--- title: "The `dirichlet()` function in the `hyper2` package" author: "Robin K. S. Hankin" output: html_vignette bibliography: hyper2.bib vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{dirichlet} %\usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} set.seed(0) library("hyper2") options(rmarkdown.html_vignette.check_title = FALSE) knitr::opts_chunk$set(echo = TRUE) knit_print.function <- function(x, ...){dput(x)} registerS3method( "knit_print", "function", knit_print.function, envir = asNamespace("knitr") ) ``` ```{r out.width='20%', out.extra='style="float:right; padding:10px"',echo=FALSE} knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2")) ``` ```{r, label=showdirichlet,comment=""} dirichlet ``` To cite the `hyper2` package in publications, please use @hankin2017_rmd. We say that non-negative $p_1,\ldots, p_n$ satisfying $\sum p_i=1$ follow a _Dirichlet distribution_, and write $\mathcal{D}(p_1,\ldots,p_n)$, if their density function is \begin{equation} \frac{\Gamma(\alpha_1+\cdots +\alpha_n)}{\Gamma(\alpha_1)\cdots\Gamma(\alpha_n)} \prod_{i=1}^n p_i^{\alpha_i-1} \end{equation} for some parameters $\alpha_i>0$. It is interesting in the current context because it is conjugate to the multinomial distribution; specifically, given a Dirichlet prior $\mathcal{D}(\alpha_1,\ldots,\alpha_n)$ and likelihood function $\mathcal{L}(p_1,\ldots,p_n)\propto\prod_{i=1}^n p_i^{\alpha_i}$ then the posterior is $\mathcal{D}(\alpha_1+a_1,\ldots,\alpha_n+a_n)$. In the context of the `hyper2` package, function `dirichlet()` presents some peculiarities, discussed here. ## A simple example Consider a three-way trial between players `a`, `b`, and `c` with eponymous Bradley-Terry strengths. Suppose that, out of $4+5+3=12$ trials, `a` wins 4, `b` wins 5, and `c` wins 3. Then an appropriate likelihood function would be: ```{r dirichletcompleteFALSE} dirichlet(c(a=4,b=5,c=3)) ``` Note the `(a+b+c)^-11` term, which is not strictly necessary according the Dirichlet density function above which has no such term. However, we see that the returned likelihood function is ${\propto} p_{a\vphantom{abc}}^4p_{b\vphantom{abc}}^5p_{c\vphantom{abc}}^3$ (because $p_a+p_b+p_c=1$). ```{r dirichletabc} (L1 <- dirichlet(c(a=4,b=5,c=3))) ``` Now suppose we observe three-way trials between `b`, `c`, and `d`: ```{r dirichletbcd} (L2 <- dirichlet(c(b=1,c=1,d=8))) ``` The overall likelihood function would be $L_1+L_2$: ```{r sumofL1andL2,cache=TRUE} L1+L2 maxp(L1+L2) ``` Observe the dominance of competitor `d`, reasonable on the grounds that `d` won 8 of the 10 trials against `b` and `c`; and `a`, `b`, `c` are more or less evenly matched [chi square test, $p=`r round(chisq.test(c(4,5,3),sim=TRUE)$p.value,2)`$]. Observe further that we can include four-way observations easily: ```{r fourwayobs} (L3 <- dirichlet(c(a=4,b=3,c=2,d=6))) L1+L2+L3 ``` Package idiom `L1+L2+L3` operates as expected because `dirichlet()` includes the denominator. Sometimes we see situations in which a competitor does not win any trials. Consider the following: ```{r} (L4 <- dirichlet(c(a=5,b=3,c=0,d=3))) ``` Above, we see that `c` won no trials and is not present in the numerator of the expression. However, `L4` is informative about competitor `c`: ```{r label=L4testspec,cache=TRUE} maxp(L4) ``` We see that the maximum likelihood estimate for `c` is zero (to within numerical tolerance). Further, we may reject the hypothesis that $p_c=\frac{1}{4}$ [which might be a reasonable consequence of the assumption that all four competitors have the same strength]: ```{r label=testcquarter,cache=TRUE} specificp.test(L4,'c',0.25) ``` However, observe that we cannot reject the equality hypothesis, that is, $p_a=p_b=p_c=p_d=\frac{1}{4}$: ```{r label=testequal,cache=TRUE} equalp.test(L4) ``` ## References