# Maximum likelihood estimate of hypergeometric distribution parameter

Using the symbols in the Wikipedia write-up on the hypergeometric distribution, I'm interested just how one would certainly get the maximum chance price quote for parameter $m$ , the variety of white marbles, offered $T$ tests from the very same container. For ease, I'll copy/paste the symbols from the write-up:

Suppose you are to attract $n$ marbles without substitute from a container having $N$ marbles in total amount, $m$ of which are white. The hypergeometric circulation defines the circulation of the variety of white marbles attracted from the container, $k$ .

Once more, thinking I perform $T$ tests, at each test, I take $n$ rounds from the container, and also $k_i$ is the variety of white rounds at test $i$ . Specify $K = (k_1,\ldots,k_T)$ . After that the chance function $L$ : $$L(m; K, N, n) = \prod_i^T \frac{\binom{m}{k_i}\binom{N-m}{n-k_i}}{\binom{N}{n}}$$

Taking a tip from this post, I first attempted to address the inequality: $$L(m;K,N,n) \geq L(m-1;K,N,n)$$ when $T=1$ . From this I got $$m \leq \frac{Nk+k}{n}$$ so the MLE need to be $$m = \left\lfloor \frac{Nk+k}{n} \right\rfloor$$

Now, I'm stuck when I attempt to generalise to $T \geq 2$ .

I first attempted doing the like above and also I wound up with the adhering to unwieldy inequality: $$\prod_i^T \frac{m}{m-k_i} \geq \prod_i^T \frac{N-m+1}{N-m-n+k_i+1}$$ which I'm not exactly sure just how to address.

After that I attempted to take the log of the chance and also set apart as if $m$ were specified over favorable reals and also I wound up with a just as unwieldy formula to address: $$\sum_i^T \left(\Psi(m+1) - \Psi(m-k_i+1) - \Psi(N-m+1) + \Psi(N-m-n+k_i+1)\right) = 0$$ where $\Psi$ is the digamma function (i.e. the by-product of the log - gamma function).

My instinct informs me the remedy to either of the above would certainly look something similar to this: $$m = \left\lfloor \frac{(N+1)\sum_i^T k_i}{Tn} \right\rfloor$$ yet I have no suggestion just how to get below.

The inspiration for this trouble is pure inquisitiveness, given that I've never ever seen a MLE for the hypergeometric circulation in regards to $m$ .

The problem you got $$ \prod_{i=1}^T \frac{m}{m-k_i} \geq \prod_{i=1}^T \frac{N-m+1}{N-m-n+k_i+1} $$ is proper. Revising it as $$ \prod_{i=1}^T \left(1+\frac{k_i}{m-k_i}\right) \geq \prod_{i=1}^T \left(1+\frac{n-k_i}{N-m-n+k_i+1}\right) $$ makes noticeable the reality that the LHS is a lowering function of $m$ and also the RHS is a raising function of $m$. Allow $k^*$ represent the minimum value of $k_i$ over $i$. When $m\to k^*$, $m>k^*$, the LHS mosts likely to infinity and also the RHS remains limited. When $m\to N-n+k^*+1$, $m<N-n+k^*+1$, the LHS remains limited and also the RHS mosts likely to infinity. These monitorings confirm that there exists an one-of-a-kind maximum chance estimator of $m$.

Its value is close to the origin of a polynomial in $m$ whose level is a priori at the majority of $2T$. The regards to level $2T$ terminate and also a mindful examination reveals that the coefficient of $m^{2T-1}$ is $nT$, therefore not absolutely no (this reality alone suffices to confirm that there exists at the very least a remedy given that the level of the polynomial is weird).

Ultimately one have to locate the one-of-a-kind origin in the pertinent series of a polynomial of level $2T-1$ therefore a shut kind formula for the maximum chance estimator of $m$ is not likely when $T\ge3$, the instances $T=1$ (straight polynomial) and also $T=2$ (level $3$ polynomial) being understandable.

Here is an approximate remedy. The Poisson estimate to the hypergeometric disribution legitimate for $\frac{m}{N}<<1$ and also $n>>1$, has the kind:

$P(K = k|n, M, N) = \frac{exp(-\frac{nm}{N}) (\frac{nm}{N})^k}{k!}$

The chance function comes to be

$L(m;n,N) = \frac{exp(-\frac{Tnm}{N}) (\frac{nm}{N})^{\sum_i^T k_i}}{\prod_i^T k_i!}$

which can be conveniently addressed to get:

$ m = \frac{N\sum_i^T k_i}{Tn} $

Related questions