Hay casi ciertamente más fácil maneras, pero una manera de calcular el valor es, precisamente, calcular el número de maneras de colocar a $n$ etiquetado bolas en $m$ contenedores etiquetados de tal manera que ningún bin contiene $k$ o más bolas. Podemos calcular esta
el uso de una simple repetición. Deje $W(n,j,m',k)$ el número de maneras de colocar exactamente $j$ de la $n$ etiquetado bolas en $m'$ de la $m$ contenedores etiquetados. A continuación, el
el número que buscamos es $W(n,n,m,k)$. Tenemos la siguiente recurrencia:
$$W(n,j,m',k)=\sum_{i=0}^{k-1}\binom{n-j+i}{i}W(n,j-i,m'-1,k)$$ where $W(n,j,m',k)=0$ when $j<0$ and $W(n,0,0,k)=1$ as there is one way to pace no balls in no bins. This follows from the fact that there are $\binom{n-j+i}{i}$ ways to choose $i$ out of $n-j+i$ balls to put in the $m'$th bin, and there are $W(n,j-i,m'-1,k)$ ways to put $j-i$ balls in $m'-1$ bins.
The essence of this recurrence if that we can compute the number of ways of placing $j$ out of $n$ balls in $m'$ bins by looking at the number of balls placed in the $m'$th bin. If we placed $i$ balls in the $m'$th bin, then there were $j-i$ balls in the previous $m'-1$ bins, and we have already calculated the number of ways of doing that as $W(n,j-i,m'-1,k)$, and we have $\binom{n-j+i}{i}$ ways of choosing the $i$ balls to put in the $m'$th bin (there were $n-j+i$ balls left after we put $j-i$ balls in the first $m'-1$ bins, and we choose $i$ of them.) So $W(n,j,m',k)$ is just the sum over $i$ from $0$ to $k-1$ of $\binom{n-j+i}{i}W(n,j-i,m'-1,k)$.
Once we have computed $W(n,n,m,k)$ the probability that at least one bin has at least $k$ balls is $1-\frac{W(n,n,m,k)}{m^n}$.
Coding in Python because it has multiple precision arithmetic we have
import sympy
# to get the decimal approximation
#compute the binomial coefficient
def binomial(n, k):
if k > n or k < 0:
return 0
if k > n / 2:
k = n - k
if k == 0:
return 1
bin = n - (k - 1)
for i in range(k - 2, -1, -1):
bin = bin * (n - i) / (k - i)
return bin
#compute the number of ways that balls can be put in cells such that no
# cell contains fullbin (or more) balls.
def numways(cells, balls, fullbin):
x = [1 if i==0 else 0 for i in range(balls + 1)]
for j in range(cells):
x = [sum(binomial(balls - (i - k), k) * x[i - k] if i - k >= 0 else 0
for k in range(fullbin))
for i in range(balls + 1)]
return x[balls]
x = sympy.Integer(numways(300, 3000, 20))/sympy.Integer(300**3000)
print sympy.N(1 - x, 50)
(sympy is just used to get the decimal approximation).
I get the following answer to 50 decimal places
0.64731643604975767318804860342485318214921593659347
This method would not be feasible for much larger values of $m$ and $n$.
ADDED
As there appears to be some skepticism as to the accuracy of this answer, I ran my own Monte-Carlo approximation (in C using the GSL, I used something other than R to avoid any problems that R may have provided, and avoided python because the heat death of the universe is happening any time now). In $10^7$ ejecuta tengo 6471264
hits. Esto parece estar de acuerdo con mi cuenta, y es considerablemente en desacuerdo con whubers. El código para el Monte-carlo se adjunta.
He terminado una carrera de 10^8 ensayos y han conseguido 64733136 éxitos para una probabilidad de 0.64733136. Estoy bastante seguro de que las cosas están funcionando correctamente.
#include <stdio.h>
#include <stdlib.h>
#include <gsl/gsl_rng.h>
const gsl_rng_type * T;
gsl_rng * r;
int
testrand(int cells, int balls, int limit, int runs) {
int run;
int count = 0;
int *array = malloc(cells * sizeof(int));
for (run =0; run < runs; run++) {
int i;
int hit = 0;
for (i = 0; i < cells; i++) array[i] = 0;
for (i = 0; i < balls; i++) {
array[gsl_rng_uniform_int(r, cells)]++;
}
for (i = 0; i < cells; i++) {
if (array[i] >= limit) {
hit = 1;
break;
}
}
count += hit;
}
free(array);
return count;
}
int
main (void)
{
int i, n = 10;
gsl_rng_env_setup();
T = gsl_rng_default;
r = gsl_rng_alloc (T);
for (i = 0; i < n; i++)
{
printf("%d\n", testrand(300, 3000, 20, 10000000));
}
gsl_rng_free (r);
return 0;
}
AÚN MÁS
Nota: esto debe ser un comentario a probabilityislogic la respuesta, pero no caben.
Cosificación probabilityislogic la respuesta (principalmente por curiosidad), esta vez en R porque un tonto inconsistencia es el hobgoblin de grandes mentes, o algo así. Esta es la aproximación normal de la Levin papel (el Edgeworth de expansión debe ser sencillo, pero es más escribiendo que estoy dispuesto a gastar)
# an implementation of the Bruce Levin article here limit is the upper limit on
# bin size that does not count
approxNorm <- function(balls, cells, limit) {
# using N=s
sp <- balls / cells
mu <- sp * (1 - dpois(limit, sp) / ppois(limit, sp))
sig2 <- mu - (limit - mu) * (sp - mu)
x <- (balls - cells * mu) / sqrt(cells * sig2)
p2 <- exp(-x^2 / 2)/sqrt(2 * pi * cells * sig2)
p1 <- exp(ppois(limit, sp, log.p=TRUE) * cells)
sqrt(2 * pi * balls) * p1 * p2
}
y 1 - approxNorm(3000, 300, 19)
nos da $p(3000, 300, 20) \approx 0.6468276$ lo que no es malo en absoluto.