확률 과정과 같은 눈사태

다음 프로세스를 고려하십시오.

위에서 아래로 배열 된 빈이 있습니다 . 처음에는 각 빈에 공이 하나 있습니다. 모든 단계에서, 우리는 $n$

무작위 로 공 골라서 $b$

가 들어있는 용지함에서 모든 공을 그 아래의 용지함으로 옮깁니다 . 이미 가장 낮은 용기 인 경우 공정에서 볼을 제거합니다. $b$

프로세스가 종료 될 때까지, 즉 모든 볼이 프로세스에서 제거 될 때까지 몇 단계를 거쳐야 합니까? 이것이 전에 연구 된 적이 있습니까? 대답은 알려진 기술에서 쉽게 따르나요? $n$

가장 좋은 경우, 단계 후에 프로세스가 완료 될 수 있습니다 . 최악의 경우 단계를 수행 할 수 있습니다 . 두 경우 모두 매우 가능성이 낮습니다. 내 추측은 그것이 단계 를 거치고 이것을 확인 하는 것으로 보이는 몇 가지 실험을 한 것 입니다. $n$ $\Theta(n^2)$ $\Theta(n\log n)$

빈 을 무작위로 균일하게 고르는 것은 분명히 다른 과정 으로 마무리하기 위해 단계를 거치게됩니다 . $\Theta(n^2)$

pr.probability markov-chains stochastic-process

— 마티아스
소스

질문은 흥미로워 보입니다 (답을 모르겠지만). 비단 조성으로 인해 어려운 것 같습니다. 모든 n 볼이 상단 빈에있는 경우 프로세스는 정확히 n 단계로 종료됩니다.

— 이토 쓰요시

답변:

실제로 답변은 아니지만 András의 답변에 대한 확장 된 의견입니다.

András의 대답에는 멋진 직감이 포함되어 있지만 예상 단계 수를 엄밀하게 계산 한 것은 아닙니다. 나는 그것이 대답에 대한 좋은 근사치라고 생각하지만, 최고 빈이 아래쪽으로 비워지기 전에 최고 점유 빈 아래의 빈이 비어있는 경우를 제대로 처리하지 못하는 것 같습니다. 그럼에도 불구하고 이것은 합리적인 근사치 일 수 있습니다 (확실하지 않습니다).

그의 계산에는 스케일링에 영향을 미치는 오류가 포함되어 있습니다. 정확히 같은 시작점을 가져 와서 계산을 다시 실행하고 확장합니다.

올바른 빈을 무작위로 선택할 확률이 이므로 합계 내에서 p의 인수를 놓칩니다.아닌 $\frac{p}{n}$ . 결과적으로 우리는 $\frac{1}{n}$

$\begin{eqnarray*} n + \sum_{p=1}^n \sum_{k=0}^{\infty} (k+1) \frac{p}{n} \left(\frac{n-p}{n}\right)^k & = & n + \sum_{p=1}^{n} \frac{p}{n} \sum_{k=0}^{\infty} (k+1) \left(\frac{n-p}{n}\right)^k \\\\& = & n + \sum_{p=1}^{n} \frac{p}{n} \cdot \frac{n^2}{p^2} \\\\& = & n + n\sum_{p=1}^{n} 1/p \\\\& = & n (1+H_n) \end{eqnarray*}$

where $H_n = \sum_{p=1}^{n} 1/p$ is the nth Harmonic number. To approximate $H_n$ we can simply replace the summation with an integral: $H_n \approx \int_{1}^{n+1} \frac{1}{x} dx = \log(n+1)$ . Thus the scaling is $n (1+\log(n+1))$ or approximately $n \log(n+1)$ . While this scaling does not match the scaling of the problem exactly (see simulation below) it is out by almost exactly a factor of $\log(2)$ .

Simulation vs theory

Red circles: Data points from simulation of process averaged over 10k runs. Green: $n \log_2(n+1)$ . Blue: $n \log(n+1)$ .

— Joe Fitzsimons
소스

@Joe: Nice work! It would be interesting to now show rigorously how the

\ln 2

$\ln 2$ factor comes in from the creation of gaps.

— András Salamon

@András: I don't really have a good feeling for if this is a sound approximation to make or not. @Peter's idea of bunches forming which shift down, seems like it should give the correct expression assuming that these are equally likely to form in any bin.

— Joe Fitzsimons

@Joe: The top most ball will remain isolated in almost 1/3 of the cases. Consider the top 3 balls. If the middle one is picked first (out of those 3), it will join the third one. These two will, from then on, move twice as fast as the top ball. The distance between them and the top ball is a heavily biased random walk and the probability for the top ball to catch up is bounded by a small(ish) constant (rough estimate 15%). But the good news is that the top log n balls shouldn't really matter. If everything else is cleared in n\log n steps, they will only add additional n\log n steps.

— Matthias

Here are two plots. Both show the number of steps divided by

n

$n$ , until everything but

\log n

$\log n$ balls are cleared. For the first one, balls that drop out of the system can still be picked (like András proposed it): tinyurl.com/2wg7a9y . For the second one, balls that drop out of the system are not picked anymore: tinyurl.com/33b63pq . As you can see, the bounds the first process can give are probably too weak. Maybe it can be tuned by considering phases (like Peter wrote somewhere) in which we always halve the number of balls in the system?

— Matthias

@Matthias: Analyzing the expected time assuming Peter's intuition is correct is not the road block (at least from my perspective). To me proving that this intuition is in fact a fair reflection of what happens is necessary first, though I do suspect it is a good approximation.

— Joe Fitzsimons

Edit: I am leaving this answer as is (for now) to illustrate the messy process of proving theorems, something that is left out of published papers. The core intuition here is that it is enough to focus on the top ball, as it sweeps away all below it. Please see the comments (in particular @Michael pointing out that gaps can occur) and @Joe's later answer for how errors were identified and corrected. I especially like Joe's use of experiments to double-check that the formulas were sensible.

The lower bound is $n$ as you point out, but somewhat surprising there seems to be an upper bound of $(1 + \pi^2/6)n$ for the expected number of steps.

To derive this, note that a sequence of balls will clear all the bins precisely if it contains a subsequence $b_1b_2\cdots b_n$ such that $b_1 = n$ , $b_2 \ge n-1$ , $\dots$ , $b_i \ge n-i+1$ . Additional conditions are necessary on the sequence to avoid balls being chosen that are no longer in the system, but for the purposes of an upper bound, suppose that there is an infinite decreasing sequence of bins (so the balls don't disappear when leaving bin 1, but are moved to bin 0, then bin -1, and so on). Then the expected number of steps for such a subsequence to be seen is the expected number of steps before $b_1$ is seen, plus the expected number of steps before $b_2$ is seen, and so on (down to 1, since $b_n$ can be any of the numbers $1,2,\ldots,n$ ). These can be seen as separate events, one after the other. The expected number of steps is then

$\begin{eqnarray*}n + \sum_{p=1}^n \sum_{k=0}^{\infty} \frac{k+1}{n} \left(\frac{n-p}{n}\right)^k & = & n + \sum_{p=1}^{n-1} \frac{1}{n-p} \sum_{k=1}^{\infty} k\left(\frac{n-p}{n}\right)^k \\& = & n + \sum_{p=1}^{n-1} \frac{1}{n-p} n(n-p)/p^2 \\& = & n + n\sum_{p=1}^{n-1} 1/p^2 \\& \le & (1 + \pi^2/6)n. \end{eqnarray*}$

— András Salamon
소스

@Andras @Joe: Holy schmoley. If all the people asking the questions on this site took their questions as seriously as you take answering them, this would be the badassest url on the internet.

— Aaron Sterling

@András: I'm trying to understand your statement "a sequence of balls will clear all the bins precisely if it contains a subsequence...". Maybe I've misunderstood something, but say we have four balls. If the sequence is 3,4,3,2,4 then it seems to satisfy your subsequence requirement, yet not all the bins have been cleared.

— Michael

@András: If you want to show a reasonable upper bound, you have to use the fact that balls disappear from the process and are no longer picked. Otherwise, the top most ball is always only picked with probability 1/n and there is a good chance (maybe slightly less than 1/2) that this ball will stay isolated the whole time. For this ball, you will need n^2 steps.

— Matthias

@Michael: I think you have identified the mistake. I'm assuming falsely that the top ball will move down even if there is a gap.

— András Salamon

Here's my intuition. After a few steps, some clump of balls is going to be larger than any other clump of balls. At this point, the clump moves faster than everything else, clears everything below it and falls out of the system. This whole process should take

O (n)

$O(n)$ or maybe

O (n \log n)

$O(n \log n)$ steps. This first clump is uniformly distributed in the line, so on average it takes half the balls with it. Now, we're left with a system of around

n / 2

$n/2$ balls, and another clump forms. So after around

\log n

$\log n$ clumps, we're done.

— Peter Shor