부러진 막대의 가장 큰 조각 분포 (간격)

길이 1의 스틱을 무작위 로 $k+1$ 조각으로 균일하게 끊습니다. 가장 긴 조각의 길이 분포는 무엇입니까?

더 공식적으로하자 BE IID 및하자 , 관련 주문 통계 수 즉, 우리는 단순히 같은에서 샘플 주문 그 방법 . 방해 $(U_1, \ldots U_k)$ $U(0,1)$ $(U_{(1)}, \ldots, U_{(k)})$ $U_{(1)} \leq U_{(2)} \leq, \ldots , \leq U_{(k)}$ $Z_k = \max \left(U_{(1)}, U_{(2)}-U_{(1)}, \ldots, U_{(k)} - U_{(k-1)}, 1-U_{(k)}\right)$ .

I am interested in the distribution of $Z_k$ . Moments, asymptotic results, or approximations for $k \uparrow \infty$ are also interesting.

— gui11aume
소스

This is a well studied problem; see R. Pyke (1965), "Spacings," JRSS(B) 27:3, pp. 395-449. I'll try to come back to add some information later unless someone beats me to it. There's also a 1972 paper by the same author ("Spacings revisited") but I think what you're after is pretty much all in the first. There's some asymptotics in Devroye (1981), "Laws of the Iterated Logarithm for Order Statistics of Uniform Spacings" Ann. Probab., 9:5, 860-867.

— Glen_b -Reinstate Monica

Those should also give some good search terms to find later work if you need it.

— Glen_b -Reinstate Monica

This is awesome. The first reference is hard to find. For those interested, I put it on The Grand Locus.

— gui11aume

Please correct the misprint:

Y_{(k)}

$Y_{(k)}$ instead of

U_{(k)}

$U_{(k)}$ .

— Viktor

Thanks @Viktor! For such small things, don't hesitate to do the edit yourself (I think that it will be reviewed by other users for approval).

— gui11aume

With the information given by @Glen_b I could find the answer. Using the same notations as the question

P (Z_{k} \leq x) = \sum_{j = 0}^{k + 1} (\binom{k + 1}{j}) (- 1)^{j} (1 - j x)_{+}^{k},

$P(Z_k \leq x) = \sum_{j=0}^{k+1} { k+1 \choose j } (-1)^j (1-jx)_+^k,$

where $a_+ = a$ if $a > 0$ and $0$ otherwise. I also give the expectation and the asymptotic convergence to the Gumbel (NB: not Beta) distribution

E (Z_{k}) = \frac{1}{k + 1} \sum_{i = 1}^{k + 1} \frac{1}{i} \sim \frac{\log (k + 1)}{k + 1}, P (Z_{k} \leq x) \sim \exp (- e^{- (k + 1) x + \log (k + 1)}) .

$E(Z_k)= \frac{1}{k+1}\sum_{i=1}^{k+1}\frac{1}{i} \sim \frac{\log(k+1)}{k+1}, \\ P(Z_k \leq x) \sim \exp\left(- e^{-(k+1)x + \log(k+1)} \right).$

The material of the proofs is taken from several publications linked in the references. They are somewhat lengthy, but straightforward.

1. Proof of the exact distribution

Let $(U_1, \ldots, U_k)$ be IID uniform random variables in the interval $(0,1)$ . By ordering them, we obtain the $k$ order statistics denoted $(U_{(1)}, \ldots, U_{(k)})$ . The uniform spacings are defined as $\Delta_i = U_{(i)} - U_{(i-1)}$ , with $U_{(0)} = 0$ and $U_{(k+1)} = 1$ . The ordered spacings are the corresponding ordered statistics $\Delta_{(1)} \leq \ldots \leq \Delta_{(k+1)}$ . The variable of interest is $\Delta_{(k+1)}$ .

For fixed $x \in (0,1)$ , we define the indicator variable $\mathbb{1}_i = \mathbb{1}_{\{\Delta_i > x\}}$ . By symmetry, the random vector $(\mathbb{1}_1, \ldots, \mathbb{1}_{k+1})$ is exchangeable, so the joint distribution of a subset of size $j$ is the same as the joint distribution of the first $j$ . By expanding the product, we thus obtain

P (Δ_{(k + 1)} \leq x) = E (\prod_{i = 1}^{k + 1} (1 - 1_{i})) = 1 + \sum_{j = 1}^{k + 1} (\binom{k + 1}{j}) (- 1)^{j} E (\prod_{i = 1}^{j} 1_{i}) .

$P(\Delta_{(k+1)} \leq x) = E \left( \prod_{i=1}^{k+1} (1 - \mathbb{1}_i) \right) = 1 + \sum_{j=1}^{k+1} { k+1 \choose j } (-1)^j E \left( \prod_{i=1}^j \mathbb{1}_i \right).$

We will now prove that $E \left( \prod_{i=1}^j \mathbb{1}_i \right) = (1-jx)_+^k$ , which will establish the distribution given above. We prove this for $j=2$ , as the general case is proved similarly.

E (\prod_{i = 1}^{2} 1_{i}) = P (Δ_{1} > x \cap Δ_{2} > x) = P (Δ_{1} > x) P (Δ_{2} > x | Δ_{1} > x) .

$E \left( \prod_{i=1}^2 \mathbb{1}_i \right) = P(\Delta_1 > x \cap \Delta_2 > x) = P(\Delta_1 > x) P(\Delta_2 > x | \Delta_1 > x).$

If $\Delta_1 > x$ , the $k$ breakpoints are in the interval $(x,1)$ . Conditionally on this event, the breakpoints are still exchangeable, so the probability that the distance between the second and the first breakpoint is greater than $x$ is the same as the probability that the distance between the first breakpoint and the left barrier (at position $x$ ) is greater than $x$ . So

P (Δ_{2} > x | Δ_{1} > x) = P (all points are in (2 x, 1) | all points are in (x, 1)), so P (Δ_{2} > x \cap Δ_{1} > x) = P (all points are in (2 x, 1)) = (1 - 2 x)_{+}^{k} .

$P(\Delta_2 > x | \Delta_1 > x) = P\big(\text{all points are in } (2x,1) \big| \text{all points are in } (x,1)\big), \; \text{so} \\ P(\Delta_2 > x \cap \Delta_1 > x) = P\big(\text{all points are in } (2x,1)\big) = (1-2x)_+^k.$

2. Expectation

For distributions with finite support, we have

E (X) = \int P (X > x) d x = 1 - \int P (X \leq x) d x .

$E(X) = \int P(X > x)dx = 1 - \int P(X \leq x)dx.$

Integrating the distribution of $\Delta_{(k+1)}$ , we obtain

E (Δ_{(k + 1)}) = \frac{1}{k + 1} \sum_{j = 1}^{k + 1} (\binom{k + 1}{j}) \frac{(- 1)^{j + 1}}{j} = \frac{1}{k + 1} \sum_{j = 1}^{k + 1} \frac{1}{j} .

$E\left(\Delta_{(k+1)}\right) = \frac{1}{k+1}\sum_{j=1}^{k+1}{k+1 \choose j}\frac{(-1)^{j+1}}{j} = \frac{1}{k+1}\sum_{j=1}^{k+1}\frac{1}{j}.$

The last equality is a classic representation of harmonic numbers $H_i = 1+ \frac{1}{2}+ \ldots + \frac{1}{i}$ , which we demonstrate below.

H_{k + 1} = \int_{0}^{1} 1 + x + \dots + x^{k} d x = \int_{0}^{1} \frac{1 - x^{k + 1}}{1 - x} d x .

$H_{k+1} = \int_0^1 1 + x + \ldots + x^k dx = \int_0^1 \frac{1-x^{k+1}}{1-x}dx.$

With the change of variable $u = 1-x$ and expanding the product, we obtain

H_{k + 1} = \int_{0}^{1} \sum_{j = 1}^{k + 1} (\binom{k + 1}{j}) (- 1)^{j + 1} u^{j - 1} d u = \sum_{j = 1}^{k + 1} (\binom{k + 1}{j}) \frac{(- 1)^{j + 1}}{j} .

$H_{k+1} = \int_0^1\sum_{j=1}^{k+1}{ k+1 \choose j }(-1)^{j+1}u^{j-1}du = \sum_{j=1}^{k+1}{k+1 \choose j}\frac{(-1)^{j+1}}{j}.$

3. Alternative construction of uniform spacings

In order to obtain the asymptotic distribution of the largest fragment, we will need to exhibit a classical construction of uniform spacings as exponential variables divided by their sum. The probability density of the associated order statistics $(U_{(1)}, \ldots, U_{(k)})$ is

f_{U_{(1)}, \dots U_{(k)}} (u_{(1)}, \dots, u_{(k)}) = k!, 0 \leq u_{(1)} \leq \dots \leq u_{(k + 1)} .

$f_{U_{(1)}, \ldots U_{(k)}}(u_{(1)}, \ldots, u_{(k)}) = k!, \; 0 \leq u_{(1)} \leq \ldots \leq u_{(k+1)}.$

If we denote the uniform spacings $\Delta_i = U_{(i)} - U_{(i-1)}$ , with $U_{(0)} = 0$ , we obtain

f_{Δ_{1}, \dots Δ_{k}} (δ_{1}, \dots, δ_{k}) = k!, 0 \leq δ_{i} + \dots + δ_{k} \leq 1.

$f_{\Delta_1, \ldots \Delta_k}(\delta_1, \ldots, \delta_k) = k!, \; 0 \leq \delta_i + \ldots + \delta_k \leq 1.$

By defining $U_{(k+1)} = 1$ , we thus obtain

f_{Δ_{1}, \dots Δ_{k + 1}} (δ_{1}, \dots, δ_{k + 1}) = k!, δ_{1} + \dots + δ_{k} = 1.

$f_{\Delta_1, \ldots \Delta_{k+1}}(\delta_1, \ldots, \delta_{k+1}) = k!, \; \delta_1 + \ldots + \delta_k = 1.$

Now, let $(X_1, \ldots, X_{k+1})$ be IID exponential random variables with mean 1, and let $S = X_1 + \ldots + X_{k+1}$ . With a simple change of variable, we can see that

f_{X_{1}, \dots X_{k}, S} (x_{1}, \dots, x_{k}, s) = e^{- s} .

$f_{X_1, \ldots X_k, S}(x_1, \ldots, x_k, s) = e^{-s}.$

Define $Y_i = X_i/S$ , such that by a change of variable we obtain

f_{Y_{1}, \dots Y_{k}, S} (y_{1}, \dots, y_{k}, s) = s^{k} e^{- s} .

$f_{Y_1, \ldots Y_k, S}(y_1, \ldots, y_k, s) = s^k e^{-s}.$

Integrating this density with respect to $s$ , we thus obtain

f_{Y_{1}, \dots Y_{k},} (y_{1}, \dots, y_{k}) = \int_{0}^{\infty} s^{k} e^{- s} d s = k!, 0 \leq y_{i} + \dots + y_{k} \leq 1, and thus f_{Y_{1}, \dots Y_{k + 1},} (y_{1}, \dots, y_{k + 1}) = k!, y_{1} + \dots + y_{k + 1} = 1.

$f_{Y_1, \ldots Y_k,}(y_1, \ldots, y_k) = \int_0^{\infty}s^k e^{-s}ds = k!, \; 0 \leq y_i + \ldots + y_k \leq 1, \; \text{and thus} \\ f_{Y_1, \ldots Y_{k+1},}(y_1, \ldots, y_{k+1}) = k!, \; y_1 + \ldots + y_{k+1} = 1.$

So the joint distribution of $k+1$ uniform spacings on the interval $(0,1)$ is the same as the joint distribution of $k+1$ exponential random variables divided by their sum. We come to the following equivalence of distribution

Δ_{(k + 1)} \equiv \frac{X_{(k + 1)}}{X_{1} + \dots + X_{k + 1}} .

$\Delta_{(k+1)} \equiv \frac{X_{(k+1)}}{X_1 + \ldots + X_{k+1}}.$

4. Asymptotic distribution

Using the equivalence above, we obtain

\begin{aligned} P ((k + 1) Δ_{(k + 1)} - \log (k + 1) \leq x) & = P (X_{(k + 1)} \leq (x + \log (k + 1)) \frac{X_{1} + \dots + X_{k + 1}}{k + 1}) \\ = P (X_{(k + 1)} - \log (k + 1) \leq x + (x + \log (k + 1)) T_{k + 1}), \end{aligned}

$\begin{align} P\big((k+1)\Delta_{(k+1)} - \log(k+1) \leq x\big) &= P\left(X_{(k+1)} \leq (x + \log(k+1))\frac{X_1 + \ldots + X_{k+1}}{k+1}\right) \\ &= P\left(X_{(k+1)} - \log(k+1) \leq x + (x + \log(k+1))T_{k+1}\right), \end{align}$

where $T_{k+1} = \frac{X_1+\ldots+X_{k+1}}{k+1} -1$ . This variable vanishes in probability because $E\left(T_{k+1}\right) = 0$ and $Var\big(\log(k+1)T_{k+1}\big) = \frac{(\log(k+1))^2}{k+1} \downarrow 0$ . Asymptotically, the distribution is the same as that of $X_{(k+1)} - \log(k+1)$ . Because the $X_i$ are IID, we have

\begin{aligned} 피 ({엑스}_{(케이 + 1)} - 로그 (케이 + 1) \leq 엑스) & = 피 {({엑스}_{1} \leq 엑스 + 로그 (케이 + 1))}^{케이 + 1} \\ = {(1 - {이자형}^{- 엑스 - 로그 (케이 + 1)})}^{케이 + 1} = {(1 - \frac{{이자형}^{- 엑스}}{케이 + 1})}^{케이 + 1} \sim 특급 {- {이자형}^{- 엑스}} . \end{aligned}

$\begin{align} P\left(X_{(k+1)} - \log(k+1) \leq x \right) &= P\left(X_1 \leq x + \log(k+1)\right)^{k+1} \\ &= \left(1-e^{-x - \log(k+1)}\right)^{k+1} = \left(1-\frac{e^{-x}}{k+1}\right)^{k+1} \sim \exp\left\{-e^{-x}\right\}. \end{align}$

5. 그래픽 개요

아래 그림은 다른 값에 대한 가장 큰 조각의 분포를 보여줍니다. $k$ . 에 대한 $k=10, 20, 50$ , 나는 점근선 Gumbel 분포 (가는 선)를 겹쳐 놓았습니다. Gumbel은 작은 값에 대한 매우 나쁜 근사치입니다. $k$ 그래서 나는 사진에 과부하가 걸리지 않도록 생략합니다. Gumbel 근사값은 $k \approx 50$ .

6. 참고 문헌

위의 증거는 참고 문헌 2와 3에서 가져온 것입니다. 인용 문헌에는 임의의 순위 정렬 간격 분포, 한계 분포 및 정렬 된 균일 간격의 일부 대체 구성과 같은 더 많은 결과가 포함되어 있습니다. 주요 참고 문헌에 쉽게 접근 할 수 없으므로 전체 텍스트에 대한 링크도 제공합니다.

Bairamov et al. (2010) 주문 된 균일 간격에 대한 결과 제한 , 통계 지, 51 : 1, pp 227-240
Holst (1980) 무작위로 끊어진 막대기 조각의 길이에서 , J. Appl. 잠언 17, pp 623-634
파이크 (1965) 간격 , JRSS (B) 27 : 3, pp. 395-449
Renyi (1953) 주문 통계 이론 , Acta math Hung, 4, pp 191-231

— gui11aume
소스

Brilliant. By the way, is there a known asymptotics to

E (Z_{k}^{2})

$E(Z_k ^2)$ ?

— Amir Sagiv

@ AmirSagiv 이것은 좋은 질문입니다. 나는 참고 문헌을 빨리 보았고 그것을 찾을 수 없었다. 또한 위의 증거를 조정할 수 없습니다. 이것은 Gumbel의 제곱 분포가 무엇인지 모른다는 것을 깨달았습니다. 시작하기에 좋은 곳일까요?

— gui11aume

$gui11aume Look here : mathoverflow.net/a/293381/42864

— Amir Sagiv

@AmirSagiv This is a very good post. For some reason, I misunderstood your question and thought you were interested in the asymptotic distribution of

Z_{k}^{2}

$Z_k^2$ (even though your comment was very clear), so my comment above is not so relevant.

— gui11aume

This is not a complete answer, but I did some quick simulations, and this is what I obtained: Histogram of the longest fragment

This looks remarkably beta-ish, and this makes a bit of sense, since the order statistics of i.i.d. uniform distributions are beta wiki.

This might give some starting point to derive the resulting p.d.f..

I'll update if I get to a final closed solution.

Cheers!

— Lima
소스

한 가지 더, k를 증가시키기위한 히스토그램의 모양은 0에 가까워지는 "찌그러짐"을 제외하고는 크게 변하지 않습니다.

— 리마

Thank you for your thoughts @Lima (and welcome to Cross Validated). I think your answer can be improved. First, I would refrain from making statements without proof. If this is incorrect, you may put the people who see this thread on the wrong track. Second, I would document what you did. Without the value of

k

$k$ that you used nor the code, the figure does not help anybody. Finally, I would copy-edit the answer and remove everything that is not directly answering the question.

— gui11aume

제안 해 주셔서 감사합니다. 그것들은 스택 교환을 넘어서 유효하며, 그것들을 사용하는 것을 기억할 것입니다.

— 리마

2005 년 시에나 (이탈리아)에서 회의에 대한 답변을 작성했습니다.이 논문 (2006)은 내 웹 사이트 (pdf)에 나와 있습니다. 모든 간격 (최소에서 최대)의 정확한 분포는 75 및 76 페이지에 있습니다.

2016 년 9 월 맨체스터 (영국)에서 열린 RSS 컨퍼런스에서이 주제에 대한 프레젠테이션을하고 싶습니다.

— C.J.
소스

Welcome to the site. We are trying to build a permanent repository of high-quality statistical information in the form of questions & answers. Thus, we're wary of link-only answers, due to linkrot. Can you post a full citation & a summary of the information at the link, in case it goes dead? Also, please don't sign your posts here. Every post has a link to your userpage where you can post that information.

— gung-복직 모니카