MLE에 iid 데이터가 필요합니까? 아니면 그냥 독립 매개 변수?

16

최대 우도 추정 (MLE)을 사용하여 모수를 추정하려면 우도 함수를 평가해야합니다. 우도 함수는 분포 패밀리 (P (X = x | θ)가 주어지면 모수 공간 (θ)의 값 (x)에 표본 (X)이 발생할 확률을 매핑합니다. )의 가능한 값을 초과하여 (주 : 이것에 맞습니까?) 내가 본 모든 예는 F (X)의 곱을 취하여 P (X = x | θ)를 계산하는 것을 포함합니다. 여기서 F는 로컬과의 분포입니다 θ 및 X의 값은 샘플 (벡터)입니다.

데이터를 곱하는 것이므로 데이터가 독립적입니까? 예를 들어 시계열 데이터에 MLE을 사용할 수 없습니까? 아니면 매개 변수가 독립적이어야합니까?

maximum-likelihood

— 펠릭스
소스

14

우도 함수는 모형 매개 변수 의 함수로서 이벤트 (데이터 세트 ) 의 확률로 정의 됩니다. $E$ ${\bf x}$ $\theta$

L (θ; x) \propto P (Event E; θ) = P (observing x; θ) .

${\mathcal L}(\theta;{\bf x})\propto {\mathbb P}(\text{Event }E;\theta)= {\mathbb P}(\text{observing } {\bf x};\theta).$

따라서 관측치의 독립성에 대한 가정은 없습니다. 고전적인 접근법에서는 매개 변수가 임의의 변수가 아니기 때문에 매개 변수의 독립성 에 대한 정의 가 없습니다. 일부 관련 개념은 식별 가능성 , 모수 직교성 및 최대 가능성 추정기 (임의 변수)의 독립성 일 수 있습니다 .

몇 가지 예

(1). 개별 사례 . 와 (독립) 이산 관찰 시료 인 , 다음, ${\bf x}=(x_1,...,x_n)$ ${\mathbb P}(\text{observing } x_j ; \theta)>0$

엘 (θ; 엑스) \propto \prod_{제이 = 1}^{엔} 피 (관찰 {엑스}_{제이}; θ) .

${\mathcal L}(\theta;{\bf x})\propto \prod_{j=1}^n{\mathbb P}(\text{observing } x_j ; \theta).$

특히, 알고 있으면 알 수 있습니다. $x_j\sim \text{Binomial}(N,\theta)$ $N$

엘 (θ; 엑스) \propto \prod_{제이 = 1}^{엔} θ^{{엑스}_{제이}} (1 - θ)^{엔 - {엑스}_{제이}} .

${\mathcal L}(\theta;{\bf x})\propto \prod_{j=1}^n \theta^{x_j}(1-\theta)^{N-x_j}.$

(2). 연속 근사 . 하자 연속적인 임의의 변수의 샘플 수 분포, 및 밀도 측정 오차와, 이이고,는 세트의 관찰 . 그때 ${\bf x}=(x_1,...,x_n)$ $X$ $F$ $f$ $\epsilon$ $(x_j-\epsilon,x_j+\epsilon)$

\begin{array}{rcl} L (θ; x) \propto \prod_{j = 1}^{n} P [observing (x_{j} - ϵ, x_{j} + ϵ); θ] = \prod_{j = 1}^{n} [F (x_{j} + ϵ; θ) - F (x_{j} - ϵ; θ)] \end{array}

$\begin{eqnarray*} {\mathcal L}(\theta;{\bf x})\propto \prod_{j=1}^n {\mathbb P}[\text{observing } (x_j-\epsilon,x_j+\epsilon);\theta] = \prod_{j=1}^n[F(x_j+\epsilon;\theta)-F(x_j-\epsilon;\theta)] \end{eqnarray*}$

When $\epsilon$ is small, this can be approximated (using the Mean Value Theorem) by

\begin{array}{rcl} L (θ; x) \propto \prod_{j = 1}^{n} f (x_{j}; θ) \end{array}

$\begin{eqnarray*} {\mathcal L}(\theta;{\bf x})\propto \prod_{j=1}^n f(x_j;\theta) \end{eqnarray*}$

For an example with the normal case, take a look at this.

(3). Dependent and Markov model. Suppose that ${\bf x}=(x_1,...,x_n)$ is a set of observations possibly dependent and let $f$ be the joint density of ${\bf x}$ , then

\begin{array}{rcl} L (θ; x) \propto f (x; θ) . \end{array}

$\begin{eqnarray*} {\mathcal L}(\theta;{\bf x})\propto f({\bf x}; \theta). \end{eqnarray*}$

If additionally the Markov property is satisfied, then

\begin{array}{rcl} L (θ; x) \propto f (x; θ) = f (x_{1}; θ) \prod_{j = 1}^{n - 1} f (x_{j + 1} | x_{j}; θ) . \end{array}

$\begin{eqnarray*} {\mathcal L}(\theta;{\bf x})\propto f({\bf x}; \theta) = f(x_1;\theta)\prod_{j=1}^{n-1} f(x_{j+1} \vert x_j ;\theta). \end{eqnarray*}$

Take also a look at this.

— Community
소스

3

From the you write the likelihood function as a product, you are implicitly assuming a dependence structure among the observations. So for MLE one needs two assumptions (a) one on the distribution of each individual outcome and (b) one on the dependence among the outcomes.

10

(+1) Very good question.

Minor thing, MLE stands for maximum likelihood estimate (not multiple), which means that you just maximize the likelihood. This does not specify that the likelihood has to be produced by IID sampling.

If the dependence of the sampling can be written in the statistical model, you just write the likelihood accordingly and maximize it as usual.

The one case worth mentioning when you do not assume dependence is that of the multivariate Gaussian sampling (in time series analysis for example). The dependence between two Gaussian variables can be modelled by their covariance term, which you incoroporate in the likelihood.

To give a simplistic example, assume that you draw a sample of size $2$ from correlated Gaussian variables with same mean and variance. You would write the likelihood as

\frac{1}{2 π σ^{2} \sqrt{1 - ρ^{2}}} \exp (- \frac{z}{2 σ^{2} (1 - ρ^{2})}),

$\frac{1}{2\pi\sigma^2\sqrt{1-\rho^2}}\exp\left(-\frac{z}{2\sigma^2(1-\rho^2)}\right),$

where $z$ is

z = (x_{1} - μ)^{2} - 2 ρ (x_{1} - μ) (x_{2} - μ) + (x_{2} - μ)^{2} .

$z = (x_1-\mu)^2-2\rho(x_1-\mu)(x_2-\mu)+(x_2-\mu)^2.$

This is not the product of the individual likelihoods. Still, you would maximize this with parameters $(\mu, \sigma, \rho)$ to get their MLE.

— gui11aume
소스

2

These are good answers and examples. The only thing I would add to see this in simple terms is that likelihood estimation only requires that a model for the generation of the data be specified in terms of some unknown parameters be described in functional form.

— Michael R. Chernick

(+1) Absolutely true! Do you have an example of model that cannot be specified in those terms?

— gui11aume

@gu11aume I think you are referring to my remark. I would say that I was not giving a direct answer to the question. The answwer to the question is yes because there are examples that can be shown where the likelihood function can be expressed when the data are genersted by dependent random variables.

— Michael R. Chernick

2

Examples where this cannot be done would be where the data are given without any description of the data generating mechanism or the model is not presented in a parametric form such as when you are given two iid data sets and are asked to test whether they come from the same distribution where you only specify that the distributions are absolutely continuous.

— Michael R. Chernick

4

Of course, Gaussian ARMA models possess a likelihood, as their covariance function can be derived explicitly. This is basically an extension of gui11ame's answer to more than 2 observations. Minimal googling produces papers like this one where the likelihood is given in the general form.

Another, to an extent, more intriguing, class of examples is given by multilevel random effect models. If you have data of the form

y_{i j} = x_{i j}^{'} β + u_{i} + ϵ_{i j},

$y_{ij} = x_{ij}'\beta + u_i + \epsilon_{ij},$ where indices

j

$j$ are nested in

i

$i$ (think of students

j

$j$ in classrooms

i

$i$ , say, for a classic application of multilevel models), then, assuming

ϵ_{i j} ⊥ u_{i}

$\epsilon_{ij} \perp u_i$ , the likelihood is

\ln L \sim \sum_{i} \ln \int \prod_{j} f (y_{i j} | β, u_{i}) d F (u_{i})

$\ln L \sim \sum_i \ln \int \prod_j f(y_{ij}|\beta,u_i) {\rm d}F(u_i)$ and is a sum over the likelihood contributions defined at the level of clusters, not individual observations. (Of course, in the Gaussian case, you can push the integrals around to produce an analytic ANOVA-like solution. However, if you have say a logit model for your response

y_{i j}

$y_{ij}$ , then there is no way out of numerical integration.)

— StasK
소스

2

Stask and @gui11aume, these three answers are nice but I think they miss a point: what about the consistency of the MLE for dependent data ?

— Stéphane Laurent