표본 분포와 무관 한 통계의 예?

이것은 wikipedia의 통계 에 대한 정의입니다.

보다 공식적으로 통계 이론은 통계 자체가 함수가 표본의 분포와 무관 한 표본의 함수로 정의합니다. 즉, 데이터를 실현하기 전에 기능을 설명 할 수 있습니다. 통계라는 용어는 함수와 주어진 샘플의 함수 값에 모두 사용됩니다.

나는이 정의의 대부분을 이해한다고 생각하지만, 그 부분 은 함수가 샘플의 분포와 무관 한 곳 으로 분류 할 수 없었습니다.

통계에 대한 나의 이해

샘플 독립적 몇개의 구현 형태의 세트가 동일 (20 양면 공정 주사위, 6면 공정 주사위 5 롤 100 실현 롤 10 실현 분포 F와 (IID) 랜덤 변수 분포 무작위로 인구에서 100 명을 끌어들입니다).

도메인이 그 집합이고 범위가 실수 인 함수 (또는 벡터 나 다른 수학적 객체와 같은 다른 것을 생성 할 수있는 함수)는 통계 로 간주됩니다 .

예를 생각할 때 평균, 중앙값, 분산은 모두이 맥락에서 의미가 있습니다. 그것들은 일련의 실현 (임의의 샘플에서 혈압 측정)에 대한 기능입니다. 또한 선형 회귀 모델이 통계로 간주되는 방법을 알 수 있습니다. $y_{i} = \alpha + \beta \cdot x_{i}$ 이것이 일련의 실현에 대한 함수가 아닙니까?

내가 혼란스러운 곳

위의 이해가 정확하다고 가정하면 함수가 샘플의 분포와 독립적이지 않은 위치를 이해할 수 없었습니다. 나는 그것을 이해하기위한 예를 생각하려고 노력했지만 운이 없다. 모든 통찰력은 대단히 감사하겠습니다!

mathematical-statistics definition

— 제이크 키르 쉬
소스

답변:

이 정의는 다소 어색한 방법입니다. "통계"는 관찰 가능한 값의 함수입니다. 모든 정의는 통계가 분포 또는 모수의 함수가 아니라 관측 가능한 값의 함수라는 것입니다. 예를 들어, $X_1, X_2, ..., X_n \sim \text{N}(\mu, 1)$ 다음 통계는 함수 것 $T(X_1,...,X_n)$ 함수 반면 $H(X_1,....,X_n, \mu)$ 는 $\mu$ 의존하기 때문에 통계가 아닙니다. 몇 가지 추가 예는 다음과 같습니다.

\begin{aligned} Statistic & {\bar{X}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}, \\ Statistic & S_{n}^{2} = \frac{1}{n} \sum_{i = 1}^{n} (X_{i} - {\bar{X}}_{n})^{2}, \\ Not a statistic & D_{n} = {\bar{X}}_{n} - μ, \\ Not a statistic & p_{i} = N (x_{i} | μ, 1), \\ Not a statistic & Q = 10 μ . \end{aligned}

$\begin{equation} \begin{aligned} \text{Statistic} & & & & & \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i, \\[12pt] \text{Statistic} & & & & & S_n^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \bar{X}_n)^2, \\[12pt] \text{Not a statistic} & & & & & D_n = \bar{X}_n - \mu, \\[12pt] \text{Not a statistic} & & & & & p_i = \text{N}(x_i | \mu, 1), \\[12pt] \text{Not a statistic} & & & & & Q = 10 \mu. \\[12pt] \end{aligned} \end{equation}$

모든 통계량은 분포 또는 모수가 아닌 관측 가능한 값의 함수입니다. 따라서 분포 또는 모수의 함수 인 통계량의 예는 없습니다 (이러한 함수는 통계가 아님). 그러나 통계 의 분포 (통계 자체가 아닌)는 일반적으로 값의 기본 분포에 의존 한다는 점에 유의해야 합니다. 보조 통계 이외의 모든 통계에 적용됩니다 .

매개 변수가 알려진 함수는 어떻습니까? 아래의 의견에서 Alecos 는 훌륭한 후속 질문을합니다. 고정 된 가정 된 모수 값을 사용하는 함수는 어떻습니까? 예를 들어, 통계는 어떻 $\sqrt{n} (\bar{x} - \mu)$ 여기서 $\mu = \mu_0$ 공지의 가설 적 가치와 동일하게 수행된다 $\mu_0 \in \mathbb{R}$ . 여기서 함수는 적절하게 제한된 도메인에 정의되어있는 한 실제로 통계입니다. 그래서 함수 $H_0: \mathbb{R}^n \rightarrow \mathbb{R}$ 과 $H_0(x_1,...,x_n) = \sqrt{n} (\bar{x} - \mu_0)$ 통계 있지만 함수 것 $H: \mathbb{R}^{n+1} \rightarrow \mathbb{R}$ 과 $H(x_1,...,x_n, \mu) = \sqrt{n} (\bar{x} - \mu)$ 은통계량이아닙니다.

— 복원 모니카
소스

비 통계의 일부로 기본 통계 매개 변수를 고려하면 매우 유용한 답변이 특히 도움이되었습니다.

— Jake Kirsch

10^{10}

$10^{10}$

(X_{1} + X_{2} + \dots + X_{1000}) / 1000

$(X_1+X_2+\dots+X_{1000})/1000$

(X_{1} + \dots + X_{n / 2}) / (n / 2)

$(X_1+\dots+X_{n/2})/(n/2)$

(X_{n / 2 + 1} + \dots + X_{n}) / (n / 2)

$(X_{n/2+1}+\dots+X_n)/(n/2)$ . These are still statistics.

— James Martin

Those examples seem entirely valid to me. Are you saying the idea of dividing data into a training set and a validation set is not valid?

— James Martin

I'm a little confused by that as well. Let me attempt to describe @CarlWitthoft point. It would still be a statistic in terms of mathematical definition, but I could see a case where a consultant takes a 'statistic' of observations, but arbitrarily decides to remove a few results (consultants do this all the time right?). This would be 'valid' in the sense it's still a function on observations, however the way that statistic may be presented and interpreted likely wouldn't be valid.

— Jake Kirsch

@Carl Withhoft: With respect to the point you are making, it is important to distinguish between a statistic (which need not include all the data, and may not encompass all the information in the sample) and a sufficient statistic (which will encompass all the information with respect to some parameter). Statistical theory already has well-developed concepts like sufficiency that capture the idea that a statistic includes all relevant information in the sample. It is not necessary, or desirable, to try to build that requirement into the definition of a "statistic".

— Reinstate Monica

I interpret that as saying that you should decide before you see the data what statistic you are going to calculate. So, for instance, if you're going to take out outliers, you should decide before you see the data what constitutes an "outlier". If you decide after you see the data, then your function is dependent on the data.

— Acccumulation
소스

this is also helpful! So making a decision on which observations to include in the function after knowing what observations are available, which is more or less what I was describing in my comment on the previous answer.

— Jake Kirsch

(+1) It might be worth noting that this important because if you define a rule a prior about what constitutes a data point that will be dropped, it is (relatively) easy to derive a distribution for statistic (i.e., truncated mean, etc.). It's really hard to derive a distribution for a measure that involves dropping data points for reasons that are not cleanly defined before hand.

— Cliff AB