조건부 확률에 대한 공식의 직관은 무엇입니까?


30

B 가 발생 했을 때 A조건부 확률 에 대한 공식 은 다음과 같습니다. P ( AAB

P(A | B)=P(AB)P(B).

저의 교과서는이 다이어그램의 직관을 벤 다이어그램으로 설명합니다.

여기에 이미지 설명을 입력하십시오

점을 감안 B 발생했습니다 수있는 유일한 방법 A 이벤트의 교차점에 빠지게에 대해 발생하는이다 와 B .AB

이 경우의 확률 않을 것 P(A|B) 단순히A 교차 의 확률과 동일B합니다. 이벤트가 발생할 수있는 유일한 방법이기 때문입니까? 내가 무엇을 놓치고 있습니까?


7
계산 방법을 잊어 버린 경우 조건부 확률이 무엇인지에 대해 직관적으로 이해하고 있습니까?
Juho Kokkala

4
B (이벤트에 조절하여 왔다 발생), 당신은에서 결과의 당신의 공간 제한 만 B에 (전체 평면). 확률은 0과 1 사이에 있으므로 사용자는, B에 대하여 측정되어야한다 B. 외부 이벤트 a의 확률이다 모두 잊어Ω
Vladislavs Dovgalecs

1
이벤트 B가 발생했다는 것을 알고 나면 이벤트 A 서클의 흰색 부분이 더 이상 모집단의 일부가 아니라는 사실이 사라졌습니다.
Monty Harder

4
직감이 정확하지도 않고 단수도 아니므로 왜 (단수) 정확한 직감에 대해 물어보십시오. 유용한 직관만으로 충분하지만 모든 제안이 모든 사람에게 유용한 것은 아닙니다.
존 콜맨

답변:


23

B가 A의 유무에 관계없이 발생했다는 A의 가능성은 무엇입니까? 즉, 우리는 이제 B가 발생한 우주, 즉 완전한 원 안에 있습니다. 이 원에서 A의 확률은 A와 B의 면적을 원의 면적으로 나눈 것입니다.


5
다시 말해, 나는 발생 했다고 말하는데 , 이것은 우리가 B 서클에 살고 있음을 의미 합니다. 그 세계 내에서 렌즈 (에 사건의 어떤 %이다 B )? BBAB
MichaelChirico

18

나는 이것을 다음과 같이 생각할 것입니다.

B가 발생했음을 감안할 때 A가 발생하는 유일한 방법은 A와 B의 교차점에 짝수가 떨어지는 것입니다.

그리고 당신이 게시 한 두 번째 이미지에 대해 언급하려고합니다.

  1. 전체 흰색 사각형이 샘플 공간 이라고 상상해보십시오 .Ω

    세트에 확률을 할당한다는 것은 어떤 의미에서 해당 세트를 측정 하고 있음을 의미합니다. 사각형의 면적을 측정 한 것과 동일하지만 확률은 특정 속성을 가진 다른 종류의 측정입니다 (이에 대해서는 더 이상 언급하지 않습니다).

  2. 이라는 것을 알고 있으며 다음과 같이 해석됩니다.P(Ω)=1

    은 발생할 수있는 모든 이벤트와 발생해야하는 일을 나타내므로 100 % 확률로 발생합니다.Ω

  3. 유사하게, 세트 는 샘플 공간 Ω 의 확률에 비례 하는 확률 P ( A ) 를 갖는다 . 그래픽 당신이 볼 말하기 Ω 따라서의 측정 A가 (그 확률 P ( A는 ) )보다 작아야한다 P ( Ω ) . 세트 A ∩에 대해 동일한 추론이 유효합니다.AP(A)ΩAΩAP(A)P(Ω). 이 세트는 측정 가능하며 측정은 P ( A B )ABP(AB) .

  4. 당신이 것을 말된다 해주기 경우 일어난 당신이 생각해야 하는 것처럼 B는 당신의 "새로운"이었다 Ω . 경우 B는 귀하의 "새"입니다 Ω이 있는지 당신이 할 수있는 100 % 모든 설정에서 일어나는 것을 B .BBΩBΩB

    그리고 그것은 무엇을 의미합니까? 이제는 "새로운"경연 대회 에서 "새로운"표본 공간 B의 관점에서 표현되어야한다는 점을 고려하여 모든 확률 척도를 재조정해야합니다 . 간단한 비율입니다.P(BB)=1B

    다음과 같이 말할 때 직감은 거의 옳습니다.

P (A | B)의 확률은 단순히 A 교차점 B의 확률과 같습니다.

"거의"는 이제 샘플 공간이 변경 되었고 (지금 임) 크기를 조정하려고하기 때문입니다.B 따라.P(AB)

  1. P (P(AB) 샘플 공간이 이제 새로운 세계에서 B . 즉, 다음과 같이 말하십시오 (세트로 이미지에 시각화하십시오).P(AB)B

    새로운 세계에서 측정과 A B 측정 사이의 비율은 Ω 측정과 A B 측정 사이의 비율과 같아야합니다.BABΩAB

  2. 마지막으로 이것을 수학 언어 (단순 비율)로 번역하십시오.

P(B):P(AB)=P(Ω):P(AB)

이후 는 그 다음과 같다 :P(Ω)=1

P(AB)=P(AB):P(B)

5

다음과 같은 문제에 대해 직관을 쉽게 생각할 수 있습니다.

공이 6 ​​개 (검은 색 6 개, 빨간색 4 개)라고 가정합니다. 검은 공의 3은 굉장하고 빨간 공의 1은 굉장합니다. 검은 공도 최고일까요?

대답은 매우 쉽습니다. 총 6 개의 검은 공 중 3 개의 최고 검은 공이 있기 때문에 50 %입니다.

다음은 확률을 우리의 문제에 매핑하는 방법입니다.

  • Black AND Awesome 인 3 개의 볼은 P(AB)
  • 블랙 인 6 구는 P(B)
  • 공이 검은 색이라는 것을 알면 공이 최고 일 확률 : P(AB)

1
P ( B ) = 6 보다는 을 쓰는 것이 더 합리적이지 않습니까? n(B)=6P(B)=6
Silverfish

@Silverfish 그것은 더 정확할 것이지만,이 경우 직관을
따랐습니다

4

조건부 확률 공식의 기본 직관을 위해 항상 양방향 테이블을 사용하는 것이 좋습니다. 연도별로 150 명의 학생이 있으며 그 중 80 명은 여자이고 70 명은 남자이며 각각은 정확히 하나의 언어 코스를 공부해야합니다. 다른 코스를 수강하는 학생들의 양방향 테이블은 다음과 같습니다.

        | French   German   Italian  | Total
-------- --------------------------- -------
Male    |     30       20        20  |    70
Female  |     25       15        40  |    80
-------- --------------------------- -------
Total   |     55       35        60  |   150

학생이 이탈리아어 코스를 수강한다고 가정하면 여성 일 확률은 얼마입니까? 이탈리아어 코스에는 60 명의 학생이 있으며 그 중 40 명은 이탈리아어를 공부하는 여성입니다.

P(F|Italian)=n(FItalian)n(Italian)=4060=23

여기서 는 세트 A카디널리티 , 즉 포함 된 항목 수입니다. 분자에 n ( F )이 아닌 n ( F Italian ) 을 사용해야 했습니다.n(A)An(FItalian)n(F), because the latter would have included all 80 females, including the other 40 who do not study Italian.

But if the question were flipped around, what is the probability that a student takes the Italian course, given that they are female? Then 40 of the 80 female students take the Italian course, so we have:

P(Italian|F)=n(ItalianF)n(F)=4080=12

I hope this provides intuition for why

P(A|B)=n(AB)n(B)

Understanding why the fraction can be written with probabilities instead of cardinalities is a matter of equivalent fractions. For example, let us return to the probability a student is female given that they are studying Italian. There are 150 students in total, so the probability that a student is female and studies Italian is 40/150 (this is a "joint" probability) and the probability a student studies Italian is 60/150 (this is a "marginal" probability). Note that dividing the joint probability by the marginal probability gives:

P(FItalian)P(Italian)=40/15060/150=4060=n(FItalian)n(Italian)=P(F|Italian)

(To see that the fractions are equivalent, multiplying numerator and denominator by 150 removes the "/150" in each.)

More generally, if your sampling space Ω has cardinality n(Ω) — in this example the cardinality was 150 — we find that

P(A|B)=n(AB)n(B)=n(AB)/n(Ω)n(B)/n(Ω)=P(AB)P(B)

3

I would reverse the logic. The probability that both A and B is either:

  1. The probability B happened, and that given that A happened.
  2. Same but reverse roles for A and B

This will give you

p(AB)=p(B)p(AB)

If you're looking for a negative to your suggestion, it's while it's true the probability of A given B is contained in the probability of the product, the space you're rolling the dice in is smaller than your original probability space - you know for sure you're "in" B, hence you divide by the size of the new space.


2

The Venn diagram doesn't represent probability, it represents the measure of subsets of the event space. A probability is the ratio between two measures; the probability of X is the size of "everything that constitutes X" divided the size of "all the events being considered". Any time you're calculating a probability, you need both a "success space" and a "population space". You can't calculated a probability based just on "how big" the success space is. For instance, the probability of rolling a seven with two dice is the number of ways of rolling a seven divided by the total number of ways of rolling two dice. Just knowing the number of ways of rolling a seven is not enough to calculate the probability. P(A|B) is the ratio of the measure of the "both A and B happen" space and the measure of the "B happens" space. That's what the "|" means: it means "make what comes after this the population space".


2

I think the best way to think about this is drawing step-by-step paths.

Let's describe Event B as rolling a 4 on a fair die - this can be easily shown to have probability 16. Now let's describe Event A as drawing an Ace from a standard 52-card deck of cards - this can be easily shown to have probability 113.

Let's now run an experiment where we roll a die and then pick a card. So P(A|B) would be the probability that we draw an Ace, given that we have already rolled a 4. If you look at the image, this would be the 16 path (go up) and then the 113 path (go up again).

Intuitively, the total probability space is what we have already been given: rolling the 4. We can ignore the 113 and 1213 the initial down path leads to, since it was GIVEN that we rolled a 4. By law of multiplication, our total space is then (16×113)+(16×1213).

Now what's the probability we drew an Ace, GIVEN that we rolled a 4? The answer by using the path is (16×113), which we then need to divide by the total space. So we get

P(A|B)=16×113(16×113)+(16×1213).

enter image description here


2
I was wondering what the downvote was for, because probability trees can be very instructive. Perhaps the concern is that using independent events for the illustration misses the very point of conditional probability, which is that the probability distribution can change depending on the conditioning event. Using a less-superficial illustration may help.
whuber

1

Think of it on terms of counts. Marginal probability is how many times A occurred divided by sample size. Joint probability of A and B is how many times A occurred together with B divided by sample size. Conditional probability of A given B is how many times A occurred together with B divided by how many times B occurred, i.e. only the A's "within" B's.

You can find nice visual illustration on this blog, that shows it using Lego blocks.


1

At the time of writing there is about 10 answers which seem to all miss the most important point: you are essentially right.

In that case, wouldn't the probability of P(A | B) simply be equal to the probability of A intersection B, since that's the only way the event could happen?

This is definitely true. This explains why the quantity we to define P(A|B) is actually P(AB) rescaled.

What am I missing?

You are missing that the probability of B being satisfied given that B is satisfied should be 1 since this is quite a certain event, and not P(BB)=P(B) which can well be less than 1. Dividing by P(B) makes the conditional probability of B given B equal to 1, as expected. Actually this is even better and makes the map AP(A|B) a probability – so a conditional probability is actually a probability.


0

I feel it is more intuitive when we have a concrete data to estimate the probabilities.

Let's use mtcars data as an example, the data looks like this (we only use number of cylinders and transmission type.)

> mtcars[,c("am","cyl")]
                    am cyl
Mazda RX4            1   6
Mazda RX4 Wag        1   6
Datsun 710           1   4
Hornet 4 Drive       0   6
...  
...
Ford Pantera L       1   8
Ferrari Dino         1   6
Maserati Bora        1   8
Volvo 142E           1   4

We can calculate the joint distribution on two variables by doing a cross table:

> prop.table(table(mtcars$cyl,mtcars$am))

          0       1
  4 0.09375 0.25000
  6 0.12500 0.09375
  8 0.37500 0.06250

The joint probability means we want to consider two variables at the same time. For example, we will ask how many cars are 4 cylinder and manual transmission.

Now, we come to conditional probability. I found the most intuitive way to explain conditional probability is using the term filtering on data.

Suppose we want to get P(am=1|cyl=4), we will do following estimations:

> cyl_4_cars=subset(mtcars, cyl==4)
> prop.table(table(cyl_4_cars$am))

        0         1 
0.2727273 0.7272727 

This means, we only care cars have 4 cylinder. So we filter data on that. After filtering, we check how many of them are manual transmission.

You can compare conditional this with joint I mentioned earlier to feel the differences.


0

If A were a superset of B the probability that A happens is always 1 given that B happened, i.e. P(A|B) = 1. However, B itself may have a probability much smaller than 1.

Consider the following example:

  • given x is a natural number in 1..100,
  • A is 'x is an even number'
  • B is 'x is divisible by 10'

we then have:

  • P(A) is 0.5
  • P(B) is 0.1

If we know that x is divisible by 10 (i.e.x is in B) we know that it is also an even number (i.e. x is in A) so P(A|B) = 1.

From Bayes' rule we have:

P(A|B)=P(AB)P(B)

note that in our (special) case P(AB), i.e. the probability that x is both an even number and a number divisible by 10 is equal to the probability that x is a number divisible by 10. Therefore we have P(AB)=P(B) and plugging this back into Bayes' rule we get P(A|B)=P(B)/P(B)=1.


For a non-degenerate example consider e.g. A is 'x is divisible by 7' and B is 'x is divisible by 3'. Then P(A|B) is equivalent to 'given that we know that x is divisible by 3 what is the probability that it is (also) divisible by 7 ?'. Or equivalently 'What fraction of the numbers 3, 6, ..., 99 are divisible by 7' ?


0

I think your initial statement may be a misunderstanding.

You wrote:

The formula for conditional probability of A happening, once B has happened is:

From your phrasing, it may sound as if there are 2 events "First B happened, and then we want to calculate the probability that A will happen".

This is not the case. (The following is valid whether there was a misunderstanding or not).

We have just 1 event, which is described by one of 4 possibilities:

  1. neither A nor B;

  2. just A, not B;

  3. just B, not A;

  4. both A and B.

Putting some example numbers on it, let's say

P(A)=0.5,P(B)=0.5,andA and B are independent.

It follows that

P(A and B)=0.25andP(neither A nor B)=0.25.

Initially (with no knowledge of the event), we knew P(AB)=0.25.

But once we know that B has happened, we are in a different space. P(AB) is half of P(B) so the probability of A given B, P(A|B), is 0.5. It is not 0.25, knowing that B has happened.


0

The conditioning probability is NOT equal to the probability of intersection. Here is an intuitive answer:

1) P(BA): "We know that A happened. What is the probability that B will happen?"

2: P(AB) : "We don't know if A or B did happen. What is the probability that both will happen?

The difference is that in the first one, we have extra information (we know that A occurs first). In the second one we do not know anything.

Starting out with the probability of the second one, we can deduce the probability of the first one.

The event that both A and B will occur can happen in two ways:

1) The probability of A AND the probability of B given that A happened.

2) The probability of B AND the probability of A given that B happened.

It turns out that both situations are equally like to happen. (I cannot myself find out the intuitive reason). Thus we have to weight both scenarios with 0.5

P(AB)=1/2P(A(BA))+1/2P(B(AB))

Now use that A and BA are independent and remember that both scenarios are equally likely to happen.

P(AB)=P(A)P(BA)

Tadaaa... now isolate the probability of the conditioning!

btw. I would love if someone could explain why scenario 1 and 2 are equal. The key lies in there imo.

당사 사이트를 사용함과 동시에 당사의 쿠키 정책개인정보 보호정책을 읽고 이해하였음을 인정하는 것으로 간주합니다.
Licensed under cc by-sa 3.0 with attribution required.