bzdww

Get answers and suggestions for various questions from here

Think Bayes - Bayes' theorem I understand

cms
Bayes' theorem is a very important theorem in statistics. The statistical school based on Bayes' theorem plays an important role in the statistical world. The probability school is different from the randomness of events. Bayesian statistics Learning is more from the perspective of the observer. The randomness of the event is only caused by the incomplete information of the observer. The amount of information the observer has will affect the observer's perception of the event.

Conditional probability and full probability

Before introducing Bayes' theorem, let's briefly introduce the conditional probability , which describes the probability that event A will occur under the condition that another event B has already occurred. It is noted that A and B may be two events independent of each other, or Not:

Indicates the probability that A and B events will occur at the same time. If A and B are two events independent of each other, then:


The above derivation process in turn proves that if A and B are independent events, the probability of event A occurring is independent of B.

Make a slight change:


Taking into account the multiple possibilities of a priori condition B, a full probability formula is introduced here:


Herein represents the complementary Event B, B is the complement is from the perspective of the collection:


Conditional probability and full probability formulas can be visually represented by Wayne:


Bayesian formula

Based on the conditional probability and the full probability, it is easy to derive the Bayesian formula:


Bayesian formula looks just posterior probability of conversion A combination of forms of expression posterior probability of A + became marginal probability of B, because a lot of real-world problems or difficult to directly observe, however , and very easily measured, Using Bayesian formulas makes it easy for us to calculate many actual probability problems.

a very interesting example

In life, almost everyone (including statisticians) unconsciously confuse the posterior probabilities of two events, namely:


One of the most classic examples is disease detection, assuming that a disease has an infection rate of 0.1% in all populations, and the hospital's current technology is 99% accurate for the disease (99% of the known disease conditions) Sex can be detected positive; 99% of the normal person is checked as normal). If a person is randomly selected from the population to test, and the test result given by the hospital is positive, what is the probability that the person actually gets sick?

Many people will blurt out "99%", but the true probability is much lower than this because they confuse the two posterior probabilities. If A is used to indicate that the person has the disease, B is used to indicate that the hospital test is positive. it represents the "hospital case known a person has the disease detected positive probability", and we now ask is probability "for the person randomly selected from known cases tested positive person sick ", that is .

We can use Bayes' theorem to calculate the probability that this person actually gets sick:


among them:

  • , the probability of being tested
  • , the probability of being detected is not sick
  • , the probability of being tested positive in the case of a disease
  • Probability of being tested positive in the absence of disease

Substituting the above probability into the Bayesian formula gives:


What is the practical significance of this formula here? Let us explain it with a graph (the probability is rounded off in the figure, considering the size of the image, the area does not strictly correspond to the probability):

From the Bayesian point of view, one of the subjects randomly selected, because the information is not sufficient, there are four possibilities of false positive, true positive, false negative and true negative before the detection, these possibilities are detected by the detection technique and The infection rate of the disease determines that when the test result is positive, only true positive and false positive are left, and the probability of true positive is only one tenth of the false positive. The practical significance of the Bayesian formula here. Yes:

Even if it is tested positive by the hospital, the actual probability of illness is actually less than 10%. It is likely to be a false positive. It is often necessary to re-examine to determine whether it is really sick. Let us calculate the initial and re-examination results. When it is positive, the possibility of illness. Assume that the accuracy of the two tests is the same, both are 99%. Here, B is the first test result is positive, C is the second test result is positive, A is the test subject's disease, then both test results are The probability of being positively ill can be expressed as:




among them:

  • , the probability of being tested
  • , the probability that the subject is not sick
  • , the probability that two consecutive tests are positive in the case of illness
  • , the probability that two consecutive tests are positive in the absence of disease

After substitution, you can get:


It can be seen that the retest results greatly improve the credibility of the test. In connection with the above figure, the significance of the retest is to greatly reduce the possibility of false positives (0.01 -> 0.0001) and thus improve the accuracy of the positive test.