Get answers and suggestions for various questions from here

Statistical knowledge | talk about P and alpha levels


In 2016, a survey of domestic psychology students and researchers (n = 308, including 91 undergraduates, 134 master students, 56 doctoral students, and 27 doctors) showed that more than 95% participated in the survey. Everyone can't understand the meaning of P value correctly. The biggest misunderstanding of the P value is: the P value is taken as the false positive rate (that is, the probability of rejecting the null hypothesis) (Hu Chuanpeng et al., 2016).

Figure 1 is quoted from "Repetitiveness in Psychological Research: From Crisis to Opportunity"

I also found that the concepts and relationships between P and α levels are not so easy to understand, so I sorted out my understanding of P and α levels and hope to share them with you.

First, the P value is a value we obtained by conversion. In the hypothesis test, the t value, z value, and F value we get can be converted to P value, for example, Z=1.96, we can calculate it by table lookup or statistical software. The corresponding P = 0.05 was obtained. So what does the P value mean? In terms of the definition of the book, the P value is the probability of getting the current result and the more extreme result when the null hypothesis is true. This concept is not well understood, we can illustrate it by example.

Here we assume that the null hypothesis distribution is a normal distribution, we are doing the right side test, and the sample mean we get is X1. as shown in picture 2.

From the figure we can see that when the null hypothesis is true, we find that the probability of X1 is P(X1). We substitute X1 into the formula of normal distribution, and it is easy to calculate the value of P(X1). Many people mistakenly believe that P(X1) is the P value, but it is not. P(X1) only represents the probability of the current result, and does not represent the probability of more extreme results.

So what parts of the current probability and the probability of more extreme results should be included?

In this right side test, it should include X1 and its right part, as shown in the shaded area A of Figure 3. Here the area of ​​area A is SA=p(>=x1 | H0)= ∫f(x)dx (x>x1) (note: f(x) is a normal distribution expression), where the area SA of area A is us The required P value.

As for the alpha level, it is the artificially rejected domain. Generally speaking, we will set it to 0.05, which means that when the null hypothesis is true, only 5% of our sampling is likely to fall. within the area.

As shown in Fig. 4, in this right side test, if the α level is set to 0.05, the area B area SB = P(α|H0) = 0.05. So in this hypothesis test, if the null hypothesis is true, then only 5% of the probability of one sampling result falls in the region B, and according to the principle of small probability events, we believe that the result of such a small probability will not be in one sampling. occur. If it happens, then our sample does not come from this population, so we reject the null hypothesis. And when we need more stringent standards, we can set the alpha level to 0.01 or less.

Therefore, P < α (ie, SA < SB) expresses the probability that the current result and the more extreme result appear in one sample is less than the probability of a small probability event that we believe is unlikely to occur in one sample, that is, Our observations are more extreme than the rejections we set. Therefore, the smaller the P value is than the α level, the more confident we are that this sample does not belong to the population represented by the null hypothesis distribution, and the more confident it is to reject the null hypothesis.

Back to the beginning of the article on the investigation of the understanding of P value, many people will regard the P value as the probability of making mistakes in rejecting the null hypothesis, but actually rejecting the null hypothesis is a type of error. The probability of error is our α, P value is only The value we calculate based on one sampling result. This is like the standard for setting the overspeed to 120km/h, and we have detected that the average speed of n cars is 110km/h, but we can't think that the overspeed standard is 110km/h.

The above content is a temporary understanding of myself. If the reader has other opinions, please feel free to correct me.


Hu Chuanpeng, Wang Fei, Guo Jicheng, Song Mengdi, Qi Jie, & Peng Kaiping. (2016). Reproducibility in Psychological Research: From Crisis to Opportunity. Progress in Psychological Science, 24 (9), 1504-1518.

Author: Yanji Xing

Proofreading: Ji Xuejun, Zhao Jiawei

Editor: Cui Weijun (Automatic identification of QR code)