Get answers and suggestions for various questions from here

One minute to understand the reason why the machine has learned the fit!


What is overfitting?

First let us explain the concept of overfitting?

Overfitting is a phenomenon in which the trained model performs well on the training set, but performs poorly on the test set! The figure below gives an example:

We interpret the third model in the above figure as an over-fitting phenomenon that over-fits the training data without considering the generalization ability. The accuracy rate on the training set and the accuracy on the development set are plotted on a graph as follows:

From the figure we can see that the model performs well on the training set, but performs a good difference on the cross-validation set. This is also the feature of overfitting!

There are three reasons why the model has been fitted.

The main reasons for the overfitting can be as follows:

(1) The data is noisy

(2) Insufficient training data, limited training data

(3) Excessive training model leads to very complicated model

I will explain these three situations separately (here explained in my own understanding, welcome to communicate)

Reason one: the data is noisy

Why is the data noisy, which may lead to over-fitting of the model?

All machine learning processes are a process of testing hypothetical space! We search a set of parameters in the model parameter space, so that our loss function is minimal, that is, we are constantly approaching our real hypothesis model, and the real model can only be obtained if we know all the data distribution.

Often our model finds the optimal model that minimizes the loss function with limited training data and then generalizes the model to all other parts of the data. This is the essence of machine learning!

Well, let's assume that our overall data is shown below:

(I assume here that the overall data distribution satisfies a linear model y = kx+b, which is certainly not so simple in reality, and the amount of data will not be so small, at least how many billions, but does not affect the interpretation. Anyway, the overall data satisfies the model. y)

At this point we get some data, as well as noise, as shown:

Then the model trained by the above training data points is definitely not a linear model (the standard model that is satisfied under the overall data distribution). For example, the trained model is as follows:

Then I took this model with noisy training, and through continuous training on the training set, I can make the loss function value 0, but with this model, to the real overall data distribution (satisfying the linear model) to generalize, the effect It will be very bad, because you are holding a nonlinear model to predict the true distribution of the linear model, and the obvious effect is very poor, which leads to over-fitting!

Reason 2: Insufficient training data

When our training data is insufficient, even if the training data obtained is no noise, the trained model may have an over-fitting phenomenon, as explained below:

Suppose our overall data is distributed as follows:

The training data we get is limited because it is the following:

Then from this training data, the model I got is a linear model. By training more times, I can get a linear model with the loss function as 0 in the training data. With this model, I will generalize the real overall distribution data ( In fact, it is to satisfy the quadratic function model). Obviously, the generalization ability is very poor, and there is an over-fitting phenomenon! Some people also call this situation an under-fitting.

Reason three: Excessive training model leads to a very complicated model

Excessive training models lead to very complex models and can lead to over-fitting! This is in fact very well understood when combined with the first two reasons. When we train the training data, if the training is excessive, resulting in complete fitting of the training data, the resulting model is not necessarily reliable.

For example, in the noisy training data, if we train too much, it will let the model learn the characteristics of noise, which will undoubtedly cause the accuracy rate to drop on the real test set without noise!

Well, here I have finished the content of this article. I tried to explain the reasons for the over-fitting according to my own understanding. I hope that more people can have an intuitive understanding. I really hope to help everyone. Welcome everyone to misunderstand ~

The cover image is from the Andrew Machine Learning class slide!

How to prevent over-fitting? You can use regular, dropout, expand training data and other methods~

In the actual battle of data science and machine learning, in addition to over-fitting, there is a lack of sufficient sample space, under-fitting, Xiao Jingteng effect, survivor bias and so on. For more information on the dry goods of model processing, please see the live broadcast of Yin Xiangzhi’s public lecture “Data is everywhere”: (Automatic identification of QR code)

Pay attention to the Jizhi AI Academy public number

Get more interesting AI tutorials!

Search WeChat public number: swarmAI

Jizhi AI Academy QQ Group: 426390994

School website:

Business cooperation and submissions reprinted |