What is Over-fitting in Linear Regression and Machine Learning

In this short lesson, we will discuss the concept of over-fitting in Linear Regression. For now I would assume you have a basic knowledge of linear regression, where you have to fit a straight line through a set of data points.

What is Over-fitting?
Over-fitting of a model in regression is a condition where the model corresponds too closely to a particular set of data and may therefore not be able to predict new observation. When over-fitting occurs the model begins to describe random error in the data rather than relationship between two variables.

Polynomial Curve Fitting
Assuming we have an input variable x and we want to use this observation to predict a target variable t.
Let’s also assume that we have a training dataset of N observations of the variable x which can be denoted as a vector:

x = { x1, x2,…xn)T

The corresponding training values for t can be denotes as a vector t given by

t = {t1, t2,…tn)T

Let’s choose values of N = 1 to 10 with intervals of 0,1
Let the relationship between the x and t for the training data set be

f(x) = sin(2πx)

Now the objective of polynomial fitting to discover this function given just the training observation. 
To do that, we would use a polynomial function of the form

where M is the order of the polynomial
xj denote the power x is raised to

First we need to choose the value of w to make the error E(w) to have a minimum value
Next we would choose the value of the order of the polynomial M to get the best fit of our regression model.
We already know the plot for the given function y(x) = sin(2πx). So we will first plot it, and then choose M(the order of the polynomial) for different values. For each value, we would plot it in the same plot with the know function and try to choose which of them best fits
We would choose four values. M = 0, 1, 3, 9
From the figure, the green plot is the given model and the red plot is our polynomial curve.

M = 1 (Under-fitting)   M = 3
M=3 M = 9 (Over-fitting)

From the models, we see that for M = 3 provides the best fit.
For M = 9, the model tends to capture all the data points which means the the value of the error E(w) is zero. However, this curve gives a poor representation of the original function and would not be able to capture new data points. This scenario is what is known as over-fitting.