{"id":196,"date":"2018-01-09T14:34:00","date_gmt":"2018-01-09T13:34:00","guid":{"rendered":"https:\/\/kindsonthegenius.com\/blog\/2018\/01\/09\/what-is-over-fitting-in-linear-regression-and-machine-learning\/"},"modified":"2020-08-22T11:04:48","modified_gmt":"2020-08-22T09:04:48","slug":"what-is-over-fitting-in-linear-regression-and-machine-learning","status":"publish","type":"post","link":"https:\/\/kindsonthegenius.com\/blog\/what-is-over-fitting-in-linear-regression-and-machine-learning\/","title":{"rendered":"What is Over-fitting in Linear Regression and Machine Learning"},"content":{"rendered":"<div style=\"color: #555555; font-size: 18px; line-height: 30px; text-align: justify;\">\n<div style=\"font-family: 'segoe ui';\">In this short lesson, we will discuss the concept of over-fitting in Linear Regression. For now I would assume you have a basic knowledge of linear regression, where you have to fit a straight line through a set of data points.<\/p>\n<div style=\"clear: both; text-align: center;\"><a href=\"https:\/\/4.bp.blogspot.com\/-Z0KrkSOwpv8\/WlTSdf3pSnI\/AAAAAAAAArQ\/Hxp_QSF9DywITn11n1oJHMsG5qXKI0qOQCLcBGAs\/s1600\/Overfitting-in-Linear-Regression.jpg\" style=\"margin-left: 1em; margin-right: 1em;\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" border=\"0\" data-original-height=\"662\" data-original-width=\"1226\" height=\"172\" src=\"https:\/\/4.bp.blogspot.com\/-Z0KrkSOwpv8\/WlTSdf3pSnI\/AAAAAAAAArQ\/Hxp_QSF9DywITn11n1oJHMsG5qXKI0qOQCLcBGAs\/s320\/Overfitting-in-Linear-Regression.jpg\" width=\"320\" \/><\/a><\/div>\n<p>   <ins data-ad-client=\"ca-pub-7041870931346451\" data-ad-format=\"fluid\" data-ad-layout=\"in-article\" data-ad-slot=\"8227894917\" style=\"display: block; text-align: center;\"><\/ins> <b>What is Over-fitting?<\/b><br \/>Over-fitting of a model in regression is a condition where the model corresponds too closely to a particular set of data and may therefore not be able to predict new observation. When over-fitting occurs the model begins to describe random error in the data rather than relationship between two variables.<\/p>\n<p><b>Polynomial Curve Fitting<\/b><br \/>Assuming we have an input variable x and we want to use this observation to predict a target variable <span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\"><i>t<\/i><\/span>.<br \/>Let&#8217;s also assume that we have a training dataset of N observations of the variable x which can be denoted as a vector:<\/p>\n<div style=\"text-align: center;\"><span style=\"color: black;\"><i><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\">x = { x<sub>1<\/sub>, x<sub>2<\/sub>,&#8230;x<sub>n<\/sub><\/span><\/i><i><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\">)<sup>T<\/sup><\/span><\/i><\/span><\/div>\n<p>The corresponding training values for t can be denotes as a vector t given by<\/p>\n<div style=\"text-align: center;\"><span style=\"color: black;\"><i><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\">t = {t<sub>1<\/sub>, t<sub>2<\/sub>,&#8230;t<sub>n<\/sub>)<sup>T<\/sup><\/span><\/i><\/span><\/div>\n<p>Let&#8217;s choose values of N = 1 to 10 with intervals of 0,1<br \/>Let the relationship between the x and t for the training data set be<\/p>\n<div style=\"text-align: center;\"><span style=\"color: black;\"><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\"><i>f(x) = sin(2<span>\u03c0x)<\/span><\/i><\/span><\/span><\/div>\n<p><span>Now the objective of polynomial fitting to discover this function given just the training observation.&nbsp;<\/span><br \/><span>To do that, we would use a polynomial function of the form<\/span><br \/><span><br \/><\/span><\/p>\n<div style=\"clear: both; text-align: center;\"><a href=\"https:\/\/3.bp.blogspot.com\/-_S8GpA_IYNA\/WlTLcp7126I\/AAAAAAAAAqs\/N_GQ0eXSqp0uIEvLE9wI7OlW38_U5xkAwCLcBGAs\/s1600\/Polynomial%2BCurve%2BFitting.jpg\" style=\"margin-left: 1em; margin-right: 1em;\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" border=\"0\" data-original-height=\"149\" data-original-width=\"1130\" height=\"83\" src=\"https:\/\/3.bp.blogspot.com\/-_S8GpA_IYNA\/WlTLcp7126I\/AAAAAAAAAqs\/N_GQ0eXSqp0uIEvLE9wI7OlW38_U5xkAwCLcBGAs\/s640\/Polynomial%2BCurve%2BFitting.jpg\" width=\"640\" \/><\/a><\/div>\n<p><span>where M is the order of the polynomial<\/span><br \/><span><span style=\"color: black;\"><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\"><i>x<sub>j<\/sub> <\/i><\/span><\/span>denote the power <i><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\">x<\/span><\/i> is raised to<\/span><br \/><span><br \/><\/span><span>First we need to choose the value of w to make the error<i><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\"> E(<b>w<\/b>)<\/span><\/i> to have a minimum value<\/span><br \/><span>Next we would choose the value of the order of the polynomial M to get the best fit of our regression model.<\/span><br \/><span>We already know the plot for the given function <span style=\"color: black;\"><i><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\">y(x) = <\/span><\/i><\/span><\/span><span><span style=\"color: black;\"><i><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\">sin(2<\/span><\/i><\/span><span><span style=\"color: black;\"><i><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\">\u03c0x)<\/span><\/i><\/span>. So we will first plot it, and then choose M(the order of the polynomial) for different values. For each value, we would plot it in the same plot with the know function and try to choose which of them best fits<\/span><\/span><br \/><span><span>We would choose four values. <span style=\"color: black;\"><i><span style=\"font-family: &quot;georgia&quot; , &quot;times new roman&quot; , serif;\">M = 0, 1, 3, 9<\/span><\/i><\/span><\/span><\/span><br \/><span><span>From the figure, the green plot is the given model and the red plot is our polynomial curve.<\/span><\/span><\/p>\n<table cellspacing=\"5\">\n<tbody>\n<tr>\n<td><a href=\"https:\/\/3.bp.blogspot.com\/-c03C34BZak4\/WlTOJ7AYuYI\/AAAAAAAAAq8\/R7YGuPhhkdM1z_-0_sdUKgj4MH3tDKYdwCLcBGAs\/s1600\/Polynomial%2B1.JPG\" style=\"clear: left; float: left; margin-bottom: 1em; margin-right: 1em;\" target=\"_blank\" rel=\"noopener\">M = 1 (Under-fitting) &nbsp; <img loading=\"lazy\" decoding=\"async\" border=\"0\" data-original-height=\"381\" data-original-width=\"499\" height=\"244\" src=\"https:\/\/3.bp.blogspot.com\/-c03C34BZak4\/WlTOJ7AYuYI\/AAAAAAAAAq8\/R7YGuPhhkdM1z_-0_sdUKgj4MH3tDKYdwCLcBGAs\/s320\/Polynomial%2B1.JPG\" width=\"320\" \/><\/a><\/td>\n<td><a href=\"https:\/\/1.bp.blogspot.com\/-S1m3nX6-syo\/WlTOJqvAkwI\/AAAAAAAAAq4\/Y_ZMGAfjTnwDaMEwsYGGHaV32XhMU-CngCLcBGAs\/s1600\/Polynomial%2B2.JPG\" style=\"margin-left: 1em; margin-right: 1em;\" target=\"_blank\" rel=\"noopener\">M = 3<img loading=\"lazy\" decoding=\"async\" border=\"0\" data-original-height=\"378\" data-original-width=\"495\" height=\"244\" src=\"https:\/\/1.bp.blogspot.com\/-S1m3nX6-syo\/WlTOJqvAkwI\/AAAAAAAAAq4\/Y_ZMGAfjTnwDaMEwsYGGHaV32XhMU-CngCLcBGAs\/s320\/Polynomial%2B2.JPG\" width=\"320\" \/><\/a><\/td>\n<\/tr>\n<tr>\n<td>M=3<a href=\"https:\/\/2.bp.blogspot.com\/-KB9LRzaHVbI\/WlTOJzXDfzI\/AAAAAAAAArA\/M52TAKB_zLkMTBZilH_eteDWjrmST_LiQCLcBGAs\/s1600\/Polynomial%2B3.JPG\" style=\"margin-left: 1em; margin-right: 1em;\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" border=\"0\" data-original-height=\"365\" data-original-width=\"487\" height=\"239\" src=\"https:\/\/2.bp.blogspot.com\/-KB9LRzaHVbI\/WlTOJzXDfzI\/AAAAAAAAArA\/M52TAKB_zLkMTBZilH_eteDWjrmST_LiQCLcBGAs\/s320\/Polynomial%2B3.JPG\" width=\"320\" \/><\/a><\/td>\n<td>M = 9 (Over-fitting)<a href=\"https:\/\/2.bp.blogspot.com\/-qug3rtvbs-Y\/WlTOKCgB5RI\/AAAAAAAAArE\/LQTrIeeM7W0vt_lASUmn2qmfIejXSRpYgCLcBGAs\/s1600\/Polynomial%2B4.JPG\" style=\"margin-left: 1em; margin-right: 1em;\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" border=\"0\" data-original-height=\"367\" data-original-width=\"493\" height=\"238\" src=\"https:\/\/2.bp.blogspot.com\/-qug3rtvbs-Y\/WlTOKCgB5RI\/AAAAAAAAArE\/LQTrIeeM7W0vt_lASUmn2qmfIejXSRpYgCLcBGAs\/s320\/Polynomial%2B4.JPG\" width=\"320\" \/><\/a><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>From the models, we see that for M = 3 provides the best fit.<br \/>For M = 9, the model tends to capture all the data points which means the the value of the error E(w) is zero. However, this curve gives a poor representation of the original function and would not be able to capture new data points. This scenario is what is known as <span style=\"color: #cc0000;\"><i><b>over-fitting<\/b><\/i><\/span>.  <br \/><ins data-ad-client=\"ca-pub-7041870931346451\" data-ad-format=\"fluid\" data-ad-layout=\"in-article\" data-ad-slot=\"8227894917\" style=\"display: block; text-align: center;\"><\/ins> <br \/><ins data-ad-client=\"ca-pub-7041870931346451\" data-ad-format=\"fluid\" data-ad-layout=\"in-article\" data-ad-slot=\"8227894917\" style=\"display: block; text-align: center;\"><\/ins>  <\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In this short lesson, we will discuss the concept of over-fitting in Linear Regression. For now I would assume you have a basic knowledge of &hellip; <\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"pagelayer_contact_templates":[],"_pagelayer_content":"","footnotes":""},"categories":[15,11,16],"tags":[],"class_list":["post-196","post","type-post","status-publish","format-standard","hentry","category-data-science","category-learn-machine-learning","category-machine-learning"],"_links":{"self":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/196","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/comments?post=196"}],"version-history":[{"count":1,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/196\/revisions"}],"predecessor-version":[{"id":1467,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/196\/revisions\/1467"}],"wp:attachment":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/media?parent=196"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/categories?post=196"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/tags?post=196"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}