{"id":85,"date":"2018-04-19T19:47:00","date_gmt":"2018-04-19T19:47:00","guid":{"rendered":"https:\/\/kindsonthegenius.com\/blog\/2018\/04\/19\/how-to-build-a-decision-tree-for-classification-step-by-step-procedure-using-entropy-and-gain\/"},"modified":"2020-07-25T22:44:47","modified_gmt":"2020-07-25T20:44:47","slug":"how-to-build-a-decision-tree-for-classification-step-by-step-procedure-using-entropy-and-gain","status":"publish","type":"post","link":"https:\/\/kindsonthegenius.com\/blog\/how-to-build-a-decision-tree-for-classification-step-by-step-procedure-using-entropy-and-gain\/","title":{"rendered":"How to Build Decision Tree for Classification &#8211; (Step by Step Using Entropy and Gain)"},"content":{"rendered":"<p>In this Lesson, I would teach you how to build a decision tree step by step in very easy way, with clear explanations and diagrams.<\/p>\n<div style=\"clear: both; text-align: center;\"><\/div>\n<p><b>Content<\/b><\/p>\n<ol>\n<li><a href=\"#t1\">What are Decision Trees<\/a><\/li>\n<li><a href=\"https:\/\/kindsonthegenius.com\/blog\/how-to-build-a-decision-tree-for-classification-step-by-step-procedure-using-entropy-and-gain#t2\">Exercise for this Lesson<\/a><\/li>\n<li><a href=\"#t3\">The ID3 Algorithm for Building Decision Trees<\/a><\/li>\n<li><a href=\"#t4\">Step by Step Procedure<\/a>\n<ul>\n<li><a href=\"https:\/\/kindsonthegenius.com\/blog\/how-to-build-a-decision-tree-for-classification-step-by-step-procedure-using-entropy-and-gain#s1\">Step 1: Determine the Root of the Tree<\/a><\/li>\n<li><a href=\"#s2\">Step 2: Calculate Entropy for The Classes<\/a><\/li>\n<li><a href=\"#s3\">Step 3: Calculate Entropy After Split for Each Attribute<\/a><\/li>\n<li><a href=\"https:\/\/kindsonthegenius.com\/blog\/how-to-build-a-decision-tree-for-classification-step-by-step-procedure-using-entropy-and-gain#s4\">Step 4: Calculate Information Gain for each split\u00a0<\/a><\/li>\n<li><a href=\"#s5\">Step 5: Perform the Split<\/a><\/li>\n<li><a href=\"https:\/\/kindsonthegenius.com\/blog\/how-to-build-a-decision-tree-for-classification-step-by-step-procedure-using-entropy-and-gain#s6\">Step 6: Perform Further Splits<\/a><\/li>\n<li><a href=\"#s7\">Step 7: Complete the Decision Tree<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"http:\/\/kindsonthegenius.blogspot.com\/#t5\">Final Notes<\/a><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h3 id=\"t1\">1. What are Decision Trees<\/h3>\n<p>A decision tree is a tree-like structure that is used as a model for classifying data. A decision tree decomposes the data into sub-trees made of other sub-trees and\/or leaf nodes.<br \/>\nA decision tree is made up of three types of nodes<\/p>\n<ul>\n<li><em>Decision Nodes<\/em>: These type of node have two or more branches<\/li>\n<li><em>Leaf Nodes<\/em>: The lowest nodes which represents decision<\/li>\n<li><em>Root Node<\/em>: This is also a decision node but at the topmost level<\/li>\n<\/ul>\n<p>The question is : How to we build a decision tree? Let&#8217;s see!<\/p>\n<p>&nbsp;<\/p>\n<h3 id=\"t2\">2. Exercise for this Lesson<\/h3>\n<p>Consider the table below. It represent factors that affect whether John would go out to play golf or not. Using the data in the table, build a a decision tree to model that can be used to predict if John would play golf or not.<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-sD_VfJzi8YY\/WtTygMEGRCI\/AAAAAAAABwA\/mnnX-Q14j3kRoFzbygUrhgDS_DQwSemZQCLcBGAs\/s1600\/Decision%2BTree%2BExercise.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-sD_VfJzi8YY\/WtTygMEGRCI\/AAAAAAAABwA\/mnnX-Q14j3kRoFzbygUrhgDS_DQwSemZQCLcBGAs\/s640\/Decision%2BTree%2BExercise.jpg\" width=\"640\" height=\"545\" border=\"0\" data-original-height=\"845\" data-original-width=\"989\" \/>\u00a0<\/a><\/div>\n<div style=\"clear: both; text-align: center;\"><b>Figure 1:<\/b> Exercise on Decision Trees<\/div>\n<p>&nbsp;<\/p>\n<h3 id=\"t3\">3. Algorithm for Building Decision Trees &#8211; The ID3 Algorithm(you can skip this!)<\/h3>\n<p>This is the algorithm you need to learn, that is applied in creating a decision tree. Although you don&#8217;t need to memorize it but just know it. It is called the ID3 algorithm by J. R. Quinlan. The algorithm uses Entropy and Informaiton Gain to build the tree.<br \/>\nLet:<\/p>\n<p>S = Learning Set<br \/>\nA = Attibute Set<br \/>\nV = Attribute Values<\/p>\n<hr \/>\n<p><span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">Begin<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">Load learning sets\u00a0 and create decision tree root node(rootNode), add learning set S into root not as its subset<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">For rootNode, compute Entropy(rootNode.subset) first<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">If Entropy(rootNode.subset) == 0 (subset is homogenious)<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">\u00a0\u00a0\u00a0\u00a0\u00a0 return a leaf node<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><br \/>\n<\/span><span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">If Entropy(rootNode.subset)!= 0 (subset is not homogenious)<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">\u00a0\u00a0\u00a0\u00a0 compute Information Gain for each attribute left (not been used for spliting)<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">\u00a0\u00a0\u00a0\u00a0 Find attibute A with Maximum(Gain(S, A))<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">\u00a0\u00a0\u00a0\u00a0 Create child nodes for this root node and add to rootNode in the decision tree<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><br \/>\n<\/span><span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">For each child of the rootNode<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">\u00a0\u00a0 Apply ID3(S, A, V)<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">\u00a0\u00a0 Continue until a node with Entropy of 0 or a leaf node is reached<\/span><\/span><br \/>\n<span style=\"color: #073763;\"><span style=\"font-family: 'times' , 'times new roman' , serif;\">End<\/span><\/span><\/p>\n<p><i>(You actually don&#8217;t have to worry trying to understand every bit of this algorithm. The application would make is very clear)<\/i><\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<h3 id=\"t4\">4. Step by Step Procedure for Building a Decision Tree<\/h3>\n<p>Here I would give you a simple step by step procedure that is super-easy to follow in creating a decision tree no matter how complex it could be.<br \/>\nStart by determining the root of the tree and the class column<\/p>\n<p>&nbsp;<\/p>\n<h3 id=\"s1\">Step 1: Determine the Decision Column<\/h3>\n<hr \/>\n<p>Since decision trees are used for clasification, you need to determine the classes which are the basis for the decision.<br \/>\nIn this case, it it the last column, that is <i>Play Golf<\/i> column with classes <i><b>Yes <\/b><\/i>and <b><i>No<\/i><\/b>.<\/p>\n<p>To determine the rootNode we need to compute the entropy.<br \/>\nTo do this, we create a frequency table for the classes (the Yes\/No column).<\/p>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/3.bp.blogspot.com\/-sr5Xk0iBLZM\/WtUToEVlKSI\/AAAAAAAABwQ\/914mIDeieOUpVG38pYwx3Q1uVkOBYYXRwCLcBGAs\/s1600\/Decistion%2BTree%2B-%2BFrequency%2BTable%2B-%2BPlay%2BGolf.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/3.bp.blogspot.com\/-sr5Xk0iBLZM\/WtUToEVlKSI\/AAAAAAAABwQ\/914mIDeieOUpVG38pYwx3Q1uVkOBYYXRwCLcBGAs\/s200\/Decistion%2BTree%2B-%2BFrequency%2BTable%2B-%2BPlay%2BGolf.jpg\" width=\"200\" height=\"88\" border=\"0\" data-original-height=\"260\" data-original-width=\"584\" \/>\u00a0<\/a><\/div>\n<div style=\"clear: both; text-align: center;\"><b>Table 2: <\/b>Frequency Table<\/div>\n<p>&nbsp;<\/p>\n<h3 id=\"s2\">Step 2: Calculating Entropy for the classes (Play Golf)<\/h3>\n<hr \/>\n<p>In this step, you need to calculate the entropy for the Play Golf column and the calculation step is given below.<\/p>\n<div style=\"text-align: center;\"><i>Entropy(PlayGolf) = <b>E<\/b>(5,9) <\/i><\/div>\n<p><i><\/i><i><br \/>\n<\/i><\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-nCz0cZ8jYMQ\/WtUWR1NJXdI\/AAAAAAAABww\/qdjyvECbSr4IiBSpYCevuznnKcNNjHmSgCLcBGAs\/s1600\/Decistion%2BTree%2B-%2BEntropy%2BCalculation.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-nCz0cZ8jYMQ\/WtUWR1NJXdI\/AAAAAAAABww\/qdjyvECbSr4IiBSpYCevuznnKcNNjHmSgCLcBGAs\/s400\/Decistion%2BTree%2B-%2BEntropy%2BCalculation.jpg\" width=\"400\" height=\"156\" border=\"0\" data-original-height=\"381\" data-original-width=\"970\" \/><\/a><\/div>\n<p>&nbsp;<\/p>\n<h3 id=\"s3\">Step 3: Calculate Entropy for Other Attributes After Split<\/h3>\n<hr \/>\n<p>For the other four attributes, we need to calculate the entropy after each of the split.<\/p>\n<ul>\n<li>E(PlayGolf, Outloook)<\/li>\n<li>E(PlayGolf, Temperature)<\/li>\n<li>E(PlayGolf, Humidity)<\/li>\n<li>E(PlayGolf,Windy)<\/li>\n<\/ul>\n<p>The entropy for two variables\u00a0 is calculated using the formula.<br \/>\n(don&#8217;t worry if about this formula, its really easy doing the calculation \u263a<\/p>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-T24C_trBpMk\/WtUb-9eANzI\/AAAAAAAABxA\/6ACcH5f7b1M691AGf9OQOk1bosS2q3OLQCLcBGAs\/s1600\/Decistion%2BTree%2B-%2BEntropy%2Bof%2BTwo%2BVariables.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"\" src=\"https:\/\/2.bp.blogspot.com\/-T24C_trBpMk\/WtUb-9eANzI\/AAAAAAAABxA\/6ACcH5f7b1M691AGf9OQOk1bosS2q3OLQCLcBGAs\/s320\/Decistion%2BTree%2B-%2BEntropy%2Bof%2BTwo%2BVariables.jpg\" width=\"238\" height=\"55\" border=\"0\" data-original-height=\"125\" data-original-width=\"539\" \/><\/a><\/div>\n<p>There to calculate <b>E<\/b>(PlayGolf, Outlook), we would use the formula below:<\/p>\n<div style=\"clear: both; text-align: center;\"><\/div>\n<div style=\"clear: both; text-align: left;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/4.bp.blogspot.com\/-DM_-HQOvEPY\/WtUiVsMmVMI\/AAAAAAAABxc\/_nJMmx6len4btYASn2hJyMghpIddvHk3gCLcBGAs\/s1600\/Decistion%2BTree%2B-%2BEntropy%2Bof%2BTwo%2BVariables2.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/4.bp.blogspot.com\/-DM_-HQOvEPY\/WtUiVsMmVMI\/AAAAAAAABxc\/_nJMmx6len4btYASn2hJyMghpIddvHk3gCLcBGAs\/s640\/Decistion%2BTree%2B-%2BEntropy%2Bof%2BTwo%2BVariables2.jpg\" width=\"640\" height=\"28\" border=\"0\" data-original-height=\"61\" data-original-width=\"1300\" \/><\/a>Which is<\/div>\n<div><\/div>\n<div style=\"clear: both; text-align: left;\">the same as:<\/div>\n<div style=\"clear: both; text-align: left;\"><\/div>\n<div style=\"clear: both; text-align: center;\"><i>E(PlayGolf, Outlook)\u00a0 = <b>P<\/b>(Sunny) <b>E<\/b>(3,2) + P(Overcast) <b>E<\/b>(4,0) + <b>P<\/b>(rainy) E(2,30<\/i><\/div>\n<div style=\"clear: both; text-align: center;\"><\/div>\n<p>This formula may look unfriendly, but it is quite clear. The easiest way to approach this calculation is to create a frequency table for the two variables, that is PlayGolf and Outlook.<\/p>\n<p>This frequency table is given below:<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/4.bp.blogspot.com\/-R2Y6cMoCA2I\/WtUh53NPSxI\/AAAAAAAABxY\/zxCy_7Iz8gI3ha0sthZt_7nvgat42G2-ACLcBGAs\/s1600\/Decistion%2BTree%2B-%2BEntropy%2Bof%2BTwo%2BVariables3.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/4.bp.blogspot.com\/-R2Y6cMoCA2I\/WtUh53NPSxI\/AAAAAAAABxY\/zxCy_7Iz8gI3ha0sthZt_7nvgat42G2-ACLcBGAs\/s320\/Decistion%2BTree%2B-%2BEntropy%2Bof%2BTwo%2BVariables3.jpg\" width=\"320\" height=\"117\" border=\"0\" data-original-height=\"222\" data-original-width=\"603\" \/>\u00a0<\/a><\/div>\n<div style=\"clear: both; text-align: center;\">Table 3: Frequency Table for Outlook<\/div>\n<p>&nbsp;<\/p>\n<p>Using this table, we can then calculate E(PlayGolf, Outlook), which would then be given by the formula below<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-imdc1oWPMe8\/WtUj2JtJNzI\/AAAAAAAABxs\/ch4jn3jU-2UrzM7vgoWOVihVBEhyPsSuQCLcBGAs\/s1600\/Decistion%2BTree%2B-%2BEntropy%2Bof%2BTwo%2BVariables4.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-imdc1oWPMe8\/WtUj2JtJNzI\/AAAAAAAABxs\/ch4jn3jU-2UrzM7vgoWOVihVBEhyPsSuQCLcBGAs\/s400\/Decistion%2BTree%2B-%2BEntropy%2Bof%2BTwo%2BVariables4.jpg\" width=\"400\" height=\"40\" border=\"0\" data-original-height=\"83\" data-original-width=\"826\" \/><\/a><\/div>\n<p>&nbsp;<\/p>\n<p>Let&#8217;s go ahead to calculate E(3,2)<br \/>\nWe would not need to calculate the second and the third terms! This is because<\/p>\n<p>E(4, 0) = 0<br \/>\nE(2,3) = E(3,2)<br \/>\nIsn&#8217;t this interesting!!!<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-HsYFjNR0xdI\/WtUma2vNVHI\/AAAAAAAABx4\/IV0N_y8VlvodugKhxTyaCIatgYVxdRNrQCLcBGAs\/s1600\/Decistion%2BTree%2B-%2BEntropy%2Bof%2BTwo%2BVariables5.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-HsYFjNR0xdI\/WtUma2vNVHI\/AAAAAAAABx4\/IV0N_y8VlvodugKhxTyaCIatgYVxdRNrQCLcBGAs\/s400\/Decistion%2BTree%2B-%2BEntropy%2Bof%2BTwo%2BVariables5.jpg\" width=\"400\" height=\"186\" border=\"0\" data-original-height=\"458\" data-original-width=\"979\" \/><\/a><\/div>\n<p>&nbsp;<\/p>\n<p>Just for clarification, let&#8217;s show the the calculation steps<br \/>\nThe calculation steps for E(4,0):<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-cxGn4uzTrLk\/WtZUyrhFDJI\/AAAAAAAAByI\/NaVe4iGm8AMyWSOZIYOCBxfmMqfGkolCQCLcBGAs\/s1600\/Rainy.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-cxGn4uzTrLk\/WtZUyrhFDJI\/AAAAAAAAByI\/NaVe4iGm8AMyWSOZIYOCBxfmMqfGkolCQCLcBGAs\/s320\/Rainy.jpg\" width=\"320\" height=\"127\" border=\"0\" data-original-height=\"389\" data-original-width=\"980\" \/><\/a><\/div>\n<p>&nbsp;<\/p>\n<p>The calculation step for E(2,3) is given below<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-SrB04dFUkZg\/WtZU4Uu16WI\/AAAAAAAAByM\/QX53Nj4Ipq0_aCVJOje1TdI9wXTo5hmrQCLcBGAs\/s1600\/Overcast.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-SrB04dFUkZg\/WtZU4Uu16WI\/AAAAAAAAByM\/QX53Nj4Ipq0_aCVJOje1TdI9wXTo5hmrQCLcBGAs\/s320\/Overcast.jpg\" width=\"320\" height=\"127\" border=\"0\" data-original-height=\"389\" data-original-width=\"979\" \/><\/a><\/div>\n<p>Time to put it all together.<\/p>\n<p>We go ahead to calculate the <b>E<\/b>(PlayGolf, Outlook) by substituting the values we calculated from <b>E<\/b>(Sunny), <b>E<\/b>(Overcast) and <b>E<\/b>(Rainy) in the equation:<\/p>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><i>E(PlayGolf, Outlook)\u00a0 = P(Sunny) E(3,2) + P(Overcast) E(4,0) + P(rainy) E(2,3)<\/i><\/div>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/4.bp.blogspot.com\/-DK4jUKpsE5A\/WtZVqiOqUFI\/AAAAAAAAByc\/KaXYSR50STUYeQ8fwGDiExQEs7CY59PagCLcBGAs\/s1600\/Calculating%2BP%2528PlayGolf%252C%2BOutlook%2529.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/4.bp.blogspot.com\/-DK4jUKpsE5A\/WtZVqiOqUFI\/AAAAAAAAByc\/KaXYSR50STUYeQ8fwGDiExQEs7CY59PagCLcBGAs\/s400\/Calculating%2BP%2528PlayGolf%252C%2BOutlook%2529.jpg\" width=\"400\" height=\"185\" border=\"0\" data-original-height=\"388\" data-original-width=\"838\" \/><\/a><\/div>\n<p><i><b>E(PlayGolf, Temperature) Calculation<\/b><\/i><\/p>\n<p>Just like in the previous calculation, the calculation of E(PlayGolf, Temperature) is given below. It<br \/>\nIt is easier to do if you form the frequency table for the split for Temperature as shown.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/3.bp.blogspot.com\/-rc3HnCbker4\/WtZWCpgFIXI\/AAAAAAAAByg\/Og2wNOM1eewwu-gdhmAo9ckng6OgQcQwwCLcBGAs\/s1600\/Decision_Trees_Temperature.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/3.bp.blogspot.com\/-rc3HnCbker4\/WtZWCpgFIXI\/AAAAAAAAByg\/Og2wNOM1eewwu-gdhmAo9ckng6OgQcQwwCLcBGAs\/s320\/Decision_Trees_Temperature.jpg\" width=\"320\" height=\"116\" border=\"0\" data-original-height=\"260\" data-original-width=\"714\" \/>\u00a0<\/a><\/div>\n<div style=\"clear: both; text-align: center;\"><b>Table 4: <\/b>Frequency Table for Temperature<\/div>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><i>E(PlayGolf, Temperature)\u00a0 = P(Hot) E(2,2) + P(Cold) E(3,1) + P(Mild) E(4,2)<\/i><\/div>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-qbHnJZI0wyM\/WtjT1B1GFII\/AAAAAAAABz4\/HOXdZuurF3wn2OVA0OG6OjxSMShScMK7wCLcBGAs\/s1600\/Entropy%2Bfor%2BTemperature.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-qbHnJZI0wyM\/WtjT1B1GFII\/AAAAAAAABz4\/HOXdZuurF3wn2OVA0OG6OjxSMShScMK7wCLcBGAs\/s640\/Entropy%2Bfor%2BTemperature.jpg\" width=\"640\" height=\"416\" border=\"0\" data-original-height=\"769\" data-original-width=\"1172\" \/><\/a><\/div>\n<p><i><b>E(PlayGolf, Humidity) Calculation<\/b><\/i><\/p>\n<p>Just like in the previous calculation, the calculation of E(PlayGolf, Humidity) is given below. It<br \/>\nIt is easier to do if you form the frequency table for the split for Humidity as shown.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/3.bp.blogspot.com\/-c8GtlQgfO1w\/WtZWVrFwSlI\/AAAAAAAAByo\/7vCeeAs7gJUUluajwqUraoYd1ZpyNtqOACLcBGAs\/s1600\/Decision_Trees_Humidity.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/3.bp.blogspot.com\/-c8GtlQgfO1w\/WtZWVrFwSlI\/AAAAAAAAByo\/7vCeeAs7gJUUluajwqUraoYd1ZpyNtqOACLcBGAs\/s320\/Decision_Trees_Humidity.jpg\" width=\"320\" height=\"97\" border=\"0\" data-original-height=\"201\" data-original-width=\"663\" \/><\/a><\/div>\n<div style=\"clear: both; text-align: center;\"><b>Table 5:<\/b> Frequency Table for Humidity<\/div>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-oRbf1GmM9kQ\/WtjVJ7cYgYI\/AAAAAAAAB0E\/sjrabvpFL6I2k6c7LnSdQXkCRVksuLhUwCLcBGAs\/s1600\/Entropy%2Bfor%2BHumidity.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-oRbf1GmM9kQ\/WtjVJ7cYgYI\/AAAAAAAAB0E\/sjrabvpFL6I2k6c7LnSdQXkCRVksuLhUwCLcBGAs\/s640\/Entropy%2Bfor%2BHumidity.jpg\" width=\"640\" height=\"384\" border=\"0\" data-original-height=\"698\" data-original-width=\"1160\" \/><\/a><\/div>\n<p><i><b>E(PlayGolf, Windy) Calculation<\/b><\/i><\/p>\n<p>Just like in the previous calculation, the calculation of E(PlayGolf, Windy) is given below. It<br \/>\nIt is easier to do if you form the frequency table for the split for Windy as shown.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-yX0xhO1UgQ0\/WtZWsWFtxJI\/AAAAAAAAByw\/cvp7qED_4xAbn9tc38S_aWnS23GSYDmYgCLcBGAs\/s1600\/Decision_Trees_Windy.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-yX0xhO1UgQ0\/WtZWsWFtxJI\/AAAAAAAAByw\/cvp7qED_4xAbn9tc38S_aWnS23GSYDmYgCLcBGAs\/s320\/Decision_Trees_Windy.jpg\" width=\"320\" height=\"102\" border=\"0\" data-original-height=\"212\" data-original-width=\"659\" \/>\u00a0<\/a><\/div>\n<div style=\"clear: both; text-align: center;\"><b>Table 6: <\/b>Frequency Table for Windy<\/div>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-4NFnb5Irmsw\/WtjWFpZlpDI\/AAAAAAAAB0Q\/_svnoM6NJCgEjHC4EYW3fqiiMXnj2dD1wCLcBGAs\/s1600\/Entropy%2Bfor%2BWindy.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-4NFnb5Irmsw\/WtjWFpZlpDI\/AAAAAAAAB0Q\/_svnoM6NJCgEjHC4EYW3fqiiMXnj2dD1wCLcBGAs\/s400\/Entropy%2Bfor%2BWindy.jpg\" width=\"400\" height=\"316\" border=\"0\" data-original-height=\"688\" data-original-width=\"865\" \/><\/a><\/div>\n<p>Wow! That is so much work! So take break, walk around a little and take a glass of cold water.<br \/>\nThen we continue.<br \/>\nSo now that we have all the entropies for all the four attributes, let&#8217;s go ahead to summarize them as shown in below:<\/p>\n<ol>\n<li>E(PlayGolf, Outloook) = <b>0.693<\/b><\/li>\n<li>E(PlayGolf, Temperature) = <b>0.911<\/b><\/li>\n<li>E(PlayGolf, Humidity) = <b>0.788<\/b><\/li>\n<li>E(PlayGolf,Windy) = <b>0.892<\/b><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h3 id=\"s4\">Step 4: Calculating Information Gain for Each Split<\/h3>\n<hr \/>\n<p>The next step is to calculate the information gain for each of the attributes. The information gain is calculated from the split using each of the attributes. Then the attribute with the largest information gain is used for the split.<\/p>\n<p>&nbsp;<\/p>\n<p>The information gain is calculated using the formula:<\/p>\n<p><i>Gain(S,T) = Entropy(S) &#8211; Entropy(S,T)<\/i><i><\/i><br \/>\n<i><br \/>\n<\/i>For example, the information gain after spliting using the Outlook attibute is given by:<\/p>\n<p><i>Gain(PlayGolf, Outlook) = Entropy(PlayGolf) &#8211; Entropy(PlayGolf, Outlook)<\/i><\/p>\n<p>So let&#8217;s go ahead to do the calculation<\/p>\n<p><i>Gain(PlayGolf, Outlook) = Entropy(PlayGolf) &#8211; Entropy(PlayGolf, Outlook)<\/i><br \/>\n= 0.94 &#8211; 0.693<b> = 0.247<\/b><br \/>\n<i><br \/>\n<\/i><i><br \/>\n<\/i><i>Gain(PlayGolf, Temperature) = Entropy(PlayGolf) &#8211; Entropy(PlayGolf, Temparature)<\/i><br \/>\n<i>= 0.94 &#8211; 0.911<b> =<\/b> <b>0.029<\/b><\/i><b><\/b><i><br \/>\n<\/i><i><br \/>\n<\/i><i>Gain(PlayGolf, Humidity) = Entropy(PlayGolf) &#8211; Entropy(PlayGolf, Humidity)<\/i><br \/>\n<i>= 0.94 &#8211; 0.788<b> = 0.152<\/b><\/i><b><\/b><i><br \/>\n<\/i><i><br \/>\n<\/i><i>Gain(PlayGolf, Windy) = Entropy(PlayGolf) &#8211; Entropy(PlayGolf, Windy)<\/i><br \/>\n<i>= 0.94 &#8211; 0.892<b> =<\/b> <b>0.048<\/b> <\/i><br \/>\n<b><br \/>\n<\/b>Having calculated all the information gain, we now choose the attribute that gives the highest information gain after the split.<br \/>\n<b><br \/>\n<\/b><br \/>\n<i><b><\/b><\/i><\/p>\n<h3 id=\"s5\">Step 5: Perform the First Split<\/h3>\n<p>&nbsp;<\/p>\n<p><strong>Draw the First Split of the Decision Tree<\/strong><br \/>\nNow that we have all the information gain, we then split the tree based on the attribute with the highest information gain.<\/p>\n<p>From our calculation, the highest information gain comes from Outlook. Therefore the split will look like this:<\/p>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/3.bp.blogspot.com\/-RBu_zCzBmZc\/WtZbzRVHFuI\/AAAAAAAABzE\/_CWY7woKdnApBTbzx-latDjP3TiTCbUPQCLcBGAs\/s1600\/Decision%2BTree%2BStage%2B1.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/3.bp.blogspot.com\/-RBu_zCzBmZc\/WtZbzRVHFuI\/AAAAAAAABzE\/_CWY7woKdnApBTbzx-latDjP3TiTCbUPQCLcBGAs\/s640\/Decision%2BTree%2BStage%2B1.jpg\" width=\"640\" height=\"292\" border=\"0\" data-original-height=\"544\" data-original-width=\"1179\" \/><\/a><\/div>\n<div style=\"clear: both; text-align: center;\">Figure 2: Decision Tree after first split<\/div>\n<p>&nbsp;<\/p>\n<p>Now that we have the first stage of the decison tree, we see that we have one leaf node. But we still need to split the tree further.<\/p>\n<p>To do that, we need to also split the original table to create sub tables.<br \/>\nThis sub tables are given in below.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-o5EUwUj19VA\/WtZkIV-307I\/AAAAAAAABzU\/U3vLsmIaRdg_mmwSaFh3ONgzK3eGR8aXwCLcBGAs\/s1600\/Desision%2BTable%2BFirst%2BSplit.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-o5EUwUj19VA\/WtZkIV-307I\/AAAAAAAABzU\/U3vLsmIaRdg_mmwSaFh3ONgzK3eGR8aXwCLcBGAs\/s640\/Desision%2BTable%2BFirst%2BSplit.jpg\" width=\"640\" height=\"444\" border=\"0\" data-original-height=\"726\" data-original-width=\"1040\" \/><\/a><\/div>\n<div style=\"clear: both; text-align: center;\">Table 7: Initial Split using Outlook<\/div>\n<p>&nbsp;<\/p>\n<p>From Table 3, we could see that the Overcast outlook requires no further split because it is just one homogeneous group. So we have a leaf node.<\/p>\n<p>&nbsp;<\/p>\n<h3 id=\"s6\">Step 6: Perform Further Splits<\/h3>\n<hr \/>\n<p>The Sunny and the Rainy attributes needs to be split<\/p>\n<p>The Rainy\u00a0 outlook can be split using either Temperature, Humidity or Windy.<\/p>\n<p><i>Quiz 1<\/i>: What attribute would best be used for this split? Why?<\/p>\n<p><i>Answer<\/i>:\u00a0 <b>Humidity<\/b>. Because it produces homogenous\u00a0 groups.<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/1.bp.blogspot.com\/-yn1hXVNIVj8\/WtjiWezBSWI\/AAAAAAAAB00\/Kivy42FD0zkWfw-hkTAVrh0VkbBGYDQ3gCLcBGAs\/s1600\/Humidity%2BSplitTable.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/1.bp.blogspot.com\/-yn1hXVNIVj8\/WtjiWezBSWI\/AAAAAAAAB00\/Kivy42FD0zkWfw-hkTAVrh0VkbBGYDQ3gCLcBGAs\/s400\/Humidity%2BSplitTable.jpg\" width=\"400\" height=\"152\" border=\"0\" data-original-height=\"231\" data-original-width=\"603\" \/>\u00a0<\/a><\/div>\n<div style=\"clear: both; text-align: center;\">Table 8: Split using Humidity<\/div>\n<p>The Rainy attribute could be split using\u00a0 High and Normal attributes and that would give us the tree below.<\/p>\n<div style=\"text-align: center;\">\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/3.bp.blogspot.com\/-AwUteKA-yXw\/WtjYPllqShI\/AAAAAAAAB0c\/fYwB4Q8-nmk_3u9x8r5X7smfWmuMsu_DwCLcBGAs\/s1600\/Split%2Bby%2BRainy.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/3.bp.blogspot.com\/-AwUteKA-yXw\/WtjYPllqShI\/AAAAAAAAB0c\/fYwB4Q8-nmk_3u9x8r5X7smfWmuMsu_DwCLcBGAs\/s640\/Split%2Bby%2BRainy.jpg\" width=\"640\" height=\"331\" border=\"0\" data-original-height=\"576\" data-original-width=\"1109\" \/><\/a><\/div>\n<p><b>Figure 3:<\/b> Split using the Humidity Attribute<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Let&#8217;t now go ahead to do the same thing for the Sunny outlook<br \/>\nThe Rainy\u00a0 outlook can be split using either Temperature, Humidity or Windy.<br \/>\n<b><\/b><b><br \/>\n<\/b><i>Quiz 2<\/i>: What attribute would best be used for this split? Why?<br \/>\n<i>Answer<\/i>:\u00a0 <b>Windy <\/b>. Because it produces homogeneous\u00a0 groups.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/2.bp.blogspot.com\/-EexbgEZsjWs\/WtjijSF-ZzI\/AAAAAAAAB04\/uCkfFxo_9XA7XAGwdzMB0ZFO5cYY6qfmQCLcBGAs\/s1600\/Windy%2BSplitTable.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/2.bp.blogspot.com\/-EexbgEZsjWs\/WtjijSF-ZzI\/AAAAAAAAB04\/uCkfFxo_9XA7XAGwdzMB0ZFO5cYY6qfmQCLcBGAs\/s400\/Windy%2BSplitTable.jpg\" width=\"400\" height=\"140\" border=\"0\" data-original-height=\"342\" data-original-width=\"969\" \/>\u00a0<\/a><\/div>\n<div style=\"clear: both; text-align: center;\">Table 9: Split using Windy Attribute<\/div>\n<p>&nbsp;<\/p>\n<p>If we do the split using the Windy attribute, we would have the final tree that would require no further splitting! This is shown in Figure 4<\/p>\n<p>&nbsp;<\/p>\n<h3 id=\"s7\">Step 7: Complete the Decision Tree<\/h3>\n<hr \/>\n<p>The complete table is shown in Figure 4<br \/>\nNote that the same calculation that was used initially could also be used for the further splits. But that would not be necessary since you could just look at the sub table and be able to determine which attribute to use for the split.<\/p>\n<p><span style=\"color: #990000;\">Quiz: What does each of he color represent in the tree?\u00a0<\/span><br \/>\nLeave your answer in the comment box below<\/p>\n<p>&nbsp;<\/p>\n<div style=\"text-align: center;\">\n<div style=\"clear: both; text-align: center;\"><a style=\"margin-left: 1em; margin-right: 1em;\" href=\"https:\/\/1.bp.blogspot.com\/-vEy0tVpBuQ4\/Wte39ZkiXpI\/AAAAAAAABzk\/8n-CF4cmYnEylEKUKf0-yiJtWmmYy2pSgCLcBGAs\/s1600\/Decision-Tree-Final.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/1.bp.blogspot.com\/-vEy0tVpBuQ4\/Wte39ZkiXpI\/AAAAAAAABzk\/8n-CF4cmYnEylEKUKf0-yiJtWmmYy2pSgCLcBGAs\/s640\/Decision-Tree-Final.jpg\" width=\"640\" height=\"361\" border=\"0\" data-original-height=\"731\" data-original-width=\"1290\" \/><\/a><\/div>\n<div style=\"clear: both; text-align: center;\"><b>Figure 4<\/b>: Final Decision Tree<\/div>\n<p>&nbsp;<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<h3 id=\"t5\"><b>5. Final Notes<\/b><\/h3>\n<p>Now we have successfully completed the decision tree.<br \/>\nI think we need to celebrate with a bottle of beer!<br \/>\nThis is how easy it is to build a decision three. Remember, the initial steps of calculating the entropy and the gain is the most difficult part. But after that, everything falls into place.<\/p>\n<p>Do let me know if you have any challenges. Write me in the comment box below or in the form at the left of this page.<br \/>\n<b><br \/>\n<\/b><br \/>\nThanks for reading!!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this Lesson, I would teach you how to build a decision tree step by step in very easy way, with clear explanations and diagrams. &hellip; <\/p>\n","protected":false},"author":2,"featured_media":650,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[11],"tags":[],"_links":{"self":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/85"}],"collection":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/comments?post=85"}],"version-history":[{"count":7,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/85\/revisions"}],"predecessor-version":[{"id":1223,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/85\/revisions\/1223"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/media\/650"}],"wp:attachment":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/media?parent=85"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/categories?post=85"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/tags?post=85"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}