{"id":185,"date":"2018-01-12T16:44:00","date_gmt":"2018-01-12T16:44:00","guid":{"rendered":"https:\/\/kindsonthegenius.com\/blog\/2018\/01\/12\/what-is-the-difference-between-classification-and-clustering-in-machine-learning\/"},"modified":"2020-11-05T13:15:57","modified_gmt":"2020-11-05T12:15:57","slug":"what-is-the-difference-between-classification-and-clustering-in-machine-learning","status":"publish","type":"post","link":"https:\/\/kindsonthegenius.com\/blog\/what-is-the-difference-between-classification-and-clustering-in-machine-learning\/","title":{"rendered":"What is the Difference Between Classification and Clustering in Machine Learning"},"content":{"rendered":"<div style=\"color: #555555; font-size: 18px; line-height: 30px; text-align: justify;\">\n<div style=\"font-family: 'segoe ui';\">Today we will discuss the difference between two important topic that appear similar in machine learning.<\/p>\n<ul>\n<li><span style=\"color: #990000;\">Classification and<\/span><\/li>\n<li><span style=\"color: #990000;\">Clustering <\/span><\/li>\n<\/ul>\n<div style=\"clear: both; text-align: center;\"><a href=\"https:\/\/2.bp.blogspot.com\/-Csblqgw-LA0\/WljlFaRJTEI\/AAAAAAAAAwg\/h741f38XzG4HCp4RuWj2FqbZzV0VUCYqACLcBGAs\/s1600\/Difference%2Bbetween%2BClassification%2Band%2BClustering%2Bin%2BMachine%2BLearning.JPG\" style=\"margin-left: 1em; margin-right: 1em;\"><img decoding=\"async\" loading=\"lazy\" border=\"0\" data-original-height=\"380\" data-original-width=\"672\" height=\"180\" src=\"https:\/\/2.bp.blogspot.com\/-Csblqgw-LA0\/WljlFaRJTEI\/AAAAAAAAAwg\/h741f38XzG4HCp4RuWj2FqbZzV0VUCYqACLcBGAs\/s320\/Difference%2Bbetween%2BClassification%2Band%2BClustering%2Bin%2BMachine%2BLearning.JPG\" width=\"320\" \/><\/a><\/div>\n<p>I have decided to create this article because of the confusion the name of the topics may pose. Is clustering not the same as classification, like having to separate the data into different classes or clusters? It seems to make sense, right?<br \/>But, in the world of machine learning, the two are completely different concepts.<\/p>\n<p>Let&#8217;s start the discussion with classification.<\/p>\n<p><ins data-ad-client=\"ca-pub-7041870931346451\" data-ad-format=\"fluid\" data-ad-layout=\"in-article\" data-ad-slot=\"8227894917\" style=\"display: block; text-align: center;\"><\/ins><span style=\"font-size: large;\"><b><span style=\"color: #45818e;\">What is Classification?<\/span><\/b><\/span><br \/>First classification is a supervised learning technique that has to do with learning and training an algorithm using a set of labeled&nbsp; training input dataset.<br \/>In classification, the goal is to assign each input vector to one of a finite number of discrete categories.<\/p>\n<p>Real life application of classification is <i>spam detection<\/i>. In this case, there are finite number of discrete categories an email can belong to: spam and non-spam. The input data set in this case is incoming emails.<\/p>\n<p><b>Theory of Classification<\/b><br \/>Assuming that we are given a training set comprising of N observation  of random variable X which can have values&nbsp; of x1, x2, &#8230; ,xn.<br \/>Then we also have corresponding observations of the values of t,&nbsp; which can take values t1, t2, &#8230; ,tn.<br \/>The first step would be to find the function of x that maps the input x to the corresponding t.<br \/>To do this we can use the polynomial curve fitting which is of the form:<\/p>\n<div style=\"clear: both; text-align: center;\"><a href=\"https:\/\/1.bp.blogspot.com\/-Kcj2mOTCq9U\/WljewY56bkI\/AAAAAAAAAwM\/63Z2tofSLscpzFpb-Q4hx1QQmVPBZXQlQCLcBGAs\/s1600\/Polynomial%2BCurve%2BFitting.jpg\" style=\"margin-left: 1em; margin-right: 1em;\"><img decoding=\"async\" loading=\"lazy\" border=\"0\" data-original-height=\"149\" data-original-width=\"1130\" height=\"83\" src=\"https:\/\/1.bp.blogspot.com\/-Kcj2mOTCq9U\/WljewY56bkI\/AAAAAAAAAwM\/63Z2tofSLscpzFpb-Q4hx1QQmVPBZXQlQCLcBGAs\/s640\/Polynomial%2BCurve%2BFitting.jpg\" width=\"640\" \/><\/a><\/div>\n<p>We would not go further than this since we are only considering difference between classification and clustering.<\/p>\n<p>Find a detailed discussion of classification on:<br \/><a href=\"https:\/\/www.kindsonthegenius.com\/machine-learning-introduction-to-machine-learning-course\/\" target=\"_blank\" rel=\"noopener noreferrer\">Introduction to Machine Learning<\/a><br \/><a href=\"https:\/\/kindsonthegenius.com\/blog\/what-is-the-difference-between-classification-and-clustering-in-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">Difference between Classification and Regression<\/a><\/p>\n<p><ins data-ad-client=\"ca-pub-7041870931346451\" data-ad-format=\"fluid\" data-ad-layout=\"in-article\" data-ad-slot=\"8227894917\" style=\"display: block; text-align: center;\"><\/ins> <span style=\"color: #45818e;\"><span style=\"font-size: large;\"><b>What is Clustering?<\/b><\/span><\/span><br \/>Clustering is an unsupervised learning technique whereby the input dataset is unlabeled.<br \/>In clustering, we use a finite set of input data, and to goal is to discover, groups(or clusters) within the data that have similar characteristics.<\/p>\n<p><b>Theory of Clustering<\/b><br \/>Assuming we have a set of observations <i><span style=\"font-family: &quot;times&quot; , &quot;times new roman&quot; , serif;\">{x<sub>1<\/sub>, x<sub>2<\/sub>,&#8230; x<sub>n<\/sub>}<\/span><\/i> which consists in a set of N random variable <i><span style=\"font-family: &quot;times&quot; , &quot;times new roman&quot; , serif;\">x<\/span><\/i> (x is a D d-dimensional real vector). The goal is to partition the data  set&nbsp; into some number K of clusters, where the value of K is known.<br \/>A cluster is a group of data points whose inter-point distances are  minimal when compare with distance to points outside the cluster.<br \/>The first step is to find the <i><span style=\"font-family: &quot;times&quot; , &quot;times new roman&quot; , serif;\">m<sub>k<\/sub>,<\/span><\/i> for <i><span style=\"font-family: &quot;times&quot; , &quot;times new roman&quot; , serif;\">k = 1,&#8230;, K,<\/span><\/i> in which <i><span style=\"font-family: &quot;times&quot; , &quot;times new roman&quot; , serif;\">m<sub>k<\/sub> <\/span><\/i>is the mean associated to the <i><span style=\"font-family: &quot;times&quot; , &quot;times new roman&quot; , serif;\">k<sub>th<\/sub><\/span><\/i> cluster.<br \/>We now assign each of the data points to clusters, such that the sum of  squares of the distances of each data&nbsp; point to its closest mean <span style=\"font-family: &quot;times&quot; , &quot;times new roman&quot; , serif;\">m<sub>k<\/sub><\/span> is&nbsp; minimum.This particular case is known as <a href=\"https:\/\/kindsonthegenius.com\/blog\/what-is-k-means-in-clustering-in-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">k-means clustering<\/a>.<br \/>Find detailed explanation on: <a href=\"https:\/\/kindsonthegenius.com\/blog\/what-is-k-means-in-clustering-in-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">K-Means Clustering.<\/a><\/p>\n<p><b>Summary of differences between Classification and Clustering is given below:<\/b><\/p>\n<div style=\"clear: both; text-align: center;\"><a href=\"https:\/\/4.bp.blogspot.com\/-ZZpX0n1GN2w\/Wljk85EPKpI\/AAAAAAAAAwc\/-1Eo6AS6R-sDVZr22HjBIRKGyIV_dbbPACLcBGAs\/s1600\/Difference%2Bbetween%2BClassification%2Band%2BClustering%2Bin%2BMachine%2BLearning2.JPG\" style=\"margin-left: 1em; margin-right: 1em;\"><img decoding=\"async\" loading=\"lazy\" border=\"0\" data-original-height=\"259\" data-original-width=\"675\" height=\"244\" src=\"https:\/\/4.bp.blogspot.com\/-ZZpX0n1GN2w\/Wljk85EPKpI\/AAAAAAAAAwc\/-1Eo6AS6R-sDVZr22HjBIRKGyIV_dbbPACLcBGAs\/s640\/Difference%2Bbetween%2BClassification%2Band%2BClustering%2Bin%2BMachine%2BLearning2.JPG\" width=\"640\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<p><ins data-ad-client=\"ca-pub-7041870931346451\" data-ad-format=\"fluid\" data-ad-layout=\"in-article\" data-ad-slot=\"8227894917\" style=\"display: block; text-align: center;\"><\/ins> <br \/><ins data-ad-client=\"ca-pub-7041870931346451\" data-ad-format=\"fluid\" data-ad-layout=\"in-article\" data-ad-slot=\"8227894917\" style=\"display: block; text-align: center;\"><\/ins><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today we will discuss the difference between two important topic that appear similar in machine learning. Classification and Clustering I have decided to create this &hellip; <\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[11,16],"tags":[],"_links":{"self":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/185"}],"collection":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/comments?post=185"}],"version-history":[{"count":3,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/185\/revisions"}],"predecessor-version":[{"id":1697,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/185\/revisions\/1697"}],"wp:attachment":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/media?parent=185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/categories?post=185"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/tags?post=185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}