{"id":903,"date":"2019-04-23T12:08:35","date_gmt":"2019-04-23T10:08:35","guid":{"rendered":"https:\/\/kindsonthegenius.com\/blog\/?p=903"},"modified":"2021-11-13T20:25:33","modified_gmt":"2021-11-13T19:25:33","slug":"statistics-tutorial-3-hypothesis-testing","status":"publish","type":"post","link":"https:\/\/kindsonthegenius.com\/blog\/statistics-tutorial-3-hypothesis-testing\/","title":{"rendered":"Statistics Tutorial 3 &#8211; Hypothesis Testing"},"content":{"rendered":"<p>In this tutorial we are going to cover Hypothesis Testing. To understand this topic better, we would break it down into the following sub-topics<\/p>\n<ol>\n<li><a href=\"#t1\">About Estimation<\/a><\/li>\n<li><a href=\"#t2\">Introduction to Hypothesis Testing<\/a><\/li>\n<li><a href=\"#t3\">Confidence Interval<\/a><\/li>\n<li><a href=\"#t4\">t-Statistic<\/a><\/li>\n<li><a href=\"#t5\">p-Value<\/a><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t1\">1. About Estimation<\/strong><\/h4>\n<p>Estimation is a statistical way of trying to deduce the value of an unknown parameter. For example, to estimate the mean of a population \u00b5, we can take a sample from the population and calculate the mean. Then we can use the sample mean as an estimate of the population mean.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Point Estimate vs Interval Estimate<\/strong><\/p>\n<p><em>A point estimate<\/em> is one single number that is represents the parameter you are trying to estimate.<\/p>\n<p><em>Interval estimates<\/em> is a range of values that represents the parameter you are trying to estimate. Hence, interval estimate are often two values that define a range.<\/p>\n<p>The question now is: how accurate is our estimate? We can get this by performing hypothesis testing.<\/p>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t2\">2. Introduction to Hypothesis Testing<\/strong><\/h4>\n<p>Hypothesis testing is simply a statistical way of testing an existing\u00a0 or null hypothesis H<sub>0<\/sub>(that is an estimate the is currently accepted). Therefore, to carry out a hypothesis, there must at least be an existing hypothesis. So we have to test the null hypothesis to see if it is correct.<\/p>\n<p>To do this we need to formulate an alternative hypothesis Ha or H<sub>1<\/sub>. This is normally exactly opposite of the null hypothesis.<\/p>\n<p>Let&#8217;s take example of regression from <a href=\"https:\/\/www.kindsonthegenius.com\/machine-learning-introduction-to-machine-learning-course\/\">Machine Learning 101<\/a>. We make an estimate of the regression coefficient\u00a0\u03b2<sub>1<\/sub> in case of linear regression.<\/p>\n<p>Let&#8217;s state the null and alternate hypothesis:<\/p>\n<ul>\n<li>H<sub>0<\/sub>:\u00a0\u03b2<sub>1<\/sub> = 0<\/li>\n<li>H<sub>a<\/sub>:\u00a0\u03b2<sub>1<\/sub>\u00a0\u2260 0<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>To carry out hypothesis testing, we need to determine if our estimate for\u00a0\u03b2<sub>1<\/sub> is far enough from zero. In this case we would be confident that\u00a0\u2260 \u00a0is non-zero.<\/p>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t3\">3. Confidence Interval<\/strong><\/h4>\n<p>How far is far enough depends on the standard error. The standard error is represented as\u00a0 SE(\u03b2<sub>1<\/sub>) in case of\u00a0\u03b2<sub>1<\/sub>.<\/p>\n<p>The standard error tells us how much our estimate differs from the actual value. In case of estimating the mean of a population<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-905\" src=\"https:\/\/kindsonthegenius.com\/blog\/wp-content\/uploads\/2019\/04\/Standard-Error-150x101.jpg\" alt=\"Standard Error\" width=\"116\" height=\"78\" \/><\/p>\n<p>Where n is the sample size while\u00a0\u03c3 is the standard deviation of the sample.<\/p>\n<p>We also see that this formula show a relationship between the standard error and the sample size: the larger the sample size, the lower the standard error.<\/p>\n<p>Standard errors can be used to compute confidence intervals. A 95% confidence interval means the range of values within which the the value of the unknown parameter can fall with a 95% probability. Therefore, confidence interval has an upper and lower limits.<\/p>\n<p>For linear regression, a 95% confidence interval for\u00a0\u03b2<sub>1<\/sub> would mean:<\/p>\n<p>\u03b21\u00a0\u00b1 SE(\u03b2<sub>1<\/sub>)<\/p>\n<p>That is 95% chance (or 0.95 probability) that the interval:<\/p>\n<ul>\n<li>upper:\u00a0\u03b2<sub>1<\/sub> + 2SE(\u03b2<sub>1<\/sub>)<\/li>\n<li>lower:\u00a0\u03b2<sub>1<\/sub> + 2SE(\u03b2<sub>1<\/sub>)<\/li>\n<\/ul>\n<p>would contain the real value of\u00a0\u03b2<sub>1<\/sub><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t4\">4. t-Statistic<\/strong><\/h4>\n<p>To actually carry out hypothesis testing, we compute the t-statistic. In the case of\u00a0\u03b2<sub>1<\/sub>, this is given by:<\/p>\n<p>t =\u00a0\u03b2<sub>1<\/sub> \/ [SE(\u03b2<sub>1<\/sub>)]<\/p>\n<p>This simply measures the number of standard deviations that\u00a0\u03b21 is away from 0. This in the case our linear regression example. The t-distribution which is assumed in this case, has a similar shape to normal distribution for n &gt; 30.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t5\">5. p-Value<\/strong><\/h4>\n<p>Recall that statistics relates with probability. So in case of t-statistic, we can compute the probability of observing any value that is equal to |t| or greater, assuming that\u00a0\u03b2<sub>1<\/sub> is 0.<\/p>\n<p>This probability is what is know as the p-value.<\/p>\n<p>a small value of p-value indicates that is is not likely to observe such a significant\u00a0 association between the X and Y (in case of linear regression) due to chance. Therefore, when a small p-value is determined, then we can conclude that there is a relationship between X and Y(the predictor and response variables). In this case, we reject the null hypothesis.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial we are going to cover Hypothesis Testing. To understand this topic better, we would break it down into the following sub-topics About &hellip; <\/p>\n","protected":false},"author":1,"featured_media":907,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[552],"tags":[554,553],"_links":{"self":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/903"}],"collection":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/comments?post=903"}],"version-history":[{"count":6,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/903\/revisions"}],"predecessor-version":[{"id":1788,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/posts\/903\/revisions\/1788"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/media\/907"}],"wp:attachment":[{"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/media?parent=903"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/categories?post=903"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kindsonthegenius.com\/blog\/wp-json\/wp\/v2\/tags?post=903"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}