**Question 17**

A research team investigated whether there was any significant correlation between the severity of a certain disease runoff and the age of the patients. During the study, data for n = 200 patients were collected and grouped according to the severity of the disease and the age of the patient. The table below shows the result

Age |
||||

below 40 |
40 – 60 |
above 60 |
||

runoff |
slight |
41 | 34 | 9 |

average |
25 | 25 | 12 | |

serious |
6 | 33 | 15 |

Let us decide about the correlation between the age of the patients and the severity of disease progression.

**Solution Steps**

As usual, we need to understand the problem and decide on which particular test to carry out.

In this case, since the question says to investigate whethere there was any significant correlation between the severity and age, it means that the null hypothesis would be that ‘there is correlation between the age and the severity’. That is the hypothesis we are going to test.

**Step 1: State the null and alternate hypothesis**

**H _{0}:** there is significant correlation between the severity and the age

**H**: there is no significant correlation between the severity and the age

_{a}

Since we are going to be using Excel to simplify the solving of this problem, I have transfered the table to MS Excel. This is shown in Table 1. You can get the completed excel sheet from here

**Step 2: Calcualte the totals**

In this step we calculate the totals for each of the row. This i have done using excel formula as you can see in Table 2

**Step 3: Calculate the expected values**

The expected values are calculated by multiplying the corresponding row and column sub-total and dividing by the grand-total. For example, the first expected value that corresponds to Slight and Below 40 would be calculated as follows:

Do this for all the 9 observed values. I have used excel to automatically generate these values and it is shown in Table 3

**Step 4: Calculate Squared Difference (O-E)2**

Where O is the observed values in Table 2 and E is the expected values calcualted in Table 3. The first squared difference would be.

Do this for all the the observed values and the corresponding expected values. The resulting sets of values is given in Table 4

**Step 5: Calculate the Component**

This is the squared deviation you calculated in step 4 divided by the corresponding expected values. For the first value it would be

If you repeat this all the values, then the resulting table would be table 5.

**Step 6: Calculate the Test Statistic**

This is the sum of all the terms in calculated in the table. I calculated this using the Sum() formula in Excel, but you can do this by hand just to verify.

Test Statistic = 3.83 + 0.56 + 2.48 + 0.32 + 0.43 + 0.06 + 9.29 +2.68 + 2.87 **= 22.52**

**Step 7: Look up the critical Value from Chi-Square table**

Get Statistical table from here

First we calcuale the degrees of freedom

df = (3-1) * (3-1) = 4

alpha = 0.01

The critical value from the table of Chi-Square distribution is written as

K_{0.01, 4} = **13.28**

**Step 8: State your conclusion**

Since the calculated value of the test statistic is greater than the critical value, we therefore reject the null hypothesis and conclude that the data is not related.

The whole tables are shown below, you can also download it for free