Section 8 ANOVA William Christensen, Ph.D. AN alysis O VA f riance Commonly referred to as ANOVA, analysis of variance is a method of simultaneously

testing the equality of 3 or more population means. In this section we will review two ANOVA techniques: 1.One-way ANOVA 2.Two-way ANOVA ANOVA & Confidence Intervals 95% .95 .025 -z/2 .025

z/2 Remember from Section 4 we learned about Confidence Intervals. What ANOVA does, simply stated, is it tests to see if the confidence intervals of different samples significantly overlap. If they do overlap too much, the conclusion is the samples are not significantly different. ANOVA Process Explanation If we are going to just compare two samples then we would use the hypothesis method we learned in Section 6. The cool thing about ANOVA is that it allows us to compare three or even more samples, all at the same time. There are two things ANOVA considers in determining whether or not different samples overlap (and therefore are not unequal). One is how narrow or wide the sample distributions are, and

the other is how far apart the distribution means are from each other. See the following slides for examples. ANOVA Process Explanation Example: So, when ANOVA compares distributions that are relatively wide (i.e., they have large variances) then the distributions (means) would have to be really far apart for them not to overlap a lot. The following picture shows how wide distributions (big variances) tend to overlap, even when they are fairly far apart. ANOVA would likely determine that these samples are NOT different. ANOVA Process Explanation Example: On the other hand, if ANOVA is comparing distributions that are relatively narrow (i.e., they have small variances) then the distributions (means) dont have to be so far apart for them not to overlap a lot. The following picture shows

how narrow distributions (small variances) have less tendency to overlap. ANOVA would likely determine that these samples are different. ANOVA Process Explanation Example: It is important to remember that only one of the samples has to be different from at least one of the other samples for ANOVA to conclude there is a difference. ANOVA DOES NOT tell us which sample is different, and from what other samples it is different. Any such conclusions we must decide outside of ANOVA, but we can often tell by looking at the means and variances of the samples. In this case, it appears the sample distribution on the right is the one that is different. Analysis of Variance From the perspective of a

hypothesis test what we are doing in ANOVA is testing: H0: 1 = 2 = 3 = k H1: At least one mean is different ANOVA Definition Analysis of Variance (ANOVA) a method of testing the equality of three or more population means by analyzing sample

means and variances One-way ANOVA One-Way ANOVA Assumptions The populations have normal distributions The populations have the same variances 2 (or standard deviations ) The samples are simple random samples

The samples are independent of each other The different samples are from populations that are categorized in only one way (i.e., we are investigating only a single factor or treatment - youll see in a minute what we mean by this) ANOVA Fundamental Concept Test Statistic for One-Way ANOVA F= variance between samples

variance within samples A excessively large F test statistic is evidence against equal population means. Key Components of ANOVA Method SS(Between Groups) is a measure of the variation between the samples. In one-way ANOVA it is also sometimes referred to as SS(factor) or SS (treatment). . SS(Within Groups) is a sum of squares representing the variability that is assumed to be common to all the

populations being considered. It is also referred to as SS(error). SS(total), or total sum of squares, is a measure of the total variation (around x) in all the sample data combined. SS(total) = SS(treatment) + SS(error) Mean Squared (MS) values are simply SS values divided by the appropriate degrees of freedom (df) ANOVA Fundamental Concept Test Statistic for One-Way ANOVA. Here is how ANOVA comes up with F MS(betweengroups)

F MS(withingroups) An excessively large F test statistic is evidence against equal population means. Using Excel for One-way ANOVA 1. Arrange the raw data by sample and in columns. It is always nice to have data labels at the top of each sample column. Each sample set should be in its own column and columns should be next to each other 2. Next, click on Tools, select Data Analysis, the select Anova: Single Factor 3. In the dialog box, enter the range containing ALL of the sample data. If you include column headings, make sure to click the Labels in First Row box.

Dont worry if the columns have different lengths and so some of the cells may be empty 4. Select an Output Range or go with the default of having Excel put the ANOVA output on a new worksheet One-Way ANOVA Example The following example is from the Triola text A solar electric system is studied in various weather conditions. Use a significance level of 0.05 (alpha=0.05) to test if there is any difference in solar energy collected during the three types of weather conditions tested (sunny days, cloudy days, and rainy days) Sunny

Cloudy Rainy 13.5 12.7 12.1 13.0 12.5 12.2 13.2

12.6 12.3 13.9 12.7 11.9 13.8 13.0 11.6 14.0

13.0 12.2 One-Way ANOVA Example Here is a copy of the spreadsheet: I entered the data then clicked Tools, Data Analysis, Anova: Single Factor One-Way ANOVA Example Here are the results after I clicked OK. You need to learn how to interpret the information here. You should know what all this means (if not, see Section 1 and 2) Here are the means from the 3 samples we can see the mean for Sunny days looks larger, but the next section will tell us if the differences we see here are statistically significant

One-Way ANOVA Example Here are the results after I clicked OK. You need to learn how to interpret the information here. Our first clue a large F value is evidence against the means being equal. Also, since F is larger than the F crit or critical value, we know the means are NOT equal The clincher is here if the P-value is less than our alpha (0.05 here) then we conclude the means are NOT equal (in this case we see Sunny days mean is Procedure for testing: H0: 1 = 2 = 3 = . . . 1. Use Excel to obtain results. 2. Identify the P-value from the display. 3. Form a conclusion based on these criteria:

If P-value , reject the null hypothesis of equal means. At least one mean is different from the others If P-value > , fail to reject the null hypothesis of equal means. The means are not significantly different from each other (the means are close enough to being the same that were not going to worry about it) One-Way ANOVA Summary Wasnt that easy! I saved ANOVA for last because it is sooo sweeeet like dessert Make sure you try doing some

problems on your own Two-way ANOVA Two-Way Analysis of Variance Involves two factors Two-Way ANOVA Assumptions The populations have normal distributions The populations have the same variances 2 (or standard deviations )

The samples are simple random samples The samples are independent of each other The different samples are from populations that are categorized in two ways (i.e., we are investigating two factors or treatments including whether or not they interact with each other) Definition There is an interaction between two factors if the affect of one of the factors changes for different categories of the other factor.

Example: you make like vanilla ice cream, and you may like mustard, but you may not like mustard on vanilla ice cream Interaction Interaction effects are one of the most important aspects of two-way ANOVAs It is critical that you understand the concept of interaction Example: Lets say we do an analysis of DSU students taste for ice cream and find that students prefer strawberry ice cream over other flavors (a one-way ANOVA could be used here to look for a difference in mean taste preference). Lets also say we do an analysis of DSU students taste for various toppings and find that students prefer chocolate topping. So far, our analysis has looked at ice cream and, separately, looked at toppings. Both of those studies would be done using one-way ANOVA because in each case we had a single factor; ice cream in one study and toppings in another study. Now lets say we want to do an analysis in which we investigate whether students prefer certain toppings on certain types of ice cream. This would now involve two factors or treatments, namely ice cream and toppings considered simultaneously. IF we find that students do indeed

prefer (or hate) certain toppings on certain flavors of ice cream, this is an example of an INTERACTION effect. IF students have no preference for certain toppings on certain flavors of ice cream then there would be NO INTERACTION. Critical Steps in Interpreting Results of a Two-way ANOVA 1. Check the results to see if there is a statistically significant INTERACTION between the two factors. 2. IF there is a significant interaction, STOP! Note there is an interaction, thoroughly discuss/explain the interaction, but DO NOT try to analyze the factors / treatments (labeled Sample and Columns in the ANOVA output). The interaction you have found interferes with any individual factor effects, so we cannot interpret

the individual factors, only the interaction between them. 3. IF there is NOT a significant interaction, THEN proceed to examine each of the two factors to see if there is any significant effects within either of them. Make sure you examine both of them. Using Excel for Two-way ANOVA 1. Pay close attention to how the data is entered in the examples. 2. Click on Tools, select Data Analysis, the select: A. Anova: Two Factor With Replication IF you have more than B.

3. 4. 5. one set of data for each row (examples here fit this category), OR Anova: Two Factor Without Replication IF you have only one set of data for each row (there will be problem(s) of this type on Exam. Two Factor Without Replication does not and cannot analyze interaction affects, so we only examine the factors individually in this case. In the dialog box, enter the range containing ALL of the sample data (same process as with one-way ANOVA) For rows per sample enter the number of instances of the first factor (male/female) in our first example. In our example we have 5 males and 5 females so we enter 5

as the rows per sample. Click OK Two-Way ANOVA Example1 Our first example is a study of New York Marathon runners. Our two factors are Gender (male and female) and Age (3 different age groups) Any interaction we find would mean there is some combination of gender and age whose performance is significantly different than other combinations of gender and age.

Remember, we always look for interaction FIRST, then look at the individual factors ONLY IF we find no significant interaction 21-29y 30-39y 40+y Male 13615 14677 14528

Male 18784 16090 17034 Male 14256 14086 14935 Male

10905 16461 14996 Male 12077 20808 22146 Female 16401

15357 17260 Female 14216 16771 25399 Female 15402 15036

18647 Female 15326 16297 15077 Female 12047 17636 25898

Two-Way ANOVA Example1 Note how the data is entered. We have 5 rows of Males (one in each age category) and 5 rows of females. It is critical that the data be entered in this manner in order for the software to perform correctly. It does not matter whether we put the 5 male rows before or after the 5 female rows. It also does not matter what order the 5 male or 5 female rows are in, as long as all male rows are together and all female rows are together. This Summary information is all selfevident, so the next slides will cut this part out and focus on the ANOVA output Two-Way ANOVA Example1

First considering any INTERACTION effects between our two factors, Gender and Age, we find the P-value is greater than alpha=0.05, so THERE IS NO SIGNIFICANT INTERACTION effect here. This now frees us to check for any Age group differences and for any Gender differences. Notice how the output does NOT use our gender or age labels, it only lists Sample and Columns. Well, what did we put in the columns when we entered the data? We put the Age groups in the columns. Sample in the output refers to what we had in the rows, which was Gender in this example. Two-Way ANOVA Example1

For SAMPLE which represents Gender (male/female), we see a P-value of 0.206 which is greater than alpha (0.05) so we conclude that the mean marathon times DO NOT vary significantly between Male and Female runners For COLUMNS which represents Age groups (21-29, 30-39, and 40+ years of age), we see a P-value of 0.014 which is less than alpha (0.05) so we conclude that the mean

marathon times DO vary significantly between the age groups. 2-way ANOVA Wasnt that great! Lets do one more example where there IS an interaction between the 2 factors Two-Way ANOVA Example2 Our second example involves a taste test of various kinds of Wines and Meats.

Our two factors are Wine (white wine and red wine) and Meat (chicken, steak, and seafood) Any interaction we find would mean there is some combination of type of wine and type of meat which tastes either significantly better or worse than other combinations of wine and meat. Remember, we always look for interaction FIRST, then look at the individual factors ONLY IF we find no significant interaction Two-Way ANOVA Example2

We have the following data which we must arrange properly in order to be able to analyze using Excel Note: yes, I changed the label from Red Meat to Steak Steak Chicken Fish Red 10 4

3 Red 9 4 2 Red 10 2 4

White 3 6 8 White 3 5 6 White

2 7 7 Two-Way ANOVA Example2 Here is the correct way to enter this sample data. Notice that Wine is our Sample and Meat is our Columns This Summary information is all selfevident, so the next slides will cut this

part out and focus on the ANOVA output Two-Way ANOVA Example2 First considering any INTERACTION effects between our two factors, Wine and Meat, we find the P-value is practically 0, way less than alpha (0.05), so THERE IS an INTERACTION effect here. STOP! Explain and discuss the interaction and go no further. We simply conclude that there is a significant interaction between Wine and Meat. We can explain the interaction by looking at the data. We see that people seem to like Red wine with red meat, but do not like red

wine with chicken or fish. Similarly, it appears people do not like white wine with red meat, but they do like white wine with fish and especially with seafood. Two-factor ANOVA without replication Without replication means we only have one set of data for each sample or row When you do a without replication ANOVA in Excel you will see that it is not possible to check for interaction. The results will only show affects within rows

and within columns. Again, NO INTERACTION checks are possible when you have only one set of data for each sample/row and use Excel 2factor ANOVA without replication Two-Way ANOVA (without replication) Example3 Our third example involves a test of emissions from automobiles with various engine sizes and types of transmissions. Our two factors are engine size (4 cylinder, 6 cylinder and 8 cylinder engines), and

transmission type (automatic and manual transmissions) Two-way ANOVA without replication IS NOT ABLE to look for any interaction between the factors because there is no replication. Therefore, we can only examine the individual factors (engine size and transmission type) and whether there are any differences related to the amount of emissions produced. Two-Way ANOVA (without replication) Example3 We have the following data 4 Cyl

6 Cyl 8 Cyl Auto 10 12 14 Manual 10 12

12 Two-Way ANOVA (without replication) Example3 Here is a picture of the Excel, Tools, Data Analysis, ANOVA screen to solve the problem. Two-Way ANOVA (without replication) Example3 Finally, here is the ANOVA output. Our conclusion is as follows: Assuming an alpha of 0.05, with a Pvalue of 0.423, there is no support

for claiming any difference in emissions between automatic and manual transmissions. Also, with a P-value of 0.125, there is no support for claiming any difference in emissions between 4, 6, and 8 cylinder engines. However, there was some difference in average emissions between various sizes of engines and a larger sample might reveal a statistically significant difference. Congratulations!!!

Youve made it through to the end Please contact me and tell me what you liked or didnt like about the instruction and course Thanks! Section 8 ANOVA END William Christensen, Ph.D.