- Class: meta Course: ANOVA Lesson: Single Factor ANOVA Author: Peter M. Smyntek Type: Standard Organization: Saint Vincent College Version: 2.4.5 - Class: text Output: This lesson provides a brief overview of practical considerations for conducting single factor (also known as 1-way) analysis of variance (ANOVA) using R. After completing it, you should be able to describe what a single-factor ANOVA analysis evaluates, determine when it is appropriate to carry out a single-factor ANOVA, interpret results of ANOVA tables and the output of Tukey HSD posthoc tests, carry out a single-factor ANOVA of a dataset, and label an appropriate figure to show the results of an ANOVA analysis along with Tukey HSD posthoc tests. - Class: text Output: ANOVA is used to evaluate differences in the means between more than 2 groups. It asks whether any of 2 or more means is different from any other. In other words, is variance among groups greater than 0? - Class: text Output: ANOVA answers the question, is variation in group means greater than what is expected by chance? The null hypothesis for a single factor ANOVA is that the variance among the groups = 0 (the mean of group 1 = mean of group 2 = mean of group 3, etc.). - Class: text Output: The alternative hypothesis is that there is a difference between the groups, which is to say that the mean of at least 1 group is different from that of at least 1 other group. - Class: text Output: In practical terms, ANOVA compares 2 sources of variation - between group variation and within group variation. The former considers differences between group means, while the latter focuses on variability within each group. - Class: figure Output: We will examine between group and within group error in this figure. It has 3 groups, A, B and C, that are data drawn from normal distributions with the same standard deviation but different means (A with a mean of 5, B of 10, and C of 15) Figure: ThreeGroupComparison_ANOVA2.R FigureType: new - Class: mult_question Output: Based on this graph in your Plots window, which source of variation is greater, the variation within the 3 groups (shown by the blue, yellow, and red arrows) or between the groups (shown by the green, orange, and purple arrows)? AnswerChoices: within group variation; between group variation CorrectAnswer: between group variation AnswerTests: omnitest(correctVal='between group variation') Hint: Is the average size of the arrows that represent the variation bigger within the groups or between the groups? - Class: text Output: The ratio of these 2 sources of variation (between group variation divided by within group variation) is effectively the signal to noise ratio for the ANOVA test. It is referred to as the variance ratio or calculated F-value. - Class: text Output: Officially the between group variation is summarized as the group mean square (MS_group), while the within group variation is summarized as the error mean square (MS_error). - Class: text Output: If the calculated F-value is > 1, then we may be able to reject the null hypothesis. However, if it is close to 1 or less than 1, then this is very weak evidence, and we would fail to reject the null hypothesis. - Class: mult_question Output: So, based on this information and looking at your graph, do you think the F-value for a comparison of these 3 blue, yellow, and red groups would be large (greater than 1) or small (less than or equal to 1)? AnswerChoices: greater than 1; less than or equal to 1 CorrectAnswer: greater than 1 AnswerTests: omnitest(correctVal='greater than 1') Hint: Remember, an F-value that is greater than 1 has larger between group variation compared to within group variation. - Class: text Output: So, now you are ready to try some practical steps to run a single-factor ANOVA with real data. We will try this using the InsectSprays dataset that is built into R. This contains data on the counts of insect pests on agricultural plots after they had been sprayed with 1 of 6 different insecticide sprays (A - F). - Class: text Output: The key question is if any of these sprays are more effective than the others at reducing the counts of insects. Our null hypothesis is that there is no difference between any of the sprays in terms of the reductions in counts of insect pests on the plots. The alternative hypothesis is that there is a difference in these counts between at least 2 of the sprays. - Class: cmd_question Output: Ok. We will begin by plotting our data to see what it looks like. First, let's attach the data. Please type attach(InsectSprays) CorrectAnswer: attach(InsectSprays) AnswerTests: omnitest(correctExpr= 'attach(InsectSprays)') Hint: Type attach(InsectSprays) - Class: cmd_question Output: Now we will plot the data by typing or copying & pasting the following command (but delete vertical lines like this - | - that show line breaks; they are not part of the command) - boxplot(count ~ spray, main = "Insect counts vs Insecticide Spray Type", xlab = "Spray", ylab = "# of insects") CorrectAnswer: boxplot(count ~ spray, main = "Insect counts vs Insecticide Spray Type", xlab = "Spray", ylab = "# of insects") AnswerTests: omnitest(correctExpr= 'boxplot(count ~ spray, main = "Insect counts vs Insecticide Spray Type", xlab = "Spray", ylab = "# of insects")') Hint: Type boxplot(count ~ spray, main = "Insect counts vs Insecticide Spray Type", xlab = "Spray", ylab = "# of insects") - Class: mult_question Output: So, based on this boxplot, does it appear that some of the sprays are more effective than others at reducing the counts of insects? AnswerChoices: Yes; No CorrectAnswer: Yes AnswerTests: omnitest(correctVal='Yes') Hint: Does there appear to be a difference in the spray means? - Class: cmd_question Output: Since we want to see if there is a difference in counts among the spray types, we will use the aov() command and define the output of it by giving it a name. We can choose any name that we would like. I will call it aov_spray. Please type (copy and paste) this command - aov_spray <- aov(count ~ spray) CorrectAnswer: aov_spray <- aov(count ~ spray) AnswerTests: expr_creates_var("aov_spray"); omnitest(correctExpr= 'aov_spray <- aov(count ~ spray)') Hint: Type aov_spray <- aov(count ~ spray) - Class: cmd_question Output: Now, to get the output, we will use the summary command. Please type (copy and paste) this command - summary(aov_spray) CorrectAnswer: summary(aov_spray) AnswerTests: omnitest(correctExpr= 'summary(aov_spray)') Hint: Type summary(aov_spray) - Class: mult_question Output: Now we have a result! Please have a look at the F-value as well as the p-value (denoted as Pr(>F)). How would you describe the F-value? AnswerChoices: Small; Medium; Big! CorrectAnswer: Big! AnswerTests: omnitest(correctVal='Big!') Hint: Values substantially more than 1 are considered large. - Class: text Output: Based on this result we would reject the null hypothesis and conclude the alternative; namely, there is a difference in these counts between at least 2 of the sprays. - Class: text Output: However, while we know that there is a difference, we would like to know how we can distinguish between each of the 6 sprays. Which spray results in significantly lower insect counts than the others? - Class: text Output: To answer this question, we will use a post hoc test called Tukeys Honest Significant Difference (HSD) test. This allows us to do pairwise comparisons among each spray while using a larger critical value to limit the type 1 error rate (probability of getting a false positive - rejecting a true null hypothesis) to the chosen significance level. We will try this for our results! - Class: cmd_question Output: We will use the TukeyHSD command. Please type (copy and paste) this command - TukeyHSD(aov_spray) CorrectAnswer: TukeyHSD(aov_spray) AnswerTests: omnitest(correctExpr= 'TukeyHSD(aov_spray)') Hint: Type TukeyHSD(aov_spray) - Class: mult_question Output: Another set of results! Scroll back up in your console window and look at the results. This shows us the difference, a lower and upper 95% confidence interval and a p-value for each pairwise comparison. How many of the p-values for these pairwise comparison are larger than our default significance level of alpha = 0.05? AnswerChoices: 6; 9; Infinity (and beyond)! CorrectAnswer: 6 AnswerTests: omnitest(correctVal='6') Hint: It is not 9. - Class: text Output: So, sprays A, B and F are not different from each other, but all of them are different from the more effective sprays (C, D, and E). Sprays C, D, and E are not different from each other. - Class: text Output: We want to depict this on our graph. We can do this simply by adding letters to indicate which of our sprays are not statistically different from one another based on our 1-way ANOVA and our Tukey HSD post hoc test. Groups that share a letter are not different from each other. - Class: cmd_question Output: We will use the text command to add these labels to our graph. For this command, you give the x value, the y value, the text in quotes, and a color, if you would like. You will need to be patient, since we will do this 6 times. Please type (copy and paste) this command - text(1, 25, "a", col = "red") CorrectAnswer: text(1, 25, "a", col = "red") AnswerTests: omnitest(correctExpr= 'text(1, 25, "a", col = "red")') Hint: Type text(1, 25, "a", col = "red") - Class: cmd_question Output: Ok. Next one. Now, please type (copy and paste) this - text(2, 25, "a", col = "red") CorrectAnswer: text(2, 25, "a", col = "red") AnswerTests: omnitest(correctExpr= 'text(2, 25, "a", col = "red")') Hint: Type text(2, 25, "a", col = "red") - Class: cmd_question Output: Keep it coming. Now, please type (copy and paste) this - text(3, 5.5, "b", col = "blue") CorrectAnswer: text(3, 5.5, "b", col = "blue") AnswerTests: omnitest(correctExpr= 'text(3, 5.5, "b", col = "blue")') Hint: Type text(3, 5.5, "b", col = "blue") - Class: cmd_question Output: Three down and three to go! Now, please type (copy and paste) this - text(4, 8, "b", col = "blue") CorrectAnswer: text(4, 8, "b", col = "blue") AnswerTests: omnitest(correctExpr= 'text(4, 8, "b", col = "blue")') Hint: Type text(4, 8, "b", col = "blue") - Class: cmd_question Output: Nearly there! Now, please type (copy and paste) this - text(5, 8, "b", col = "blue") CorrectAnswer: text(5, 8, "b", col = "blue") AnswerTests: omnitest(correctExpr= 'text(5, 8, "b", col = "blue")') Hint: Type text(5, 8, "b", col = "blue") - Class: cmd_question Output: Last one! Now, please type (copy and paste) this - text(6, 18, "a", col = "red") CorrectAnswer: text(6, 18, "a", col = "red") AnswerTests: omnitest(correctExpr= 'text(6, 18, "a", col = "red")') Hint: Type text(6, 18, "a", col = "red") - Class: text Output: You did it! You have made a nicely labeled graph to show the results of your single factor ANOVA and your Tukey HSD post hoc test! - Class: cmd_question Output: Finally, it's a good habit to detach data when you are finished using it. Please type detach(InsectSprays) CorrectAnswer: detach(InsectSprays) AnswerTests: omnitest(correctExpr= 'detach(InsectSprays)') Hint: Type detach(InsectSprays) - Class: text Output: I suggest that you practice doing these tests with the airquality and ToothGrowth datasets within R. The more you practice, the more confident and more skilled you will become!