-

Basic Skills and Principles: Measurement

Inferential Statistics
The t-test: An Example

Let's assume that a researcher is attempting to determine the effects of pesticide pollution on the hatching success of fish eggs. He has noticed that fish eggs in streams near agricultural fields fare poorly, while those is undisturbed areas hatch successfully. The researcher sets up a lab experiment in which ten groups of fish eggs are allowed to develop in unmanipulated stream water, and ten groups of eggs develop in stream water with the addition of pesticide. The proportion of eggs hatching in each group is tallied (Table 1), and descriptive statistics for the two groups compared.

Statistic
No Pesticide
Pesticide
Mean proportion eggs hatching
0.87
0.59
Standard deviation
0.07
0.09

Group
Proportion eggs hatching
No Pesticide
Pesticide
1
0.80
0.50
2
0.76
0.45
3
0.81
0.68
4
0.90
0.77
5
0.95
0.64
6
0.84
0.60
7
0.88
0.54
8
0.99
0.57
9
0.86
0.62
10
0.93
0.57

Table 1. Effects of pesticide pollution on
hatching success of fish eggs.

While the results indicate an adverse effect of pesticide exposure on egg hatching success, how can we be sure that the difference in the hatching success is "significant", and didn't just occur by chance? After all, the two data sets do overlap as the lowest value in the no pesticide groups and the highest value in the pesticide groups was 76% hatching success. To compare these two data sets, we must first state our hypothesis. Our null hypothesis would be that the two groups are not different, and that both data sets belong to the same distribution.

Ho: The mean proportion eggs hatching in the two groups is not different OR
Ho: The two experimental groups are part of the same distribution

We then conduct a t-test on the data set, which will examine the differences in mean and dispersion in the two groups, and provide a probability that the two groups are part of the same distribution.

As the p-value returned by the test was less than 0.05, we can reject our null hypothesis, conclude that the two groups are indeed from different distributions, and that they are significantly different from one another. The researcher can therefore conclude that pesticide exposure reduces the hatching success of eggs of this species. All of the comparisons you will be making in laboratory exercises this semester will mimic this example, so you should take special care to ensure you understand the operation and usefulness the t-test for comparing data sets.

Comparison
t-statistic
p-value
t-test
7.56
p< 0.0005

Performing t-tests
To perform t-tests on data sets, we suggest using an online t-test calculator from Graphpad.com. It can be accessed at the address below, and is exceptionally easy to use. WebStat, the program we are using to calculate descriptive statistics and create graphs, also has a t-test function, but we suggest you use the one below as it is tailored to the needs and statistical sophistication of Science 1101 students.

http://www.graphpad.com/quickcalcs/index.cfm

(1.) Select the "Continuous data" option, then hit the "Continue" button.
(2.) Select the "t test to compare two means" option, then hit the "Continue" button.
(3.) Simply enter your data in the columns by group, select "Unpaired t-test", and then hit the "Calculate now" button. Your p-value and t statistic will be listed on the results page.

t-tests and Statistical Assumptions - A Word of Caution
Those of you familiar with the t-test have likely noticed that we are omitting a step in our use of the t-test - the testing of assumptions. For a given data set to be suitable for analysis with a t-test, it must meet two assumptions: (1.) the variance in the two groups being compared cannot be significantly different from one another, and (2.) the data must roughly fit a normal distribution. When statisticians and scientists conduct a t-test, they first verify these assumptions with statistical tests, and only proceed once these assumptions have been satisfied. If the variances in the two groups differ appreciably, the data can be mathematically "transformed" to bring variances closer together. If the data are not normally distributed, they can be transformed for normality, or an alternative test that does not require a normal distribution can be used.

As these steps appreciably increase the statistical complexity of t-test analysis, we will not be testing data sets for assumptions in this course. You must therefore realize that the statistical rigor of your results may not be comparable to that in published scientific studies, and that we are consciously avoiding the use of assumption tests to simplify the statistical analyses conducted in the course.