Statistics Analysis Completed by University of Outline 1. Averages 2. Variance 3. Standard deviation 4. Abusing Correlation 5. Regression 6. T-tests 7. ANOVAs 8. Reliability 9. Validity 1. Average Statistics allow researchers to take a large batch of data and summarize it into a couple of numbers, such as an average. Of course, when a large amount of data is summarized into a single number, a lot of information is lost, including the fact that different people have very different experiences. So it is important to remember that, for the most part, statistics do not provide useful information about each individual’s experience. Rather, researchers generally use statistics to make general statements about a population, such as ‘an average home in the US is sold for $160,000”. Although personal stories are often moving or interesting, it is often important to understand what the typical or average experience is. For this, we need statistics. Statistics are also used to reach conclusions about general differences between groups. For example, suppose that in my family, there are four children, two men and two women. Suppose that the women in my family are taller than the men. This personal experience may lead me to the conclusion that women are generally taller than men. Of course, we know that, on average, men are taller than women. The reason we know this is because researchers have taken large, random samples of men and women and compared their average heights. Researchers are often interested in making such comparisons: Do cancer patients survive longer using one drug than another? Is one method of teaching children to read more effective than another? Do men and women differ in their enjoyment of a certain movie? To answer these questions, we need to collect data from randomly selected samples and compare these data using statistics. The results we get from such comparisons are often more trustworthy than the simple observations people make from non-random samples, such as the different heights of men and women in my family (Neale and Liebert 1986). Let’s suppose that I wanted to know the average income of the current full-time, tenured faculty at Harvard. There are two ways that I could find this average. First, I could get a list of every full-time, tenured faculty member at Harvard and find out the annual income of each member on this list. Because this list contains every member of the group that I am interested in, it can be considered a population. If I were to collect these data and calculate the mean, I would have generated a parameter, because a parameter is a value generated from, or applied to, a population. The symbol used to represent the population mean is u. Another way to generate the mean income of the tenured faculty at Harvard would be to randomly select a sub-set of faculty names from my list and calculate the average income of this sub-set. The sub-set is known as a sample (in this case it is a random sample), and the mean that I generate from this sample is a type of statistic. Statistics are values derived from sample data, whereas parameters are values that are either derived from, or applied to, population data. It is important to note that all samples are representative of some population and that all sample statistics can be used as estimates of population parameters. 2. Variance The variance provides a statistical average of the amount of dispersion in a distribution of scores. In general, variance is used more as a step in the calculation of other statistics (e.g., analysis of variance) than as a stand-alone statistic. But with a simple manipulation, the variance can be transformed into the standard deviation, which is one of the statistician’s favorite tools (Iverson and Norpoth 1987). The variance has a nice mathematical property. In particular, it can be shown that under some common circumstances variability as measured by variance summons up in a straightforward way when variable values are added. For example, if we knew that the variability of the heights of pony backs as measured by variance was 225 units and we knew that the variability of the heights of women as measured by variance was 100 units, then under some common circumstances the variability of the distance from the ground to the tops of the heads of women standing on pony backs would be 325 (225 + 100).

This addition of variability as measured by variance applies whenever there is no tendency to pair tall women with tall ponies or vice versa, a condition known as independence. By contrast, the heights of women and the heights of their husbands are unlikely to be independent, as tall women tend to prefer to marry even taller men. In this case we would say that the heights of women and the heights of their husbands are dependent. Yet, variance as a measure of variability has a major drawback. Because of the squaring, the units of measurement don’t match up with what has been measured. The height of women might be measured in cm, but the variance would then be in square cm. To get back to the original units we need to take a square root. The square root of the variance is the most widely used measure of the spread of data. It is called the standard deviation. The standard deviation is then the square root of the average of the squared deviations. It is explained in the following section. 3. Standard deviation Measures of central tendency, such as the mean and the median provide useful information. But it is important to recognize that these measures are limited and, by themselves, do not provide a great deal of information. There are three measures of dispersion that researchers typically examine: the range, the variance, and the standard deviation. Of these, the standard deviation is perhaps the most informative and certainly the most widely used. The best way to understand what a standard deviation is to consider what the two words mean. Deviation, in this case, refers to the difference between an individual score in a distribution and the average score for the distribution. So if the average score for a distribution is 10, and an individual child has a score of 12, the deviation is 2 (SD). The other word in the term standard deviation is standard. In this case, standard means typical, or average. So a standard deviation is the typical, or average, deviation between individual scores in a distribution and the mean for the distribution (SD). This is a very useful statistic because it provides a handy measure of how spread out the scores are in the distribution. When combined, the mean and standard deviation provide a pretty good picture of what the distribution of scores is like. In a sense, the range provides a measure of the total spread in a distribution (i.e., from the lowest to the highest scores), whereas the variance and standard deviation are measures of the average amount of spread within the distribution. Researchers tend to look at the range when they want a quick snapshot of a distribution, such as when they want to know whether all of the response categories on a survey question have been used or they want a sense of the overall balance of scores in the distribution. It is important to note that the formulas for calculating the variance and the standard deviation differ depending on whether you are working with a distribution of scores taken from a sample or from a population. Briefly, when we do not know the population mean, we must use the sample mean as an estimate. But the sample mean will probably differ from the population mean. Whenever we use a number other than the actual mean to calculate the variance, we will end up with a larger variance, and therefore a larger standard deviation, than if we had used the actual mean. This will be true regardless of whether the number we use in our formula is smaller or larger than our actual mean. 4. Abusing Correlation Correlation coefficients such as the Pearson are very powerful statistics. They allow us to determine whether, on average, the values on one variable are associated -with the values on a second variable. This can be very useful information, but people, including social scientists, are often tempted to ascribe more meaning to correlation coefficients than they deserve.

Namely, people often confuse the concepts of correlation and causation (Aiken and West 1991). Correlation (co-relation) simply means that variation in the scores on one variable correspond with variation in the scores on a second variable. Causation means that variation in the scores on one variable cause or create variation in the scores on a second variable. When we make the leap from correlation to causation, we may be wrong. As an example, let’s consider this story. One winter shortly after World War II, there was an explosion in the number of storks nesting in some northern European country (Denmark). Approximately 9 months later, there was a large jump in the number of babies that were born. Now, the link between storks and babies being what it is, many concluded that this correlation between the number of storks and the number of babies represented a causal relationship. Fortunately, science tells us that babies do not come from storks after all, at least not human babies. However, there is something that storks and babies have in common: Both can be “summoned” by cold temperatures and warm fireplaces. It seems that storks like to nest in warm chimneys during cold winters. As it happens, cold winter nights also foster baby-making behavior. The apparent cause-and-effect relationship between storks and babies was in fact caused by a third variable: a cold winter. The point of this example is simple: Evidence of a relationship between two variables (i.e., a correlation) does not necessarily mean that there is a causal relationship between the two variables. However, it should also be noted that a correlation between two variables is a necessary ingredient of any argument that the two variables are causally related. In other words, I cannot claim that one variable causes another (e.g., that smoking causes cancer) if there is no correlation between smoking and cancer. If I do find a correlation between smoking and cancer, I must rule out other factors before I can conclude that it is smoking that causes cancer (Aiken and West 1991). 5. Regression As with correlation analysis, in regression the dependent and independent variables need to be measured on an interval or ratio scale. Dichotomous (i.e., categorical variables with two categories) predictor variables can also be used (Spatz 2001). Regression, particularly simple linear regression, is a statistical technique that is very closely related to correlations. In fact, when examining the relationship between two continuous (i.e., measured on an interval or ratio scale) variables, either a correlation coefficient or a regression equation can be used. Indeed, the Pearson correlation coefficient is nothing more than a simple linear regression coefficient that has been standardized. The benefits of conducting a regression analysis rather than a correlation analysis are (a) regression analysis yields more information, particularly when conducted with one of the common statistical software packages, and (b) the regression equation allows us to think about the relation between the two variables of interest in a more intuitive way. Whereas the correlation coefficient provides us with a single number (e.g., r = .40), which we can then try to interpret, the regression analysis yields a formula for calculating the predicted value of one variable when we know the actual value of the second variable (Spatz 2001). In simple linear regression, we begin with the assumption that the two variables are linearly related. In other words, if the two variables are actually related to each other, we assume that every time there is an increase of a given size in value on the X variable (called the predictor or independent variable), there is a corresponding increase (if there is a positive correlation) or decrease (if there is a negative correlation) of a given size in the Y variable (called the dependent, or outcome, or criterion variable) (Spatz 2001). In other words, if the value of X increases from a value of 1 to a value of 2, and Y increases by 2 points, then when X increases from 2 to 3, we would predict that the value of Y would increase another 2 points.

6. T-tests Because there is a distinction between the common statistical vernacular definition of t tests and the more technical definition, t tests can be a little confusing. The common-use definition or description of t tests is simply comparing two means to see if they are significantly different from each other. The more technical definition or description of a t test is any statistical test that uses the t, or Student’s t, family of distributions. We will discuss the two most commonly conducted t tests, the independent samples t test and the paired or dependent samples t test (Spatz 2001). One of the most commonly used t tests is the independent samples t test. You use this test when you want to compare the means of two independent samples on a given variable. For example, if you wanted to compare the average height of 50 randomly selected men to that of 50 randomly selected women, you would conduct an independent samples t test. Note that the sample of men is not related to the sample of women, and there is no overlap between these two samples (i.e., one cannot be a member of both groups). Therefore, these groups are independent, and an independent samples t test is appropriate. Dependent samples t test is also used to compare two means on a single dependent variable. Unlike the independent samples test, however, a dependent samples t test is used to compare the means of a single sample or of two matched or paired samples. For example, if a group of students took a math test in March and that same group of students took the same math test two months later in May, we could compare their average scores on the two test dates using a dependent samples t test. Or, suppose that we wanted to compare a sample of boys’ Scholastic Aptitude Test (SAT) scores with their fathers’ SAT scores. In this example, each boy in our study would be matched with his father. In both of these examples, each score is matched, or paired with, a second score. Because of this pairing, we say that the scores are dependent upon each other, and a dependent samples t test is warranted. 7. ANOVAs Factorial ANOVA and one-way ANOVA are two most commonly used measurement of this type. The purpose of a one-way analysis of variance (one-way ANOVA) is to compare the means of two or more groups (the independent variable) on one dependent variable to see if the group means are significantly different from each other. In fact, if you want to compare the means of two independent groups on a single variable, you can use either an independent samples t test or a oneway ANOVA. The results will be identical, except instead of producing a t value, the ANOVA will produce an Fratio, which is simply the t value squared (Spatz 2001). Because the t test and the one-way ANOVA produce identical results when there are only two groups being compared, most researchers use the one-way ANOVA only when they are comparing three or more groups. Because the independent t test and the one-way ANOVA are so similar, people often wonder, Why don’t we just use t tests instead of one-way ANOVAs? Perhaps the best way to answer this question is by using an example. Suppose that I want to go into the potato chip business. I’ve got three different recipes, but because I’m new to the business and don’t have a lot of money, I can produce only one flavor. I want to see which flavor people like best and produce that one. I randomly select 90 adults and randomly divide them into three groups. One group tries my BBQ-flavored chips, the second group tries my ranch-flavored chips, and the third group tastes my cheese-flavored chips. All participants in each group fill out a rating form after tasting the chips to indicate how much they liked the taste of the chips. The rating scale goes from a score of 1 (“Hated it”) to 7 (“Loved it”). I then compare the average ratings of the three groups to see which group liked the taste of their chips the most.

In this example, the chip flavor (BBQ, Ranch, Cheese) is my categorical, independent variable and the rating of the taste of the chips is my continuous, dependent variable. To see which flavor received the highest average rating, I could run three separate independent t tests comparing (a) BBQ with Ranch, (b) BBQ with Cheese, and (c) Ranch with Cheese. The problem with running three separate t tests is that each time we run a t test, we must make a decision about whether the difference between the two means is meaningful, or statistically significant. This decision is based on probability, and every time we make such a decision, there is a slight chance we might be wrong. The more times we make decisions about the significance of t tests, the greater the chances are that we will be wrong. In other words, the more t tests we run, the greater the chances become of deciding that a t test is significant (i.e., that the means being compared are really different) when it really is not. A one-way ANOVA fixes this problem by adjusting for the number of groups being compared. Let’s consider factorial ANOVA and repeated-measures ANOVA. These techniques are based on the same general principles as one-way ANOVA. Namely, they all involve the partitioning of the variance of a dependent variable into its component parts (e.g., the part attributable to between-group differences, the part attributable to within-group variance, or error). In addition, these techniques allow us to examine more complex, and often more interesting questions than is allowed by simple one-way ANOVA. 8. Reliability and Validity Two important concepts in any research design course are reliability and validity. For example, small samples produce highly variable estimates of population parameters. This relation between sample size and the precision of estimates produces the familiar decrease in statistical power that occurs as sample size decreases. Similarly, measurement error hampers estimation of population values. Unreliable measures limit the magnitude of the correlation that can be obtained in a sample, with that upper limit being less than the population correlation. This relation is captured in the maxim that reliability is a necessary but not a sufficient condition for validity (Neale and Liebert 1986)). Understanding the limits that sample size and reliability place on expected outcomes is central to good research design. Although these relations are well-known to experienced researchers and are captured by statistical formulas, the concepts are often difficult for students to grasp. How small a sample is "too small?" When is a measure too unreliable? Answering these questions requires a firm understanding of the functional relation of study outcome to sample size and reliability. Many scholars have found it useful to demonstrate these relations with a computer program that visually displays the influences of sample size and reliability on study outcome. The program generates random samples from populations defined by the user. Consequently, it can be used to demonstrate the variability in sample statistics that arises from different sample sizes, the variability in sample correlations that arises from different sample sizes, and the impact of reliability on sample correlations. The program allows people to explore more fully the relations of sample size and reliability to study outcome and to appreciate the relations in a way that is not immediately apparent from examining a statistical formula.