Class Pricing Project: Hypotheses Testing of the Kmart and Walmart’s Prices Completed by University of Outline 1. Research question for the study 2. Research hypothesis 3. Designing a study to test the research hypothesis 4. Conducting the study and collecting numerical data. 5. Analyzing the data and calculating the posterior distribution (density) 6. Determining the appropriate critical value for the obtained test statistic and finding the appropriate p value. 7. Deciding whether to retain or reject the null hypothesis. 8. Confirming the study’s hypothesis 9. Summarizing the study’s conclusions, addressing the study’s research question. The purpose of this Class Pricing Project is to determine whether the prices at one retail establishment (Kmart) are lower or higher than another retailer establishment (Walmart). We are comparing the prices of 100 items at these retailers to see if there is a statistically significant difference between the prices of the two retailers. The 100 items are exactly the same items (product, brand) sold by Kmart and Walmart. Our research hypothesis states that generally prices at Kmart retail establishment are higher. We illustrate this hypothesis by using a real data set taken from the websites of the two retailers to compare the prices for the same products at Kmart and Walmart. The Testing Hypothesis states that one (Kmart) is more expensive than the other (Walmart). The parameter of interest in this study is the difference in the prices between the two retailers. The observed pricing rates (proportion of higher prices for the same products) in this study were .063 for Walmart, and .073 for Kmart. Relatively simple calculations from the original data show that, using a uniform prior distribution, the posterior distribution of the difference in pricing rates should be approximately normal, with a mean of .01 (that is, a 1% difference favoring Walmart), and a standard deviation of .0031. The calculations by a Bayesian method to arrive at the value .0031 are the same as a classical statistician would use to derive the standard error of a difference in proportions. This small standard deviation of the posterior distribution is a result of a large sample size: 100 for the Kmart and Walmart group. Figure 1 Figure 1 shows the posterior distribution of difference in pricing rates between Kmart and Walmart. Values greater than 0 indicate that Walmart has lower pricing range; numbers less than 0 indicate that Kmart has lower pricing range.

The points corresponding to effect sizes of .005 and .010, and the posterior probabilities corresponding to areas below .005, between .005 and .010, and above .010, are displayed. Almost all of the area under the curve is between 0 and .02, indicating that Walmart has lower pricing range. This same distribution would be used by a statistician using classical methods to obtain a point estimate of the difference (.01), confidence intervals (e.g., a 95% confidence interval goes from .004 to .016), and to do hypothesis tests (here, we reject the hypothesis that δ is zero). The power to detect a difference in pricing rates of .001, or one tenth of 1%, is .052, which is close to the alpha value of .05 that would be used. Therefore, the rejection of the exact null hypothesis also allows us to reject the "very small" null hypothesis. This leads us to conclude that the effect is not "very small," if "very small" is defined as less than .001. One who wanted to use a Bayesian method to do something comparable to a classical hypothesis test would compute the probability that the difference was less than zero. This probability is represented by the area under the curve to the left of zero; this probability is very nearly zero. This is the same probability that a classical statistician would derive for a one-tailed test (it would be doubled for a two-tailed test). The Bayesian method conclusion from this calculation is that there is little chance that Kmart is superior to Walmart. As discussed previously, more informative Bayesian results can be presented by choosing an effect size that is large enough to be significant. For these data, the general consensus was that the difference in pricing between retailers would have to be greater than 1% (.01) in order for them to have a clear preference of one store over the other. That is, the pricing range would have to be at least .01 higher for one group than for the other group. We determine the importance of this choice by doing a sensitivity analysis, in which we recalculate using other values, such as .02 and .005. The probability that the effect is a large one favoring Walmart is represented by the area to the right of the value .01; it is easy to see in this case that this probability is .50. The probability that the effect is a large one favoring Kmart is represented by the area to the left of the value -.01 (a value so unlikely that it does not even appear in the figure). This corresponds to a standard normal deviate (z-score) of {(-.

01) - .01}/.0031 = -.02/.0031, or about -6.4. As expected from examining Figure 1, the probability that a standard normal deviate is less than -6.4 is very near zero. Therefore, by subtraction we calculate that the probability that the effect is "small," in the sense of being less than .01 in absolute value (i.e., between -.01 and .01), is about .50. We can further refine our method by dividing the area of "small" effect into "small difference in favor of Walmart," and "small difference in favor of Kmart." These probabilities are nearly .50, and nearly zero, respectively. For example, the probability that the effect is less than zero can be calculated from the z-score (0 - .01)/.031 = -3.2. The desired probabilities then can be calculated by subtraction. One can conclude, therefore, that Kmart is not only extremely unlikely to be highly superior to Walmart, it is quite unlikely to be even slightly superior to Walmart. How sensitive are these results to the choice of .01 as a price sensitivity effect? First, suppose that we had used .005 instead of .01. Figure 1 shows that we would calculate the probability of a small effect as, approximately, the probability of finding a standard normal deviate z less than (.005 - .010)/.0031 = -1.67; this probability is about .05. The probability that the effect is price-wise significant in favor of Walmart is about .95. We are reasonably certain in this case that the effect is clinically significant. Next, suppose that we had used a criterion of .02 for greater significance. We see from the figure that the probability is near I that the effect is a small one in favor of Walmart, and near zero that it is "large" in favor of Walmart. From this example, we see that when the mode (peak) of the posterior distribution is near the dividing line between "small" and "large," the location of the dividing line can have an effect on the interpretation. The Bayesian lesson is that we may need to look at the whole posterior distribution. The usual null hypothesis-testing rationale arises from a test that a parameter is exactly zero (or some other value). This is a point (or simple) hypothesis; that is, a hypothesis that the parameter is exactly equal to some point (single value) in the space of possible parameter values. But, in fact, most researchers probably don’t really want to test this hypothesis. Instead, they probably want to test the hypothesis that an effect is so small that it is unimportant, versus the alternative hypothesis that the effect is large enough to be considered important.

In statistical terms, both the null and alternative hypothesis would be composite hypotheses; that is, they each consist of ranges of possible values, rather than single values. For example, rather than testing the null hypothesis that "The mean difference in IQ scores between two groups is exactly equal to 0," the researcher might really want to test the "small" hypothesis that "The mean difference in IQ scores between the two groups is less than 2 IQ points." The alternative hypothesis would be that the mean difference is at least 2 IQ points. This viewpoint is not new; however, it is not widely known and even less widely practiced. If this is what researchers want to test, why don’t they test this hypothesis directly? Because in the classical framework for statistics it is much simpler mathematically to test a point null hypothesis than it is to test a composite "nearly null" hypothesis. Some attention is now being devoted to discussing practical procedures for such tests. These procedures, like power tests, involve non-central distributions, and (also like power tests) are not found in most standard statistical packages. (The usual hypothesis-testing procedure involves the use of central distributions, which describe the distribution of a statistic, such as t or F, when the null hypothesis is true. For testing composite hypotheses we must use non-central distributions, which describe the distribution of a statistic when the null hypothesis is false.) This paper has two main conclusions. First, null hypothesis testing is not as irrational as it is sometimes made to seem. One must merely understand that it does approximately the right thing most of the time, especially in many social science studies, where small to medium sample sizes are typical. If we follow the usual advice to declare an effect significant only if it is both statistically significant and large enough to matter, then our methods may be roughly right. Exact methods of testing "small" null hypotheses are preferable and available, but not yet as part of popular statistical packages. In any case, null hypothesis tests serve a useful purpose in decision making, from the viewpoint of classical statistics. Second, the Bayesian method offers more gratifying interpretations than classical methods. One can easily present probabilities that the effect size is small, or is large in either direction. These probabilities can be extremely helpful in making decisions. Further, in many situations the Bayesian only needs to use central distributions, which are widely found in tables and computer programs, rather than non-central distributions, which are not.

In some situations, of course, confidence intervals (for a classical statistician) or credible intervals (for a Bayesian) are more useful than explicit hypothesis tests. Finally, note that in all null hypothesis tests, there is a bias against rejecting the null hypothesis. To see an extreme example of this, consider what result is necessary to reject a null hypothesis with a one-tailed test at the .05 level. From a Bayesian perspective, this means that there is a greater than 95% probability that the parameter in question is greater than zero (or less than zero, depending on the direction). So the odds must be at least 19 to I that the parameter is on one side of zero before the null hypothesis is rejected. Problems/ Difficulties What problems did you have doing your Pricing Project? It was difficult to choose the retailers that would be very similar in products range, yet that would have differing prices. Also, it was very difficult to decide on the sample size because the large sample size is more desirable, but large sample is also more difficult to get and access. Did you have a difficult time determining which statistical tool to use? If you had to do over what would you differently? What other comments do you have about the project? No, it was easy determining what statistical tool to use. I knew that null and alternative hypothesis had to be tested. Deriving the posterior distribution method was easy because I knew that it was reliable for comparing exact samples from two groups (Kmart and Walmart in our case). I chose to use Bayesian method as well to have extra confidence in the results derived by using posterior distribution. If I had to do it over, I would do Bayesian calculations differently. The Bayesian calculations in many instances can be found by simple modifications of calculations used in classical statistics. When this is true, the only additional tools required are those used to calculate areas under probability distributions. For normal curves, tables of the standard normal distribution function are found in nearly every textbook, although using a computer can be faster and more accurate. For other distributions, such as the t-distribution, additional aids are needed.