########################## ## R Assignment 13 ## ########################## ## Curtis Miller ## ## 12/17/12 ## ## MATH 3070 ## ## Lab Assignment 13 ## ########################## # "8.2 The samhda(UsingR) data set contains information on marijuana usage among children as collected at the Substance Abuse and Mental Health Data Archive. The variable marijuana indicates whether the individual has ever tried marijuana. A 1 means yes, a 2 no. If it used to be that 50% of the target population had tried marijuana, does this data indicate an increase in marijuana usage? # "1) Identify the null hypothesis and the alternative hypothesis." # The null hypothesis is the population proportion still equals .5. The alternative hypothesis is the population proportion has increased and is greater than .5. # "2) Do a significance test of proportion to decide." # First, I load the UsingR package and attach the samhda dataset, so I can more easily access the marijuana vector. library(UsingR) attach(samhda) # I use the prop.test() function to determine whether marijuana usage has increased. For x, I used length() to determine the number of marijuana users in the sample (given by marijuana[marijuana == 1]), and for n I simply used length(marijuana). I leave p at its default (.5) and set alt="greater" since my alternative hypothesis suggests that p is greater than .5. prop.test(length(marijuana[marijuana == 1]), length(marijuana), alt="greater") # The resulting p-value = 1, a failure to reject the null hypothesis. Having completed the significance test, I detach the samhda dataset. detach(samhda) # "8.13 (modified) A consumer-reports group is testing whether a gasoline additive changes a car's gas milage. A test of 7 cars finds an average improvement of 0.5 miles per gallon with a standard deviation of 3.77. Is this difference significantly greater than 0? Assume the values comes from a normal population. # "1) Identify the null hypothesis and the alternative hypothesis." # The null hypothesis is the change in gasoline mileage is zero, signifying no improvement. The alternative hypothesis is that gasoline mileage has improved, signified by a difference greater than zero. # "2) Do a significance test to decide." # First, I assign the data to variables; dbar represents the sample difference, Sd represents the sample standard deviation, and n represents the number of data in the sample. dbar = .5; Sd = 3.77; n = 7 # Next, I compute the t-statistic (the sample is too small for a z-statistic), which is given by (dbar - mu0)/(Sd/sqrt(n)) (note that mu0 = 0). tstat = dbar/(Sd/sqrt(n)) # The p-value is found by finding the probability of finding a t-statistic greater than the one found assuming the null hypothesis is true. The pt() function will compute this probability. Since the alternative hypothesis suggests the difference is greater than 0, I set lower.tail=F. pt(tstat, df = n - 1, lower.tail=F) # The p-value computed is 0.3688294. The data is not significant enough to justify rejecting the null hypothesis. # "3) [EXTRA CREDIT] Compute the Type II error, beta." # beta represents the probability of failing to reject the null hypothesis when the alternative hypothesis is true, typically computed for a particular value for the alternative hypothesis. For this example, I will compute beta when the true difference is 5. In this example, the beta value is the lower tail of the t-distribution, the statistic being the t critical value given alpha = 0.05 less the difference between the true population difference and the null hypothesis difference divided by the standard error Sd/sqrt(n) (this latter part represented by the variable betastat). betastat = 5/(Sd/sqrt(n)) pt(qt(0.05, df = n - 1, lower.tail = F) - betastat, df = n - 1) # The computed beta is 0.08422042.