Solution Manual for Using R for Introductory Statistics, 2nd Edition

Solve your textbook exercises quickly and effectively with Solution Manual for Using R for Introductory Statistics, 2nd Edition, your go-to solutions manual.

Benjamin Griffin
Contributor
4.1
43
5 months ago
Preview (16 of 180 Pages)
100%
Purchase to unlock

Loading document content...

Preview Mode

Sign in to access the full document!

Solution Manual for Using R for Introductory Statistics, 2nd Edition

Page 1

solutions MA nu A l F o R u sing R F o R i nt R oducto R y s t A tistics second edition John Verzani by

Page 2

Page 3

Contents Contents i 1 Getting Started 1 2 Univariate data 5 3 Bivariate data 27 4 Multivariate data 47 5 Multivariate graphics 56 6 Populations 61 7 Statistical inference 71 8 Confidence intervals 72 9 Significance tests 91 10 Goodness of fit 109 11 Linear regression 121 12 Analysis of variance 144 13 Extensions of linear model 169 14 Thanks 179 i

Page 4

1 Getting Started 1.1 The only thing to remember is the placement of parentheses, and the need to use * for multiplication: 1 + 2*(3+4) ## [1] 15 4^3 + 3^(2+1) ## [1] 91 sqrt((4+3)*(2+1)) ## [1] 4.582576 ( (1+2)/(3+4) )^2 ## [1] 0.1836735 1.2 These would be ( 2 + 3 ) 4, 2 + ( 3 4 ) , ( 2/3 ) /4, and 2 ( 3 4 ) ; the last work- ing right to left. 1.3 Translating this to R requires attention to the use of parentheses and using an asterisk for multiplication: (1 + 2*3^4) / (5/6 - 7) 1

Page 5

CHAPTER 1. GETTING STARTED 2 ## [1] -26.43243 1.4 We use the 1/2 power as an alternative to the sqrt function: (0.25 - 0.2) / (0.2 * (1 - 0.2)/100)^(1/2) ## [1] 1.25 1.5 We don’t use c below, as it is a very commonly used function in R: a <- 2; b <- 3; d <- 4; e <- 5 a * b * d * e ## [1] 120 1.6 It is 1770. 1.7 It is 2510 . Instead of scanning, this can be automated: require(UsingR) ## Loading required package: UsingR ## Loading required package: MASS ## ## Attaching package: ’UsingR’ ## ## The following object is masked from ’package:ggplot2’: ## ## movies ## ## The following object is masked from ’package:survival’: ## ## cancer max(exec.pay) ## [1] 2510

Page 6

CHAPTER 1. GETTING STARTED 3 1.8 These values come from: require(UsingR) mean(exec.pay) ## [1] 59.88945 min(exec.pay) ## [1] 0 max(exec.pay) ## [1] 2510 1.9 This is done with: require(UsingR) mean(exec.pay) ## [1] 59.88945 mean(exec.pay, trim=0.10) ## [1] 29.96894 The big difference is due to the fact that the very large salaries that are trimmed have big influence on the average of the data set computed by mean . 1.10 The variable names are printed when the data set is displayed. They are Tree , age , and circumference . 1.11 The only trick is to reference the variable appropriately: mean(Orange$age)

Page 7

CHAPTER 1. GETTING STARTED 4 ## [1] 922.1429 1.12 The largest value in a collection is returned by max : max(Orange$circumference) ## [1] 214

Page 8

2 Univariate data 2.1 For example: p <- c(2, 3, 5, 7, 11, 13, 17, 19) 2.2 The diff function returns the distance between fill-ups, so mean(diff(gas)) is your average mileage per fill-up, and mean(gas) is the uninteresting average of the recorded mileage. 2.3 The data may be entered in using c then manipulated in a natural way. x <- c(2, 5, 4, 10, 8) x^2 ## [1] 4 25 16 100 64 x - 6 ## [1] -4 -1 -2 4 2 (x - 9)^2 ## [1] 49 16 25 1 1 2.4 These can be done with 5

Page 9

CHAPTER 2. UNIV ARIATE DATA 6 rep("a", 10) ## [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" seq(1, 99, by=2) ## [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 ## [21] 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 ## [41] 81 83 85 87 89 91 93 95 97 99 rep(1:3, rep(3,3)) ## [1] 1 1 1 2 2 2 3 3 3 rep(1:3, 3:1) ## [1] 1 1 1 2 2 3 c(1:5, 4:1) ## [1] 1 2 3 4 5 4 3 2 1 2.5 These can be done with the following commands: primes_under_20 <- c(1, 2, 3, 5, 8, 13, 21, 34) ns <- 1:10 recips <- 1/ns cubes <- (1:6)^3 years <- 1964:2014 subway <- c(14, 18, 23, 28, 34, 42, 50, 59, 66, 72, 79, 86, 96, 103, 110) by25 <- seq(0,1000, by=25) 2.6 We have: sum(abs(rivers - mean(rivers))) / length(rivers)

Page 10

CHAPTER 2. UNIV ARIATE DATA 7 ## [1] 313.5508 To elaborate, rivers - mean(rivers) centers the values and is a data vector. Calling abs makes all the values non-negative, and sum reduces the result to a single number, which is then divided by the length. 2.7 The unary minus is evaluated before the colon: -1:3 # like (-1):3 ## [1] -1 0 1 2 3 However, the colon is evaluated before multiplication in the latter: 1:2*3 # not like 1:(2*3) ## [1] 3 6 2.8 If we know the cities starting with a “J” then this is just an exercise in indexing by the names attribute, as with: precip["Juneau"] ## Juneau ## 54.7 Getting the cities with the names beginning with “J” can be done by sorting and inspecting, say with sort(names(precip)) . This gives: j_cities <- c("Jackson", "Jacksonville", "Juneau") precip[j_cities] ## Jackson Jacksonville Juneau ## 49.2 54.5 54.7 The inspection of the names by scanning can be tedious for large data sets. The grepl function can be useful here, but requires the specifica-

Page 11

CHAPTER 2. UNIV ARIATE DATA 8 tion of a regular expression to indicate words that start with “J”. As a teaser, here is how this could be done: precip[grepl("^J", names(precip))] ## Juneau Jacksonville Jackson ## 54.7 54.5 49.2 Regular expressions are described in the help page ?regexp. 2.9 There are many ways to do this, the following uses paste : paste("Trial", 1:10) ## [1] "Trial 1" "Trial 2" "Trial 3" "Trial 4" "Trial 5" ## [6] "Trial 6" "Trial 7" "Trial 8" "Trial 9" "Trial 10" 2.10 This answer will very depending on the underlying system. One answer is: paste(dname, fname, sep=.Platform$file.sep) ## [1] "/Library/Frameworks/R.framework/Versions/3.2/Resources/library/UsingR/DESCRIPTION" 2.11 The number of levels and number of cases are returned by: require(MASS) man <- Cars93$Manufacturer length(man) # number of cases ## [1] 93 length(levels(man)) # number of levels ## [1] 32

Page 12

CHAPTER 2. UNIV ARIATE DATA 9 2.12 Looking at the levels, we see that one is rotary , which is clearly not numeric. As for the 5-cylinder cars, we can get them as follows: cyl <- Cars93$Cylinders levels(cyl) # "rotary" ## [1] "3" "4" "5" "6" "8" "rotary" which(cyl == "5") # just 5 is also okay ## [1] 89 93 Cars93$Manufacturer[ which(cyl == 5) ] # which companies ## [1] Volkswagen Volvo ## 32 Levels: Acura Audi BMW Buick Cadillac Chevrolet ... Volvo 2.13 The factor function allows this to be done by specifying the labels argument: mtcars$am <- factor(mtcars$am, labels=c("automatic", "manual")) This produces a modified, local copy of mtcars . The ordering of the la- bels should match the following: sort(unique(as.character(mtcars$am))) . 2.14 The answer is no: require(HistData) any(Arbuthnot$Female > Arbuthnot$Male) ## [1] FALSE Read the help page to see how this could be construed to show the “guiding hand of a devine being.” 2.15 We have:

Page 13

CHAPTER 2. UNIV ARIATE DATA 10 A <- c(TRUE, FALSE, TRUE, TRUE) B <- c(TRUE, FALSE, TRUE, TRUE) !(A & B) ## [1] FALSE TRUE FALSE FALSE !A | !B ## [1] FALSE TRUE FALSE FALSE It is not necessary to express the latter as (!A) | (!B) , as the unary ! operator has higher precedence than the binary | operator. 2.16 We use logical extraction for this task: names(precip[precip > 50]) ## [1] "Mobile" "Juneau" "Jacksonville" "Miami" ## [5] "New Orleans" "San Juan" 2.17 After parsing the question, it can be seen that this expression answers it: m <- mean(precip) trimmed_m <- mean(precip, trim=0.25) any(precip > m + 1.5 * trimmed_m) ## [1] FALSE A similar question is used for the algorithmic determination of “out- liers” in a data set. 2.18 The comparison of strings is done lexicographically. That is, compar- isons are done character by character until a tie is broken. The com- parison of characters varies due to the locale. This may be decided by ASCII codes—which yields alphabetically ordering—but need not be. See ?locale for more detail. 2.19 First we store the data, then we analyze it.

Page 14

CHAPTER 2. UNIV ARIATE DATA 11 commutes <- c(17, 16, 20, 24, 22, 15, 21, 15, 17, 22) commutes[commutes == 24] <- 18 max(commutes) ## [1] 22 min(commutes) ## [1] 15 mean(commutes) ## [1] 18.3 sum(commutes >= 20) ## [1] 4 sum(commutes < 18)/length(commutes) ## [1] 0.5 2.20 We need to know that the months with 31 days are 1, 3, 5, 7, 8, 10, and 12. cds <- c(79, 74, 161, 127, 133, 210, 99, 143, 249, 249, 368, 302) longmos <- c(1, 3, 5, 7, 8, 10, 12) long <- cds[longmos] short <- cds[-longmos] mean(long) ## [1] 166.5714 mean(short) ## [1] 205.6

Page 15

CHAPTER 2. UNIV ARIATE DATA 12 2.21 Enter in the data as follows: x <- c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.441, 1.72) names(x) <- 1990:1999 Using diff gives diff(x) ## 1991 1992 1993 1994 1995 1996 1997 1998 1999 ## 0.320 0.190 0.040 0.060 -0.110 0.100 0.210 0.061 0.279 We can see that one year was negative: which(diff(x) < 0) ## 1995 ## 5 The jump between 1994 and 1995 was negative (there was a work stop- page that year). The percentage difference is found by dividing by x[-10] and multiplying by 100. (Recall that x[-10] is all but the tenth (10th) number of x ). The first year’s jump was the largest. diff(x)/x[-10] * 100 ## 1991 1992 1993 1994 1995 1996 ## 56.140351 21.348315 3.703704 5.357143 -9.322034 9.345794 ## 1997 1998 1999 ## 17.948718 4.420290 19.361554 2.22 We have: mean_distance <- function(x) { distances <- abs(x - mean(x)) mean(distances) }

Page 16

CHAPTER 2. UNIV ARIATE DATA 13 2.23 This can be done through: f <- function(x) { mean(x^2) - mean(x)^2 } f(1:10) ## [1] 8.25 2.24 A simple answer is just given by: iseven <- function(x) x %%2 == 0 Then isodd would be: isodd <- function(x) x%%2 == 1 The following implementation ensures integers are used, and adds names: iseven <- function(x) { x <- as.integer(x) ans <- x %% 2 == 0 setNames(ans, x) # add names } iseven(1:10) ## 1 2 3 4 5 6 7 8 9 10 ## FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE Restricting a function to handle only integer inputs can be achieved by using generic functions, such as described in Appendix ?? . 2.25 A simple implementation looks like this. One could improve it by only looking at integer factors less or equal the square-root of x . isprime <- function(x){ !any(x %% 2:(x-1) == 0) }
solutions MA nu A l F o R u sing R F o R i nt R oducto R y s t A tistics second edition John Verzani by Contents Contents i 1 Getting Started 1 2 Univariate data 5 3 Bivariate data 27 4 Multivariate data 47 5 Multivariate graphics 56 6 Populations 61 7 Statistical inference 71 8 Confidence intervals 72 9 Significance tests 91 10 Goodness of fit 109 11 Linear regression 121 12 Analysis of variance 144 13 Extensions of linear model 169 14 Thanks 179 i 1 Getting Started 1.1 The only thing to remember is the placement of parentheses, and the need to use * for multiplication: 1 + 2*(3+4) ## [1] 15 4^3 + 3^(2+1) ## [1] 91 sqrt((4+3)*(2+1)) ## [1] 4.582576 ( (1+2)/(3+4) )^2 ## [1] 0.1836735 1.2 These would be ( 2 + 3 ) − 4, 2 + ( 3 ∗ 4 ) , ( 2/3 ) /4, and 2 ( 3 4 ) ; the last work- ing right to left. 1.3 Translating this to R requires attention to the use of parentheses and using an asterisk for multiplication: (1 + 2*3^4) / (5/6 - 7) 1 CHAPTER 1. GETTING STARTED 2 ## [1] -26.43243 1.4 We use the 1/2 power as an alternative to the sqrt function: (0.25 - 0.2) / (0.2 * (1 - 0.2)/100)^(1/2) ## [1] 1.25 1.5 We don’t use c below, as it is a very commonly used function in R: a <- 2; b <- 3; d <- 4; e <- 5 a * b * d * e ## [1] 120 1.6 It is 1770. 1.7 It is 2510 . Instead of scanning, this can be automated: require(UsingR) ## Loading required package: UsingR ## Loading required package: MASS ## ## Attaching package: ’UsingR’ ## ## The following object is masked from ’package:ggplot2’: ## ## movies ## ## The following object is masked from ’package:survival’: ## ## cancer max(exec.pay) ## [1] 2510 CHAPTER 1. GETTING STARTED 3 1.8 These values come from: require(UsingR) mean(exec.pay) ## [1] 59.88945 min(exec.pay) ## [1] 0 max(exec.pay) ## [1] 2510 1.9 This is done with: require(UsingR) mean(exec.pay) ## [1] 59.88945 mean(exec.pay, trim=0.10) ## [1] 29.96894 The big difference is due to the fact that the very large salaries that are trimmed have big influence on the average of the data set computed by mean . 1.10 The variable names are printed when the data set is displayed. They are Tree , age , and circumference . 1.11 The only trick is to reference the variable appropriately: mean(Orange$age) CHAPTER 1. GETTING STARTED 4 ## [1] 922.1429 1.12 The largest value in a collection is returned by max : max(Orange$circumference) ## [1] 214 2 Univariate data 2.1 For example: p <- c(2, 3, 5, 7, 11, 13, 17, 19) 2.2 The diff function returns the distance between fill-ups, so mean(diff(gas)) is your average mileage per fill-up, and mean(gas) is the uninteresting average of the recorded mileage. 2.3 The data may be entered in using c then manipulated in a natural way. x <- c(2, 5, 4, 10, 8) x^2 ## [1] 4 25 16 100 64 x - 6 ## [1] -4 -1 -2 4 2 (x - 9)^2 ## [1] 49 16 25 1 1 2.4 These can be done with 5 CHAPTER 2. UNIV ARIATE DATA 6 rep("a", 10) ## [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" seq(1, 99, by=2) ## [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 ## [21] 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 ## [41] 81 83 85 87 89 91 93 95 97 99 rep(1:3, rep(3,3)) ## [1] 1 1 1 2 2 2 3 3 3 rep(1:3, 3:1) ## [1] 1 1 1 2 2 3 c(1:5, 4:1) ## [1] 1 2 3 4 5 4 3 2 1 2.5 These can be done with the following commands: primes_under_20 <- c(1, 2, 3, 5, 8, 13, 21, 34) ns <- 1:10 recips <- 1/ns cubes <- (1:6)^3 years <- 1964:2014 subway <- c(14, 18, 23, 28, 34, 42, 50, 59, 66, 72, 79, 86, 96, 103, 110) by25 <- seq(0,1000, by=25) 2.6 We have: sum(abs(rivers - mean(rivers))) / length(rivers) CHAPTER 2. UNIV ARIATE DATA 7 ## [1] 313.5508 To elaborate, rivers - mean(rivers) centers the values and is a data vector. Calling abs makes all the values non-negative, and sum reduces the result to a single number, which is then divided by the length. 2.7 The unary minus is evaluated before the colon: -1:3 # like (-1):3 ## [1] -1 0 1 2 3 However, the colon is evaluated before multiplication in the latter: 1:2*3 # not like 1:(2*3) ## [1] 3 6 2.8 If we know the cities starting with a “J” then this is just an exercise in indexing by the names attribute, as with: precip["Juneau"] ## Juneau ## 54.7 Getting the cities with the names beginning with “J” can be done by sorting and inspecting, say with sort(names(precip)) . This gives: j_cities <- c("Jackson", "Jacksonville", "Juneau") precip[j_cities] ## Jackson Jacksonville Juneau ## 49.2 54.5 54.7 The inspection of the names by scanning can be tedious for large data sets. The grepl function can be useful here, but requires the specifica- CHAPTER 2. UNIV ARIATE DATA 8 tion of a regular expression to indicate words that start with “J”. As a teaser, here is how this could be done: precip[grepl("^J", names(precip))] ## Juneau Jacksonville Jackson ## 54.7 54.5 49.2 Regular expressions are described in the help page ?regexp. 2.9 There are many ways to do this, the following uses paste : paste("Trial", 1:10) ## [1] "Trial 1" "Trial 2" "Trial 3" "Trial 4" "Trial 5" ## [6] "Trial 6" "Trial 7" "Trial 8" "Trial 9" "Trial 10" 2.10 This answer will very depending on the underlying system. One answer is: paste(dname, fname, sep=.Platform$file.sep) ## [1] "/Library/Frameworks/R.framework/Versions/3.2/Resources/library/UsingR/DESCRIPTION" 2.11 The number of levels and number of cases are returned by: require(MASS) man <- Cars93$Manufacturer length(man) # number of cases ## [1] 93 length(levels(man)) # number of levels ## [1] 32 CHAPTER 2. UNIV ARIATE DATA 9 2.12 Looking at the levels, we see that one is rotary , which is clearly not numeric. As for the 5-cylinder cars, we can get them as follows: cyl <- Cars93$Cylinders levels(cyl) # "rotary" ## [1] "3" "4" "5" "6" "8" "rotary" which(cyl == "5") # just 5 is also okay ## [1] 89 93 Cars93$Manufacturer[ which(cyl == 5) ] # which companies ## [1] Volkswagen Volvo ## 32 Levels: Acura Audi BMW Buick Cadillac Chevrolet ... Volvo 2.13 The factor function allows this to be done by specifying the labels argument: mtcars$am <- factor(mtcars$am, labels=c("automatic", "manual")) This produces a modified, local copy of mtcars . The ordering of the la- bels should match the following: sort(unique(as.character(mtcars$am))) . 2.14 The answer is no: require(HistData) any(Arbuthnot$Female > Arbuthnot$Male) ## [1] FALSE Read the help page to see how this could be construed to show the “guiding hand of a devine being.” 2.15 We have: CHAPTER 2. UNIV ARIATE DATA 10 A <- c(TRUE, FALSE, TRUE, TRUE) B <- c(TRUE, FALSE, TRUE, TRUE) !(A & B) ## [1] FALSE TRUE FALSE FALSE !A | !B ## [1] FALSE TRUE FALSE FALSE It is not necessary to express the latter as (!A) | (!B) , as the unary ! operator has higher precedence than the binary | operator. 2.16 We use logical extraction for this task: names(precip[precip > 50]) ## [1] "Mobile" "Juneau" "Jacksonville" "Miami" ## [5] "New Orleans" "San Juan" 2.17 After parsing the question, it can be seen that this expression answers it: m <- mean(precip) trimmed_m <- mean(precip, trim=0.25) any(precip > m + 1.5 * trimmed_m) ## [1] FALSE A similar question is used for the algorithmic determination of “out- liers” in a data set. 2.18 The comparison of strings is done lexicographically. That is, compar- isons are done character by character until a tie is broken. The com- parison of characters varies due to the locale. This may be decided by ASCII codes—which yields alphabetically ordering—but need not be. See ?locale for more detail. 2.19 First we store the data, then we analyze it. CHAPTER 2. UNIV ARIATE DATA 11 commutes <- c(17, 16, 20, 24, 22, 15, 21, 15, 17, 22) commutes[commutes == 24] <- 18 max(commutes) ## [1] 22 min(commutes) ## [1] 15 mean(commutes) ## [1] 18.3 sum(commutes >= 20) ## [1] 4 sum(commutes < 18)/length(commutes) ## [1] 0.5 2.20 We need to know that the months with 31 days are 1, 3, 5, 7, 8, 10, and 12. cds <- c(79, 74, 161, 127, 133, 210, 99, 143, 249, 249, 368, 302) longmos <- c(1, 3, 5, 7, 8, 10, 12) long <- cds[longmos] short <- cds[-longmos] mean(long) ## [1] 166.5714 mean(short) ## [1] 205.6 CHAPTER 2. UNIV ARIATE DATA 12 2.21 Enter in the data as follows: x <- c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.441, 1.72) names(x) <- 1990:1999 Using diff gives diff(x) ## 1991 1992 1993 1994 1995 1996 1997 1998 1999 ## 0.320 0.190 0.040 0.060 -0.110 0.100 0.210 0.061 0.279 We can see that one year was negative: which(diff(x) < 0) ## 1995 ## 5 The jump between 1994 and 1995 was negative (there was a work stop- page that year). The percentage difference is found by dividing by x[-10] and multiplying by 100. (Recall that x[-10] is all but the tenth (10th) number of x ). The first year’s jump was the largest. diff(x)/x[-10] * 100 ## 1991 1992 1993 1994 1995 1996 ## 56.140351 21.348315 3.703704 5.357143 -9.322034 9.345794 ## 1997 1998 1999 ## 17.948718 4.420290 19.361554 2.22 We have: mean_distance <- function(x) { distances <- abs(x - mean(x)) mean(distances) } CHAPTER 2. UNIV ARIATE DATA 13 2.23 This can be done through: f <- function(x) { mean(x^2) - mean(x)^2 } f(1:10) ## [1] 8.25 2.24 A simple answer is just given by: iseven <- function(x) x %%2 == 0 Then isodd would be: isodd <- function(x) x%%2 == 1 The following implementation ensures integers are used, and adds names: iseven <- function(x) { x <- as.integer(x) ans <- x %% 2 == 0 setNames(ans, x) # add names } iseven(1:10) ## 1 2 3 4 5 6 7 8 9 10 ## FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE Restricting a function to handle only integer inputs can be achieved by using generic functions, such as described in Appendix ?? . 2.25 A simple implementation looks like this. One could improve it by only looking at integer factors less or equal the square-root of x . isprime <- function(x){ !any(x %% 2:(x-1) == 0) }

Study Now!

XY-Copilot AI
Unlimited Access
Secure Payment
Instant Access
24/7 Support
Document Chat

Document Details

Subject
Statistics

Related Documents

View all