Solution Manual for Statistics for the Life Sciences, 5th Edition

Get ahead in your studies with Solution Manual for Statistics for the Life Sciences, 5th Edition, offering the solutions and explanations needed to understand your textbook.

Sofia Garcia
Contributor
4.3
54
5 months ago
Preview (16 of 313 Pages)
100%
Purchase to unlock

Page 1

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 1 preview image

Loading page image...

SSOLUTIONSMANUALSTATISTICSFOR THELIFESCIENCESFIFTHEDITIONMyra SamuelsPurdue UniversityJeffrey A. WitmerOberlin CollegeAndrew SchaffnerCalifornia Polytechnic State University

Page 2

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 2 preview image

Loading page image...

Page 3

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 3 preview image

Loading page image...

ContentsI.General Comments1II. Comments on Chapters3III. Comments on Sampling Exercises19Chapter 1Introduction24Chapter 2Description of Samples and Population27Chapter 3Probability and the Binomial Distribution48Chapter 4The Normal Distribution59Chapter 5Sampling Distributions72Unit I Summary91Chapter 6Confidence Intervals95Chapter 7Comparison of Two Independent Samples112Chapter 8Comparison of Paired Samples145Unit II Summary164Chapter 9Categorical Data: One SampleDistributions170Chapter 10Categorical Data: Relationships188Unit III Summary219Chapter 11Comparing the Means of ManyIndependent Samples222Chapter 12Linear Regression and Correlation255Unit IV Summary295Chapter 13A Summary of Inference Methods299

Page 4

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 4 preview image

Loading page image...

I GENERAL COMMENTSCOURSE DESIGNTo provide flexibility in course design, a number of sections in the textbook are designated as "Optional."The instructor wishing to adopt a leisurely pace, or who is designing a course for one quarter, can omit alloptional sections and, moreover, give only light coverage to some other sections. TheComments onChaptersin Section II of this Manual indicate specifically those sections and parts of sections where lightcoverage may be appropriate. For an even briefer schedule, Chapters 11 and 12 can be omitted entirely.Also, there is no new material in Chapter 13; rather, this chapter provides a perspective of the precedingchapters. The purpose of Chapter 13 is to help summarize and put into perspective the many inferencemethods discussed in the text.Because each chapter builds on the preceding ones, it is not advisable to alter the order of the chapters.EXERCISESCalculators and ComputersThe exercises in the text are designed to minimize numerical drudgery and emphasize understanding.Exercises with simple numbers familiarize the student with the meaning and structure of formulas, afterwhich additional exercises, based on real data, focus primarily on interpretation. Any calculationsrequired for the latter exercises are easily carried out on a hand calculator.If computing facilities are available, the use of a computer can be easily integrated into the course. Tothat end, data files are available for many of the examples and exercises in the book. Several exercisespresent computer output and ask the student to interpret the results. A number of the exercises give rawdata and could be done on either a computer or a calculator. Also, certain exercises, labeledcomputerexercise,are especially designed for computer use and would not be suitable for hand computation.Location of ExercisesMost exercises are located at the ends of sections. At the end of each chapter are SupplementaryExercises, some of which use material from more than one section. Exercises in the Unit Summarysections often use material from more than one chapter.Class DiscussionMany of the exercises can be used as starting points for class discussion. This is especially true ofexercises that emphasize interpretation of results or that request a critique of an inappropriate analysis orconclusion.Sampling Exercises

Page 5

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 5 preview image

Loading page image...

2Scattered throughout the first half of the book are exercises that require students to carry out randomsampling. These sampling exercises are discussed in detail in Section III of this Manual.APPENDICESAt the website for the textbook are appendices that provide more detail on selected topics discussed in thetext. These appendices are not intended to form part of the course (except, perhaps, for Appendix 6.1 onsignificant digits), but rather to provide supplementary material for interested students.STATISTICAL TABLESThe tables of critical values are all used in essentially the same way. In the t table (Table 4 and insideback cover), the column headings are upper tail areas. Thus, when the alternative hypothesis for a t test isdirectional, students can easily bracket the P-value by reading the column headings; when the alternativehypothesis is non-directional, the column headings must be multiplied by 2. The tables for the Wilcoxon-Mann-Whitney test, the sign test, and the Wilcoxon signed-rank test give selected values of the teststatistic in bold type and corresponding non-directional P-values in italics. This is somewhat non-standard, but we believe is more informative than are standard tables. The column headings direct thereader to critical values for which the P-value is less than or equal to the given column heading. Tablesfor the chi-square test and the F test give upper tail areas, as is appropriate for use with the usual non-directional alternative hypotheses. When the alternative hypothesis in a chi-square test is directional, thecolumn headings must be multiplied by 1/2.The examples in the textbook and the answers to the exercises in this Manual do not use interpolation inthe statistical tables. For instance, in entering Table 4 the nearestvalue of df is used; in ambiguous cases(e.g., df = 35 or df = 200), either one of the nearest values is considered correct. Students may need someguidance on this point.

Page 6

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 6 preview image

Loading page image...

3II COMMENTS ON CHAPTERSChapter 1IntroductionThe study of statistics will seem more inviting to students in the life sciences if they see that statisticalquestions arise in biologically interesting settings. Chapter 1 begins with a series of examples of suchsettings. Instructors may choose to discuss these or other examples in the first lecture. Section 1.3 thenaddresses sampling issues, including discussion of sampling errors and nonsampling errors.Comments on Section 1.2Section 1.2 discusses data collection, particularly the difference between an observational study and anexperiment. By drawing attention to data collection issues at the onset of the course, we hope thatstudents are mindful of need to question where data came from before conducting any analysis.Comments on Section 1.3In keeping with the nature of most biological data, the text treats the random sampling model as amodel -- that is, an idealization -- rather than as a reflection of a physical sampling process. Tomotivate this approach, and to counter the common preconception that a "random" sample and a"representative" sample are the same thing, the instructor can point out that the statistical approachtakes a natural but unanswerable question that a biological researcher might ask and translates it into aslightly different question that can be answered:Researcher's question:"How representative is my sample?"Statistical translation:"How representative is a random sample likely to be?"The word "likely" in the translated question is unavoidable, because a random sample can be quiteunrepresentative. This motivates the use of probability in statistical analysis.The technique of drawing a random sample has two applications in the text: (a) the technique is centralto randomized allocation; and (b) sampling exercises in Chapters 1, 5, 6, 7, and 8 (see Section III ofthis Manual) require the technique.Note that in some cases we regard the data as being like a random sample, although they did not arisethat way. For example, consider a group of students in a class: We would not think of them as being arandom sample of all students on campus, but if the variable of interest is blood type, then we mightregard their blood types as being like a random sample, on the assumption that blood type is unrelatedto a student enrolling in a given class.Chapter 2Description of Samples and PopulationsComments on Sections 2.1 and 2.2Students are sometimes uneasy when approaching these introductory sections because they are not surewhat is expected of them. The instructor may wish to reassure them on the following points: (a) thedistinction between Y and y in Section 2.1 is for clarity only; (b) for an exercise requiring a groupedfrequency distribution, there are many different "right answers."

Page 7

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 7 preview image

Loading page image...

Comments on Chapters4Comments on Sections 2.3 - 2.6Students will want to use their calculators, or computers, when solving homework problems. Twouseful general principles are:(a)Minimize roundoff error by keeping intermediate answers in a calculator rather than writing themdown. Round the answers only at the end of the entire computation. (The topic of how far to roundwhen reporting the mean and SD is discussed in Section 6.2; for use in Chapter 2, the instructormay wish to give students a temporary answer, such as "always round to four significant digits.")(b)Take full advantage of a calculator's memory (and parentheses, if it has them) to keep track ofintermediate answers.Students may need guidance in using the statistics function of a calculator; the following tips are worthmentioning:(a)Use the "change-sign" key (not the "minus" key) to enter negative data.(b)Use the "data-removal" key to correct erroneous entries.(c)Note that some calculators have two SD keys, one that uses "n-1" as a divisor, and another that uses"n" instead. The former definition is used throughout the text.Students may wonder whether they are permitted to use the SD function of their calculator or computersoftware in doing homework. Generally, the exercises have been written under the assumption thatstudents would use the SD function of a calculator or would use software to obtain a sample SD.However, in the entire text the total number of exercises in which students are expected to calculate astandard deviation from raw data is quite small.Comments on Optional Section 2.7This section aims to enhance the student's intuition about the effect of linear and nonlineartransformations of data. The logarithmic transformation is especially important in biology. It may beenlightening to students to point out that choice of scale is rather arbitrary and that there is nothingwrong with choosing a new scale in order to aid presentation and interpretation. For example, acidityis generally measured by pH, which is in log scale.Comments on Section 2.8The major goal of this section is to convince students that the population/sample concept is areasonable one in biological research, and to help them develop some intuition about the relationshipbetween sample statistics and population parameters.Chapter 3Probability, and the Binomial DistributionComments on Section 3.2This section introduces probability and the use of probability trees. The presentation bypasses formaldefinitions of sample space, event, etc., and the "addition" and "multiplication" rules for combiningprobabilities; formal treatment of these topics is taken up in optional Section 3.3. Instead, the

Page 8

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 8 preview image

Loading page image...

5emphasis is on a central theme: the interpretation of probability as long-run relative frequency (fromwhich the "addition" rule follows very naturally). The intent of this approach is to spend less time onprobability manipulations and more time on later chapters where probability ideas are applied in theanalysis of data.The concept of "independence" is not defined formally in these sections (it is presented in optionalSection 3.3), although Section 3.2 introduces probability trees, which implicitly use conditionalprobabilities. Rather independence is introduced informally in various settings throughout the text(which is why Section 3.3 can be skipped). Independence of trials and independence as part of thedefinition of a random sample are introduced in Chapter 3; independence of observations in a sampleis discussed in Chapter 6; independence of two samples is also introduced in Chapter 6; independenceof two categorical variables and the related concept of conditional probability are introduced inChapter 10. Conditional distributions, conditional means, and conditional standard deviations areintroduced in Chapter 12.Comments on Optional Section 3.3This section introduces formal rules for probability, including the "addition" and "multiplication" rulesfor combining probabilities. This section is optional; some instructors will omit it, while others whowish to cover probability in a more formal way will include the section.Comments on Sections 3.4 and 3.5Continuous distributions are introduced in Section 3.4. The purpose of Exercises 3.4.1-3 is to reassurestudents that the mysterious "area under the curve" will be quite easy for them to handle, and, further,to ward off the common misconception that all continuous distributions are normal. Section 3.5introduces the concept of a random variable and more formally the population mean and variance via afew simple examples.Comments on Section 3.6This section introduces the binomial distribution, which is presented as a tool for computing certainkinds of probabilities that will later be seen to be relevant to statistics. The binomial formula ispresented, but details of the derivation of this formula are left to Appendix 3.1; the derivation of themean and of the standard deviation are found in Appendix 3.2.Comments on Optional Section 3.7Many students find this section intriguing because it makes probability ideas very concrete; however,the material is not referred to again in the text.Chapter 4The Normal DistributionChapter 4 is a straightforward treatment of the normal distribution. Some instructors will want to usetechnology completely and not make use of Table 3. Even so, it is helpful if students are reminded todraw a sketch for each calculation. In addition to skill in determining normal areas, this chapter givesstudents experience in visualizing a population distribution for a population whose size is large andunspecified.

Page 9

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 9 preview image

Loading page image...

Comments on Chapters6Those naturally occurring distributions that are used as examples of normal distributions in Chapter 4are in fact (approximately) normal, as determined by an examination of the raw data (in Example 4.1the raw data are shown) or by theory (e.g., in Exercise 4.S.1 the distribution is Poisson with largemean). However, most population distributions encountered in biology are not approximately normalbut are distinctly skewed to the right. Thus, the challenge is to convey to the students the twinmessages that (a) it is not true that the "typical" distribution is normal, but (b) methods of data analysisbased on normal theory are useful anyway. The simplest example of the latter is the "typical"percentages (68%, 95%, > 99%) rule given in Section 2.6, which is derived from the normaldistribution but works rather well for many nonnormal distributions. Deeper examples are the manyinferential methods (first encountered in Chapter 6) that are based on normal theory but (because of theCentral Limit Theorem) can be validly applied in nonnormal settings.Comments on Section 4.4Section 4.4 takes up the topic of assessing normality. Normal quantile plots are introduced here andare used to assess normality in many examples later in the text. However, some instructors will chooseto spend minimal time on this topic, preferring to rely on other means of assessing normality, such asexamining histograms. Others will wish to discuss the optional sub-section on the Shapiro-Wilk test,wherein the presentation is necessarily brief and somewhat informal, given that P-values are not fullydiscussed until later in the text.Chapter 5Sampling DistributionsComments on Sections 5.1 and 5.2Chapter 5 introduces the very important concept of a sampling distribution. As motivation, thestudents can be reminded that the question "How representative is a random sample likely to be?" isthe foundation of statistical inference (see Comments on Sections 1.3 and 2.8).Many students find Chapter 5 difficult. To motivate them to make a special effort, the instructor canstress that the chapter lays an important foundation, because many concepts used in the analysis of realdata (two examples are standard errors and P-values) can be understood only if sampling distributionsare first understood. Students should be encouraged to read the material in Chapter 5 more than once.Doing the sampling exercises (5.2.1, 5.2.2, and 5.2.3) and then discussing them in class are veryhelpful to the students. Many students will need to be reminded several times that the samplingdistribution ofis not the same as the distribution of observations in the sample (nor in thepopulation).To help the student put Chapter 5 in perspective, the instructor can explain that, whereas in Chapter 5we are assuming that we know the population parameters and we are predicting the behavior ofsamples, in real inferential data analysis (which starts in Chapter 6) we are in the reverse position: Weknow the characteristics of the sample and we are trying to learn something about the unknowncharacteristics of the population. Moreover, although computations using the sampling distribution ofmay appear to require the knowledge ofμandσ, actually, useful computations can be made inignorance ofμ(as shown by Exercises 5.2.7 and 5.S.12), and furthermore in Chapter 6 it will be seenthat very useful similar computations can be made in ignorance ofσ.Comments on Optional Section 5.3YY

Page 10

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 10 preview image

Loading page image...

7Section 5.3 illustrates the effect of the Central Limit Theorem on a moderately skewed population(Example 5.3.1) and a violently skewed population (Example 5.3.2). The two populations are usedagain in Section 6.5 to show the impact of the Central Limit effect on the coverage probability of aStudent's t confidence interval. However, Section 6.5 can be used independently; Section 5.3 need notbe covered at all. A sketchy coverage of Section 5.3 can be achieved in ten minutes of class time bysimply displaying and briefly discussing Figures 5.3.1-5.3.4; a more leisurely coverage permitsdetailed discussion of Example 5.3.2 and assignment of one of the exercises (5.3.1 or 5.3.2).Background Information on Examples 5.3.1 and 5.3.2For the record, the distribution in Example 5.3.1 is a shifted and scaled chi-square distribution fitted toa graph of the data of Zeleny (see source cited in Example 5.3.1). The distributions in Figure 5.3.2were estimated by the authors and R. B. Becker using computer simulation; for each samplingdistribution, we drew 100,000 samples from the shifted chi-square distribution. The distribution inExample 5.3.2 is a 9:1 mixture of two normal distributions with parametersμ1= 115, s1= 37.5,μ2=450, and s2= 75; the parameters are based on a simplified version of data of Bradley (see sources citedin Example 5.3.2). The curves in Figure 5.3.4 were determined analytically; each samplingdistribution is a mixture of normal distributions with mixing proportions determined from the binomialdistribution with parameters n and p = 0.9.Comments on Optional Section 5.4This section presents the normal approximation to the binomial distribution, both unadorned and in itsguise as the sampling distribution ofˆP. The material is never used again. (A brief mention of thenormal approximation in Section 9.2 is entirely self-contained.)Chapter 6Confidence IntervalsComments on Sections 6.1 - 6.3These sections introduce the idea of a standard error and its use in constructing a confidence interval.The confidence interval forμbased on Student's t (σunknown) is the only one presented; the intervalbased on Z (σknown) is used as motivation, but is not presented as a technique for analyzing data. Ofcourse, the two intervals are nearly identical if n is large; the instructor can clarify this by explainingthe relationship between the normal table (Table 3) and the bottom row of the t table (Table 4 and backcover). Confidence levels are given at the bottom of each column in Table 4, for use with confidenceintervals in Section 6.3 and later.The explicit comparison between the SD and the SE in Section 6.2 helps students to feel morecomfortable with the different interpretations of these two statistics -- a difference that is important butsubtle. Exercises 6.2.4, 6.2.5, 6.2.6, 6.2.7, 6.3.8, 6.S.12, and 6.S.13 can serve as reinforcement.Students generally have difficulty interpreting confidence statements; this difficulty is natural. Severalof the exercises ask students to interpret, in context, confidence intervals that they have constructed.Students should not be allowed to simply refer to "the mean," lest they confuse in their minds thesample mean and the population mean. If they balk at being held accountable for precise usage ofwords, they may benefit from hearing the saying that "The temple of reason is entered through thecourtyard of habit." A major goal in asking students to interpret confidence intervals is to clarify theirreasoning by instilling good habits of English usage.

Page 11

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 11 preview image

Loading page image...

Comments on Chapters8The "Rule for Rounding" given in Section 6.2 is intended as a guide for reporting summary statistics inresearch articles. It is not rigidly adhered to in the answers to exercises given in this Manual.Students who are uncomfortable with the concept of significant digits (used in Section 6.2) can bereferred to Appendix 6.1.The last part of Section 6.3 presents one-sided confidence intervals. This can easily be omitted as thematerial is not used elsewhere in the text.Comments on Section 6.4This section shows students that (a) there is a rational way to decide how large n should be, but (b) thedecision requires input from the researcher. While these two principles apply quite generally, the onlyother explicit sample size computation given in the text is for the two-sample t test (optional Section7.8 on power).Section 6.4 includes an informal guideline (anticipated difference between two groups should be atleast 4 standard errors) that the instructor may wish to mention again when discussing the two-sample ttest. (However, there are no exercises using the guideline). The basis of the guideline is that itachieves a power of roughly 0.80 for a two-tailed t test atα= 0.05.Comments on Section 6.5Here, for the first of many times throughout the text, the students encounter the idea that a statisticalcalculation can give an answer that is numerically correct, but is misleading or meaningless because ofeither (a) the way the data were obtained, or (b) some feature of the population distribution.The requirement of independence of the observations is often violated in biological studies; Exercises6.5.2, 6.S.7, 6.S.9, and many exercises in later chapters address this point.The numerical results in Table 6.4 on coverage probability when sampling from nonnormalpopulations were obtained by computer simulation (see Comments on Optional Section 5.3 in thisManual).Comments on Sections 6.6 and 6.7In addition to introducing the notation for two samples, Section 6.6 alerts students to the fact thatdistributions can differ in dispersion and shape as well as in location, while noting that our attentionwill be given to comparisons of center. Section 6.6 introduces the standard error of a differencebetween two means. This notion may be rather unnatural for life science students, who are generallyaccustomed to comparing two quantities in terms of their ratio rather than their difference. Classdiscussion of this point can be helpful.The unpooled method for computing the standard error is emphasized, although pooling of standarddeviations is discussed in an optional sub-section, which some instructors may wish to discuss aspreparation for later treatment of analysis of variance. (The caseσ1σ2often occurs in biologicalresearch.)The textbook primarily uses the Satterthwaite formula (Formula (6.7.1) on page 206) for degrees of

Page 12

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 12 preview image

Loading page image...

9freedom, on the view that technology should be used for number crunching, so that the messy form ofFormula (6.7.1) should provide no obstacle in practice. However, the formulas df = n1+ n2- 2 and df =smaller of n1- 1 and n2- 1 are given as liberal and conservative values. The philosophy behindpresenting three df formulas is that statistical methods are generally approximate. It may be useful toquote George Box here: "All models are wrong, some models are useful." If different df choices leadto qualitatively different conclusions, then great care should be taken in interpreting the results.In constructing confidence intervals for (μ1-μ2) a choice must be made, implicitly or explicitly, ofwhich sample to denote by 1 and which by 2. Some students find this indeterminacy unsettling; theinstructor can help by explaining that either choice is acceptable, and that the apparently differentconfidence intervals obtained are actually equivalent.Comments on Section 6.8The material in this section is not specifically reinforced in the exercises, but rather serves to "open awindow" between the narrow coverage of Chapter 6 and the real world of research.Chapter 7Comparison of Two Independent SamplesChapter 7 introduces the general principles of hypothesis testing by beginning with the randomizationtest and then developing the t test. The chapter concludes with the Wilcoxon-Mann-Whitney test,which is the distribution-free competitor to the t test.Although some introductory tests include the F test for comparison of variances, we have omitted thistopic. In some situations comparing variances is more relevant than comparing means. However,when means are being compared the use of the F test as a preliminary to the pooled t test is stronglydiscouraged by many statisticians because it is highly sensitive to nonnormality: in the words ofGeorge Box, such use is like "putting out to sea in a rowing boat to find out whether conditions aresufficiently calm for an ocean liner to leave port!" [Biometrika 40(1953), p. 333.]Chapter 7 introduces design issues -- for instance, the term "independent" in the chapter title and thedistinction between observational and experimental studies -- that are more fully discussed in Section7.4.Comments on Section 7.1Section 7.1 introduces the randomization test for comparing two populations, which some would call apermutation test since it is based on considering the distribution of possible permutations of sampledata. Although this is not labeled as an optional section, it is possible to skip Section 7.1 and to beginwith Section 7.2. However, we believe that demonstrating a simple basis for determining a P-value inSection 7.1 and then using the t test as an approximation, starting with Section 7.2, is a good way tosecure the idea of hypothesis testing and the proper interpretation of a P-value.Students readily understand that computer simulation is needed when the sample sizes are large andlisting all possible permutations in not practical, but it is advisable to conduct a physicalrandomization, using the data of Example 7.1.1 or of some other small two-sample comparison, with3x5 cards that contain one observation each. Shuffling and dealing the cards into two piles andcalculating the difference in means demonstrates the concept underlying this test procedure.

Page 13

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 13 preview image

Loading page image...

Comments on Chapters10Comments on Sections 7.2 and 7.3These sections introduce the basic ideas of hypothesis testing and the specific technique of the t test.The approach to hypothesis testing is two-pronged, developing both the use of the P-value as adescriptive statistic and also the decision-oriented framework which gives meaning to Type I and TypeII error. However, the use of P-values is emphasized and encouraged.The text places strong emphasis on verbal statements of hypotheses and conclusions, and the solutionsgiven in this Manual reflect that emphasis. The verbal statements help the student appreciate thebiological purpose of each statistical test; without them it is all too easy for the student to look only atthe numbers in an example or exercise, while virtually ignoring the descriptive paragraph which givesmeaning to the numbers. A potential difficulty is that the verbal statements must be, in the interest ofbrevity, considerably oversimplified. The instructor can call the students' attention to the demurrer inthe Answers to Selected Exercises for Chapter 7, which explicitly recognizes this oversimplification.To further emphasize the point, a distinction is made in Section 7.2 between the "formal" hypothesesH0and HAand the "informal" hypotheses H0* and HA*; but this cumbersome notation is abandoned inlater sections.The verbal conclusions in this Manual usually use the phrases "sufficient evidence" and "insufficientevidence" (for instance, "There is sufficient evidence to conclude that Drug A is a more effective painreliever than Drug B"). Some students are more comfortable with "enough" and "not enough," butthey may mistakenly believe that "sufficient" and "insufficient" are technical terms that they arerequired to use. The instructor may wish to use "enough" in class discussion, or to encourage studentsto use more descriptive phrases such as "little or no evidence," "very strong evidence," etc.Comments on Section 7.4Section 7.4 gives explicit attention to the difference between association and causation, alertingstudents to the existence of both experimental and observational studies. This is arguably the mostimportant section in the entire book.Comments on Section 7.5When considering one-tailed tests and directional HAthe textbook uses a two-step procedure to bracketthe P-value. The advantage of this rather unusual approach is that it extends readily to tests (such asthe chi-square test for a 2 x 2 contingency table) for which H0is rejected in only one tail of the nulldistribution.The issue of directional versus nondirectional alternative hypotheses is difficult for many students.One difficulty is that the rule that a directional HAmust be formulated before seeing the data issomewhat remote for those students whose only exposure to data is in the textbook itself.A second difficulty with directional versus nondirectional alternatives concerns the nature of thepossible conclusions. The textbook indicates that rejecting H0in favor of a nondirectional HAshouldlead to a directional conclusion; for instance, "There is sufficient evidence to conclude that Diet I givesahighermean weight gain than Diet 2." However, some students – as well as some statisticians – tendto believe that a nondirectional alternative requires a nondirectional conclusion; for instance, "... Diet Igives adifferentmean weight gain than Diet 2." It is worth noting that a biological researcher, havingcarried out an expensive and time-consuming experiment to find out which diet gives the higher mean,might reasonably be quite dissatisfied with such a noncommittal conclusion.

Page 14

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 14 preview image

Loading page image...

11(Remark: Strictly speaking, the formal machinery of the Neyman-Pearson theory leads to only twopossible decisions -- reject H0or do not reject H0. The procedure recommended in this textbook can beformally justified as follows. The two-tailed t test yieldsthreepossible decisions; namely, D0: do notreject H0; D1: reject H0and concludeμ1<μ2; and D2: reject H0and concludeμ1>μ2.With thisapproach one may consider three risks of error; namely, Pr{D1or D2|μ1=μ2}, Pr{D1|μ1>μ2},andPr{D2|μ1<μ2}. It is easy to show that, using the recommended procedure, all three of these risks arebounded byα; indeed, the latter two are bounded byα/2. Related ideas are discussed in Bohrer, R.(1979), "Multiple Three-Decision Rules for Parametric Signs,"Journal of the American StatisticalAssociation 74,432-437.)Comments on Section 7.6The most common abuse of statistical testing is to base scientific conclusions solely on the P-value,which leads inevitably to a troublesome confusion between statistical significance and practicalimportance. Section 7.6 attempts to forestall this confusion in two ways: by showing how confidenceintervals can supplement tests and by introducing the concept of effect size.Comments on Optional Section 7.7Section 7.7 gives a detailed discussion of power and introduces Table 5, which gives the sample sizerequired to achieve a prescribed power.Comments on Sections 7.8 and 7.9Section 7.8 discusses the conditions on which the Student's t methods are based, and gives someguidelines for informally checking the conditions. Note that the word "conditions" is used in place ofthe commonly used "assumptions." This is because students tend to think of an assumption assomething that one simply assumes. This is not at all the case in statistics; these are conditions thatshould be verified whenever possible.Section 7.9 places hypothesis testing in a general setting, thus preparing the students for tests otherthan the t test. The topics of P-value and Type I and Type II error are revisited in this general setting.Section 7.9 explicitly acknowledges an unspoken condition for the t test (and the Wilcoxon-Mann-Whitney test and the randomization test) – that the population distributions are stochastically ordered.(Although, strictly speaking, this condition cannot be satisfied by two normal distributions withdifferent standard deviations, it clearly can be satisfied by the real-world – and therefore finite-tailed –distributions for which the normal distributions are always only approximations.)Comments on Section 7.10Section 7.10 introduces the Wilcoxon-Mann-Whitney test. This is the students' first exposure to aclassical nonparametric test (they will meet the sign test and Wilcoxon Signed-Rank in Chapter 8).The first part of Section 7.10 describes the Wilcoxon-Mann-Whitney test procedure. The second partgives the rationale for the test; this material is somewhat difficult and can be omitted without loss ofcontinuity. The last part of Section 7.10 gives the conditions for validity of the Wilcoxon-Mann-Whitney test and compares it with the t test and the randomization test.

Page 15

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 15 preview image

Loading page image...

Comments on Chapters12Some textbooks incorrectly describe the Wilcoxon-Mann-Whitney test as a comparison of medians,and give much stronger conditions for validity of the test than are necessary. In fact, the Wilcoxon-Mann-Whitney procedure tests the null hypothesis that two continuous population distributions areidentical against the alternative that one is stochastically larger than the other. (This matter is furtherdiscussed in the textbook in Note 65 to Chapter 7.) The confusion is probably due to the fact thatmany power calculations, and other developments such as the Hodges-Lehmann estimator, assume thatthe distributions differ only by a shift.Chapter 8Comparison of Paired SamplesComments on Sections 8.1 - 8.2Section 8.1 contains a brief introduction to the paired design and to a randomization test for thissetting. After explaining the basic notion of using pairwise differences, Section 8.2 describes thepaired t test and confidence interval. Although the basis for these techniques is the single-samplestandard error introduced in Chapter 6, the notation used in Chapter 8 (namely, (-)and)is the same as in Chapters 6 and 7. This notation emphasizes that the object of inference (namely,μ1-μ2)is the same in Chapters 6 and 7 as it is in Chapter 8, although the df and the formula forare different.Comments on Section 8.3Section 8.3 gives several examples of paired designs and explains that pairing can serve two purposes:(1) control of bias, especially in nonrandomized studies, and (2) increased precision. Pairing canincrease precision as often there is a positive correlation between the observations on members of apair, which reduces the variance of the difference. The term "correlation" is not used in Chapter 8, butthe idea is conveyed by a scatterplot (Figure 8.3.1); class discussion of such a scatterplot can helpstudents develop some intuition about the meaning of effective pairing. Together, sampling exercises8.3.1 and 8.3.3 (or 8.3.1 and 7.3.2 or 7.3.3) illustrate the increase in power achievable by effectivepairing.Comments on Sections 8.4 and 8.5Section 8.4 introduces the sign test, which is worthwhile for beginning students for two reasons. First,it is widely applicable, even in many nonstandard situations where a parametric analysis may becomplicated or unsuitable. Second, because students are familiar with the binomial distribution theycan fully understand how P-values for the sign test are calculated, and this enhances theirunderstanding of P-values in general. Section 8.5 presents the Wilcoxon Signed-Rank test, which ismore powerful than the sign test, but not as widely applicable.Comments on Section 8.6This section contains no new techniques, but rather some deeper discussions of earlier ideas andmethods. The discussions illustrate (a) the importance of a control group in studying change; (b)suitable reporting of paired data; (c) the folly of using a paired t analysis to compare measurementmethods; and (d) the inability of standard designs and analyses to detect interactions of treatments withexperimental units.y1y2SEY1Y2()SEY1Y2()

Page 16

Solution Manual for Statistics for the Life Sciences, 5th Edition - Page 16 preview image

Loading page image...

13Chapter 9Categorical Data: One-Sample Distributions.Comments on Sections 9.1 and 9.2Section 9.1 develops the sampling distribution for the Wilson-adjusted sample proportion,!PSection9.2 then presents the Wilson confidence interval for a proportion. Note that some authors refer to thisas the “plus 2; plus 4” interval. Appendix 9.1 discusses the use of!P, which is becoming more widelyused in statistical practice, although the traditionalˆP-based confidence interval remains verycommon.Students using graphing calculators or standard statistical software will find that the technologyproduces confidence intervals based onˆP(although it is easy to add 2 successes and 2 failures to thedata so as to get the Wilson interval). If n is large, the familiarˆP-based confidence interval and the!P-based confidence interval are virtually identical. However, the!P-based confidence interval hassuperior coverage properties when n is not large. Moreover, using the!P-based confidence intervalmeans that it is not necessary to construct tables or rules for how large n must be in order for theconfidence interval to have good coverage properties. For further discussion, see Appendix 9.1.The material in these sections is related to the material in Sections 6.2 and 6.3. In Appendix 3.2 it isshown that in the setting of 0-1 data,μ= p and s =, so the fact thatcorrespondsdirectly to the fact thatˆ(1) /Psppn=; this is discussed in Appendix 5.1. However, thenormality condition that is the basis of a confidence interval forμin Section 6.3 is clearly violatedwhen we have 0-1 data. Thus, we must appeal (via the Central Limit Theorem) to the approximatenormality of the sampling distribution ofˆP, and likewise of!P, when n is large.Comments on Section 9.3Optional Section 9.3 considers confidence levels other than 95%, for which the rule “add 2 successesand 2 failures” is modified.Comments on Section 9.4The chi-square goodness-of-fit test, introduced in Section 9.4, is already familiar to some biologystudents from their study of genetics, and these students enjoy seeing this formerly mysterious subjectin a new light.In this textbook, the one-sample binomial test is subsumed under the topic of goodness-of-fit tests.This approach minimizes the number of formulas that the student must master, and nothing is lost,since the commonly used Z test based on a standardized binomial deviate is exactly equivalent to thechi-square test.The notationˆPr{A}for the estimated probability of a category A is presented in Section 9.4. Thisnotation may seem unnecessarily cumbersome, but its extension in Section 10.2 to the conditionalprobability notationˆPr{A|B}will be very useful.The term "compound hypothesis" introduced in Section 9.4 is not standard, but was coined by theauthors. ("Composite hypothesis" would have been more apt, but this term has a different meaning inp(1p)sY=s/n
Preview Mode

This document has 313 pages. Sign in to access the full document!

Study Now!

XY-Copilot AI
Unlimited Access
Secure Payment
Instant Access
24/7 Support
Document Chat

Document Details

Subject
Statistics

Related Documents

View all