Solution Manual for Elementary Survey Sampling, 7th Edition

Preview (16 of 149 Pages)

100%

Purchase to unlock

Loading page image...

1CHAPTER 2ELEMENTS OF THE SAMPLING PROBLEM2.1An adequate frame listing individuals in a city is difficult to obtain. For that reason,and because data is desired on a family basis, it would be better to sample dwellingunits. An adequate frame for dwelling units is also difficult to obtain, so a clustersampling approach could be used by sampling city blocks and then measuring waterconsumption for the families living in the sampled blocks.2.2A common way to sample trees is to divide the farm into plots and then randomly orsystematically sample plots on which trees would be counted. Unless trees are plantedaccording to a regularly spaced design, it is difficult to use the trees themselves assampling units.2.3The sampling design depends on a careful definition of the population of interest. Asit would be almost impossible to get a listing of all cars owned by residents of a city, abetter option would be to restrict the population of cars to something like “cars thatuse city parking lots on a working day” or “cars that belong to people visiting themalls on a weekend.” Then, a listing of parking lots or sections of parking lots couldserve as frames for collections of cars.2.4If the number of plants is not too large, each could serve as a stratum from whichemployees would be sampled. In this case one would need a list of employees (aframe) for each plant. If the number of plants is large, then a sample of plants(clusters of employees) could be taken and a sample of employees (or all employees)could be interviewed in each sampled plant.2.5An area as large as a state is generally broken up into smaller areas, such as countiesand farms within counties, for sampling. Each county may contain a number of farms,so there are various sampling options. Counties could be viewed as strata, with farmsbeing sampled from each. If there are many counties, one might sample counties asclusters of farms and then sample farms from each sampled county. In either of thesescenarios a list of farms by county would be needed as a frame.2.6Most polls of this type are done by telephone using random digit dialing, The statecould be stratified by regions, with dialing taking place within each of these regions.Frames may be found for selected populations by using lists of registered voters orlists of property owners, but these frames do not cover the entire population of adults.Personal interviews generally produce the highest response rate, but they areexpensive and require a list of individuals from which to sample. Mailed

Loading page image...

2questionnaires also require a list of individual addresses and have the lowest responserate, typically. Telephone interviews are probably the most viable choice for such asurvey.2.7(a)A telephone survey would be the only way to cover the country with a welldesigned sampling plan in a reasonable time.(b)If the population is defined as subscribers to the paper, then a mailedquestionnaire or interviews could be used. If the population is less well definedto include all readers or potential readers, than a telephone survey with randomdigit dialing may have to be used.(c)Homeowners are a well-defined group, and a sample could be contactedthrough either mailed questionnaires or personal interviews, although the latterwould be time consuming. Telephone interviews could also be used, andrandom digit dialing would not be necessary.(d)Assuming dogs are registered, it should be relatively easy to sample from thelist of registered owners and obtain the survey information by either telephoneor mail. If there is no lost of dog owners, this would be a difficult problemprobably best solved by random digit dialing.2.8“Do you consider yourself a political liberal or conservative?” “Do you favor anincrease in the minimum wage?” Once the political label is decided, the options forthe second question become more limited. Presented in the reverse order, therespondent has more freedom on the answer to the minimum wage question.2.9Closed questions limit options and nuances in answers, but are easier to analyzestatistically. An open question could be of the form “What is your opinion on theschool tax referendum?” A closed version could be “Do you pan to vote for or againstthe school tax referendum that is on the ballot in the next election?” This is anextremely closed version, other options could be offered.2.10“Do you favor an increase in the minimum wage to keep up with inflation?” “Do youfavor an increase in the minimum wage so that many wage earners who are nowliving below the poverty line can afford to adequately feed and clothe their families?”2.11The no-opinion option should be used carefully and sparingly because it givesrespondents an easy way out of questions on which they may well have a deeperopinion.2.12“I’m sure you are aware that most standardized tests contain multiple choice questionsthat favor students who memorize lots of facts as compared to those who learn to thinkdeeply. Do you favor or oppose the increased use of standardized test scores tomeasure academic achievement?”

Loading page image...

32.13After errors of non-observation and errors of observation, the next most commonsource of errors in surveys is the mishandling of data in the data recording andanalysis part of the survey. It is imperative that the data management process containschecks to see that data are recorded correctly, and that all recorded data are part of theanalysis.2.14The pretest, which need not be on a randomly selected sample, is the best way todiscover if a questionnaire contains questions that can be answered in reliable andvalid ways. It also helps define issues that should be part of the training of fieldworkers as well as issues of data collection, management and analysis.2.15The response rate is strongly related to the bias in survey results. A low response ratemay imply that important segments of the population (such as retired people or singlepeople) are under-represented in the survey data and., hence, in the reported results.2.16More people may well beat home at that hour, but they also do not like beinginterrupted at mealtime. One type of nonresponse is being traded for another,perhaps.2.18It is very difficult to get objective information on sensitive issues. The proportionwho admitted to cheating is probably well below the actual proportion who cheated.(A technique in Chapter 11 will show how to improve the accuracy of responses inthese situations.)2.19The results may be a bit biased because students regularly here that mathematics andEnglish are the two subjects in which they need to do well in order to succeed in life.2.20The response rate is low and those who are most concerned about being able to payfor a college education are the ones most likely to respond. For others, this is not animportant issue and they will tend not to respond. Thus, the result could suffer from alarge bias.2.21The population being sampled here does not represent the population of the countryand the responses are voluntary, not form a randomly selected sample. The questionhas an inherent bias toward favoring nuclear power plants. All aspects of the surveyare directed toward obtaining a highly biased result.2.22This is a very low response rate and the GAO was justified in questioning the results.It is quite likely that some of the income groups (especially the low incomes) weregreatly underrepresented.2.23(a)One rating point represents one percent of the viewing households, or95.1 million×0.01 = 951,000 householdsbased on the fact that the sampled population is households.

Loading page image...

4(b)As a percentage, a share is larger than a rating because the denominator of therating is the total number of sampled households, while the denominator of ashare is the total number of sampled households that actually have a TV setturned on (viewing households).(c)95.1 million×0.217 = 20.64 million households could have been viewing thisshow(d)Much of the data collected by Nielsendepends upon people in the sampledhouseholds either pushing a button on a People Meter or writing in a diary torecord what they are watching. This is far from a fool-proof system.2.25(a)Target Population51%12%9%High risk cities57.9%33.8%20.7%National58.4%13.5%8.3%In the national survey, the sample percentages are quite close to those reportedby the Census. Thus, randomization did a good job.In the survey of high-risk cities, the black and Hispanic percentages are muchhigher than those reported for the nation as a whole.(b)High-risk cities are not the typical cities of the population. One may expectthat the randomization actually did a good job here as well.2.26(a)National:100 - 84.9 = 15.1 %High Risk Cities:100 - 80.4 = 19.6 %(b)National:0.151 (1000) = 151High Risk Cities: 0.196 (1000) = 1962.28Care about keeping weight downNSEXFSCSTotalA lot62973613197211412221Somewhat28821677907935442A Little1441625163542436Don’t care1709822223772930Total123296737325363823029% A lot.51.54.61.58(a)It is anticipated that responses will be more honest when questions are askedabout peers rather than about the respondent himself or herself.

Loading page image...

5(b)12,329,000(c)12221 / 23029= .53(d)3,638,000, 2114/3638 = .58(e)6297 / 12221 = .51, 2114 / 12221 = .17(f)No, at least not strongly.As you see from the bottom row of calculatedpercentages, the percentage of students who care a lot about keeping theirweight down is fairly constant across all categories of smoking.2.29Care about staying away from marijuanaNSEXFSCSTotalA lot721326937585710838Somewhat2482186110911025554A Little744542272981611Don’t care1878155011913124859Total123176646330356922862% A lot.59.41.23.24(a)7213 / 12317 = .59(b)857 / 3569 = .24(c)7213 / 10838 = .67(d)1878 / 4859 =.39(e)Yes.Non smokers care more about staying away from marijuana than currentsmokers (59%, 24%, respectively). Also from (a), and (b), among those whocare a lot about staying away from marijuana, 59% were non smokers, while24% was current smokers.2.30(a)These are conditional proportions, calculated as the percentages of “yes”, “no”and “don’t know” responses within each smoking category.(b)Yes.Twelve percent of nonsmokers think smoking help reduce stress, while46.5% of current smokers believe that.(c)No.Regardless of their smoking status, teenagers believe that almost alldoctors are strongly against smoking.

Loading page image...

62.31In the actual study, 38% favored the law in the A1 form whereas only 29% favored thelaw in the B1 form. In the later study, 39% favored the law in the A2 form whereasonly 26% favored it in the B2 form. There is a stronger counter argument in B2 ascompared to B1.

Loading page image...

7CHAPTER 3AREVIEW OF SOME BASIC CONCEPTS3.1Statistics has many definitions, but almost all involve the process of drawing conclusionsfrom data.Data is subject to variability, so some would say that statistics is the study ofvariability with the objective of understanding its sources, measuring it, controllingwhatever is controllable, and drawing conclusions in the face of it. For sample surveypurposes, statistics involves a well-defined population, a sample selected according to anappropriate probabilistic design, and a methodology for making inferences from thesample to the population, usually in terms of estimation of population parameters.3.2A statistic is a function of (is calculated from) sample data whereas a parameter is anumerical characteristic of a population. In a common opinion poll, a sample of 500residents may be asked whether or no they favor a certain candidate for office.Thesample percentage is a statistic, but it is used to estimate the population percentagefavoring that candidate, an unknown parameter.3.3An estimator is a statistic used to estimate a population parameter, like the sampleproportion in Exercise 3.2.3.4A sampling distribution is a distribution of all possible values of a statistic.3.5The goodness of an estimator is usually measured by the standard deviation of itssampling distribution. The margin of error refers to two standard deviations of thesampling distribution of an estimator. Roughly speaking, the difference between anestimator and the true value of the parameter being estimated will be less than themargin of error with probability about .95.3.6An estimator should be unbiased (or nearly so) and have a small standard deviation of itssampling distribution. In other words, in repeated usage, an estimator’s values shouldpile up close to the value of the parameter being estimated.3.7An unbiased estimator is one for which the sampling distribution centers at the truevalue of the parameter being estimated.

Loading page image...

83.8The error of estimation refers to the difference between an estimator and the true valueof a parameter being estimated. It is measured by the standard deviation of the samplingdistribution of the estimator in question.3.10Summary StatisticsCaloriesCost in Dollarsw/Hydraw/o Hydraw/ Hydraw/o HydraMean64.7864.62.294.27Median66.063.5.260.25Stdev8.519.09.097.05Q16060.230.225Q37070.345.320Min5050.220.220Max8080.520.350Range3030.300.130Scatterplot of cost vs. calories-* Hydra0.50+-cost---0.40+--*-*-0.30+--**-**-20.20++---------+---------+---------+---------+---------+------calories48.054.060.066.072.078.0

Loading page image...

9Scatterplot of cost vs. calories (without Hydra)-0.350+*-*cost---0.300+----**0.250+-**--2-0.200++---------+---------+---------+---------+---------+------calories48.054.060.066.072.078.0(a)The mean is a good summary number for typical calories per serving.The standard deviation is a good summary number for the variation in thecalories.Box plot of calories-----------------------------------I+I----------------------------------+---------+---------+---------+---------+---------+------48.054.060.066.072.078.0(b)Since there is an extreme value, the median is a good summary number fortypical cost per serving, and IQR (Q3- Q1) is a good summary for the variationin costs.Box plot of cost------------------Hydra---I+I-*------------------------+---------+---------+---------+---------+---------+0.2400.3000.3600.4200.4800.540(c)Because these drinks would not generally be combined by users, the totals havelittle practical value here.(d)On the average calories per serving; not much impactOn the standard deviation: slight increaseOn the average cost per serving: decrease

Loading page image...

10On the standard deviation of the cost per serving: decreasedBox plot of calories (without Hydra)-----------------------------------I+I----------------------------------+---------+---------+---------+---------+---------+------48.054.060.066.072.078.0Box plot of cost (without Hydra)---------------------------------I+I-----------------------------------------------------+---------+---------+---------+---------+---------+--0.2250.2500.2750.3000.3250.350(e)There is no particularly influential drink on the average calories per saving, butSnappple (80 calories) has the most influence as it is furthest from the mean.3.11(a)Including the powdered drinks on the same list with the liquid drinks does nothave much effecton the average calories per serving, as their calorie figuresare within the range of the first data set. Including the powdered drinks lowersthe average cost per serving and increases the standard deviation of costbecause the new cost values are much lower than in the original set.caloriescostmeanstdevmeanstdevw/o powder64.788.51.294.097wpowder64.827.93.278.102Parallel box plots of calories------------------w/o powder-----------------I+I----------------------------------------------------w/ powder-----------------I+I----------------------------------+---------+---------+---------+---------+---------+------48.054.060.066.072.078.0

Loading page image...

11Parallel box plots of cost-------------w/o powder---I +I--*------------------------w/ powder--------------I+I-----*-------------------+---------+---------+---------+---------+--------0.1600.2400.3200.4000.480(b)Adding the light varieties to the list will not have much of an effect on theaverage cost and standard deviation of cost.Mean and standard deviation for two groups (cost)meanstdevw/o lites.294.097w/ lites.286.088Parallel box plots of cost------------------w/o lites---I+I-*-----------------------------w/ lites---I+I--------O-----------------+---------+---------+---------+---------+---------+0.2400.3000.3600.4200.4800.540(c)Adding the light varieties to the list will decrease the average calories perserving and increase the standard deviation of calories because the new costfigures are way below those of the original data set.Mean and standard deviation for two groups (calories)meanstdevw/o lites64.788.51w/ lites55.4522.69

Loading page image...

12Parallel box plots of calories--------w/o lites-------I+I------------------------w/ litesO*----I+I-----------------+---------+---------+---------+---------+---------+------01530456075(d)Use the median because the median is not sensitive to extreme values.3.12Summary StatisticsAreaNmeanmedianstdevQ1Q3Q3-Q1U.S1025.1012.5022.027.5051.2543.75U.S.& Foreign104.801.007.180.0010.0010.00(a)As shown on the stem plot, these data are split into two groups and neither themean nor the median are good measures of center. A more meaningfulsummary statistic is the total number of endangered species, 251 for thoseunique to the U.S. and 299 in the U.S. and foreign countries.Box plot of U.S.------------------------------------------------I+I----------------------------------------------------------+---------+---------+---------+---------+--------1020304050Stem plot of U.S.Stem-and-leaf of USLeaf Unit = 1.030 368(3)1 0234243 73435 057(b)Again, the total number of endangered species is a more meaningful statisticthan either the mean or median. For the world, this total is 791 species.

Loading page image...

13Stem plot of U.S & ForeignStem-and-leaf of US & ForeignLeaf Unit = 1.050 0000050 23303030 821212121 611 9Box plot of U.S. & Foreign------------------------I+I-------------------------------------------------------+---------+---------+---------+---------+---------+------0.03.57.010.514.017.5(c)No. See part (a).3.13(a)125 316241258252 32().×+×+×==(b)μ ==×+×+ ×+×=∑xp x( ).....3642161080122 32(c)V xxp xx p xxx( )()( )( )=−=−∑∑μμ222=+++−=−=3642161080122 326 485 38241097622222(.)(.)(.)(.)....σ ==V x( ).1053.14(a)μ ==∑E xxp x( )( )= 2(.443) + 3(.229) + 4(.200) + 5(.086) + 6(.028) + 7(.014) = 3.069(b)σ2=V(x)=(x−μ)2x∑p(x)=1.458σ =V(x)=1.207

Loading page image...

14(c)The distribution of the sample data would reflect that of the population. Mostof the data values would pile up around 2 and 3 , with a few larger values. Thedistribution of the sample would be skewed toward the larger values, with acenter at approximately 3.07 and a standard deviation of approximately 1.21.(d)The sample meanxhas approximately a normal distribution with meanμx=μx=3.07and standard deviationσx=σn=1.2120=0.06053.15(a)The scatter plot shows that SAT and Percent are negatively correlated, with acurved pattern suggesting that the average score drops quickly as thepercentages begin to increase and them levels off for higher percentages.Thedecreasing scores with increasing percentage taking the exam makes practicalsense; in states with small percentages only the very best students are taking theexam.(b)The correlation coefficient is -0.877, but this is not a good measure to use herebecause of the curvature in the patter. Correlation measures the strength of alinear relationship between two variables.Scatter plot between Average Score and Percent3.16Tabled below are the new probabilities for samples of size 2 and estimates of thepopulation total for the unequal probabilities of selection that favor the smallerpopulation values.10501200Aver agePer cent020406080xxxxxxx2xxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxx xxxxxx

Loading page image...

15SampleProbabilityˆτpps{1.2}0.323.75{1.3}0.0816.25{1.4}0.0821.25{2,3}0.0817.50{2,4}0.0822.50{3,4}0.0235.00{1,1}0.162.50{2.2}0.165.00{3,3}0.0130.00{4,4}0.0140.00Calculation of expectations yields:Eˆτpps()=10Vˆτpps()=81.253.17The weights given in Section 3.3 for the four population values arew1= 4.0916,w2=4.0916,w3= 1.3236 andw4=1.3236. The sum of the weights for each of the sixpossible samples, along with the probabilities of selecting each of these samples, areshown in the accompanying table.The expected value of the sum of the weights turnsout to be 4.00, the number of values in the population.SampleSum of weightsProbability of sample,unequal weights{1,2}8.1832.0222{1,3}5.4152.1111{1,4}5.4152.1111{2,3}5.4152.1111{2,4}5.4152.1111{3,4}2.6472.53333.18For samples of size n=2 taken with probabilities proportional to the populations of thestates, the pertinent data and the probabilities of selection with probabilitiesproportional to the population, both with and without replacement, are given in thefirst table that follows. With replacement probabilities of selection (δ) are directlyproportional to the population sizes. Without replacement probabilities of selection(π) are found by first finding the probability for each possible sample, given on the

Preview Mode

This document has 149 pages. Sign in to access the full document!

Report

Study Now!

Document Details

Related Documents

The Statistics of Inheritance

Estimation and Hypothesis Testing

Lab Activity 5 Sampling Distributions

Probability and Statistics Assignment

STAT 250-004 Data Analysis Assignment 4

Normal Distribution - Amount of Sleep

Correlation and Confidence Intervals

Two-Sample Hypothesis Tests

Hypothesis Testing � Comparing Two Groups

Inferential Statistics Week 2 Solution

Company

Explore

Study Tools