Solution Manual for Business Statistics: A First Course, 3rd Edition

Solution Manual for Business Statistics: A First Course, 3rd Edition provides a structured way to navigate complex textbook material.

Chloe Martinez
Contributor
4.4
57
5 months ago
Preview (16 of 348 Pages)
100%
Purchase to unlock

Page 1

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 1 preview image

Loading page image...

SOLUTIONSMANUALLINDADAWSONUniversity of WashingtonBUSINESSSTATISTICSAFIRSTCOURSETHIRDEDITIONNorean R. SharpeGeorgetown UniversityRichard De VeauxWilliams CollegePaul VellemanCornell University

Page 2

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 2 preview image

Loading page image...

Page 3

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 3 preview image

Loading page image...

1-1Chapter 1 – Data and DecisionsSECTION EXERCISESSECTION 1.11.a)Each row represents a different house that was recently sold. It can be described as a case.b)There are 7 quantitative variables in each row including the house identifier.2.a)Each row represents a different transaction (not customer or book). It can be described as a case.b)There are 8 variables including two identifiers in each row, 6 of the variables are quantitative.SECTION 1.23.a)House_ID is an identifier (categorical, not ordinal); Neighborhood is categorical (nominal); Mail_ZIP iscategorical (nominal – ordinal in a sense, but only on a national level); Acres is quantitative (units – acres);Yr_Built is quantitative (units – year); Full_Market_Value is quantitative (units – dollars); Size isquantitative (units – sq. ft.).b)These data are cross-sectional. Each row corresponds to a house that recently sold so at approximatelythe same fixed point in time.4.a)Transaction ID is an identifier (categorical, nominal, not ordinal); Customer ID is an identifier(categorical, nominal); Date can be treated as quantitative (how many days since the transaction took place,days since Jan. 1 2009, for example) or categorical (as month, for example); ISBN is an identifier(categorical, nominal); Price is quantitative (units – dollars); Coupon is categorical (nominal); Gift iscategorical (nominal); Quantity is quantitative (unit – counts).b)These data are cross-sectional. Each row corresponds to a transaction at a fixed point in time. Howeverthe date of the transaction has been recorded. Consequently, since a time variable is included the datacould be reconfigured as a time series.SECTION 1.35.It is not specified whether or not the real estate data of Exercise 1 are obtained from a survey. The data arenot from a designed experiment, a data gathering method with specific requirements. Rather, the real estatemajor’s data set was derived from transactional data (on local home sales). The major concern withdrawing conclusions from this data set is that we cannot be sure that the sample is representative of thepopulation of interest (e.g., all recent local home sales or even all recent national home sales). Therefore,we should be cautious about drawing conclusions from these data about the housing market in general.6.The student is using a secondary data source (from the Internet). The data are not from a designedexperiment, a data gathering method with specific requirements. The main concerns about using these datafor drawing conclusions is that the data were collected for a different purpose (not necessarily fordeveloping a stock investment strategy) and information about how, when, where and why these data werecollected may not be available. In addition, the companies may not be representative of companies ingeneral. Therefore, the student should be cautious about using this type of data to predict performance inthe future.CHAPTER EXERCISES7.The news.Answers will vary.8.The Internet.Answers will vary.9.Survey.The description of the study has to be broken down into its components in order to understand thestudy.Who– who or what was actually sampled–college students;What–what is being measured–opinion ofelectric vehicles: whether there will more electric or gasoline powered vehicles in 2025 and the likelihoodof whether they would purchase an electric vehicle in the next 10 years;When–current;Where–yourlocation;Why–automobile manufacturer wants college student opinions;How–how was the studyconducted–survey;Variables–what is the variable being measured–there is one categorical variable–what

Page 4

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 4 preview image

Loading page image...

1-2Chapter 1 Data and Decisionsstudents think about whether or not there will be more electric or gasoline powered vehicles in 2025 andone ordinal variable–how likely, using a scale, would the student be to buy an electric vehicle in the next 10years;Source –the data are not from a designed survey or experiment;Type–the data are cross-sectional;Concerns–none.10.Your survey.Answers will vary.11.World databank.Answers will vary but chosen from the following possible indicators:GDP growth (annual %)GDP (current US$)GDP per capita (current US$)GNI per capita, Atlas method (current US$)Exports of goods and services (% of GDP)Foreign direct investment, net inflows (BoP, current US$)GNI per capita, PPP (current international $)GINI indexInflation, consumer prices (annual %)Population, totalLife expectancy at birth, total (years)Internet users (per 100 people)Imports of goods and services (% of GDP)Unemployment, total (% of total labor force)Agriculture, value added (% of GDP)CO2 emissions (metric tons per capita)Literacy rate, adult total (% of people ages 15 and above)Central government debt, total (% of GDP)Inflation, GDP deflator (annual %)Poverty headcount ratio at national poverty line (% of population)12.Arby’s menu.Who–Arby’s sandwiches;What–type of meat, number of calories (in calories), and servingsize (in ounces);When–not specified;Where–Arby’s restaurants;Why–assess the nutritional value of thedifferent sandwiches;How–information gathered on each of the sandwiches offered on the menu;Variables–there are 3 variables: the number of calories and serving size are quantitative, and the type ofmeat is categorical;Source–data are not from a designed survey or experiment;Type–data are cross-sectional;Concerns–none.13.MBA admissions.Who–MBA applicants (in Northeast US);What–sex, age, whether or not accepted,whether or not they attended, and the reasons for not attending (if they did not accept);When–not specified;Where–a school in the Northeastern United States;Why–the researchers wanted to investigate any patternsin female student acceptance and attendance in the MBA program;How–data obtained from the admissionsoffice;Variables–there are 5 variables: sex, whether or not the students accepted, whether or not theyattended, and the reasons for not attending if they did not accept (all categorical) and age which isquantitative;Source–data are not from a designed survey or experiment;Type–data are cross-sectional;Concerns–none.14.MBA admissions II.Who–MBA students (in a program outside of Paris);What–each student’sstandardized test scores and GPA in the MBA program;When–2009 to 2014;Where–outside of Paris;Why–to investigate the association between standardized test scores and performance in the MBA programover five years (2009–2014);How–not specified;Variables–there are 2 quantitative variables: standardizedtest scores and GPA;Source–data are not from a designed survey or experiment, data are available fromstudent records;Type–although the data are collected over 6 years, the purpose is to examine them as cross-sectional rather than as time-series;Concerns–none.

Page 5

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 5 preview image

Loading page image...

Chapter 1 Data and Decisions1-315.Pharmaceutical firm.Who–experimental volunteers;What–herbal cold remedy or sugar solution, and coldseverity;When–not specified;Where–major pharmaceutical firm;Why–scientists were testing theeffectiveness of an herbal compound on the severity of the common cold;How–scientists conducted acontrolled experiment;Variables–there are 2 variables: type of treatment (herbal or sugar solution) iscategorical, and severity rating is quantitative;Source –data come from an experiment;Type–data arecross-sectional;Concerns–the severity of a cold might be difficult to quantify (beneficial to add actualobservations and measurements, such as body temperature). Also, scientists at a pharmaceutical firm couldhave a predisposed opinion about the herbal solution or may feel pressure to report negative findings aboutthe herbal product.16.Start-up company.Who–customers of a start-up company;What–customer name, ID number, region ofthe country, date of last purchase, amount of purchase ($), and item purchased;When–present day;Wherenot specified;Why–the company is building a database of customers and sales information;How–assumedthat the company records the needed information from each new customer;Variables–there are 6 variables:name, ID number, region of the country, and item purchased which are categorical and date and amount ofpurchase are quantitative;Source–data are not from a designed survey or experiment;Type–data are cross-sectional;Concerns–although region is coded as a number, it is still a categorical variable.17.Vineyards.Who–vineyards;What–size of vineyard (acres), number of years in existence, state, varieties ofgrapes grown, average case price ($), gross sales ($), and percent profit;When–not specified;Where–notspecified;Why–business analysts hope to provide information that would be helpful to producers ofU.S. wines;How–not specified;Variables–there are 5 quantitative variables: the size of vineyard (acres),number of years in existence, average case price ($), gross sales ($); there are 2 categorical variables: stateand variety of grapes grown;Source–data come from a designed survey;Type–data are cross-sectional;Concerns–none.18.Spectrem group polls.Who–not completely clear. Probably a sample of affluent and retired people;Whatpet preference, number of pets, services and products bought for pets (from a list);When–not specified;Where–United States;Why–provide services for the affluent;How–survey;Variables–there are 3categorical variables: pet preference, list of pets and list of services and products bought for pet;Source–data from a designed survey;Type–data are cross-sectional;Concerns–none.19.EPA.Who–every model of automobile in the United States;What–vehicle manufacturer, vehicle type (car,SUV, etc.), weight (probably pounds), horsepower (units of horsepower), and gas mileage (miles pergallon) for city and highway driving;When–the information is currently collected;Where–United States;Why–the EPA uses the information to track fuel economy of vehicles;How– among the data EPA analystscollect from the automobile manufacturers are the name of the manufacturer (Ford, Toyota, etc.), vehicletype….”;Variables–there are 6 variables: vehicle manufacturer and vehicle type are categorical variables;weight, horsepower, and gas mileage for both city and highway driving are quantitative variables;Source–data are not from a designed survey or experiment;Type–data are cross-sectional;Concerns–none.20.Consumer Reports.Who–46 models of smart phones;What–brand, price (probably dollars), display size(probably inches) operating system, camera image size (megapixels), and memory card slot (yes/no);When–2013;Where–United States;Why–the information was compiled to provide information to readers ofConsumer Reports;How–not specified;Variables–– there are a total of 6 variables: price, display size andimage size are quantitative variables; brand and operating system are categorical variables, and memorycard slot is a nominal variable;Source–not specified;Type–thedata are cross-sectional;Concerns–thismany or may not be a representative sample of smart phones, or includes all of them, we don’t know. Thisis a rapidly changing market, so their data are at best a snapshot of the state of the market at this time.21.Zagat.Who–restaurants;What–% of customers liking restaurant, average meal cost ($), food rating (1-30),decor rating (1-30), service rating (1-30);When–current;Where–United States;Why–service to provideinformation for consumers;How–not specified;Variables–there are 5 variables: % liking and average costare quantitative variables; ratings (food, decor and service) are ordered categories, therefore, ordinalvariables;Source–not specified;Type–thedata are cross-sectional.

Page 6

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 6 preview image

Loading page image...

1-4Chapter 1 Data and Decisions22.L.L. Bean.Who–catalog mailings;What–number of catalogs mailed out, square inches in catalog, and sales($ million) in 4 weeks following mailing;When–current;Where–L.L. Bean (United States);Why–toinvestigate association among catalog characteristics, timing, and sales;How–collection of internal data;Variables–there are 3 variables: number of catalogs, square inches in catalog, and sales are all quantitativevariables;Source–not specified;Type–data are cross-sectional;Concerns–none.23.Stock market.Who–students in an MBA statistics class;What–total personal investment in stock market($), number of different stocks held, total invested in mutual funds ($), and the name of each mutual fund;When–not specified;Where–a business school in the northeast US;Why–the information was collected foruse in classroom illustrations;How–an online survey was conducted, participation was probably requiredfor all members of the class;Variables–there are 4 variables: total personal investment in stock market ($),number of different stocks held, total invested in mutual funds ($) are quantitative variables; the name ofeach mutual fund is a categorical variable;Source–data come from a designed survey;Type–data are cross-sectional.24.Theme park sites.Who–potential theme park locations in Europe;What–country of site, estimated cost(probably €), potential population size (counts), size of site (probably hectares), whether or not masstransportation within 5 minutes of site;When–2013;Where–Europe;Why–to present to potential developerson the feasibility of various sites;How–not specified;Variables–there are 5 variables: country of site andwhether or not mass transportation is within 5 minutes of site are both categorical variables; estimated cost,potential population size and size of site are quantitative variables;Source–data are not from a designedsurvey or experiment;Type–data are cross-sectional.25.Indy 2014.Who–Indy 500 races;What–year, winner, chassis, engine, time (hrs), speed (mph), and carnumber;When–1911-2014;Where–Indianapolis, Indiana;Why–examine trends in Indy 500 race winners;How–official statistics kept for each race every year;Variables–– there are 7 variables: winner, chassis,engine, and car number are categorical variables; year, time and speed are quantitative variables;Source–all race results;Type–data are time-series;Concerns–none.26.Kentucky Derby 2014.Who–Kentucky Derby races;What–year, winner, jockey, duration of the race(seconds), and track conditions;When–1875-2014;Where–Churchill Downs, Louisville, Kentucky;Whyexamine trends in Kentucky Derby winners;How–official statistics kept for each race every year;Variables–there are 5 variables: winner, winning jockey, and track conditions are categorical variables;year and duration of the race are quantitative variables;Source–race results;Type–data are time-series;Concerns–none.27.Mortgages.Each row represents each individual mortgage loan. Headings of the columns would be:borrower’s name, mortgage amount.28.Employee performance.Each row represents each individual employee. Headings of the columns wouldbe: Employee ID Number (to identify the row instead of name), contract average ($), supervisor’s rating (1-10), and years with the company.29.Company performance.Each row represents a week. Headings of the columns would be: week number ofthe year (to identify each row), sales prediction ($), sales ($), and difference between predicted sales andrealized sales ($).30.Command performance.Each row represents a Broadway show. Headings of the columns would be: theshow name (identifies the row), profit or loss ($), number of investors and investment total ($).31.Car sales.Cross-sectional are data taken from situations that vary over time but measured at a single timeinstant. This problem focuses on data for September only which is a single time period. Therefore, the dataare cross-sectional.

Page 7

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 7 preview image

Loading page image...

Chapter 1 Data and Decisions1-532.Motorcycle sales.Time-series data are measured over time. Usually the time intervals are equally-spaced(e.g. every week, every quarter, or every year). This problem focuses on the number of motorcycles sold bythe dealership in each month of 2014; therefore, the data are measured over a period of time and are timeseries data.33.Cross sections.Time-series data are measured over time. Usually the time intervals are equally-spaced(e.g. every week, every quarter, or every year). This problem focuses on the average diameter of treesbrought to a sawmill in each week of a year; therefore, the data are measured over a period of time and aretime-series data.34.Series.Cross-sectional are data taken from situations that vary over time but measured at a single timeinstant. This problem focuses on data for attendance of the third World Series game. Therefore, the data arecross-sectional.Ethics in ActionSarah’s dilemma: The company RSPT Inc. is having Sarah compare their strategies to other companies. However,they could influence the outcome by funding the research and providing free software. In addition, Sarah may feelobliged to favor RSPT Inc. because they were generous in providing her research tools and funding. The companymay put pressure on her to favor their methods over others because of their close relationship. The undesirableconsequences are that the results are not completely objective and bias exists due to the funding circumstances.One possible solution would be to find other grants outside of RSPT Inc. but not connected to any of the companiesbeing compared. This might also be true of the software. It is important in a scientific study to be completelyobjective and not have any influence by one of the clients being examined.Jim’s dilemma: Statistics and data can often be manipulated to produce a desired result that can “fudge” results andpresent a more desirable outcome. The scientific method is constructed to be objective if the rules are followed. Theobjective of Jim’s study was to increase the percentage of clients who viewed their advisory services as outstanding,not increase the overall satisfaction average. In presenting an increased average, Jim is not being honest about thespecific results of his study with respect to his objective. He should be honest about the decrease in the“outstanding” category.One possible solution might be to compare the number of responses in each survey to see if there is a discrepancythat could explain the change. In addition, he could point out the large increase in the “above average” category(10% to 40%) which shows a huge improvement. Many people may be unwilling to give the highest rating on anintermediate basis but would be willing to identify an improvement.For further information on the official American Statistical Association’s Ethical Guidelines, visit:http://www.amstat.org/about/ethicalguidelines.cfmThe Ethical Guidelines address important ethical considerations regarding professionalism and responsibilities.Brief Case – Credit Card BankList the W’s for these data:Who –company cardholdersWhat –offer status (type of offer made to cardholder), credit card charges made by cardholder in August 2008,September 2008, and October 2008, marketing segment, industry segment, amount of spend lift after promotion,average spending on card pre- and post- promotion, whether or not cardholder is a retail customer or enrolled in theprogram and whether or not the spend lift was positive.Why –to determine what types of offers are most effective in increasing credit card spendingWhen –most likely in 2008Where –although not specified, most likely national data collected in U.S.

Page 8

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 8 preview image

Loading page image...

1-6Chapter 1 Data and DecisionsHow –demographic data most likely collected when credit card account was opened and spending data collectedduring transactionsClassify each variable as categorical or quantitative; if quantitative identify the units:Offer Status –categoricalCharges August 2008 –quantitative ($)Charges September 2008 –quantitative ($)Charges October 2008 –quantitative ($)Marketing Segment –categoricalIndustry Segment –categoricalSpend Lift After Promotion –quantitative ($)Pre Promotion Avg Spend –quantitative ($)Post Promotion Avg Spend –quantitative ($)Retail Customer –categoricalEnrolled in Program –categoricalSpend Lift Positive –categorical

Page 9

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 9 preview image

Loading page image...

3-1Chapter 3 – Displaying and Describing Quantitative DataSECTION EXERCISESSECTION 3.11.a)b)The slightly different look to this chart generated in Excel is due to how Excel puts data intobins. Excel puts numbers in a bin that are up to and including the bin amount.c)The relative frequencies are calculated by dividing each bin amount by the total number ofcustomers. The shape should look identical to the previous chart.d)

Page 10

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 10 preview image

Loading page image...

3-2Chapter 3 Displaying and Describing Quantitative Data1 | 142 | 02256993 | 00022245584 | 2444882.a)b)c)

Page 11

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 11 preview image

Loading page image...

Chapter 3 Displaying and Describing Quantitative Data3-3d)0 | 31 | 22 | 23 | 3348894 | 18885 | 276 | 467 | 458 | 2SECTION 3.23.a)The distribution is unimodal.b)The mode is around 35 years old.c)The distribution is fairly symmetric.d)No outliers are evident.4.a)The distribution is unimodal.b)The mode is around $30.c)The distribution is fairly symmetric, but slightly right skewed.d)No outliers are evident.SECTION 3.35.a)The mean and median age should be about the same because the distribution is fairly symmetric.b)31.84 yearsc)32 years6.a)Because the distribution is slightly skewed to the right, the mean purchase amount may be a bithigher than the median purchase amount.b)$45.26c)$44.17SECTION 3.47.a)Q1= 26 ; Q3= 38 (answers may vary slightly, Q1 = 25.5, Q3 = 40, depending on the softwarepackage or calculator that you use which can use different algorithms)b)Q1= 26; Q3= 38c)IQR = 12 yearsd)Standard Deviation = 9.84 years8.a)Q1= $34.01; Q3= $58.83 (answers may vary slightly, Q1 = 33.67, Q3 = 60.72, depending onthe software package that you use which can use different algorithms)b)Q1= $33.67; Q3= $60.72c)IQR = $27.05d)Standard Deviation = $20.67

Page 12

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 12 preview image

Loading page image...

3-4Chapter 3 Displaying and Describing Quantitative DataSECTION 3.59.a)Shape – the distribution is clearly skewed to the right. Center – it is more difficult to determinevisually the center of a skewed distribution. The center of a skewed distribution is best representedby the median, the exact middle data point when the data set is ordered either in ascending ordescending order. There are 5000 data points representing the 5000 charge customers. The centerof the data would be just to the right of the 2500thdata point. It can be estimated which bincontains the median value by adding up the values in each bin. For example, the first bin (-$1000to-$500) contains about 10 data points. The next bin ((-$500 to $0) contains slightly more, about15 data points (the exact number is not important in this estimation of the median value) and thenext bin ($0 to $5000) contains about 810 values. The next bin ($5000 to $10,000) contains about720 values. The next bin ($1000 to $1500) contains about 830 values. The total so far is10+15+810+720+830 = 2385 values. The 2500thdata point has not yet been reached. The $1500 to$2000 bin contains a large number of values (about 750 values) which means the 2500thvalue iscontained in that interval. Therefore, the center of the distribution is estimated to be between$1500 and $2000 (close to $2000). Spread – the spread is determined from the range of data, lowto high, or $5000-(-$1000) = approximately $6000. The exact range cannot be determined fromthe histogram because the intervals or bins do not represent the exact data points. The IQR isaround $2000. There are no outliers. Unusual features – it can be pointed out that there are a fewnegative values that represent customers that received more credits than charges in the month;therefore the charge shows up as negative on the histogram.b)The mean will be larger than the median because the distribution is right skewed. The medianrepresents the exact middle number whereas the mean is an average of all data points, includingthe data points with higher values represented in the right tail. The mean will be pulled toward thetail with the higher values. The median is always the center value whether the distribution issymmetric or skewed.c)The median is a more appropriate measure of the center of the distribution when it is skewedbecause it represents the middle or more typical value. The mean has been pulled toward the rightor higher end because of the skew and therefore is not an accurate representation of the center.10.a)Shape – the distribution is skewed to the right. Center –the center of a skewed distribution is bestrepresented by the median, the exact middle data point when the data set is ordered either inascending or descending order. There are 36 data points representing the 36 wineries. The centerof the data would be close to the 18thvalue (between the 18thand 19thvalues). It can be estimatedwhich bin contains the median value by adding up the values in each bin. The 0-30 acre bincontains 15 data points and the next bin (30-60 acres) contains about 13 data points. The total sofar is 15+30 values = 45. The median value has to be contained in the 30 to 60 bin (near 40)because that interval contains the 16ththrough 30thdata points. Spread – the spread is determinedfrom the range of data, high value minus low value, or 270-0 = approximately 240. Unusualfeatures – there is a possible outlier in the 240-270 acre interval. There is a large gap between 180and the outlier. The IQR is relatively small since most data are in the first two bins.b)The mean is expected to be larger in a right skewed distribution because the larger values affectthe mean, pulling it towards the right tail of the distribution.c)Because the distribution is skewed, the median is a better representation of the center of thedistribution.SECTION 3.611.a)Standardize the minimum (11) and maximum (48) ages from Exercise 1 and the mean fromExercise 5b (31.84) and the standard deviation from Exercise 7d (9.84).Min:()(1131 84)9 842 12zxx/ s./.. Max:()(4831 84)9 841 64zxx/ s./..

Page 13

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 13 preview image

Loading page image...

Chapter 3 Displaying and Describing Quantitative Data3-5b)The minimum standardized value is more extreme.c)Az-score of 3:(); 3 = (31 84)9 84zxx/ sx./.Solve forxin the equation:x= 3×9.84+31.84 = 61.36 years old12.a)Standardize the minimum ($2.73) and maximum ($81.58) from Exercise 2 and the mean fromExercise 6b ($45.26) and the standard deviation from Exercise 8d ($20.67).Min:()($2 73$45 26)$20 672 06zxx/ s../.. Max:()($81 58$45 26)$20 671 76zxx/ s../..b)The minimum purchase standardized value is more extreme.c)Az-score of 3.5:(); 3.5 = ($45 26)$20 67zxx/ sx./.Solve forxin the equation:x= 3.5×$20.67+$45.26 = $117.6113.a)b)No outliers are identified on the boxplot.c)Q3 + 1.5 IQR = 38+1.5×12 = 56 years old14.a)b)No outliers are nominated on the boxplot.c)Q3 + 1.5 IQR = 60.72 +1.5×(27.05)= $101.3015.a)The distribution can be described as skewed to the right. Symmetry can be determined bycomparing the mean and the median. The mean is 46.50 and the median is 33.50. The mean ismuch larger than the median indicating a right skew (the higher values are pulling the mean valuehigher than the median). In addition, symmetry can be determined by comparing the difference

Page 14

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 14 preview image

Loading page image...

3-6Chapter 3 Displaying and Describing Quantitative Databetween the first quartile and the median and the third quartile and the median. If the distributionis symmetric, these values should be fairly equal. In this summary, the median – Q1 = 33.50 –18.50 = 15 compared to Q3 – median = 55 – 33.50 = 21.5. The right side of the distribution iswider than the left which indicates a right skew. Finally, the maximum value of 250 is very highcompared to the median of 33.50 while the minimum value of 6 compared to the median of 33.50is a much smaller number also indicating a skew to the right.b)Yes, there is one high outlier at 250. This is clearly shown in the histogram from Exercise 10. Inaddition, itsz-score of 4.26 indicates that 250 is more than 4 standard deviations above the meanand far greater than Q3 + 1.5 IQR.c)The boxplot shows the outlier at 250 but without the data set, the length of the upper whiskergoing to the upper fence (limit for the outliers) cannot be determined.16.a)The distribution can be described at least as roughly symmetric. Symmetry can be determined bycomparing the mean and the median. The mean is 68.35 and the median is 69.90. These values arefairly close indicating at least a roughly symmetric distribution. In addition, symmetry can bedetermined by comparing the difference between the first quartile and the median and the thirdquartile and the median. If the distribution is symmetric, these values should be fairly equal. Inthis summary, the median – Q1 = 69.90 – 59.15 = 10.75 compared to Q3 – median = 74.75 –69.90 = 4.85. The left side of the distribution is slightly wider than the right but not by a largemargin. Finally, the maximum value of 87.40 compared to the median of 69.90 is similar to theminimum value of 43.20 compared to the median of 69.90 although the left side again is wider.b)Outliers can be determined mathematically by adding the term 1.5×IQR to Q3 to see if any datavalues fall above or by subtracting 1.5×IQR from Q1 to see if any data values fall below. In thiscase, IQR = Q3 – Q1 = 74.75 – 59.15 = 15.6. 1.5×IQR = 1.5*15.6 = 23.4. To check for highoutliers, add 1.5×IQR to Q3 = 23.4 + 74.75 = 98.15. The maximum data point falls below thatvalue so there are no high outliers. To check for low outliers, subtract 1.5×IQR from Q1 = 59.15– 23.4 = 35.75. The minimum data point is above this value so there are no low outliers.

Page 15

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 15 preview image

Loading page image...

Chapter 3 Displaying and Describing Quantitative Data3-7c)BoxplotSECTION 3.717.The ages of the women are generally higher than those of the men by about 10 years. More than ¾ of thewomen are older than all of the men. The female distribution is roughly symmetric, the male distributionlooks left skewed and no outliers are evident.

Page 16

Solution Manual for Business Statistics: A First Course, 3rd Edition - Page 16 preview image

Loading page image...

3-8Chapter 3 Displaying and Describing Quantitative Data18.The distributions are similar, but purchase amounts seem to be be about $10 to $15 higher on the weekend.Both distributions appear fairly symmetric and by the 1.5 IQR rule, there are no outliers.19.In describing side-by-side boxplots, there are two important things to mention – the description of howeach data set is distributed and a comparison of the distributions with each other. The distribution ofweekly sales at Location #1 are roughly symmetric with some high outliers, one exceeding $320,000. Themedian sales value is approximately $240,000 and the minimum sales value is approximately $160,000.The distribution of weekly sales at Location #2 is also roughly symmetric with high outliers close to$150,000 to $180,000. The median sales value is approximately $110,000 and the minimum sales value isbelow $100,000.Location #1 clearly has higher sales than Location #2 in every week except for the highoutlier in Location #2. The company might want to compare other stores in locations like these to see if thispattern holds true for other locations.20.In describing side-by-side boxplots, there are two important things to mention – the description of howeach data set is distributed and a comparison of the distributions with each other. The distribution ofweekly sales in stores located in Massachusetts (MA) is right skewed with several high outliers. Themedian sales value is approximately $115,000 and the minimum sales value is approximately $85,000.High outliers extend up to about $220,000. The distribution of weekly sales in stores located in Connecticut(CT) is also right skewed with several outliers. The median sales value is approximately $100,000 and theminimum sales value is approximately $65,000. High outliers extend up to about $170,000. Overall, thestores in MA generally have higher sales than the stores in CT. The median value in MA is higher than thethird quartile in CT. The lowest performing store in MA was higher than nearly 25% of the stores in CT.SECTION 3.921.The upper outlier limit is 123×1.5*(123 – 44.9) = 240.15. The lower outlier limit is 44.9 – 1.5×(123 –44.9) < 0. Yes, the maximum value is an outlier. We should look at a boxplot to know how to proceed.22.The upper outlier limit is 49 +1.5×(49 – 24) = 86.5. The lower outlier limit is 44.9 – 1.5×(49 – 24) < 0.Yes, the maximum value of 256 is impossible. We should set that value aside and re-analyze the data.SECTION 3.1023.a)Yesb)No–data are from a single time point.c)No–response is “time” but measured at only one time point.d)Yes.
Preview Mode

This document has 348 pages. Sign in to access the full document!

Study Now!

XY-Copilot AI
Unlimited Access
Secure Payment
Instant Access
24/7 Support
Document Chat

Related Documents

View all