Solution Manual for Introductory Econometrics: A Modern Approach, 5th Edition
Solution Manual for Introductory Econometrics: A Modern Approach, 5th Edition provides expert-verified solutions to help you study smarter.
Mia Johnson
Contributor
4.7
101
about 2 months ago
Preview (31 of 279)
Sign in to access the full document!
i
CONTENTS
PREFACE iii
SUGGESTED COURSE OUTLINES iv
Chapter 1 The Nature of Econometrics and Economic Data 1
Chapter 2 The Simple Regression Model 6
Chapter 3 Multiple Regression Analysis: Estimation 19
Chapter 4 Multiple Regression Analysis: Inference 34
Chapter 5 Multiple Regression Analysis: OLS Asymptotics 48
Chapter 6 Multiple Regression Analysis: Further Issues 54
Chapter 7 Multiple Regression Analysis with Qualitative 71
Information: Binary (or Dummy) Variables
Chapter 8 Heteroskedasticity 89
Chapter 9 More on Specification and Data Problems 103
Chapter 10 Basic Regression Analysis with Time Series Data 117
Chapter 11 Further Issues in Using OLS with Time Series Data 129
Chapter 12 Serial Correlation and Heteroskedasticity in 143
Time Series Regressions
Chapter 13 Pooling Cross Sections Across Time. Simple 156
Panel Data Methods
Chapter 14 Advanced Panel Data Methods 172
Chapter 15 Instrumental Variables Estimation and Two Stage 187
Least Squares
Chapter 16 Simultaneous Equations Models 205
Chapter 17 Limited Dependent Variable Models and Sample 219
Selection Corrections
CONTENTS
PREFACE iii
SUGGESTED COURSE OUTLINES iv
Chapter 1 The Nature of Econometrics and Economic Data 1
Chapter 2 The Simple Regression Model 6
Chapter 3 Multiple Regression Analysis: Estimation 19
Chapter 4 Multiple Regression Analysis: Inference 34
Chapter 5 Multiple Regression Analysis: OLS Asymptotics 48
Chapter 6 Multiple Regression Analysis: Further Issues 54
Chapter 7 Multiple Regression Analysis with Qualitative 71
Information: Binary (or Dummy) Variables
Chapter 8 Heteroskedasticity 89
Chapter 9 More on Specification and Data Problems 103
Chapter 10 Basic Regression Analysis with Time Series Data 117
Chapter 11 Further Issues in Using OLS with Time Series Data 129
Chapter 12 Serial Correlation and Heteroskedasticity in 143
Time Series Regressions
Chapter 13 Pooling Cross Sections Across Time. Simple 156
Panel Data Methods
Chapter 14 Advanced Panel Data Methods 172
Chapter 15 Instrumental Variables Estimation and Two Stage 187
Least Squares
Chapter 16 Simultaneous Equations Models 205
Chapter 17 Limited Dependent Variable Models and Sample 219
Selection Corrections
i
CONTENTS
PREFACE iii
SUGGESTED COURSE OUTLINES iv
Chapter 1 The Nature of Econometrics and Economic Data 1
Chapter 2 The Simple Regression Model 6
Chapter 3 Multiple Regression Analysis: Estimation 19
Chapter 4 Multiple Regression Analysis: Inference 34
Chapter 5 Multiple Regression Analysis: OLS Asymptotics 48
Chapter 6 Multiple Regression Analysis: Further Issues 54
Chapter 7 Multiple Regression Analysis with Qualitative 71
Information: Binary (or Dummy) Variables
Chapter 8 Heteroskedasticity 89
Chapter 9 More on Specification and Data Problems 103
Chapter 10 Basic Regression Analysis with Time Series Data 117
Chapter 11 Further Issues in Using OLS with Time Series Data 129
Chapter 12 Serial Correlation and Heteroskedasticity in 143
Time Series Regressions
Chapter 13 Pooling Cross Sections Across Time. Simple 156
Panel Data Methods
Chapter 14 Advanced Panel Data Methods 172
Chapter 15 Instrumental Variables Estimation and Two Stage 187
Least Squares
Chapter 16 Simultaneous Equations Models 205
Chapter 17 Limited Dependent Variable Models and Sample 219
Selection Corrections
CONTENTS
PREFACE iii
SUGGESTED COURSE OUTLINES iv
Chapter 1 The Nature of Econometrics and Economic Data 1
Chapter 2 The Simple Regression Model 6
Chapter 3 Multiple Regression Analysis: Estimation 19
Chapter 4 Multiple Regression Analysis: Inference 34
Chapter 5 Multiple Regression Analysis: OLS Asymptotics 48
Chapter 6 Multiple Regression Analysis: Further Issues 54
Chapter 7 Multiple Regression Analysis with Qualitative 71
Information: Binary (or Dummy) Variables
Chapter 8 Heteroskedasticity 89
Chapter 9 More on Specification and Data Problems 103
Chapter 10 Basic Regression Analysis with Time Series Data 117
Chapter 11 Further Issues in Using OLS with Time Series Data 129
Chapter 12 Serial Correlation and Heteroskedasticity in 143
Time Series Regressions
Chapter 13 Pooling Cross Sections Across Time. Simple 156
Panel Data Methods
Chapter 14 Advanced Panel Data Methods 172
Chapter 15 Instrumental Variables Estimation and Two Stage 187
Least Squares
Chapter 16 Simultaneous Equations Models 205
Chapter 17 Limited Dependent Variable Models and Sample 219
Selection Corrections
ii
Chapter 18 Advanced Time Series Topics 243
Chapter 19 Carrying Out an Empirical Project 259
Appendix A Basic Mathematical Tools 260
Appendix B Fundamentals of Probability 263
Appendix C Fundamentals of Mathematical Statistics 265
Appendix D Summary of Matrix Algebra 269
Appendix E The Linear Regression Model in Matrix Form 271
Chapter 18 Advanced Time Series Topics 243
Chapter 19 Carrying Out an Empirical Project 259
Appendix A Basic Mathematical Tools 260
Appendix B Fundamentals of Probability 263
Appendix C Fundamentals of Mathematical Statistics 265
Appendix D Summary of Matrix Algebra 269
Appendix E The Linear Regression Model in Matrix Form 271
iii
PREFACE
This manual contains suggested course outlines, teaching notes, and detailed solutions to all of
the problems and computer exercises in Introductory Econometrics: A Modern Approach, 5e.
For several problems, I have added additional notes about interesting asides or suggestions for
how to modify or extend the problem.
Some of the answers given here are subjective, and you may want to supplement or replace them
with your own answers. I wrote all solutions as if I were preparing them for the students, so you
may find some solutions a bit tedious (if not bordering on an insult to your intelligence). This
way, if you prefer, you can distribute my answers to some of the even-numbered problems
directly to the students. (The student study guide contains answers to all odd-numbered
problems.) Many of the equations in the Word files were created using MathType, and the
equations will not look quite right without MathType. Some equations I have created using the
equation editor in Word 2007.
I solved the computer exercises using various versions of Stata, starting with version 4.0 and
running through version 12.0. Nevertheless, almost all of the estimation methods covered in the
text have been standardized, and different econometrics or statistical packages should give the
same answers. There can be differences when applying more advanced techniques, as
conventions sometimes differ on how to choose or estimate auxiliary parameters. (Examples
include heteroskedasticity-robust standard errors, estimates of a random effects model, and
corrections for sample selection bias.)
While I have endeavored to make the solutions mistake-free, some errors may have crept in. I
would appreciate hearing from you if you find mistakes. I will update the manual occasionally
and correct any mistakes that have been found. I heard from many of you regarding the earlier
editions of the text, and I incorporated many of your suggestions. I welcome any comments that
will help me make improvements to future editions. I can be reached via e-mail at
wooldri1@.msu.edu.
The fifth edition of the text drops the chapter numbers preceding the problems and computer
exercises. I have kept the chapter numbers in the solutions manual so that it is easy to keep track
of where one is. For example, the solution to problem 4 in chapter 3 is labeled 3.4 and computer
exercise 6 in chapter 8 is labeled C8.6.
I hope you find this instructor’s manual useful, and I look forward to hearing your reactions to
the fifth edition.
Jeffrey M. Wooldridge
Department of Economics
Michigan State University
486 W. Circle Drive
110 Marshall-Adams Hall
East Lansing, MI 48824-1038
PREFACE
This manual contains suggested course outlines, teaching notes, and detailed solutions to all of
the problems and computer exercises in Introductory Econometrics: A Modern Approach, 5e.
For several problems, I have added additional notes about interesting asides or suggestions for
how to modify or extend the problem.
Some of the answers given here are subjective, and you may want to supplement or replace them
with your own answers. I wrote all solutions as if I were preparing them for the students, so you
may find some solutions a bit tedious (if not bordering on an insult to your intelligence). This
way, if you prefer, you can distribute my answers to some of the even-numbered problems
directly to the students. (The student study guide contains answers to all odd-numbered
problems.) Many of the equations in the Word files were created using MathType, and the
equations will not look quite right without MathType. Some equations I have created using the
equation editor in Word 2007.
I solved the computer exercises using various versions of Stata, starting with version 4.0 and
running through version 12.0. Nevertheless, almost all of the estimation methods covered in the
text have been standardized, and different econometrics or statistical packages should give the
same answers. There can be differences when applying more advanced techniques, as
conventions sometimes differ on how to choose or estimate auxiliary parameters. (Examples
include heteroskedasticity-robust standard errors, estimates of a random effects model, and
corrections for sample selection bias.)
While I have endeavored to make the solutions mistake-free, some errors may have crept in. I
would appreciate hearing from you if you find mistakes. I will update the manual occasionally
and correct any mistakes that have been found. I heard from many of you regarding the earlier
editions of the text, and I incorporated many of your suggestions. I welcome any comments that
will help me make improvements to future editions. I can be reached via e-mail at
wooldri1@.msu.edu.
The fifth edition of the text drops the chapter numbers preceding the problems and computer
exercises. I have kept the chapter numbers in the solutions manual so that it is easy to keep track
of where one is. For example, the solution to problem 4 in chapter 3 is labeled 3.4 and computer
exercise 6 in chapter 8 is labeled C8.6.
I hope you find this instructor’s manual useful, and I look forward to hearing your reactions to
the fifth edition.
Jeffrey M. Wooldridge
Department of Economics
Michigan State University
486 W. Circle Drive
110 Marshall-Adams Hall
East Lansing, MI 48824-1038
Loading page 4...
iv
SUGGESTED COURSE OUTLINES
For an introductory, one-semester course, I like to cover most of the material in Chapters 1
through 8 and Chapters 10 through 12, as well as parts of Chapter 9 (but mostly through
examples). I do not typically cover all sections or subsections within each chapter. Under the
chapter headings listed below, I provide some comments on the material I find most relevant for
a first-semester course.
An alternative course ignores time series applications altogether, while delving into some of the
more advanced methods that are particularly useful for policy analysis. This would consist of
Chapters 1 through 8, much of Chapter 9, and the first four sections of Chapter 13. Chapter 9
discusses the practically important topics of proxy variables, measurement error, outlying
observations, and stratified sampling. In addition, I have written a more careful description of
the method of least absolute deviations, including a discussion of its strengths and weaknesses.
Chapter 13 covers, in a straightforward fashion, methods for pooled cross sections (including the
so-called “natural experiment” approach) and two-period panel data analysis. The basic cross-
sectional treatment of instrumental variables in Chapter 15 is a natural topic for cross-sectional,
policy-oriented courses. For an accelerated course, the nonlinear methods used for cross-
sectional analysis in Chapter 17 can be covered.
I typically do not begin with a review of basic algebra, probability, and statistics. In my
experience, this takes too long and the payoff is minimal. (Students tend to think that they are
taking another statistics course and start to drift away from the material.) Instead, when I need a
tool (such as the summation or expectations operator), I briefly review the necessary definitions
and key properties. Statistical inference is not more difficult to describe in the context of multiple
regression than in testing about mean a mean from a population, and so I briefly review the
principles of statistical inference during multiple regression analysis. Appendices A, B, and C are
fairly extensive. When I cover asymptotic properties of OLS, I provide a brief discussion of the
main definitions and limit theorems. If students need more than the brief review provided in
class, I point them to the appendices.
For a master’s level course, I include a couple of lectures on the matrix approach to linear
regression. This could be integrated into Chapters 3 and 4 or covered after Chapter 4. Again, I do
not summarize matrix algebra before proceeding. Instead, the material in Appendix D can be
reviewed as it is needed in covering Appendix E.
A second semester course, at either the undergraduate or master’s level, could begin with some
of the material in Chapter 9, particularly with the issues of proxy variables and measurement
error. Least absolute deviations and, more generally, quantile regression are used more and more
in empirical work, and the fifth edition has sections that can be used as an introduction to
quantile regression. The advanced chapters, starting with Chapter 13, are particularly useful for
students with an interest in policy analysis. The pooled cross section and panel data chapters
(Chapters 13 and 14) emphasize how these data structures can be used, in conjunction with
econometric methods, for policy evaluation. Chapter 15, which introduces the method of
instrumental variables, is also important for policy analysis. Most modern IV applications are
used to address the problems of omitted variables (unobserved heterogeneity) or measurement
SUGGESTED COURSE OUTLINES
For an introductory, one-semester course, I like to cover most of the material in Chapters 1
through 8 and Chapters 10 through 12, as well as parts of Chapter 9 (but mostly through
examples). I do not typically cover all sections or subsections within each chapter. Under the
chapter headings listed below, I provide some comments on the material I find most relevant for
a first-semester course.
An alternative course ignores time series applications altogether, while delving into some of the
more advanced methods that are particularly useful for policy analysis. This would consist of
Chapters 1 through 8, much of Chapter 9, and the first four sections of Chapter 13. Chapter 9
discusses the practically important topics of proxy variables, measurement error, outlying
observations, and stratified sampling. In addition, I have written a more careful description of
the method of least absolute deviations, including a discussion of its strengths and weaknesses.
Chapter 13 covers, in a straightforward fashion, methods for pooled cross sections (including the
so-called “natural experiment” approach) and two-period panel data analysis. The basic cross-
sectional treatment of instrumental variables in Chapter 15 is a natural topic for cross-sectional,
policy-oriented courses. For an accelerated course, the nonlinear methods used for cross-
sectional analysis in Chapter 17 can be covered.
I typically do not begin with a review of basic algebra, probability, and statistics. In my
experience, this takes too long and the payoff is minimal. (Students tend to think that they are
taking another statistics course and start to drift away from the material.) Instead, when I need a
tool (such as the summation or expectations operator), I briefly review the necessary definitions
and key properties. Statistical inference is not more difficult to describe in the context of multiple
regression than in testing about mean a mean from a population, and so I briefly review the
principles of statistical inference during multiple regression analysis. Appendices A, B, and C are
fairly extensive. When I cover asymptotic properties of OLS, I provide a brief discussion of the
main definitions and limit theorems. If students need more than the brief review provided in
class, I point them to the appendices.
For a master’s level course, I include a couple of lectures on the matrix approach to linear
regression. This could be integrated into Chapters 3 and 4 or covered after Chapter 4. Again, I do
not summarize matrix algebra before proceeding. Instead, the material in Appendix D can be
reviewed as it is needed in covering Appendix E.
A second semester course, at either the undergraduate or master’s level, could begin with some
of the material in Chapter 9, particularly with the issues of proxy variables and measurement
error. Least absolute deviations and, more generally, quantile regression are used more and more
in empirical work, and the fifth edition has sections that can be used as an introduction to
quantile regression. The advanced chapters, starting with Chapter 13, are particularly useful for
students with an interest in policy analysis. The pooled cross section and panel data chapters
(Chapters 13 and 14) emphasize how these data structures can be used, in conjunction with
econometric methods, for policy evaluation. Chapter 15, which introduces the method of
instrumental variables, is also important for policy analysis. Most modern IV applications are
used to address the problems of omitted variables (unobserved heterogeneity) or measurement
Loading page 5...
v
error. I have intentionally separated out the conceptually more difficult topic of simultaneous
equations models in Chapter 16.
Chapter 17, in particular the material on probit, logit, Tobit, and Poisson regression models, is a
good introduction to nonlinear econometric methods. Specialized courses that emphasize
applications in labor economics can use the material on sample selection corrections. Duration
models are also briefly covered as an example of a censored regression model.
Chapter 18 is much different from the other advanced chapters, as it focuses on more advanced
or recent developments in time series econometrics. Combined with some of the more advanced
topics in Chapter 12, it can serve as the basis for a second semester course in time series topics,
including forecasting.
Most second semester courses would include an assignment to write an original empirical paper,
and Chapter 19 should be helpful in this regard.
error. I have intentionally separated out the conceptually more difficult topic of simultaneous
equations models in Chapter 16.
Chapter 17, in particular the material on probit, logit, Tobit, and Poisson regression models, is a
good introduction to nonlinear econometric methods. Specialized courses that emphasize
applications in labor economics can use the material on sample selection corrections. Duration
models are also briefly covered as an example of a censored regression model.
Chapter 18 is much different from the other advanced chapters, as it focuses on more advanced
or recent developments in time series econometrics. Combined with some of the more advanced
topics in Chapter 12, it can serve as the basis for a second semester course in time series topics,
including forecasting.
Most second semester courses would include an assignment to write an original empirical paper,
and Chapter 19 should be helpful in this regard.
Loading page 6...
1
CHAPTER 1
TEACHING NOTES
You have substantial latitude about what to emphasize in Chapter 1. I find it useful to talk about
the economics of crime example (Example 1.1) and the wage example (Example 1.2) so that
students see, at the outset, that econometrics is linked to economic reasoning, even if the
economics is not complicated theory.
I like to familiarize students with the important data structures that empirical economists use,
focusing primarily on cross-sectional and time series data sets, as these are what I cover in a
first-semester course. It is probably a good idea to mention the growing importance of data sets
that have both a cross-sectional and time dimension.
I spend almost an entire lecture talking about the problems inherent in drawing causal inferences
in the social sciences. I do this mostly through the agricultural yield, return to education, and
crime examples.These examples also contrast experimental and nonexperimental (observational)
data. Students studying business and finance tend to find the term structure of interest rates
example more relevant, although the issue there is testing the implication of a simple theory, as
opposed to inferring causality. I have found that spending time talking about these examples, in
place of a formal review of probability and statistics, is more successful in teaching the students
how econometrics can be used. (And, it is more enjoyable for the students and me.)
I do not use counterfactual notation as in the modern “treatment effects” literature, but I do
discuss causality using counterfactual reasoning. The return to education, perhaps focusing on
the return to getting a college degree, is a good example of how counterfactual reasoning is
easily incorporated into the discussion of causality.
CHAPTER 1
TEACHING NOTES
You have substantial latitude about what to emphasize in Chapter 1. I find it useful to talk about
the economics of crime example (Example 1.1) and the wage example (Example 1.2) so that
students see, at the outset, that econometrics is linked to economic reasoning, even if the
economics is not complicated theory.
I like to familiarize students with the important data structures that empirical economists use,
focusing primarily on cross-sectional and time series data sets, as these are what I cover in a
first-semester course. It is probably a good idea to mention the growing importance of data sets
that have both a cross-sectional and time dimension.
I spend almost an entire lecture talking about the problems inherent in drawing causal inferences
in the social sciences. I do this mostly through the agricultural yield, return to education, and
crime examples.These examples also contrast experimental and nonexperimental (observational)
data. Students studying business and finance tend to find the term structure of interest rates
example more relevant, although the issue there is testing the implication of a simple theory, as
opposed to inferring causality. I have found that spending time talking about these examples, in
place of a formal review of probability and statistics, is more successful in teaching the students
how econometrics can be used. (And, it is more enjoyable for the students and me.)
I do not use counterfactual notation as in the modern “treatment effects” literature, but I do
discuss causality using counterfactual reasoning. The return to education, perhaps focusing on
the return to getting a college degree, is a good example of how counterfactual reasoning is
easily incorporated into the discussion of causality.
Loading page 7...
2
SOLUTIONS TO PROBLEMS
1.1 (i) Ideally, we could randomly assign students to classes of different sizes. That is, each
student is assigned a different class size without regard to any student characteristics such as
ability and family background. For reasons we will see in Chapter 2, we would like substantial
variation in class sizes (subject, of course, to ethical considerations and resource constraints).
(ii) A negative correlation means that larger class size is associated with lower performance.
We might find a negative correlation because larger class size actually hurts performance.
However, with observational data, there are other reasons we might find a negative relationship.
For example, children from more affluent families might be more likely to attend schools with
smaller class sizes, and affluent children generally score better on standardized tests. Another
possibility is that, within a school, a principal might assign the better students to smaller classes.
Or, some parents might insist their children are in the smaller classes, and these same parents
tend to be more involved in their children’s education.
(iii) Given the potential for confounding factors – some of which are listed in (ii) – finding a
negative correlation would not be strong evidence that smaller class sizes actually lead to better
performance. Some way of controlling for the confounding factors is needed, and this is the
subject of multiple regression analysis.
1.2 (i) Here is one way to pose the question: If two firms, say A and B, are identical in all
respects except that firm A supplies job training one hour per worker more than firm B, by how
much would firm A’s output differ from firm B’s?
(ii) Firms are likely to choose job training depending on the characteristics of workers. Some
observed characteristics are years of schooling, years in the workforce, and experience in a
particular job. Firms might even discriminate based on age, gender, or race. Perhaps firms
choose to offer training to more or less able workers, where “ability” might be difficult to
quantify but where a manager has some idea about the relative abilities of different employees.
Moreover, different kinds of workers might be attracted to firms that offer more job training on
average, and this might not be evident to employers.
(iii) The amount of capital and technology available to workers would also affect output. So,
two firms with exactly the same kinds of employees would generally have different outputs if
they use different amounts of capital or technology. The quality of managers would also have an
effect.
(iv) No, unless the amount of training is randomly assigned. The many factors listed in parts
(ii) and (iii) can contribute to finding a positive correlation between output and training even if
job training does not improve worker productivity.
1.3 It does not make sense to pose the question in terms of causality. Economists would assume
that students choose a mix of studying and working (and other activities, such as attending class,
leisure, and sleeping) based on rational behavior, such as maximizing utility subject to the
constraint that there are only 168 hours in a week. We can then use statistical methods to
SOLUTIONS TO PROBLEMS
1.1 (i) Ideally, we could randomly assign students to classes of different sizes. That is, each
student is assigned a different class size without regard to any student characteristics such as
ability and family background. For reasons we will see in Chapter 2, we would like substantial
variation in class sizes (subject, of course, to ethical considerations and resource constraints).
(ii) A negative correlation means that larger class size is associated with lower performance.
We might find a negative correlation because larger class size actually hurts performance.
However, with observational data, there are other reasons we might find a negative relationship.
For example, children from more affluent families might be more likely to attend schools with
smaller class sizes, and affluent children generally score better on standardized tests. Another
possibility is that, within a school, a principal might assign the better students to smaller classes.
Or, some parents might insist their children are in the smaller classes, and these same parents
tend to be more involved in their children’s education.
(iii) Given the potential for confounding factors – some of which are listed in (ii) – finding a
negative correlation would not be strong evidence that smaller class sizes actually lead to better
performance. Some way of controlling for the confounding factors is needed, and this is the
subject of multiple regression analysis.
1.2 (i) Here is one way to pose the question: If two firms, say A and B, are identical in all
respects except that firm A supplies job training one hour per worker more than firm B, by how
much would firm A’s output differ from firm B’s?
(ii) Firms are likely to choose job training depending on the characteristics of workers. Some
observed characteristics are years of schooling, years in the workforce, and experience in a
particular job. Firms might even discriminate based on age, gender, or race. Perhaps firms
choose to offer training to more or less able workers, where “ability” might be difficult to
quantify but where a manager has some idea about the relative abilities of different employees.
Moreover, different kinds of workers might be attracted to firms that offer more job training on
average, and this might not be evident to employers.
(iii) The amount of capital and technology available to workers would also affect output. So,
two firms with exactly the same kinds of employees would generally have different outputs if
they use different amounts of capital or technology. The quality of managers would also have an
effect.
(iv) No, unless the amount of training is randomly assigned. The many factors listed in parts
(ii) and (iii) can contribute to finding a positive correlation between output and training even if
job training does not improve worker productivity.
1.3 It does not make sense to pose the question in terms of causality. Economists would assume
that students choose a mix of studying and working (and other activities, such as attending class,
leisure, and sleeping) based on rational behavior, such as maximizing utility subject to the
constraint that there are only 168 hours in a week. We can then use statistical methods to
Loading page 8...
3
measure the association between studying and working, including regression analysis, which we
cover starting in Chapter 2. But we would not be claiming that one variable “causes” the other.
They are both choice variables of the student.
SOLUTIONS TO COMPUTER EXERCISES
C1.1 (i) The average of educ is about 12.6 years. There are two people reporting zero years of
education, and 19 people reporting 18 years of education.
(ii) The average of wage is about $5.90, which seems low in the year 2008.
(iii) Using Table B-60 in the 2004 Economic Report of the President, the CPI was 56.9 in
1976 and 184.0 in 2003.
(iv) To convert 1976 dollars into 2003 dollars, we use the ratio of the CPIs, which is184/ 56.9 3.23
. Therefore, the average hourly wage in 2003 dollars is roughly3.23($5.90) $19.06
, which is a reasonable figure.
(v) The sample contains 252 women (the number of observations with female = 1) and 274
men.
C1.2 (i) There are 1,388 observations in the sample. Tabulating the variable cigs shows that 212
women have cigs > 0.
(ii) The average of cigs is about 2.09, but this includes the 1,176 women who did not
smoke. Reporting just the average masks the fact that almost 85 percent of the women did not
smoke. It makes more sense to say that the “typical” woman does not smoke during pregnancy;
indeed, the median number of cigarettes smoked is zero.
(iii) The average of cigs over the women with cigs > 0 is about 13.7. Of course this is
much higher than the average over the entire sample because we are excluding 1,176 zeros.
(iv) The average of fatheduc is about 13.2. There are 196 observations with a missing
value for fatheduc, and those observations are necessarily excluded in computing the average.
(v) The average and standard deviation of faminc are about 29.027 and 18.739,
respectively, but faminc is measured in thousands of dollars. So, in dollars, the average and
standard deviation are $29,027 and $18,739.
C1.3 (i) The largest is 100, the smallest is 0.
(ii) 38 out of 1,823, or about 2.1 percent of the sample.
(iii) 17
measure the association between studying and working, including regression analysis, which we
cover starting in Chapter 2. But we would not be claiming that one variable “causes” the other.
They are both choice variables of the student.
SOLUTIONS TO COMPUTER EXERCISES
C1.1 (i) The average of educ is about 12.6 years. There are two people reporting zero years of
education, and 19 people reporting 18 years of education.
(ii) The average of wage is about $5.90, which seems low in the year 2008.
(iii) Using Table B-60 in the 2004 Economic Report of the President, the CPI was 56.9 in
1976 and 184.0 in 2003.
(iv) To convert 1976 dollars into 2003 dollars, we use the ratio of the CPIs, which is184/ 56.9 3.23
. Therefore, the average hourly wage in 2003 dollars is roughly3.23($5.90) $19.06
, which is a reasonable figure.
(v) The sample contains 252 women (the number of observations with female = 1) and 274
men.
C1.2 (i) There are 1,388 observations in the sample. Tabulating the variable cigs shows that 212
women have cigs > 0.
(ii) The average of cigs is about 2.09, but this includes the 1,176 women who did not
smoke. Reporting just the average masks the fact that almost 85 percent of the women did not
smoke. It makes more sense to say that the “typical” woman does not smoke during pregnancy;
indeed, the median number of cigarettes smoked is zero.
(iii) The average of cigs over the women with cigs > 0 is about 13.7. Of course this is
much higher than the average over the entire sample because we are excluding 1,176 zeros.
(iv) The average of fatheduc is about 13.2. There are 196 observations with a missing
value for fatheduc, and those observations are necessarily excluded in computing the average.
(v) The average and standard deviation of faminc are about 29.027 and 18.739,
respectively, but faminc is measured in thousands of dollars. So, in dollars, the average and
standard deviation are $29,027 and $18,739.
C1.3 (i) The largest is 100, the smallest is 0.
(ii) 38 out of 1,823, or about 2.1 percent of the sample.
(iii) 17
Loading page 9...
4
(iv) The average of math4 is about 71.9 and the average of read4 is about 60.1. So, at least
in 2001, the reading test was harder to pass.
(v) The sample correlation between math4 and read4 is about .843, which is a very high
degree of (linear) association. Not surprisingly, schools that have high pass rates on one test
have a strong tendency to have high pass rates on the other test.
(vi) The average of exppp is about $5,194.87. The standard deviation is $1,091.89, which
shows rather wide variation in spending per pupil. [The minimum is $1,206.88 and the
maximum is $11,957.64.]
(vii) The percentage by which school A outspends school B is
100 ∙ (6,000 − 5,500)
5,500 ≈ 9.09%
When we use the approximation based on the difference of the natural logs we get a somewhat
smaller number:
100 ∙ [log(6,000) − log(5,500)] ≈ 8.71%
C1.4 (i) 185/445 .416 is the fraction of men receiving job training, or about 41.6%.
(ii) For men receiving job training, the average of re78 is about 6.35, or $6,350. For men not
receiving job training, the average of re78 is about 4.55, or $4,550. The difference is $1,800,
which is very large. On average, the men receiving the job training had earnings about 40%
higher than those not receiving training.
(iii) About 24.3% of the men who received training were unemployed in 1978; the figure is
35.4% for men not receiving training. This, too, is a big difference.
(iv) The differences in earnings and unemployment rates suggest the training program had
strong, positive effects. Our conclusions about economic significance would be stronger if we
could also establish statistical significance (which is done in Computer Exercise C9.10 in
Chapter 9).
C1.5 (i) The smallest and largest values of children are 0 and 13, respectively. The average is
about 2.27.
(ii) Out of 4,358 women, only 611 have electricity in the home, or about 14.02 percent.
(iii) The average of children for women without electricity is about 2.33, and for those with
electricity it is about 1.90. So, on average, women with electricity have .43 fewer children than
those who do not.
(iv) We cannot infer causality here. There are many confounding factors that may be related
to the number of children and the presence of electricity in the home; household income and
level of education are two possibilities. For example, it could be that women with more
(iv) The average of math4 is about 71.9 and the average of read4 is about 60.1. So, at least
in 2001, the reading test was harder to pass.
(v) The sample correlation between math4 and read4 is about .843, which is a very high
degree of (linear) association. Not surprisingly, schools that have high pass rates on one test
have a strong tendency to have high pass rates on the other test.
(vi) The average of exppp is about $5,194.87. The standard deviation is $1,091.89, which
shows rather wide variation in spending per pupil. [The minimum is $1,206.88 and the
maximum is $11,957.64.]
(vii) The percentage by which school A outspends school B is
100 ∙ (6,000 − 5,500)
5,500 ≈ 9.09%
When we use the approximation based on the difference of the natural logs we get a somewhat
smaller number:
100 ∙ [log(6,000) − log(5,500)] ≈ 8.71%
C1.4 (i) 185/445 .416 is the fraction of men receiving job training, or about 41.6%.
(ii) For men receiving job training, the average of re78 is about 6.35, or $6,350. For men not
receiving job training, the average of re78 is about 4.55, or $4,550. The difference is $1,800,
which is very large. On average, the men receiving the job training had earnings about 40%
higher than those not receiving training.
(iii) About 24.3% of the men who received training were unemployed in 1978; the figure is
35.4% for men not receiving training. This, too, is a big difference.
(iv) The differences in earnings and unemployment rates suggest the training program had
strong, positive effects. Our conclusions about economic significance would be stronger if we
could also establish statistical significance (which is done in Computer Exercise C9.10 in
Chapter 9).
C1.5 (i) The smallest and largest values of children are 0 and 13, respectively. The average is
about 2.27.
(ii) Out of 4,358 women, only 611 have electricity in the home, or about 14.02 percent.
(iii) The average of children for women without electricity is about 2.33, and for those with
electricity it is about 1.90. So, on average, women with electricity have .43 fewer children than
those who do not.
(iv) We cannot infer causality here. There are many confounding factors that may be related
to the number of children and the presence of electricity in the home; household income and
level of education are two possibilities. For example, it could be that women with more
Loading page 10...
5
education have fewer children and are more likely to have electricity in the home (the latter due
to an income effect).
education have fewer children and are more likely to have electricity in the home (the latter due
to an income effect).
Loading page 11...
6
CHAPTER 2
TEACHING NOTES
This is the chapter where I expect students to follow most, if not all, of the algebraic derivations.
In class I like to derive at least the unbiasedness of the OLS slope coefficient, and usually I
derive the variance. At a minimum, I talk about the factors affecting the variance. To simplify
the notation, after I emphasize the assumptions in the population model, and assume random
sampling, I just condition on the values of the explanatory variables in the sample. Technically,
this is justified by random sampling because, for example, E(ui|x1,x2,…,xn) = E(ui|xi) by
independent sampling. I find that students are able to focus on the key assumption SLR.4 and
subsequently take my word about how conditioning on the independent variables in the sample is
harmless. (If you prefer, the appendix to Chapter 3 does the conditioning argument carefully.)
Because statistical inference is no more difficult in multiple regression than in simple regression,
I postpone inference until Chapter 4. (This reduces redundancy and allows you to focus on the
interpretive differences between simple and multiple regression.)
You might notice how, compared with most other texts, I use relatively few assumptions to
derive the unbiasedness of the OLS slope estimator, followed by the formula for its variance.
This is because I do not introduce redundant or unnecessary assumptions. For example, once
SLR.4 is assumed, nothing further about the relationship between u and x is needed to obtain the
unbiasedness of OLS under random sampling.
Incidentally, one of the uncomfortable facts about finite-sample analysis is that there is a
difference between an estimator that is unbiased conditional on the outcome of the covariates and
one that is unconditionally unbiased. If the distribution of the 𝑥𝑖 is such that they can all equal
the same value with positive probability – as is the case with discreteness in the distribution –
then the unconditional expectation does not really exist. Or, if it is made to exist then the
estimator is not unbiased. I do not try to explain these subtleties in an introductory course, but I
have had instructors ask me about the difference.
CHAPTER 2
TEACHING NOTES
This is the chapter where I expect students to follow most, if not all, of the algebraic derivations.
In class I like to derive at least the unbiasedness of the OLS slope coefficient, and usually I
derive the variance. At a minimum, I talk about the factors affecting the variance. To simplify
the notation, after I emphasize the assumptions in the population model, and assume random
sampling, I just condition on the values of the explanatory variables in the sample. Technically,
this is justified by random sampling because, for example, E(ui|x1,x2,…,xn) = E(ui|xi) by
independent sampling. I find that students are able to focus on the key assumption SLR.4 and
subsequently take my word about how conditioning on the independent variables in the sample is
harmless. (If you prefer, the appendix to Chapter 3 does the conditioning argument carefully.)
Because statistical inference is no more difficult in multiple regression than in simple regression,
I postpone inference until Chapter 4. (This reduces redundancy and allows you to focus on the
interpretive differences between simple and multiple regression.)
You might notice how, compared with most other texts, I use relatively few assumptions to
derive the unbiasedness of the OLS slope estimator, followed by the formula for its variance.
This is because I do not introduce redundant or unnecessary assumptions. For example, once
SLR.4 is assumed, nothing further about the relationship between u and x is needed to obtain the
unbiasedness of OLS under random sampling.
Incidentally, one of the uncomfortable facts about finite-sample analysis is that there is a
difference between an estimator that is unbiased conditional on the outcome of the covariates and
one that is unconditionally unbiased. If the distribution of the 𝑥𝑖 is such that they can all equal
the same value with positive probability – as is the case with discreteness in the distribution –
then the unconditional expectation does not really exist. Or, if it is made to exist then the
estimator is not unbiased. I do not try to explain these subtleties in an introductory course, but I
have had instructors ask me about the difference.
Loading page 12...
7
SOLUTIONS TO PROBLEMS
2.1 (i) Income, age, and family background (such as number of siblings) are just a few
possibilities. It seems that each of these could be correlated with years of education. (Income
and education are probably positively correlated; age and education may be negatively correlated
because women in more recent cohorts have, on average, more education; and number of siblings
and education are probably negatively correlated.)
(ii) Not if the factors we listed in part (i) are correlated with educ. Because we would like to
hold these factors fixed, they are part of the error term. But if u is correlated with educ then
E(u|educ) 0, and so SLR.4 fails.
2.2 In the equation y =
0 +
1x + u, add and subtract
0 from the right hand side to get y = (
0 +
0) +
1x + (u −
0). Call the new error e = u −
0, so that E(e) = 0. The new intercept is
0 +
0, but the slope is still
1.
2.3 (i) Let yi = GPAi, xi = ACTi, and n = 8. Thenx = 25.875,y = 3.2125,1
n
i=
(xi –x )(yi –y ) =
5.8125, and1
n
i=
(xi –x )2 = 56.875. From equation (2.9), we obtain the slope as1
ˆ
=
5.8125/56.875 .1022, rounded to four places after the decimal. From (2.17),0
ˆ
=y –1
ˆ
x
3.2125 – (.1022)25.875 .5681. So we can writeGPA
= .5681 + .1022 ACT
n = 8.
The intercept does not have a useful interpretation because ACT is not close to zero for the
population of interest. If ACT is 5 points higher,GPA increases by .1022(5) = .511.
(ii) The fitted values and residuals — rounded to four decimal places — are given along with
the observation number i and GPA in the following table:
SOLUTIONS TO PROBLEMS
2.1 (i) Income, age, and family background (such as number of siblings) are just a few
possibilities. It seems that each of these could be correlated with years of education. (Income
and education are probably positively correlated; age and education may be negatively correlated
because women in more recent cohorts have, on average, more education; and number of siblings
and education are probably negatively correlated.)
(ii) Not if the factors we listed in part (i) are correlated with educ. Because we would like to
hold these factors fixed, they are part of the error term. But if u is correlated with educ then
E(u|educ) 0, and so SLR.4 fails.
2.2 In the equation y =
0 +
1x + u, add and subtract
0 from the right hand side to get y = (
0 +
0) +
1x + (u −
0). Call the new error e = u −
0, so that E(e) = 0. The new intercept is
0 +
0, but the slope is still
1.
2.3 (i) Let yi = GPAi, xi = ACTi, and n = 8. Thenx = 25.875,y = 3.2125,1
n
i=
(xi –x )(yi –y ) =
5.8125, and1
n
i=
(xi –x )2 = 56.875. From equation (2.9), we obtain the slope as1
ˆ
=
5.8125/56.875 .1022, rounded to four places after the decimal. From (2.17),0
ˆ
=y –1
ˆ
x
3.2125 – (.1022)25.875 .5681. So we can writeGPA
= .5681 + .1022 ACT
n = 8.
The intercept does not have a useful interpretation because ACT is not close to zero for the
population of interest. If ACT is 5 points higher,GPA increases by .1022(5) = .511.
(ii) The fitted values and residuals — rounded to four decimal places — are given along with
the observation number i and GPA in the following table:
Loading page 13...
8
i GPAGPAˆu
1 2.8 2.7143 .0857
2 3.4 3.0209 .3791
3 3.0 3.2253 –.2253
4 3.5 3.3275 .1725
5 3.6 3.5319 .0681
6 3.0 3.1231 –.1231
7 2.7 3.1231 –.4231
8 3.7 3.6341 .0659
You can verify that the residuals, as reported in the table, sum to −.0002, which is pretty close to
zero given the inherent rounding error.
(iii) When ACT = 20,GPA = .5681 + .1022(20) 2.61.
(iv) The sum of squared residuals,2
1
ˆ
n
i
i
u
=
, is about .4347 (rounded to four decimal places),
and the total sum of squares,1
n
i=
(yi –y )2, is about 1.0288. So the R-squared from the regression
is
R2 = 1 – SSR/SST 1 – (.4347/1.0288) .577.
Therefore, about 57.7% of the variation in GPA is explained by ACT in this small sample of
students.
2.4 (i) When cigs = 0, predicted birth weight is 119.77 ounces. When cigs = 20,bwght = 109.49.
This is about an 8.6% drop.
(ii) Not necessarily. There are many other factors that can affect birth weight, particularly
overall health of the mother and quality of prenatal care. These could be correlated with
cigarette smoking during birth. Also, something such as caffeine consumption can affect birth
weight, and might also be correlated with cigarette smoking.
(iii) If we want a predicted bwght of 125, then cigs = (125 – 119.77)/( –.524) –10.18, or
about –10 cigarettes! This is nonsense, of course, and it shows what happens when we are trying
to predict something as complicated as birth weight with only a single explanatory variable. The
largest predicted birth weight is necessarily 119.77. Yet almost 700 of the births in the sample
had a birth weight higher than 119.77.
i GPAGPAˆu
1 2.8 2.7143 .0857
2 3.4 3.0209 .3791
3 3.0 3.2253 –.2253
4 3.5 3.3275 .1725
5 3.6 3.5319 .0681
6 3.0 3.1231 –.1231
7 2.7 3.1231 –.4231
8 3.7 3.6341 .0659
You can verify that the residuals, as reported in the table, sum to −.0002, which is pretty close to
zero given the inherent rounding error.
(iii) When ACT = 20,GPA = .5681 + .1022(20) 2.61.
(iv) The sum of squared residuals,2
1
ˆ
n
i
i
u
=
, is about .4347 (rounded to four decimal places),
and the total sum of squares,1
n
i=
(yi –y )2, is about 1.0288. So the R-squared from the regression
is
R2 = 1 – SSR/SST 1 – (.4347/1.0288) .577.
Therefore, about 57.7% of the variation in GPA is explained by ACT in this small sample of
students.
2.4 (i) When cigs = 0, predicted birth weight is 119.77 ounces. When cigs = 20,bwght = 109.49.
This is about an 8.6% drop.
(ii) Not necessarily. There are many other factors that can affect birth weight, particularly
overall health of the mother and quality of prenatal care. These could be correlated with
cigarette smoking during birth. Also, something such as caffeine consumption can affect birth
weight, and might also be correlated with cigarette smoking.
(iii) If we want a predicted bwght of 125, then cigs = (125 – 119.77)/( –.524) –10.18, or
about –10 cigarettes! This is nonsense, of course, and it shows what happens when we are trying
to predict something as complicated as birth weight with only a single explanatory variable. The
largest predicted birth weight is necessarily 119.77. Yet almost 700 of the births in the sample
had a birth weight higher than 119.77.
Loading page 14...
9
(iv) 1,176 out of 1,388 women did not smoke while pregnant, or about 84.7%. Because we
are using only cigs to explain birth weight, we have only one predicted birth weight at cigs = 0.
The predicted birth weight is necessarily roughly in the middle of the observed birth weights at
cigs = 0, and so we will under predict high birth rates.
2.5 (i) The intercept implies that when inc = 0, cons is predicted to be negative $124.84. This, of
course, cannot be true, and reflects that fact that this consumption function might be a poor
predictor of consumption at very low-income levels. On the other hand, on an annual basis,
$124.84 is not so far from zero.
(ii) Just plug 30,000 into the equation:cons = –124.84 + .853(30,000) = 25,465.16 dollars.
(iii) The MPC and the APC are shown in the following graph. Even though the intercept is
negative, the smallest APC in the sample is positive. The graph starts at an annual income level
of $1,000 (in 1970 dollars).
2.6 (i) Yes. If living closer to an incinerator depresses housing prices, then being farther away
increases housing prices.
(ii) If the city chose to locate the incinerator in an area away from more expensive
neighborhoods, then log(dist) is positively correlated with housing quality. This would violate
SLR.4, and OLS estimation is biased.inc
1000 10000 20000 30000
.7
.728
.853
APC
MPC .9
APC
MPC
(iv) 1,176 out of 1,388 women did not smoke while pregnant, or about 84.7%. Because we
are using only cigs to explain birth weight, we have only one predicted birth weight at cigs = 0.
The predicted birth weight is necessarily roughly in the middle of the observed birth weights at
cigs = 0, and so we will under predict high birth rates.
2.5 (i) The intercept implies that when inc = 0, cons is predicted to be negative $124.84. This, of
course, cannot be true, and reflects that fact that this consumption function might be a poor
predictor of consumption at very low-income levels. On the other hand, on an annual basis,
$124.84 is not so far from zero.
(ii) Just plug 30,000 into the equation:cons = –124.84 + .853(30,000) = 25,465.16 dollars.
(iii) The MPC and the APC are shown in the following graph. Even though the intercept is
negative, the smallest APC in the sample is positive. The graph starts at an annual income level
of $1,000 (in 1970 dollars).
2.6 (i) Yes. If living closer to an incinerator depresses housing prices, then being farther away
increases housing prices.
(ii) If the city chose to locate the incinerator in an area away from more expensive
neighborhoods, then log(dist) is positively correlated with housing quality. This would violate
SLR.4, and OLS estimation is biased.inc
1000 10000 20000 30000
.7
.728
.853
APC
MPC .9
APC
MPC
Loading page 15...
10
(iii) Size of the house, number of bathrooms, size of the lot, age of the home, and quality of
the neighborhood (including school quality), are just a handful of factors. As mentioned in part
(ii), these could certainly be correlated with dist [and log(dist)].
2.7 (i) When we condition on inc in computing an expectation,inc becomes a constant. So
E(u|inc) = E(inc e|inc) =inc E(e|inc) =inc 0 because E(e|inc) = E(e) = 0.
(ii) Again, when we condition on inc in computing a variance,inc becomes a constant. So
Var(u|inc) = Var(inc e|inc) = (inc )2Var(e|inc) =2
e
inc because Var(e|inc) =2
e
.
(iii) Families with low incomes do not have much discretion about spending; typically, a
low-income family must spend on food, clothing, housing, and other necessities. Higher income
people have more discretion, and some might choose more consumption while others more
saving. This discretion suggests wider variability in saving among higher income families.
2.8 (i) From equation (2.66),1
=1
n
i i
i
x y
=
/2
1
n
i
i
x
=
.
Plugging in yi =
0 +
1xi + ui gives1
=0 1
1
( )
n
i i i
i
x x u
=
+ +
/2
1
n
i
i
x
=
.
After standard algebra, the numerator can be written as2
0 1
1 1 1
i
n n n
i i i
i i i
x x x u
= = =
+ +
.
Putting this over the denominator shows we can write1
as1
=
01
n
i
i
x
=
/2
1
n
i
i
x
=
+
1 +1
n
i i
i
x u
=
/2
1
n
i
i
x
=
.
Conditional on the xi, we have
E(1
) =
01
n
i
i
x
=
/2
1
n
i
i
x
=
+
1
(iii) Size of the house, number of bathrooms, size of the lot, age of the home, and quality of
the neighborhood (including school quality), are just a handful of factors. As mentioned in part
(ii), these could certainly be correlated with dist [and log(dist)].
2.7 (i) When we condition on inc in computing an expectation,inc becomes a constant. So
E(u|inc) = E(inc e|inc) =inc E(e|inc) =inc 0 because E(e|inc) = E(e) = 0.
(ii) Again, when we condition on inc in computing a variance,inc becomes a constant. So
Var(u|inc) = Var(inc e|inc) = (inc )2Var(e|inc) =2
e
inc because Var(e|inc) =2
e
.
(iii) Families with low incomes do not have much discretion about spending; typically, a
low-income family must spend on food, clothing, housing, and other necessities. Higher income
people have more discretion, and some might choose more consumption while others more
saving. This discretion suggests wider variability in saving among higher income families.
2.8 (i) From equation (2.66),1
=1
n
i i
i
x y
=
/2
1
n
i
i
x
=
.
Plugging in yi =
0 +
1xi + ui gives1
=0 1
1
( )
n
i i i
i
x x u
=
+ +
/2
1
n
i
i
x
=
.
After standard algebra, the numerator can be written as2
0 1
1 1 1
i
n n n
i i i
i i i
x x x u
= = =
+ +
.
Putting this over the denominator shows we can write1
as1
=
01
n
i
i
x
=
/2
1
n
i
i
x
=
+
1 +1
n
i i
i
x u
=
/2
1
n
i
i
x
=
.
Conditional on the xi, we have
E(1
) =
01
n
i
i
x
=
/2
1
n
i
i
x
=
+
1
Loading page 16...
11
because E(ui) = 0 for all i. Therefore, the bias in1
is given by the first term in this equation.
This bias is obviously zero when
0 = 0. It is also zero when1
n
i
i
x
=
= 0, which is the same asx
= 0. In the latter case, regression through the origin is identical to regression with an intercept.
(ii) From the last expression for1
in part (i) we have, conditional on the xi,
Var(1
) =2
2
1
n
i
i
x
−
=
Var1
n
i i
i
x u
=
=2
2
1
n
i
i
x
−
=
2
1
Var( )
n
i i
i
x u
=
=2
2
1
n
i
i
x
−
=
2 2
1
n
i
i
x
=
=2
/2
1
n
i
i
x
=
.
(iii) From (2.57), Var(1
ˆ
) =
2/2
1
( )
n
i
i
x x
=
−
. From the hint,2
1
n
i
i
x
=
2
1
( )
n
i
i
x x
=
− , and so
Var(1
) Var(1
ˆ
). A more direct way to see this is to write2
1
( )
n
i
i
x x
=
− =2 2
1
( )
n
i
i
x n x
=
− , which
is less than2
1
n
i
i
x
=
unlessx = 0.
(iv) For a given sample size, the bias in1
increases asx increases (holding the sum of the2
ix
fixed). But asx increases, the variance of1
ˆ
increases relative to Var(1
). The bias in1
is also small when0
is small. Therefore, whether we prefer1
or1
ˆ
on a mean squared error
basis depends on the sizes of0
,x , and n (in addition to the size of2
1
n
i
i
x
=
).
2.9 (i) We follow the hint, noting that1c y =1c y (the sample average of1 ic y is c1 times the
sample average of yi) and2c x =2c x . When we regress c1yi on c2xi (including an intercept) we
use equation (2.19) to obtain the slope:2 2 1 1 1 2
1 1
1
2 2 2
2 2 2
1 1
11 1
1
22 2
1
( )( ) ( )( )
( ) ( )
( )( )
ˆ .
( )
n n
i i i i
i i
n n
i i
i i
n
i i
i
n
i
i
c x c x c y c y c c x x y y
c x c x c x x
x x y y
c c
c c
x x
= =
= =
=
=
− − − −
= =
− −
− −
= =
−
because E(ui) = 0 for all i. Therefore, the bias in1
is given by the first term in this equation.
This bias is obviously zero when
0 = 0. It is also zero when1
n
i
i
x
=
= 0, which is the same asx
= 0. In the latter case, regression through the origin is identical to regression with an intercept.
(ii) From the last expression for1
in part (i) we have, conditional on the xi,
Var(1
) =2
2
1
n
i
i
x
−
=
Var1
n
i i
i
x u
=
=2
2
1
n
i
i
x
−
=
2
1
Var( )
n
i i
i
x u
=
=2
2
1
n
i
i
x
−
=
2 2
1
n
i
i
x
=
=2
/2
1
n
i
i
x
=
.
(iii) From (2.57), Var(1
ˆ
) =
2/2
1
( )
n
i
i
x x
=
−
. From the hint,2
1
n
i
i
x
=
2
1
( )
n
i
i
x x
=
− , and so
Var(1
) Var(1
ˆ
). A more direct way to see this is to write2
1
( )
n
i
i
x x
=
− =2 2
1
( )
n
i
i
x n x
=
− , which
is less than2
1
n
i
i
x
=
unlessx = 0.
(iv) For a given sample size, the bias in1
increases asx increases (holding the sum of the2
ix
fixed). But asx increases, the variance of1
ˆ
increases relative to Var(1
). The bias in1
is also small when0
is small. Therefore, whether we prefer1
or1
ˆ
on a mean squared error
basis depends on the sizes of0
,x , and n (in addition to the size of2
1
n
i
i
x
=
).
2.9 (i) We follow the hint, noting that1c y =1c y (the sample average of1 ic y is c1 times the
sample average of yi) and2c x =2c x . When we regress c1yi on c2xi (including an intercept) we
use equation (2.19) to obtain the slope:2 2 1 1 1 2
1 1
1
2 2 2
2 2 2
1 1
11 1
1
22 2
1
( )( ) ( )( )
( ) ( )
( )( )
ˆ .
( )
n n
i i i i
i i
n n
i i
i i
n
i i
i
n
i
i
c x c x c y c y c c x x y y
c x c x c x x
x x y y
c c
c c
x x
= =
= =
=
=
− − − −
= =
− −
− −
= =
−
Loading page 17...
12
From (2.17), we obtain the intercept as0
= (c1y ) –1
(c2x ) = (c1y ) – [(c1/c2)1
ˆ
](c2x ) = c1(y
–1
ˆ
x ) = c10
ˆ
) because the intercept from regressing yi on xi is (y –1
ˆ
x ).
(ii) We use the same approach from part (i) along with the fact that1( )c y+ = c1 +y and2( )c x+
= c2 +x . Therefore,1 1( ) ( )ic y c y+ − + = (c1 + yi) – (c1 +y ) = yi –y and (c2 + xi) –2( )c x+
= xi –x . So c1 and c2 entirely drop out of the slope formula for the regression of (c1 +
yi) on (c2 + xi), and1
=1
ˆ
. The intercept is0
=1( )c y+ –1
2( )c x+ = (c1 +y ) –1
ˆ
(c2 +x
) = (1
ˆy x
− ) + c1 – c21
ˆ
=0
ˆ
+ c1 – c21
ˆ
, which is what we wanted to show.
(iii) We can simply apply part (ii) because1 1log( ) log( ) log( )i ic y c y= + . In other words,
replace c1 with log(c1), yi with log(yi), and set c2 = 0.
(iv) Again, we can apply part (ii) with c1 = 0 and replacing c2 with log(c2) and xi with log(xi).
If0 1
ˆ ˆand
are the original intercept and slope, then1 1
ˆ
= and0 0 2 1
ˆ ˆlog( )c
= − .
2.10 (i) This derivation is essentially done in equation (2.52), once(1/ SST )x is brought inside
the summation (which is valid becauseSSTx does not depend on i). Then, just define/ SSTi i xw d=
.
(ii) Because1 1 1
ˆ ˆCov( , ) E[( ) ] ,u u
= − we show that the latter is zero. But, from part (i),( )
1 1 1 1
ˆE[( ) ] =E E( ).
n n
i i i ii i
u w u u w u u
= =
− =
Because theiu are pairwise uncorrelated
(they are independent),2 2
E( ) E( / ) /i iu u u n n
= = (becauseE( ) 0,i hu u i h= ). Therefore,
(iii) The formula for the OLS intercept is0
ˆ ˆy x
= − and, plugging in0 1y x u
= + +
gives0 0 1 1 0 1 1
ˆ ˆ ˆ( ) ( ) .x u x u x
= + + − = + − −
(iv) Because1
ˆ and u
are uncorrelated,2 2 2 2 2 2 2
0 1
ˆ ˆVar( ) Var( ) Var( ) / ( / SST ) / / SSTx xu x n x n x
= + = + = +
,
which is what we wanted to show.
(v) Using the hint and substitution gives( )
2 2
0
ˆVar( ) [ SST / ]/ SSTx xn x
= +( ) ( )
2 1 2 2 2 2 1 2
1 1
/ SST / SST .
n n
i x i xi i
n x x x n x
− −
= =
= − + = 2 2
1 1 1
E( ) ( / ) ( / ) 0.
n n n
i i i ii i i
w u u w n n w
= = =
= = =
From (2.17), we obtain the intercept as0
= (c1y ) –1
(c2x ) = (c1y ) – [(c1/c2)1
ˆ
](c2x ) = c1(y
–1
ˆ
x ) = c10
ˆ
) because the intercept from regressing yi on xi is (y –1
ˆ
x ).
(ii) We use the same approach from part (i) along with the fact that1( )c y+ = c1 +y and2( )c x+
= c2 +x . Therefore,1 1( ) ( )ic y c y+ − + = (c1 + yi) – (c1 +y ) = yi –y and (c2 + xi) –2( )c x+
= xi –x . So c1 and c2 entirely drop out of the slope formula for the regression of (c1 +
yi) on (c2 + xi), and1
=1
ˆ
. The intercept is0
=1( )c y+ –1
2( )c x+ = (c1 +y ) –1
ˆ
(c2 +x
) = (1
ˆy x
− ) + c1 – c21
ˆ
=0
ˆ
+ c1 – c21
ˆ
, which is what we wanted to show.
(iii) We can simply apply part (ii) because1 1log( ) log( ) log( )i ic y c y= + . In other words,
replace c1 with log(c1), yi with log(yi), and set c2 = 0.
(iv) Again, we can apply part (ii) with c1 = 0 and replacing c2 with log(c2) and xi with log(xi).
If0 1
ˆ ˆand
are the original intercept and slope, then1 1
ˆ
= and0 0 2 1
ˆ ˆlog( )c
= − .
2.10 (i) This derivation is essentially done in equation (2.52), once(1/ SST )x is brought inside
the summation (which is valid becauseSSTx does not depend on i). Then, just define/ SSTi i xw d=
.
(ii) Because1 1 1
ˆ ˆCov( , ) E[( ) ] ,u u
= − we show that the latter is zero. But, from part (i),( )
1 1 1 1
ˆE[( ) ] =E E( ).
n n
i i i ii i
u w u u w u u
= =
− =
Because theiu are pairwise uncorrelated
(they are independent),2 2
E( ) E( / ) /i iu u u n n
= = (becauseE( ) 0,i hu u i h= ). Therefore,
(iii) The formula for the OLS intercept is0
ˆ ˆy x
= − and, plugging in0 1y x u
= + +
gives0 0 1 1 0 1 1
ˆ ˆ ˆ( ) ( ) .x u x u x
= + + − = + − −
(iv) Because1
ˆ and u
are uncorrelated,2 2 2 2 2 2 2
0 1
ˆ ˆVar( ) Var( ) Var( ) / ( / SST ) / / SSTx xu x n x n x
= + = + = +
,
which is what we wanted to show.
(v) Using the hint and substitution gives( )
2 2
0
ˆVar( ) [ SST / ]/ SSTx xn x
= +( ) ( )
2 1 2 2 2 2 1 2
1 1
/ SST / SST .
n n
i x i xi i
n x x x n x
− −
= =
= − + = 2 2
1 1 1
E( ) ( / ) ( / ) 0.
n n n
i i i ii i i
w u u w n n w
= = =
= = =
Loading page 18...
13
2.11 (i) We would want to randomly assign the number of hours in the preparation course so that
hours is independent of other factors that affect performance on the SAT. Then, we would
collect information on SAT score for each student in the experiment, yielding a data set{( , ) : 1,..., }i isat hours i n=
, where n is the number of students we can afford to have in the study.
From equation (2.7), we should try to get as much variation inihours as is feasible.
(ii) Here are three factors: innate ability, family income, and general health on the day of the
exam. If we think students with higher native intelligence think they do not need to prepare for
the SAT, then ability and hours will be negatively correlated. Family income would probably be
positively correlated with hours, because higher income families can more easily afford
preparation courses. Ruling out chronic health problems, health on the day of the exam should
be roughly uncorrelated with hours spent in a preparation course.
(iii) If preparation courses are effective,1
should be positive: other factors equal, an
increase in hours should increase sat.
(iv) The intercept,0
, has a useful interpretation in this example: because E(u) = 0,0
is the
average SAT score for students in the population with hours = 0.
2.12 (i) I will show the result without using calculus. Let 𝑦̅ be the sample average of the 𝑦𝑖 and
write2 2
0 0
1 1
2 2
0 0
1 1 1
2 2
0 0
1 1
2 2
0
1
( ) [( ) ( )]
( ) 2 ( )( ) ( )
( ) 2( ) ( ) ( )
( ) ( )
n n
i i
i i
n n n
i i
i i i
n n
i i
i i
n
i
i
y b y y y b
y y y y y b y b
y y y b y y n y b
y y n y b
= =
= = =
= =
=
− = − + −
= − + − − + −
= − + − − + −
= − + −
where we use the fact (see Appendix A) that1
( ) 0
n
i
i
y y
=
− = always. The first term does not
depend on0b and the second term,2
0( )n y b− , which is nonnegative, is clearly minimized when0b y=
.
(ii) If we definei iu y y= − then1 1
( )
n n
i i
i i
u y y
= =
= − and we already used the fact that this sum
is zero in the proof in part (i).
SOLUTIONS TO COMPUTER EXERCISES
2.11 (i) We would want to randomly assign the number of hours in the preparation course so that
hours is independent of other factors that affect performance on the SAT. Then, we would
collect information on SAT score for each student in the experiment, yielding a data set{( , ) : 1,..., }i isat hours i n=
, where n is the number of students we can afford to have in the study.
From equation (2.7), we should try to get as much variation inihours as is feasible.
(ii) Here are three factors: innate ability, family income, and general health on the day of the
exam. If we think students with higher native intelligence think they do not need to prepare for
the SAT, then ability and hours will be negatively correlated. Family income would probably be
positively correlated with hours, because higher income families can more easily afford
preparation courses. Ruling out chronic health problems, health on the day of the exam should
be roughly uncorrelated with hours spent in a preparation course.
(iii) If preparation courses are effective,1
should be positive: other factors equal, an
increase in hours should increase sat.
(iv) The intercept,0
, has a useful interpretation in this example: because E(u) = 0,0
is the
average SAT score for students in the population with hours = 0.
2.12 (i) I will show the result without using calculus. Let 𝑦̅ be the sample average of the 𝑦𝑖 and
write2 2
0 0
1 1
2 2
0 0
1 1 1
2 2
0 0
1 1
2 2
0
1
( ) [( ) ( )]
( ) 2 ( )( ) ( )
( ) 2( ) ( ) ( )
( ) ( )
n n
i i
i i
n n n
i i
i i i
n n
i i
i i
n
i
i
y b y y y b
y y y y y b y b
y y y b y y n y b
y y n y b
= =
= = =
= =
=
− = − + −
= − + − − + −
= − + − − + −
= − + −
where we use the fact (see Appendix A) that1
( ) 0
n
i
i
y y
=
− = always. The first term does not
depend on0b and the second term,2
0( )n y b− , which is nonnegative, is clearly minimized when0b y=
.
(ii) If we definei iu y y= − then1 1
( )
n n
i i
i i
u y y
= =
= − and we already used the fact that this sum
is zero in the proof in part (i).
SOLUTIONS TO COMPUTER EXERCISES
Loading page 19...
14
C2.1 (i) The average prate is about 87.36 and the average mrate is about .732.
(ii) The estimated equation isprate
= 83.05 + 5.86 mrate
n = 1,534, R2 = .075.
(iii) The intercept implies that, even if mrate = 0, the predicted participation rate is 83.05
percent. The coefficient on mrate implies that a one-dollar increase in the match rate – a fairly
large increase – is estimated to increase prate by 5.86 percentage points. This assumes, of
course, that this change prate is possible (if, say, prate is already at 98, this interpretation makes
no sense).
(iv) If we plug mrate = 3.5 into the equation we getˆprate = 83.05 + 5.86(3.5) = 103.59.
This is impossible, as we can have at most a 100 percent participation rate. This illustrates that,
especially when dependent variables are bounded, a simple regression model can give strange
predictions for extreme values of the independent variable. (In the sample of 1,534 firms, only
34 have mrate 3.5.)
(v) mrate explains about 7.5% of the variation in prate. This is not much, and suggests that
many other factors influence 401(k) plan participation rates.
C2.2 (i) Average salary is about 865.864, which means $865,864 because salary is in thousands
of dollars. Average ceoten is about 7.95.
(ii) There are five CEOs with ceoten = 0. The longest tenure is 37 years.
(iii) The estimated equation islog( )salary
= 6.51 + .0097 ceoten
n = 177, R2 = .013.
We obtain the approximate percentage change in salary given ceoten = 1 by multiplying the
coefficient on ceoten by 100, 100(.0097) = .97%. Therefore, one more year as CEO is predicted
to increase salary by almost 1%.
C2.3 (i) The estimated equation issleep
= 3,586.4 – .151 totwrk
n = 706, R2 = .103.
C2.1 (i) The average prate is about 87.36 and the average mrate is about .732.
(ii) The estimated equation isprate
= 83.05 + 5.86 mrate
n = 1,534, R2 = .075.
(iii) The intercept implies that, even if mrate = 0, the predicted participation rate is 83.05
percent. The coefficient on mrate implies that a one-dollar increase in the match rate – a fairly
large increase – is estimated to increase prate by 5.86 percentage points. This assumes, of
course, that this change prate is possible (if, say, prate is already at 98, this interpretation makes
no sense).
(iv) If we plug mrate = 3.5 into the equation we getˆprate = 83.05 + 5.86(3.5) = 103.59.
This is impossible, as we can have at most a 100 percent participation rate. This illustrates that,
especially when dependent variables are bounded, a simple regression model can give strange
predictions for extreme values of the independent variable. (In the sample of 1,534 firms, only
34 have mrate 3.5.)
(v) mrate explains about 7.5% of the variation in prate. This is not much, and suggests that
many other factors influence 401(k) plan participation rates.
C2.2 (i) Average salary is about 865.864, which means $865,864 because salary is in thousands
of dollars. Average ceoten is about 7.95.
(ii) There are five CEOs with ceoten = 0. The longest tenure is 37 years.
(iii) The estimated equation islog( )salary
= 6.51 + .0097 ceoten
n = 177, R2 = .013.
We obtain the approximate percentage change in salary given ceoten = 1 by multiplying the
coefficient on ceoten by 100, 100(.0097) = .97%. Therefore, one more year as CEO is predicted
to increase salary by almost 1%.
C2.3 (i) The estimated equation issleep
= 3,586.4 – .151 totwrk
n = 706, R2 = .103.
Loading page 20...
15
The intercept implies that the estimated amount of sleep per week for someone who does not
work is 3,586.4 minutes, or about 59.77 hours. This comes to about 8.5 hours per night.
(ii) If someone works two more hours per week then totwrk = 120 (because totwrk is
measured in minutes), and sosleep = –.151(120) = –18.12 minutes. This is only a few minutes
a night. If someone were to work one more hour on each of five working days,sleep =
–.151(300) = –45.3 minutes, or about five minutes a night.
C2.4 (i) Average salary is about $957.95 and average IQ is about 101.28. The sample standard
deviation of IQ is about 15.05, which is pretty close to the population value of 15.
(ii) This calls for a level-level model:wage
= 116.99 + 8.30 IQ
n = 935, R2 = .096.
An increase in IQ of 15 increases predicted monthly salary by 8.30(15) = $124.50 (in 1980
dollars). IQ score does not even explain 10% of the variation in wage.
(iii) This calls for a log-level model:log( )wage
= 5.89 + .0088 IQ
n = 935, R2 = .099.
If IQ = 15 thenlog( )wage = .0088(15) = .132, which is the (approximate) proportionate
change in predicted wage. The percentage increase is therefore approximately 13.2.
C2.5 (i) The constant elasticity model is a log-log model:
log(rd) =0
+1
log(sales) + u,
where1
is the elasticity of rd with respect to sales.
(ii) The estimated equation islog( )rd
= –4.105 + 1.076 log(sales)
n = 32, R2 = .910.
The estimated elasticity of rd with respect to sales is 1.076, which is just above one. A one
percent increase in sales is estimated to increase rd by about 1.08%.
The intercept implies that the estimated amount of sleep per week for someone who does not
work is 3,586.4 minutes, or about 59.77 hours. This comes to about 8.5 hours per night.
(ii) If someone works two more hours per week then totwrk = 120 (because totwrk is
measured in minutes), and sosleep = –.151(120) = –18.12 minutes. This is only a few minutes
a night. If someone were to work one more hour on each of five working days,sleep =
–.151(300) = –45.3 minutes, or about five minutes a night.
C2.4 (i) Average salary is about $957.95 and average IQ is about 101.28. The sample standard
deviation of IQ is about 15.05, which is pretty close to the population value of 15.
(ii) This calls for a level-level model:wage
= 116.99 + 8.30 IQ
n = 935, R2 = .096.
An increase in IQ of 15 increases predicted monthly salary by 8.30(15) = $124.50 (in 1980
dollars). IQ score does not even explain 10% of the variation in wage.
(iii) This calls for a log-level model:log( )wage
= 5.89 + .0088 IQ
n = 935, R2 = .099.
If IQ = 15 thenlog( )wage = .0088(15) = .132, which is the (approximate) proportionate
change in predicted wage. The percentage increase is therefore approximately 13.2.
C2.5 (i) The constant elasticity model is a log-log model:
log(rd) =0
+1
log(sales) + u,
where1
is the elasticity of rd with respect to sales.
(ii) The estimated equation islog( )rd
= –4.105 + 1.076 log(sales)
n = 32, R2 = .910.
The estimated elasticity of rd with respect to sales is 1.076, which is just above one. A one
percent increase in sales is estimated to increase rd by about 1.08%.
Loading page 21...
16
C2.6 (i) It seems plausible that another dollar of spending has a larger effect for low-spending
schools than for high-spending schools. At low-spending schools, more money can go toward
purchasing more books, computers, and for hiring better qualified teachers. At high levels of
spending, we would expend little, if any, effect because the high-spending schools already have
high-quality teachers, nice facilities, plenty of books, and so on.
(ii) If we take changes, as usual, we obtain1 110 log( ) ( /100)(% ),math expend expend
=
just as in the second row of Table 2.3. So, if% 10,expend =110 /10.math
=
(iii) The regression results are2
10 69.34 11.16 log( )
408, .0297
math expend
n R
= − +
= =
(iv) If expend increases by 10 percent,10math increases by about 1.1 percentage points.
This is not a huge effect, but it is not trivial for low-spending schools, where a 10 percent
increase in spending might be a fairly small dollar amount.
(v) In this data set, the largest value of math10 is 66.7, which is not especially close to 100.
In fact, the largest fitted values is only about 30.2.
C2.7 (i) The average gift is about 7.44 Dutch guilders. Out of 4,268 respondents, 2,561 did not
give a gift, or about 60 percent.
(ii) The average mailings per year is about 2.05. The minimum value is .25 (which
presumably means that someone has been on the mailing list for at least four years) and the
maximum value is 3.5.
(iii) The estimated equation is2
2.01 2.65
4,268, .0138
gift mailsyear
n R
= +
= =
(iv) The slope coefficient from part (iii) means that each mailing per year is associated with –
perhaps even “causes” – an estimated 2.65 additional guilders, on average. Therefore, if each
mailing costs one guilder, the expected profit from each mailing is estimated to be 1.65 guilders.
This is only the average, however. Some mailings generate no contributions, or a contribution
less than the mailing cost; other mailings generated much more than the mailing cost.
(v) Because the smallest mailsyear in the sample is .25, the smallest predicted value of gifts
is 2.01 + 2.65(.25) 2.67. Even if we look at the overall population, where some people have
received no mailings, the smallest predicted value is about two. So, with this estimated equation,
we never predict zero charitable gifts.
C2.6 (i) It seems plausible that another dollar of spending has a larger effect for low-spending
schools than for high-spending schools. At low-spending schools, more money can go toward
purchasing more books, computers, and for hiring better qualified teachers. At high levels of
spending, we would expend little, if any, effect because the high-spending schools already have
high-quality teachers, nice facilities, plenty of books, and so on.
(ii) If we take changes, as usual, we obtain1 110 log( ) ( /100)(% ),math expend expend
=
just as in the second row of Table 2.3. So, if% 10,expend =110 /10.math
=
(iii) The regression results are2
10 69.34 11.16 log( )
408, .0297
math expend
n R
= − +
= =
(iv) If expend increases by 10 percent,10math increases by about 1.1 percentage points.
This is not a huge effect, but it is not trivial for low-spending schools, where a 10 percent
increase in spending might be a fairly small dollar amount.
(v) In this data set, the largest value of math10 is 66.7, which is not especially close to 100.
In fact, the largest fitted values is only about 30.2.
C2.7 (i) The average gift is about 7.44 Dutch guilders. Out of 4,268 respondents, 2,561 did not
give a gift, or about 60 percent.
(ii) The average mailings per year is about 2.05. The minimum value is .25 (which
presumably means that someone has been on the mailing list for at least four years) and the
maximum value is 3.5.
(iii) The estimated equation is2
2.01 2.65
4,268, .0138
gift mailsyear
n R
= +
= =
(iv) The slope coefficient from part (iii) means that each mailing per year is associated with –
perhaps even “causes” – an estimated 2.65 additional guilders, on average. Therefore, if each
mailing costs one guilder, the expected profit from each mailing is estimated to be 1.65 guilders.
This is only the average, however. Some mailings generate no contributions, or a contribution
less than the mailing cost; other mailings generated much more than the mailing cost.
(v) Because the smallest mailsyear in the sample is .25, the smallest predicted value of gifts
is 2.01 + 2.65(.25) 2.67. Even if we look at the overall population, where some people have
received no mailings, the smallest predicted value is about two. So, with this estimated equation,
we never predict zero charitable gifts.
Loading page 22...
17
C2.8 There is no “correct” answer to this question because all answers depend on how the
random outcomes are generated. I used Stata 11 and, before generating the outcomes on theix , I
set the seed to the value 123. I reset the seed to 123 to generate the outcomes on theiu .
Specifically, to answer parts (i) through (v), I used the sequence of commands
set obs 500
set seed 123
gen x = 10*runiform()
sum x
set seed 123
gen u = 6*rnormal()
sum u
gen y = 1 + 2*x + u
reg y x
predict uh, resid
gen x_uh = x*uh
sum uh x_uh
gen x_u = x*u
sum u x_u
(i) The sample mean of theix is about 4.912 with a sample standard deviation of about
2.874.
(ii) The sample average of theiu is about .221, which is pretty far from zero. We do not get
zero because this is just a sample of 500 from a population with a zero mean. The current sample
is “unlucky” in the sense that the sample average is far from the population average. The sample
standard deviation is about 5.768, which is nontrivially below 6, the population value.
(iii) After generating the data oniy and running the regression, I get, rounding to three
decimal places,0
ˆ 1.862
=
and1
ˆ 1.870
=
The population values are 1 and 2, respectively. Thus, the estimated intercept based on this
sample of data is well above the population value. The estimated slope is somewhat below the
population value, 2. When we sample from a population our estimates contain sampling error;
that is why the estimates differ from the population values.
(iv) When I use the command sum uh x_uh and multiply by 500 I get, using scientific
notation, sums equal to 4.181e-06 and .00003776, respectively. These are zero for practical
purposes, and differ from zero only due to rounding inherent in the machine imprecision (which
is unimportant).
C2.8 There is no “correct” answer to this question because all answers depend on how the
random outcomes are generated. I used Stata 11 and, before generating the outcomes on theix , I
set the seed to the value 123. I reset the seed to 123 to generate the outcomes on theiu .
Specifically, to answer parts (i) through (v), I used the sequence of commands
set obs 500
set seed 123
gen x = 10*runiform()
sum x
set seed 123
gen u = 6*rnormal()
sum u
gen y = 1 + 2*x + u
reg y x
predict uh, resid
gen x_uh = x*uh
sum uh x_uh
gen x_u = x*u
sum u x_u
(i) The sample mean of theix is about 4.912 with a sample standard deviation of about
2.874.
(ii) The sample average of theiu is about .221, which is pretty far from zero. We do not get
zero because this is just a sample of 500 from a population with a zero mean. The current sample
is “unlucky” in the sense that the sample average is far from the population average. The sample
standard deviation is about 5.768, which is nontrivially below 6, the population value.
(iii) After generating the data oniy and running the regression, I get, rounding to three
decimal places,0
ˆ 1.862
=
and1
ˆ 1.870
=
The population values are 1 and 2, respectively. Thus, the estimated intercept based on this
sample of data is well above the population value. The estimated slope is somewhat below the
population value, 2. When we sample from a population our estimates contain sampling error;
that is why the estimates differ from the population values.
(iv) When I use the command sum uh x_uh and multiply by 500 I get, using scientific
notation, sums equal to 4.181e-06 and .00003776, respectively. These are zero for practical
purposes, and differ from zero only due to rounding inherent in the machine imprecision (which
is unimportant).
Loading page 23...
18
(v) We already computed the sample average of theiu in part (ii). When we multiply by 500
the sample average is about 110.74. The sum ofi ix u is about 6.46. Neither is close to zero, and
nothing says they should be particularly close.
(vi) For this part I set the seed to 789. The sample average and standard deviation of theix
are about 5.030 and 2.913; those for theiu are about.077− and 5.979. When I generated theiy
and run the regression I get0
ˆ .701
=
and1
ˆ 2.044
=
These are different from those in part (iii) because they are obtained from a different random
sample. Here, for both the intercept and slope, we get estimates that are much closer to the
population values. Of course, in practice we would never know that.
(v) We already computed the sample average of theiu in part (ii). When we multiply by 500
the sample average is about 110.74. The sum ofi ix u is about 6.46. Neither is close to zero, and
nothing says they should be particularly close.
(vi) For this part I set the seed to 789. The sample average and standard deviation of theix
are about 5.030 and 2.913; those for theiu are about.077− and 5.979. When I generated theiy
and run the regression I get0
ˆ .701
=
and1
ˆ 2.044
=
These are different from those in part (iii) because they are obtained from a different random
sample. Here, for both the intercept and slope, we get estimates that are much closer to the
population values. Of course, in practice we would never know that.
Loading page 24...
19
CHAPTER 3
TEACHING NOTES
For undergraduates, I do not work through most of the derivations in this chapter, at least not in
detail. Rather, I focus on interpreting the assumptions, which mostly concern the population.
Other than random sampling, the only assumption that involves more than population
considerations is the assumption about no perfect collinearity, where the possibility of perfect
collinearity in the sample (even if it does not occur in the population) should be touched on. The
more important issue is perfect collinearity in the population, but this is fairly easy to dispense
with via examples. These come from my experiences with the kinds of model specification
issues that beginners have trouble with.
The comparison of simple and multiple regression estimates – based on the particular sample at
hand, as opposed to their statistical properties – usually makes a strong impression. Sometimes I
do not bother with the “partialling out” interpretation of multiple regression.
As far as statistical properties, notice how I treat the problem of including an irrelevant variable:
no separate derivation is needed, as the result follows form Theorem 3.1.
I do like to derive the omitted variable bias in the simple case. This is not much more difficult
than showing unbiasedness of OLS in the simple regression case under the first four Gauss-
Markov assumptions. It is important to get the students thinking about this problem early on,
and before too many additional (unnecessary) assumptions have been introduced.
I have intentionally kept the discussion of multicollinearity to a minimum. This partly indicates
my bias, but it also reflects reality. It is, of course, very important for students to understand the
potential consequences of having highly correlated independent variables. But this is often
beyond our control, except that we can ask less of our multiple regression analysis. If two or
more explanatory variables are highly correlated in the sample, we should not expect to precisely
estimate their ceteris paribus effects in the population.
I find extensive treatments of multicollinearity, where one “tests” or somehow “solves” the
multicollinearity problem, to be misleading, at best. Even the organization of some texts gives
the impression that imperfect collinearity is somehow a violation of the Gauss-Markov
assumptions. In fact, they include multicollinearity in a chapter or part of the book devoted to
“violation of the basic assumptions,” or something like that. I have noticed that master’s
students who have had some undergraduate econometrics are often confused on the
multicollinearity issue. It is very important that students not confuse multicollinearity among the
included explanatory variables in a regression model with the bias caused by omitting an
important variable.
I do not prove the Gauss-Markov theorem. Instead, I emphasize its implications. Sometimes, and
certainly for advanced beginners, I put a special case of Problem 3.12 on a midterm exam, where
I make a particular choice for the function g(x). Rather than have the students directly compare
CHAPTER 3
TEACHING NOTES
For undergraduates, I do not work through most of the derivations in this chapter, at least not in
detail. Rather, I focus on interpreting the assumptions, which mostly concern the population.
Other than random sampling, the only assumption that involves more than population
considerations is the assumption about no perfect collinearity, where the possibility of perfect
collinearity in the sample (even if it does not occur in the population) should be touched on. The
more important issue is perfect collinearity in the population, but this is fairly easy to dispense
with via examples. These come from my experiences with the kinds of model specification
issues that beginners have trouble with.
The comparison of simple and multiple regression estimates – based on the particular sample at
hand, as opposed to their statistical properties – usually makes a strong impression. Sometimes I
do not bother with the “partialling out” interpretation of multiple regression.
As far as statistical properties, notice how I treat the problem of including an irrelevant variable:
no separate derivation is needed, as the result follows form Theorem 3.1.
I do like to derive the omitted variable bias in the simple case. This is not much more difficult
than showing unbiasedness of OLS in the simple regression case under the first four Gauss-
Markov assumptions. It is important to get the students thinking about this problem early on,
and before too many additional (unnecessary) assumptions have been introduced.
I have intentionally kept the discussion of multicollinearity to a minimum. This partly indicates
my bias, but it also reflects reality. It is, of course, very important for students to understand the
potential consequences of having highly correlated independent variables. But this is often
beyond our control, except that we can ask less of our multiple regression analysis. If two or
more explanatory variables are highly correlated in the sample, we should not expect to precisely
estimate their ceteris paribus effects in the population.
I find extensive treatments of multicollinearity, where one “tests” or somehow “solves” the
multicollinearity problem, to be misleading, at best. Even the organization of some texts gives
the impression that imperfect collinearity is somehow a violation of the Gauss-Markov
assumptions. In fact, they include multicollinearity in a chapter or part of the book devoted to
“violation of the basic assumptions,” or something like that. I have noticed that master’s
students who have had some undergraduate econometrics are often confused on the
multicollinearity issue. It is very important that students not confuse multicollinearity among the
included explanatory variables in a regression model with the bias caused by omitting an
important variable.
I do not prove the Gauss-Markov theorem. Instead, I emphasize its implications. Sometimes, and
certainly for advanced beginners, I put a special case of Problem 3.12 on a midterm exam, where
I make a particular choice for the function g(x). Rather than have the students directly compare
Loading page 25...
20
the variances, they should appeal to the Gauss-Markov theorem for the superiority of OLS over
any other linear, unbiased estimator.
SOLUTIONS TO PROBLEMS
3.1 (i) hsperc is defined so that the smaller it is, the lower the student’s standing in high
school. Everything else equal, the worse the student’s standing in high school, the lower is
his/her expected college GPA.
(ii) Just plug these values into the equation:colgpa
= 1.392 − .0135(20) + .00148(1050) = 2.676.
(iii) The difference between A and B is simply 140 times the coefficient on sat, because
hsperc is the same for both students. So A is predicted to have a score .00148(140) .207
higher.
(iv) With hsperc fixed,colgpa = .00148sat. Now, we want to find sat such thatcolgpa
= .5, so .5 = .00148(sat) or sat = .5/(.00148) 338. Perhaps not surprisingly, a
large ceteris paribus difference in SAT score – almost two and one-half standard deviations – is
needed to obtain a predicted difference in college GPA or a half a point.
3.2 (i) Yes. Because of budget constraints, it makes sense that, the more siblings there are in a
family, the less education any one child in the family has. To find the increase in the number of
siblings that reduces predicted education by one year, we solve 1 = .094(sibs), so sibs =
1/.094 10.6.
(ii) Holding sibs and feduc fixed, one more year of mother’s education implies .131 years
more of predicted education. So if a mother has four more years of education, her son is
predicted to have about a half a year (.524) more years of education.
(iii) Since the number of siblings is the same, but meduc and feduc are both different, the
coefficients on meduc and feduc both need to be accounted for. The predicted difference in
education between B and A is .131(4) + .210(4) = 1.364.
3.3 (i) If adults trade off sleep for work, more work implies less sleep (other things equal), so1
< 0.
(ii) The signs of2
and3
are not obvious, at least to me. One could argue that more
educated people like to get more out of life, and so, other things equal, they sleep less (2
< 0).
The relationship between sleeping and age is more complicated than this model suggests, and
economists are not in the best position to judge such things.
the variances, they should appeal to the Gauss-Markov theorem for the superiority of OLS over
any other linear, unbiased estimator.
SOLUTIONS TO PROBLEMS
3.1 (i) hsperc is defined so that the smaller it is, the lower the student’s standing in high
school. Everything else equal, the worse the student’s standing in high school, the lower is
his/her expected college GPA.
(ii) Just plug these values into the equation:colgpa
= 1.392 − .0135(20) + .00148(1050) = 2.676.
(iii) The difference between A and B is simply 140 times the coefficient on sat, because
hsperc is the same for both students. So A is predicted to have a score .00148(140) .207
higher.
(iv) With hsperc fixed,colgpa = .00148sat. Now, we want to find sat such thatcolgpa
= .5, so .5 = .00148(sat) or sat = .5/(.00148) 338. Perhaps not surprisingly, a
large ceteris paribus difference in SAT score – almost two and one-half standard deviations – is
needed to obtain a predicted difference in college GPA or a half a point.
3.2 (i) Yes. Because of budget constraints, it makes sense that, the more siblings there are in a
family, the less education any one child in the family has. To find the increase in the number of
siblings that reduces predicted education by one year, we solve 1 = .094(sibs), so sibs =
1/.094 10.6.
(ii) Holding sibs and feduc fixed, one more year of mother’s education implies .131 years
more of predicted education. So if a mother has four more years of education, her son is
predicted to have about a half a year (.524) more years of education.
(iii) Since the number of siblings is the same, but meduc and feduc are both different, the
coefficients on meduc and feduc both need to be accounted for. The predicted difference in
education between B and A is .131(4) + .210(4) = 1.364.
3.3 (i) If adults trade off sleep for work, more work implies less sleep (other things equal), so1
< 0.
(ii) The signs of2
and3
are not obvious, at least to me. One could argue that more
educated people like to get more out of life, and so, other things equal, they sleep less (2
< 0).
The relationship between sleeping and age is more complicated than this model suggests, and
economists are not in the best position to judge such things.
Loading page 26...
21
(iii) Since totwrk is in minutes, we must convert five hours into minutes: totwrk =
5(60) = 300. Then sleep is predicted to fall by .148(300) = 44.4 minutes. For a week, 45
minutes less sleep is not an overwhelming change.
(iv) More education implies less predicted time sleeping, but the effect is quite small. If
we assume the difference between college and high school is four years, the college graduate
sleeps about 45 minutes less per week, other things equal.
(v) Not surprisingly, the three explanatory variables explain only about 11.3% of the
variation in sleep. One important factor in the error term is general health. Another is marital
status, and whether the person has children. Health (however we measure that), marital status,
and number and ages of children would generally be correlated with totwrk. (For example, less
healthy people would tend to work less.)
3.4 (i) A larger rank for a law school means that the school has less prestige; this lowers
starting salaries. For example, a rank of 100 means there are 99 schools thought to be better.
(ii)1
> 0,2
> 0. Both LSAT and GPA are measures of the quality of the entering class.
No matter where better students attend law school, we expect them to earn more, on average.3
,4
> 0. The number of volumes in the law library and the tuition cost are both measures of the
school quality. (Cost is less obvious than library volumes, but should reflect quality of the
faculty, physical plant, and so on.)
(iii) This is just the coefficient on GPA, multiplied by 100: 24.8%.
(iv) This is an elasticity: a one percent increase in library volumes implies a .095%
increase in predicted median starting salary, other things equal.
(v) It is definitely better to attend a law school with a lower rank. If law school A has a
ranking 20 less than law school B, the predicted difference in starting salary is 100(.0033)(20) =
6.6% higher for law school A.
3.5 (i) No. By definition, study + sleep + work + leisure = 168. Therefore, if we change study,
we must change at least one of the other categories so that the sum is still 168.
(ii) From part (i), we can write, say, study as a perfect linear function of the other
independent variables: study = 168 − sleep − work − leisure. This holds for every observation,
so MLR.3 violated.
(iii) Simply drop one of the independent variables, say leisure:
GPA =0
+1
study +2
sleep +3
work + u.
(iii) Since totwrk is in minutes, we must convert five hours into minutes: totwrk =
5(60) = 300. Then sleep is predicted to fall by .148(300) = 44.4 minutes. For a week, 45
minutes less sleep is not an overwhelming change.
(iv) More education implies less predicted time sleeping, but the effect is quite small. If
we assume the difference between college and high school is four years, the college graduate
sleeps about 45 minutes less per week, other things equal.
(v) Not surprisingly, the three explanatory variables explain only about 11.3% of the
variation in sleep. One important factor in the error term is general health. Another is marital
status, and whether the person has children. Health (however we measure that), marital status,
and number and ages of children would generally be correlated with totwrk. (For example, less
healthy people would tend to work less.)
3.4 (i) A larger rank for a law school means that the school has less prestige; this lowers
starting salaries. For example, a rank of 100 means there are 99 schools thought to be better.
(ii)1
> 0,2
> 0. Both LSAT and GPA are measures of the quality of the entering class.
No matter where better students attend law school, we expect them to earn more, on average.3
,4
> 0. The number of volumes in the law library and the tuition cost are both measures of the
school quality. (Cost is less obvious than library volumes, but should reflect quality of the
faculty, physical plant, and so on.)
(iii) This is just the coefficient on GPA, multiplied by 100: 24.8%.
(iv) This is an elasticity: a one percent increase in library volumes implies a .095%
increase in predicted median starting salary, other things equal.
(v) It is definitely better to attend a law school with a lower rank. If law school A has a
ranking 20 less than law school B, the predicted difference in starting salary is 100(.0033)(20) =
6.6% higher for law school A.
3.5 (i) No. By definition, study + sleep + work + leisure = 168. Therefore, if we change study,
we must change at least one of the other categories so that the sum is still 168.
(ii) From part (i), we can write, say, study as a perfect linear function of the other
independent variables: study = 168 − sleep − work − leisure. This holds for every observation,
so MLR.3 violated.
(iii) Simply drop one of the independent variables, say leisure:
GPA =0
+1
study +2
sleep +3
work + u.
Loading page 27...
22
Now, for example,1
is interpreted as the change in GPA when study increases by one hour,
where sleep, work, and u are all held fixed. If we are holding sleep and work fixed but increasing
study by one hour, then we must be reducing leisure by one hour. The other slope parameters
have a similar interpretation.
3.6 Conditioning on the outcomes of the explanatory variables, we have1E( )
= E(1
ˆ
+2
ˆ
) = E(1
ˆ
) + E(2
ˆ
) =
1 +
2 =1
.
3.7 Only (ii), omitting an important variable, can cause bias, and this is true only when the
omitted variable is correlated with the included explanatory variables. The homoskedasticity
assumption, MLR.5, played no role in showing that the OLS estimators are unbiased.
(Homoskedasticity was used to obtain the usual variance formulas for theˆ j
.) Further, the
degree of collinearity between the explanatory variables in the sample, even if it is reflected in a
correlation as high as .95, does not affect the Gauss-Markov assumptions. Only if there is a
perfect linear relationship among two or more explanatory variables is MLR.3 violated.
3.8 We can use Table 3.2. By definition,2
> 0, and by assumption, Corr(x1,x2) < 0.
Therefore, there is a negative bias in1
: E(1
) <1
. This means that, on average across
different random samples, the simple regression estimator underestimates the effect of the
training program. It is even possible that E(1
) is negative even though1
> 0.
3.9 (i)1
< 0 because more pollution can be expected to lower housing values; note that1
is
the elasticity of price with respect to nox.2
is probably positive because rooms roughly
measures the size of a house. (However, it does not allow us to distinguish homes where each
room is large from homes where each room is small.)
(ii) If we assume that rooms increases with quality of the home, then log(nox) and rooms
are negatively correlated when poorer neighborhoods have more pollution, something that is
often true. We can use Table 3.2 to determine the direction of the bias. If2
> 0 and
Corr(x1,x2) < 0, the simple regression estimator1
has a downward bias. But because1
< 0,
this means that the simple regression, on average, overstates the importance of pollution. [E(1
)
is more negative than1
.]
(iii) This is what we expect from the typical sample based on our analysis in part (ii). The
simple regression estimate, −1.043, is more negative (larger in magnitude) than the multiple
regression estimate, −.718. As those estimates are only for one sample, we can never know
which is closer to1
. But if this is a “typical” sample,1
is closer to −.718.
3.10 (i) Because1x is highly correlated with2x and3x , and these latter variables have large
partial effects on y, the simple and multiple regression coefficients on1x can differ by large
Now, for example,1
is interpreted as the change in GPA when study increases by one hour,
where sleep, work, and u are all held fixed. If we are holding sleep and work fixed but increasing
study by one hour, then we must be reducing leisure by one hour. The other slope parameters
have a similar interpretation.
3.6 Conditioning on the outcomes of the explanatory variables, we have1E( )
= E(1
ˆ
+2
ˆ
) = E(1
ˆ
) + E(2
ˆ
) =
1 +
2 =1
.
3.7 Only (ii), omitting an important variable, can cause bias, and this is true only when the
omitted variable is correlated with the included explanatory variables. The homoskedasticity
assumption, MLR.5, played no role in showing that the OLS estimators are unbiased.
(Homoskedasticity was used to obtain the usual variance formulas for theˆ j
.) Further, the
degree of collinearity between the explanatory variables in the sample, even if it is reflected in a
correlation as high as .95, does not affect the Gauss-Markov assumptions. Only if there is a
perfect linear relationship among two or more explanatory variables is MLR.3 violated.
3.8 We can use Table 3.2. By definition,2
> 0, and by assumption, Corr(x1,x2) < 0.
Therefore, there is a negative bias in1
: E(1
) <1
. This means that, on average across
different random samples, the simple regression estimator underestimates the effect of the
training program. It is even possible that E(1
) is negative even though1
> 0.
3.9 (i)1
< 0 because more pollution can be expected to lower housing values; note that1
is
the elasticity of price with respect to nox.2
is probably positive because rooms roughly
measures the size of a house. (However, it does not allow us to distinguish homes where each
room is large from homes where each room is small.)
(ii) If we assume that rooms increases with quality of the home, then log(nox) and rooms
are negatively correlated when poorer neighborhoods have more pollution, something that is
often true. We can use Table 3.2 to determine the direction of the bias. If2
> 0 and
Corr(x1,x2) < 0, the simple regression estimator1
has a downward bias. But because1
< 0,
this means that the simple regression, on average, overstates the importance of pollution. [E(1
)
is more negative than1
.]
(iii) This is what we expect from the typical sample based on our analysis in part (ii). The
simple regression estimate, −1.043, is more negative (larger in magnitude) than the multiple
regression estimate, −.718. As those estimates are only for one sample, we can never know
which is closer to1
. But if this is a “typical” sample,1
is closer to −.718.
3.10 (i) Because1x is highly correlated with2x and3x , and these latter variables have large
partial effects on y, the simple and multiple regression coefficients on1x can differ by large
Loading page 28...
23
amounts. We have not done this case explicitly, but given equation (3.46) and the discussion
with a single omitted variable, the intuition is pretty straightforward.
(ii) Here we would expect1
and1
ˆ
to be similar (subject, of course, to what we mean by
“almost uncorrelated”). The amount of correlation between2x and3x does not directly effect
the multiple regression estimate on1x if1x is essentially uncorrelated with2x and3x .
(iii) In this case we are (unnecessarily) introducing multicollinearity into the regression:2x
and3x have small partial effects on y and yet2x and3x are highly correlated with1x . Adding2x
and3x like increases the standard error of the coefficient on1x substantially, so se(1
ˆ
) is
likely to be much larger than se(1
).
(iv) In this case, adding2x and3x will decrease the residual variance without causing
much collinearity (because1x is almost uncorrelated with2x and3x ), so we should see se(1
ˆ
)
smaller than se(1
). The amount of correlation between2x and3x does not directly affect se(1
ˆ
).
3.11 From equation (3.22) we have1
1
1
2
1
1
ˆ
,
ˆ
n
i i
i
n
i
i
r y
r
=
=
=
where the1
ˆir are defined in the problem. As usual, we must plug in the true model for yi:1 0 1 1 2 2 3 3
1
1
2
1
1
ˆ (
.
ˆ
n
i i i i i
i
n
i
i
r x x x u
r
=
=
+ + + +
=
The numerator of this expression simplifies because1
1
ˆ
n
i
i
r
=
= 0,1 2
1
ˆ
n
i i
i
r x
=
= 0, and1 1
1
ˆ
n
i i
i
r x
=
=2
1
1
ˆ
n
i
i
r
=
. These all follow from the fact that the1
ˆir are the residuals from the regression of1ix on2ix
: the1
ˆir have zero sample average and are uncorrelated in sample with2ix . So the numerator
of1
can be expressed as
amounts. We have not done this case explicitly, but given equation (3.46) and the discussion
with a single omitted variable, the intuition is pretty straightforward.
(ii) Here we would expect1
and1
ˆ
to be similar (subject, of course, to what we mean by
“almost uncorrelated”). The amount of correlation between2x and3x does not directly effect
the multiple regression estimate on1x if1x is essentially uncorrelated with2x and3x .
(iii) In this case we are (unnecessarily) introducing multicollinearity into the regression:2x
and3x have small partial effects on y and yet2x and3x are highly correlated with1x . Adding2x
and3x like increases the standard error of the coefficient on1x substantially, so se(1
ˆ
) is
likely to be much larger than se(1
).
(iv) In this case, adding2x and3x will decrease the residual variance without causing
much collinearity (because1x is almost uncorrelated with2x and3x ), so we should see se(1
ˆ
)
smaller than se(1
). The amount of correlation between2x and3x does not directly affect se(1
ˆ
).
3.11 From equation (3.22) we have1
1
1
2
1
1
ˆ
,
ˆ
n
i i
i
n
i
i
r y
r
=
=
=
where the1
ˆir are defined in the problem. As usual, we must plug in the true model for yi:1 0 1 1 2 2 3 3
1
1
2
1
1
ˆ (
.
ˆ
n
i i i i i
i
n
i
i
r x x x u
r
=
=
+ + + +
=
The numerator of this expression simplifies because1
1
ˆ
n
i
i
r
=
= 0,1 2
1
ˆ
n
i i
i
r x
=
= 0, and1 1
1
ˆ
n
i i
i
r x
=
=2
1
1
ˆ
n
i
i
r
=
. These all follow from the fact that the1
ˆir are the residuals from the regression of1ix on2ix
: the1
ˆir have zero sample average and are uncorrelated in sample with2ix . So the numerator
of1
can be expressed as
Loading page 29...
242
1 1 3 1 3 1
1 1 1
ˆ ˆ ˆ .
n n n
i i i i i
i i i
r r x r u
= = =
+ +
Putting these back over the denominator gives1 3 1
1 1
1 1 3
2 2
1 1
1 1
ˆ ˆ
.
ˆ ˆ
n n
i i i
i i
n n
i i
i i
r x ru
r r
= =
= =
= + +
Conditional on all sample values on x1, x2, and x3, only the last term is random due to its
dependence on ui. But E(ui) = 0, and so1 3
1
1 1 3
2
1
1
ˆ
E( ) = + ,
ˆ
n
i i
i
n
i
i
r x
r
=
=
which is what we wanted to show. Notice that the term multiplying3
is the regression
coefficient from the simple regression of xi3 on1
ˆir .
3.12 (i) The shares, by definition, add to one. If we do not omit one of the shares then the
equation would suffer from perfect multicollinearity. The parameters would not have a ceteris
paribus interpretation, as it is impossible to change one share while holding all of the other
shares fixed.
(ii) Because each share is a proportion (and can be at most one, when all other shares are
zero), it makes little sense to increase sharep by one unit. If sharep increases by .01 – which is
equivalent to a one percentage point increase in the share of property taxes in total revenue –
holding shareI, shareS, and the other factors fixed, then growth increases by1
(.01). With the
other shares fixed, the excluded share, shareF, must fall by .01 when sharep increases by .01.
3.13 (i) For notational simplicity, define szx =1
( ) ;
n
i i
i
z z x
=
− this is not quite the sample
covariance between z and x because we do not divide by n – 1, but we are only using it to
simplify notation. Then we can write1
as1
1
( )
.
n
i i
i
zx
z z y
s
=
−
=
1 1 3 1 3 1
1 1 1
ˆ ˆ ˆ .
n n n
i i i i i
i i i
r r x r u
= = =
+ +
Putting these back over the denominator gives1 3 1
1 1
1 1 3
2 2
1 1
1 1
ˆ ˆ
.
ˆ ˆ
n n
i i i
i i
n n
i i
i i
r x ru
r r
= =
= =
= + +
Conditional on all sample values on x1, x2, and x3, only the last term is random due to its
dependence on ui. But E(ui) = 0, and so1 3
1
1 1 3
2
1
1
ˆ
E( ) = + ,
ˆ
n
i i
i
n
i
i
r x
r
=
=
which is what we wanted to show. Notice that the term multiplying3
is the regression
coefficient from the simple regression of xi3 on1
ˆir .
3.12 (i) The shares, by definition, add to one. If we do not omit one of the shares then the
equation would suffer from perfect multicollinearity. The parameters would not have a ceteris
paribus interpretation, as it is impossible to change one share while holding all of the other
shares fixed.
(ii) Because each share is a proportion (and can be at most one, when all other shares are
zero), it makes little sense to increase sharep by one unit. If sharep increases by .01 – which is
equivalent to a one percentage point increase in the share of property taxes in total revenue –
holding shareI, shareS, and the other factors fixed, then growth increases by1
(.01). With the
other shares fixed, the excluded share, shareF, must fall by .01 when sharep increases by .01.
3.13 (i) For notational simplicity, define szx =1
( ) ;
n
i i
i
z z x
=
− this is not quite the sample
covariance between z and x because we do not divide by n – 1, but we are only using it to
simplify notation. Then we can write1
as1
1
( )
.
n
i i
i
zx
z z y
s
=
−
=
Loading page 30...
25
This is clearly a linear function of the yi: take the weights to be wi = (zi −z )/szx. To show
unbiasedness, as usual we plug yi =0
+1
xi + ui into this equation, and simplify:0 1
1
1
0 1
1 1
1
1
( )( )
( ) ( )
( )
n
i i i
i
zx
n n
i zx i i
i i
zx
n
i i
i
zx
z z x u
s
z z s z z u
s
z z u
s
=
= =
=
− + +
=
− + + −
=
−
= +
where we use the fact that1
( )
n
i
i
z z
=
− = 0 always. Now szx is a function of the zi and xi and the
expected value of each ui is zero conditional on all zi and xi in the sample. Therefore, conditional
on these values,1
1 1 1
( )E( )
E( )
n
i i
i
zx
z z u
s
=
−
= + =
because E(ui) = 0 for all i.
(ii) From the fourth equation in part (i) we have (again conditional on the zi and xi in the
sample),2
1 1
1 2 2
2
2 1
2
Var ( ) ( ) Var( )
Var( )
( )
n n
i i i i
i i
zx zx
n
i
i
zx
z z u z z u
s s
z z
s
= =
=
− −
= =
−
=
because of the homoskedasticity assumption [Var(ui) =
2 for all i]. Given the definition of szx,
this is what we wanted to show.
This is clearly a linear function of the yi: take the weights to be wi = (zi −z )/szx. To show
unbiasedness, as usual we plug yi =0
+1
xi + ui into this equation, and simplify:0 1
1
1
0 1
1 1
1
1
( )( )
( ) ( )
( )
n
i i i
i
zx
n n
i zx i i
i i
zx
n
i i
i
zx
z z x u
s
z z s z z u
s
z z u
s
=
= =
=
− + +
=
− + + −
=
−
= +
where we use the fact that1
( )
n
i
i
z z
=
− = 0 always. Now szx is a function of the zi and xi and the
expected value of each ui is zero conditional on all zi and xi in the sample. Therefore, conditional
on these values,1
1 1 1
( )E( )
E( )
n
i i
i
zx
z z u
s
=
−
= + =
because E(ui) = 0 for all i.
(ii) From the fourth equation in part (i) we have (again conditional on the zi and xi in the
sample),2
1 1
1 2 2
2
2 1
2
Var ( ) ( ) Var( )
Var( )
( )
n n
i i i i
i i
zx zx
n
i
i
zx
z z u z z u
s s
z z
s
= =
=
− −
= =
−
=
because of the homoskedasticity assumption [Var(ui) =
2 for all i]. Given the definition of szx,
this is what we wanted to show.
Loading page 31...
30 more pages available. Scroll down to load them.
Preview Mode
Sign in to access the full document!
100%
Study Now!
XY-Copilot AI
Unlimited Access
Secure Payment
Instant Access
24/7 Support
AI Assistant
Document Details
Subject
Economics