Predictive Modeling of Serious Crime Using County Demographic Data: Stats Final Exam Part 2

Final exam solving predictive modeling problems for crime analysis using demographic data.

Andrew Taylor
Contributor
4.6
50
19 days ago
Preview (3 of 8)
Sign in to access the full document!
Stats Final Exam Part 2
Given the data set called County Demographic Information, construct a predictive model for
the variable “Total Serious Crime” using some or all of the other variables in the set of data.
The model should be mathematically valid, accurate and reliable.
Total Serious Crime is Variable #8
Other Variables:
#2 Land Area
#3 Total Population
#4 Percent of Population aged 18-34
#5 Percent of Population 65 or over
#6 Number of Active Physicians
#7 Number of Hospital Beds
#9 Percent of High School Graduates
#10 Percent of Population with College Degrees
#11 Percent of Population below poverty level
#12 Unemployment Percent
#13 Per Capita Income
#14 Total Personal Income
#15 Geographic Region
Note: I am omitting the data set to simplify this problem; the following analyses use the data
set described above, and you can assume the math is calculated correctly. I am testing to see
if you can identify what analytical techniques may be validly employed and how effective are
they building a model.
Variables 2 to 14 are numeric variables and variable 15 is categorical.
Analysis #1
In the given data set, we were asked to determine if an accurate predictive model for
Variable #8, Serious Crime could be found using the attached data.
Since Variable 15 was determined to be categorical, regression was not appropriate to
use; so I used Analysis of Variance (ANOVA) to examine if there was a significant relationship
between Variable 8 and 15. The results (using Systat 13.0) are printed above.
Variables Levels
VAR(15) (4 levels)1.000 2.0003.0004.000
Dependent VariableVAR(8)
N 440
Multiple R 0.110
Squared Multiple R 0.012
Estimates of Effects B = (X'X)-
1X'Y
Factor Level VAR(8)
CONSTANT 28,017.368
VAR(15) 1 -4,931.339
VAR(15) 2 -6,236.627
VAR(15) 3 -1,026.394
Analysis of Variance
Source Type III SS df Mean Squares F-Ratiop-Value
VAR(15)1.795E+0103 5.985E+009 1.774 0.151
Error 1.471E+012436 3.374E+009
ANOVA results suggest that Variable 15 is significantly related to Variable 8, but
Variable 15 can only explain approximately 15.1% of the variation in Variable 8.
Therefore, I conclude that variable 15 is significantly related to variable 8
although variable 15 is only a minor factor in predicting variable 8.
Preview Mode

Sign in to access the full document!

100%

Study Now!

XY-Copilot AI
Unlimited Access
Secure Payment
Instant Access
24/7 Support
Document Chat

Document Details

Subject
Statistics