In Part B of this lab we were tasked with examining one type of crime from the data provided for Mecklenburg County (Charlotte, NC). I chose to examine Auto Thefts by performing both OLS (Ordinary Least Squares) and GWR (Geographically Weighted Regression) and comparing their results.
OLS creates a linear regression equation to model/predict values for a dependent variable based on its relationship(s) with an explanatory or independent variable(s).
After performing OLS one can map the residuals to examine their spatial autocorrelation. The OLS output report lists the Koenker statistic which tests for stationarity. "If the test is significant, the model is not stationary, which means
that the relationships between variables changes substantially across the study area. This presents a good
good reason for a local regression model." GWR, a local model, can then be used to reduce the residuals. GWR creates a separate linear regression equation for every feature in the dataset which enables distances for each neighborhood to be taken into account. OLS does not consider distances at each location/neighborhood.
To predict Auto Thefts, I first examined the Coefficient matrix and noticed strong multicollinearity between the Med_Income variable and the Black_Per, Rent_Per, Hu_Value variables.
|
Rate
|
BLACK_PER
|
HISP_PER
|
RENT_PER
|
MED_INCOME
|
HU_VALUE
|
Rate
|
1
|
|
|
|
|
|
BLACK_PER
|
0.467505903
|
1
|
|
|
|
|
HISP_PER
|
0.140724435
|
0.150039801
|
1
|
|
|
|
RENT_PER
|
0.496543551
|
0.555423624
|
0.4400933
|
1
|
|
|
MED_INCOME
|
-0.501135457
|
-0.729860109
|
-0.337189485
|
-0.73300131
|
1
|
|
HU_VALUE
|
-0.291086137
|
-0.648849511
|
-0.354489318
|
-0.427515948
|
0.830892093
|
1
|
I chose to perform an OLS with Black_Per and Rent_Per as the explanatory variables.
Variable
|
Coef
|
Robust_Pr
|
Intercept
|
-13.75320072
|
0.026737542
|
BLACK_PER
|
0.975778626
|
0.000158894
|
RENT_PER
|
1.645772686
|
1.42741E-05
|
I then ran GWR in order to reduce the residuals.
Parameter
|
OLS
|
GWR - Fixed
|
GWR - Adaptive
|
Adjusted R2
|
0.290
|
0.290
|
0.307
|
AIC
|
1,687.919
|
1,687.924
|
1,687.877
|
Z-score for residuals
|
5.681
|
5.682
|
3.417
|
The
GWR model with Kernel type set to Fixed actually performed slightly worse than
the OLS model. Changing the Kernel type
to Adaptive saw some improvement from the OLS model with a slightly higher
Adjusted R2 and a lower Z-score the GWR Adaptive model performed the best.
The problem with both the OLS and GWR models was that they both failed the Moran's I test, meaning their spatial correlation was not random. I performed an Exploratory Regression and found that the best fit model, (lowest AIC and highest
Adjusted R2), would be to use the Black_Per, Med_Income, and Hu_Value variables.ad
None of these models however had an Adjusted R2 value over 40% so predicting AutoThefts is difficult.
The explanatory variable, Rent_Per had the greatest impact in my OLS and GWR models. I have displayed below a comparison of the Rent_Per variable in the GWR results with the OLS residuals. In comparing the pattern of the coefficient of the
Rent_Per variable with the pattern of the OLS residuals I did not see a
correlation that could have predicted the results. I think because the model I used only had two explanatory variables and had an Adjusted R2 under 30% it was hard to explain the
variability of auto thefts from these two explanatory variables and
particularly just the Rent_Per even though it did have the larger coefficient
and smaller p-value.
|
GWR on the left displaying the regression coefficient for the Rent_Per explanatory variable and comparing it against the spatial pattern of the OLS residuals on the right |