Wednesday, November 22, 2017

GIS5935 - Lab 12 - Geographically Weighted Regression

In Part B of this lab we were tasked with examining one type of crime from the data provided for Mecklenburg County (Charlotte, NC).  I chose to examine Auto Thefts by performing both OLS (Ordinary Least Squares) and GWR (Geographically Weighted Regression) and comparing their results.

OLS creates a linear regression equation to model/predict values for a dependent variable based on its relationship(s) with an explanatory or independent variable(s).

After performing OLS one can map the residuals to examine their spatial autocorrelation.  The OLS output report lists the Koenker statistic which tests for stationarity. "If the test is significant, the model is not stationary, which means that the relationships between variables changes substantially across the study area. This presents a good good reason for a local regression model." GWR, a local model, can then be used to reduce the residuals.  GWR creates a separate linear regression equation for every feature in the dataset which enables distances for each neighborhood to be taken into account.  OLS does not consider distances at each location/neighborhood.

To predict Auto Thefts, I first examined the Coefficient matrix and noticed strong multicollinearity between the Med_Income variable and the Black_Per, Rent_Per, Hu_Value variables.



Rate
BLACK_PER
HISP_PER
RENT_PER
MED_INCOME
HU_VALUE
Rate
1





BLACK_PER
0.467505903
1




HISP_PER
0.140724435
0.150039801
1



RENT_PER
0.496543551
0.555423624
0.4400933
1


MED_INCOME
-0.501135457
-0.729860109
-0.337189485
-0.73300131
1

HU_VALUE
-0.291086137
-0.648849511
-0.354489318
-0.427515948
0.830892093
1

I chose to perform an OLS with Black_Per and Rent_Per as the explanatory variables.


Variable
Coef
Robust_Pr
Intercept
-13.75320072
0.026737542
BLACK_PER
0.975778626
0.000158894
RENT_PER
1.645772686
1.42741E-05

I then ran GWR in order to reduce the residuals.


Parameter
OLS
GWR - Fixed
GWR - Adaptive
Adjusted R2
0.290
0.290
0.307
AIC
1,687.919
1,687.924
1,687.877
Z-score for residuals
5.681
5.682
3.417


The GWR model with Kernel type set to Fixed actually performed slightly worse than the OLS model.  Changing the Kernel type to Adaptive saw some improvement from the OLS model with a slightly higher Adjusted R2 and a lower Z-score the GWR Adaptive model performed the best.

The problem with both the OLS and GWR models was that they both failed the Moran's I test, meaning their spatial correlation was not random.  I performed an Exploratory Regression and found that the best fit model, (lowest AIC and highest Adjusted R2), would be to use the Black_Per, Med_Income, and Hu_Value variables.ad 

None of these models however had an Adjusted R2 value over 40% so predicting AutoThefts is difficult.

The explanatory variable, Rent_Per had the greatest impact in my OLS and GWR models.  I have displayed below a comparison of the Rent_Per variable in the GWR results with the OLS residuals.  In comparing the pattern of the coefficient of the Rent_Per variable with the pattern of the OLS residuals I did not see a correlation that could have predicted the results.  I think because the model I used only had two explanatory variables and had an Adjusted R2 under 30% it was hard to explain the variability of auto thefts from these two explanatory variables and particularly just the Rent_Per even though it did have the larger coefficient and smaller p-value.  

GWR on the left displaying the regression coefficient for the Rent_Per explanatory variable and comparing it against the spatial pattern of the OLS residuals on the right

No comments:

Post a Comment