Tuesday, November 14, 2017

GIS5935 - Lab 11 - Multivariate Regression, Diagnostics and Regression in ArcGIS

In this week's lab we continued our lesson on regression analysis.  In Part A we performed multivariate regression analyses using the Data Analysis Correlation and Regression Tools in Excel.  In our multivariate regression we had a singular dependent variable and multiple independent variables.

We used the analysis results to predict a selling price for a house with 4 bedrooms and a lot size of 5100.  Next we added data for an outlier feature and reran the analyses and re-predicted a selling price.

In Part B we continued using Excel to create a correlation coefficient matrix to help determine which variables to use in a multivariate regression analysis of cirrhosis data.  Comparing Adjusted R Squared values of different variable combinations helped determine the best fit model.

In Part C we again used Excel to perform a multivariate regression using the provided variable combination to analyze 911 call data from Portland, Oregon.  Using a provided shapefile created from the same 911 call data we used the Spatial Statistics Tool, Ordinary Least Squares (OLS) in ArcGIS to analyze both a one independent variable (bivariate) model and a three independent variable (multivariate) model.  Using the output results from these tools enabled us to perform the six checks to determine if we had a properly specified and a statistically sound model.

In Part D we used the Exploratory Regression tool to find the OLS model that best explains/predicts the dependent variable.  The exploratory regression report summarizes the statistical data of the different independent/explanatory variables as well as combinations of these variables.   The diagnostic data and checks that the exploratory regression report provides can be used to select combinations of explanatory variables that would best model/predict the dependent variable.  Comparing Adjusted R Squared values and AIC values of each model combination as well as their coefficients/slopes, p-values, and VIF values can aid in selecting or creating models that are the best fit for predicting/explaining the dependent variable.

Below I have provided three snapshots of the spatial correlation of the residual data of three different models for the 911 call data analysis.  "Residuals are the unexplained portion of the dependent variable."

This first snapshot is of the residuals from a bivariate model using Population values only to predict the number of 911 calls.  You can see there is clustering of data which is an indication that more variables are needed.

OLS Residual Output Results - Bivariate Model with "Pop" as the only independent/explanatory variable
The 2nd snapshot is of the residuals from a multivariate model of the same data but using three independent variables in the regression analysis (Pop, LowEduc, and Dst2UrbCen).  The multivariate regression analysis explained more of the variability in the dependent variable but there was still some clustering,
OLS Residual Output Results - Multivariate Model with "Pop", "LowEduc", and "Dst2UrbCtr" as the three independent/explanatory variables used in the model
The 3rd snapshot is of the residuals from the multivariate model selected from the Exploratory Regression tool results which used four independent variables (Pop, Jobs, LowEduc, and Dst2UrbCen).  This model was the best fit for the 911 Call Data and would more accurately predict the number of 911 calls from a particular census tract.

OLS Residual Output Results - Multivariate Model with "Pop", "Jobs", "LowEduc", and "Dst2UrbCtr" as the four independent/explanatory variables used in the model based on Exploratory Regression results

No comments:

Post a Comment