We used the analysis results to predict a selling price for a house with 4 bedrooms and a lot size of 5100. Next we added data for an outlier feature and reran the analyses and re-predicted a selling price.
In Part B we continued using Excel to create a correlation coefficient matrix to help determine which variables to use in a multivariate regression analysis of cirrhosis data. Comparing Adjusted R Squared values of different variable combinations helped determine the best fit model.
In Part C we again used Excel to perform a multivariate regression using the provided variable combination to analyze 911 call data from Portland, Oregon. Using a provided shapefile created from the same 911 call data we used the Spatial Statistics Tool, Ordinary Least Squares (OLS) in ArcGIS to analyze both a one independent variable (bivariate) model and a three independent variable (multivariate) model. Using the output results from these tools enabled us to perform the six checks to determine if we had a properly specified and a statistically sound model.
In Part D we used the Exploratory Regression tool to find the OLS model that best explains/predicts the dependent variable. The exploratory regression report summarizes the statistical data of the different independent/explanatory variables as well as combinations of these variables. The diagnostic data and checks that the exploratory regression report provides can be used to select combinations of explanatory variables that would best model/predict the dependent variable. Comparing Adjusted R Squared values and AIC values of each model combination as well as their coefficients/slopes, p-values, and VIF values can aid in selecting or creating models that are the best fit for predicting/explaining the dependent variable.
Below I have provided three snapshots of the spatial correlation of the residual data of three different models for the 911 call data analysis. "Residuals are the unexplained portion of the dependent variable."
This first snapshot is of the residuals from a bivariate model using Population values only to predict the number of 911 calls. You can see there is clustering of data which is an indication that more variables are needed.
OLS Residual Output Results - Bivariate Model with "Pop" as the only independent/explanatory variable |
OLS Residual Output Results - Multivariate Model with "Pop", "LowEduc", and "Dst2UrbCtr" as the three independent/explanatory variables used in the model |
No comments:
Post a Comment