Sunday, February 21, 2016

GIS 3015 - Module 6 - Data Classification

Comparison of Data Classification Methods
Total Population 65 Years Old or Older Per Square Mile
In this week's lab assignment we learned about four common data classification methods.  Using data from the 2010 US Census we mapped Dade County, Florida by Census Tract using the different classification methods.  The Equal Interval method creates class intervals by dividing the range of data by the number of classes.  The Quantile method rank orders the data and an equal number of observations are placed in each class.  The Quantile method is useful when classifying ordinal-level data.  The class intervals are calculated by dividing the total number of observations by the number of classes.   Both of these methods are easy to use because the intervals can be determined with simple calculations and the intervals in both methods are easy for the map audience to interpret.  When using the Equal Interval method, the legend limits match the lower and upper limits of each class; i.e. no "gaps".  The Quantile method however, can cause there to be data "gaps" which can be confusing to the map reader.  More importantly, both of these methods fail to consider the distribution of the data.  Outlier data observations can be grouped or classed with distinctly different observations.   Also, some class intervals might not have any data observations.

The Mean-Standard Deviation method does consider how the data is distributed long the number line. Class intervals are created by calculating the data mean and adding and subtracting the standard deviation from this mean.  This method assumes that the data is normally distributed.  If the created class intervals have a negative range they may be empty due to no observations in this range.  The Mean-Standard Deviation method assumes that the map audience has some level of statistical training in order to interpret the mapped data.  The final method we reviewed is the Natural Breaks method. This method considers the "natural" groupings present in the data.  Sometimes, data can be visually examined to determine logical breaks.   This method minimizes the differences between the data values within a class and maximizes the data differences between the classes.  An obvious disadvantage is that this method can be subjective in that natural break decisions can vary among map creators.

Our first map employed the four different classification methods described above to display the Percent of Population 65 years old and older in Dade County, Florida by Census Tract.   Using the symbology tab we used graduated color schemes to display the PCT_65ABV field using the four classification methods.  I chose to create a sequential single hued green color scheme from Color Brewer to create my own color scheme in ArcMap.  Because the lower values were a light shade of green I chose to have a grey background behind the Dade County Tracts for each classification data frame.  This enabled the user to more readily see the lighter hues especially along the coastlines.  To invoke figure-ground I chose to make the background of the frameline a darker shade of grey.  Each data frame has its own legend to display the data ranges created for each class interval under each method.  Once I stylized one legend, I realized I could copy and paste the legend into another data frame and have it use the data layer in that corresponding data frame.  This helped me tremendously as stylizing the map usually takes up the majority of my time devoted to the lab assignment.  I included essential map elements after ensuring that each data frame was presented at the same scale.

Our second map used the same U.S. Census Bureau shapefile and four different classification methods, but this time we used the population field of AGE_65_UP and normalized the data by area in square miles.   This map displays the population density of persons 65 years old or older.   This is a more accurate depiction of the data.   The data in map 1 can be misleading because a census tract may have a high percentage of seniors residing in that tract but there may not be a large population of people in that tract.   The first map also doesn't take into consideration the area of the county tract.  A large tract may be sparsely populated and a small tract may be densely populated.  By focusing the attention on the tracts with the highest senior population density one could target the most seniors in the least amount of coverage area.  In my opinion the second map (image shown above), presents the data in the most useful manner.

Viewing and analyzing the attribute table for the Dade County shapefile helped me understand the concepts and differences of the four data classification methods.   I did make a mistake of running a statistical tool by accident on one of the field columns.   I had to replace the shapefile with the original and thankfully all of my style and symbology edits were preserved.   This lab made me realize another aspect of understanding your map audience.   It is important to know why and how a map will be used.   In addition to providing an aesthetically pleasing map, knowing your data and how to present it is even more important.   In the case of this lab assignment, how the data is presented could impact the interpretation and therefore the use of the map data to make to future decisions.

No comments:

Post a Comment