I noticed Ari Lamstein’s call for submissions to the R Shapefile Contest with interest. Commonly, we see spatial data in R used for visualization - e.g. choropleth maps. However, R has a massive ecosystem available to use spatial data in a wide variety of analyses that leverage its geographic properties.

I commonly read posts about whether spatial data is “special” or not - we geographers tend to say yes (see here: https://www.google.com/search?q=spatial+data+special). I like Justin Holman’s post on the topic (http://www.justinholman.com/2012/03/20/spatial-is-indeed-special/) which reminds us that attributes of geographic data tend to exhibit spatial autocorrelation, which means nearby data points in geographic space tend to be more similar than data points that are further away.

In turn, we are often interested in the concept of a neighborhood in spatial analysis, which refers to those data points that we consider to be proximate to a given focal data point. With areal vector data (polygons) this is not always straightforward, as there are multiple ways that we can measure proximity. In R, we accomplish this with the fantastic spdep package. spdep allows for the creation of objects of class nb, which are lists of vectors in which each vector contains the row positions of the neighboring units. You can read about the neighborhood functionality in the package here: https://cran.r-project.org/web/packages/spdep/vignettes/nb.pdf. I’m going to illustrate comparisons between neighborhoods for two common neighbor types: contiguity-based neighbors and distance-based neighbors.

Consider here the R object named spoly of class SpatialPolygonsDataFrame:

This all makes more sense with an interactive illustration - so let’s test this out with the Shiny app below. Click on any Census tract in Travis County, Texas to show its neighbors for the chosen neighborhood type. Note how the neighborhoods for any given tract change as you modify the options.

I’ve also included an applied example that shows how the choice of neighborhood can influence the results of a spatial analysis, available on the second tab. The map shows Getis-Ord \(G^{*}_{i}\) z-scores for median household income from the 2010-2014 American Community Survey. In a nutshell, \(G^{*}_{i}\) compares the local (neighborhood) sum of attribute values (in which the focal tract is included) to the global sum, which is then converted to a z-score. Following the spdep documentation:

High positive values indicate the possibility of a local cluster of high values of the variable being analysed, very low relative values a similar cluster of low values.

If you’re interested in learning more, take a look at Getis and Ord’s paper here.

Change the neighborhood type to see how the values change - and how it might influence the interpretation of the results. The app suggests that high values tend to be found to the west of downtown Austin, and low values to the east – but the map can vary significantly depending on how you modify the input parameters. Techniques like this can also fall victim to edge effects; computations for tracts on the edges of Travis County don’t incorporate their full “neighborhoods,” as only one county is included here!

This really only scratches the surface of this topic; there is a lot more to consider, such as more complex conceptualizations of spatial relationships, how to select a spatial weights matrix, and how to explore and model your data based on these relationships. For further reading, I’d recommend:

All code for this Shiny app can be found on GitHub at https://gist.github.com/walkerke/6915b02ac7f0c215bc2c75a687b3d269.