Explore and Plot by Vector Layer Attributes
Last updated on 2024-08-19 | Edit this page
Estimated time: 60 minutes
Overview
Questions
- How can I compute on the attributes of a spatial object?
Objectives
- Query attributes of a spatial object.
- Subset spatial objects using specific attribute values.
- Plot a vector feature, colored by unique attribute values.
This episode continues our discussion of vector layer attributes and covers how to work with vector layer attributes in R. It covers how to identify and query layer attributes, as well as how to subset features by specific attribute values. Finally, we will learn how to plot a feature according to a set of attribute values.
Load the Data
We will continue using the sf
and ggplot2
packages in this episode. Make sure that you have these packages loaded.
We will continue to work with the three ESRI shapefiles
(vector layers) that we loaded in a previous episode.
R
library(ggplot2)
library(dplyr)
library(sf)
R
point_HARV <-
st_read("data/NEON-DS-Site-Layout-Files/HARV/HARVtower_UTM18N.shp")
# If you are getting an error, check your file path:
# You might need change your file path to:
# "data/2009586/NEON-DS-Site-Layout-Files/HARV/HARVtower_UTM18N.shp"
lines_HARV <- st_read("data/NEON-DS-Site-Layout-Files/HARV/HARV_roads.shp")
# If you are getting an error, check your file path:
# You might need change your file path to:
# "data/2009586/NEON-DS-Site-Layout-Files/HARV/HARV_roads.shp"
aoi_boundary_HARV <-
st_read("data/NEON-DS-Site-Layout-Files/HARV/HarClip_UTMZ18.shp")
# If you are getting an error, check your file path:
# You might need change your file path to:
# "data/2009586/NEON-DS-Site-Layout-Files/HARV/HarClip_UTMZ18.shp"
Query Vector Feature Metadata
As we discussed in the Open and Plot Vector Layers in R episode, we can view metadata associated with an R object using:
-
st_geometry_type()
- The type of vector data stored in the object. -
st_bbox()
- The spatial extent (geographic area covered by) of the object. -
st_crs()
- The CRS (spatial projection) of the data.
We started to explore our point_HARV
object in the
previous episode. To see a summary of all of the metadata associated
with our object, we can view the object with
View(point_HARV)
or print a summary of the object itself to
the console. We’re going to do this, but with lines_HARV
today:
R
lines_HARV
OUTPUT
Simple feature collection with 13 features and 15 fields
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: 730741.2 ymin: 4711942 xmax: 733295.5 ymax: 4714260
Projected CRS: WGS 84 / UTM zone 18N
First 10 features:
OBJECTID_1 OBJECTID TYPE NOTES MISCNOTES RULEID
1 14 48 woods road Locust Opening Rd <NA> 5
2 40 91 footpath <NA> <NA> 6
3 41 106 footpath <NA> <NA> 6
4 211 279 stone wall <NA> <NA> 1
5 212 280 stone wall <NA> <NA> 1
6 213 281 stone wall <NA> <NA> 1
7 214 282 stone wall <NA> <NA> 1
8 215 283 stone wall <NA> <NA> 1
9 216 284 stone wall <NA> <NA> 1
10 553 674 boardwalk <NA> <NA> 2
MAPLABEL SHAPE_LENG LABEL BIKEHORSE RESVEHICLE RECMAP
1 Locust Opening Rd 1297.35706 Locust Opening Rd Y R1 Y
2 <NA> 146.29984 <NA> Y R1 Y
3 <NA> 676.71804 <NA> Y R2 Y
4 <NA> 231.78957 <NA> <NA> <NA> <NA>
5 <NA> 45.50864 <NA> <NA> <NA> <NA>
6 <NA> 198.39043 <NA> <NA> <NA> <NA>
7 <NA> 143.19240 <NA> <NA> <NA> <NA>
8 <NA> 90.33118 <NA> <NA> <NA> <NA>
9 <NA> 35.88146 <NA> <NA> <NA> <NA>
10 <NA> 67.43464 <NA> N R3 N
Shape_Le_1 ResVehic_1 BicyclesHo
1 1297.10617 R1 - All Research Vehicles Allowed Bicycles and Horses Allowed
2 146.29983 R1 - All Research Vehicles Allowed Bicycles and Horses Allowed
3 676.71807 R2 - 4WD/High Clearance Vehicles Only Bicycles and Horses Allowed
4 231.78962 <NA> <NA>
5 45.50859 <NA> <NA>
6 198.39041 <NA> <NA>
7 143.19241 <NA> <NA>
8 90.33114 <NA> <NA>
9 35.88152 <NA> <NA>
10 67.43466 R3 - No Vehicles Allowed DO NOT SHOW ON REC MAP
geometry
1 MULTILINESTRING ((730819.2 ...
2 MULTILINESTRING ((732040.2 ...
3 MULTILINESTRING ((732057 47...
4 MULTILINESTRING ((731903.6 ...
5 MULTILINESTRING ((732039.1 ...
6 MULTILINESTRING ((732056.2 ...
7 MULTILINESTRING ((731964 47...
8 MULTILINESTRING ((732105.2 ...
9 MULTILINESTRING ((732222.9 ...
10 MULTILINESTRING ((732153.8 ...
We can use the ncol
function to count the number of
attributes associated with a spatial object too. Note that the geometry
is just another column and counts towards the total.
R
ncol(lines_HARV)
OUTPUT
[1] 16
We can view the individual name of each attribute using the
names()
function in R:
R
names(lines_HARV)
OUTPUT
[1] "OBJECTID_1" "OBJECTID" "TYPE" "NOTES" "MISCNOTES"
[6] "RULEID" "MAPLABEL" "SHAPE_LENG" "LABEL" "BIKEHORSE"
[11] "RESVEHICLE" "RECMAP" "Shape_Le_1" "ResVehic_1" "BicyclesHo"
[16] "geometry"
We could also view just the first 6 rows of attribute values using
the head()
function to get a preview of the data:
R
head(lines_HARV)
OUTPUT
Simple feature collection with 6 features and 15 fields
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: 730741.2 ymin: 4712685 xmax: 732232.3 ymax: 4713726
Projected CRS: WGS 84 / UTM zone 18N
OBJECTID_1 OBJECTID TYPE NOTES MISCNOTES RULEID
1 14 48 woods road Locust Opening Rd <NA> 5
2 40 91 footpath <NA> <NA> 6
3 41 106 footpath <NA> <NA> 6
4 211 279 stone wall <NA> <NA> 1
5 212 280 stone wall <NA> <NA> 1
6 213 281 stone wall <NA> <NA> 1
MAPLABEL SHAPE_LENG LABEL BIKEHORSE RESVEHICLE RECMAP
1 Locust Opening Rd 1297.35706 Locust Opening Rd Y R1 Y
2 <NA> 146.29984 <NA> Y R1 Y
3 <NA> 676.71804 <NA> Y R2 Y
4 <NA> 231.78957 <NA> <NA> <NA> <NA>
5 <NA> 45.50864 <NA> <NA> <NA> <NA>
6 <NA> 198.39043 <NA> <NA> <NA> <NA>
Shape_Le_1 ResVehic_1 BicyclesHo
1 1297.10617 R1 - All Research Vehicles Allowed Bicycles and Horses Allowed
2 146.29983 R1 - All Research Vehicles Allowed Bicycles and Horses Allowed
3 676.71807 R2 - 4WD/High Clearance Vehicles Only Bicycles and Horses Allowed
4 231.78962 <NA> <NA>
5 45.50859 <NA> <NA>
6 198.39041 <NA> <NA>
geometry
1 MULTILINESTRING ((730819.2 ...
2 MULTILINESTRING ((732040.2 ...
3 MULTILINESTRING ((732057 47...
4 MULTILINESTRING ((731903.6 ...
5 MULTILINESTRING ((732039.1 ...
6 MULTILINESTRING ((732056.2 ...
Challenge 1: Attributes for Different Spatial Classes
Explore the attributes associated with the point_HARV
and aoi_boundary_HARV
spatial objects.
How many attributes does each have?
Who owns the site in the
point_HARV
data object?Which of the following is NOT an attribute of the
point_HARV
data object?
- Latitude B) County C) Country
- To find the number of attributes, we use the
ncol()
function:
R
ncol(point_HARV)
OUTPUT
[1] 15
R
ncol(aoi_boundary_HARV)
OUTPUT
[1] 2
- Ownership information is in a column named
Ownership
:
R
point_HARV$Ownership
OUTPUT
[1] "Harvard University, LTER"
- To see a list of all of the attributes, we can use the
names()
function:
R
names(point_HARV)
OUTPUT
[1] "Un_ID" "Domain" "DomainName" "SiteName" "Type"
[6] "Sub_Type" "Lat" "Long" "Zone" "Easting"
[11] "Northing" "Ownership" "County" "annotation" "geometry"
“Country” is not an attribute of this object.
Explore Values within One Attribute
We can explore individual values stored within a particular
attribute. Comparing attributes to a spreadsheet or a data frame, this
is similar to exploring values in a column. We did this with the
gapminder
dataframe in an earlier lesson. For spatial
objects, we can use the same syntax:
objectName$attributeName
.
We can see the contents of the TYPE
field of our lines
feature:
R
lines_HARV$TYPE
OUTPUT
[1] "woods road" "footpath" "footpath" "stone wall" "stone wall"
[6] "stone wall" "stone wall" "stone wall" "stone wall" "boardwalk"
[11] "woods road" "woods road" "woods road"
To see only unique values within the TYPE
field, we can
use the unique()
function for extracting the possible
values of a character variable.
R
unique(lines_HARV$TYPE)
OUTPUT
[1] "woods road" "footpath" "stone wall" "boardwalk"
Subset Features
We can use the filter()
function from dplyr
to select a subset of features from a spatial object in R, just like
with data frames.
For example, we might be interested only in features that are of
TYPE
“footpath”. Once we subset out this data, we can use
it as input to other code so that code only operates on the footpath
lines.
R
footpath_HARV <- lines_HARV %>%
filter(TYPE == "footpath")
nrow(footpath_HARV)
OUTPUT
[1] 2
Our subsetting operation reduces the features
count to
2. This means that only two feature lines in our spatial object have the
attribute TYPE == footpath
. We can plot only the footpath
lines:
R
ggplot() +
geom_sf(data = footpath_HARV) +
labs(title = "NEON Harvard Forest Field Site",
subtitle = "Footpaths") +
coord_sf()
There are two features in our footpaths subset. Why does the plot
look like there is only one feature? Let’s adjust the colors used in our
plot. If we have 2 features in our vector object, we can plot each using
a unique color by assigning a column name to the color aesthetic
(color =
). We use the syntax
mapping = aes(color = )
to do this. We’ll use the
OBJECTID
column for the color aesthetic in this example,
but because it is a number, we will have to convert it to a categorical
value (a factor) first. We can also alter the default line thickness by
using the linewidth =
parameter, as the default value can
be hard to see. Note that linewidth
is placed outside of
the aes()
function, as we are not connecting line thickness
to a data variable.
R
ggplot() +
geom_sf(data = footpath_HARV,
mapping = aes(color = factor(OBJECTID)), linewidth = 1.5) +
labs(color = "Footpath ID",
title = "NEON Harvard Forest Field Site",
subtitle = "Footpaths") +
coord_sf()
Now, we see that there are in fact two features in our plot!
First we will save an object with only the boardwalk lines:
R
boardwalk_HARV <- lines_HARV %>%
filter(TYPE == "boardwalk")
Let’s check how many features there are in this subset:
R
nrow(boardwalk_HARV)
OUTPUT
[1] 1
Now let’s plot that data:
R
ggplot() +
geom_sf(data = boardwalk_HARV, linewidth = 1.5) +
labs(title = "NEON Harvard Forest Field Site",
subtitle = "Boardwalks") +
coord_sf()
First we will save an object with only the stone wall lines and check the number of features:
R
stoneWall_HARV <- lines_HARV %>%
filter(TYPE == "stone wall")
nrow(stoneWall_HARV)
OUTPUT
[1] 6
Now we can plot the data:
R
ggplot() +
geom_sf(data = stoneWall_HARV,
mapping = aes(color = factor(OBJECTID)), linewidth = 1.5) +
labs(color = "Wall ID",
title = "NEON Harvard Forest Field Site",
subtitle = "Stonewalls") +
coord_sf()
Customize Plots
In the examples above, ggplot()
automatically selected
colors for each line based on a default color order. If we don’t like
those default colors, we can create a vector of colors - one for each
feature.
First we will check how many unique line types there are:
R
unique(lines_HARV$TYPE)
OUTPUT
[1] "woods road" "footpath" "stone wall" "boardwalk"
Then we can create a palette of four colors, one for each feature in our vector object.
R
road_colors <- c("blue", "green", "navy", "purple")
We can tell ggplot
to use these colors when we plot the
data.
R
ggplot() +
geom_sf(data = lines_HARV, mapping = aes(color = TYPE)) +
scale_color_manual(values = road_colors) +
labs(color = "Road Type",
title = "NEON Harvard Forest Field Site",
subtitle = "Roads & Trails") +
coord_sf()
Data Tip
Note that ggplot
automatically decides which colors
belong with which variables, based on the order of the colors in the
color vector, and the alphabetical order of the road types. You can
specify which colors to use for which road type by creating a named
vector:
R
road_colors <- c("stone wall" = "blue", "boardwalk" = "green",
"footpath" = "navy", "woods road" = "purple")
Adjust Line Width
We adjusted line width universally earlier. If we want a unique line width for each level or attribute category in our spatial object, we can use the same syntax that we used for colors, above.
We already know that we have four different TYPE
s in the
lines_HARV object, so we will set four different line widths.
R
line_widths <- c(1, 2, 3, 4)
We can use those line widths when we plot the data.
R
ggplot() +
geom_sf(data = lines_HARV,
mapping = aes(color = TYPE, linewidth = TYPE)) +
scale_color_manual(values = road_colors) +
scale_linewidth_manual(values = line_widths) +
labs(color = "Road Type",
linewidth = "Road Type",
title = "NEON Harvard Forest Field Site",
subtitle = "Roads & Trails - Line width varies") +
coord_sf()
Challenge 4: Plot Line Width by Attribute
In the example above, we set the line widths to be 1, 2, 3, and 4. Because R orders alphabetically by default, this gave us a plot where woods roads (the last type) were the thickest and boardwalks were the thinnest.
Let’s create another plot where we show the different line types with the following thicknesses:
- woods road width = 6
- boardwalks width = 1
- footpath width = 3
- stone wall width = 2
First we need to look at the levels of our factor to see what order the road types are in:
R
unique(lines_HARV$TYPE)
OUTPUT
[1] "woods road" "footpath" "stone wall" "boardwalk"
We then can create our line_width
vector setting each of
the levels to the desired thickness.
R
line_width <- c("woods road" = 6, "footpath" = 3,
"stone wall" = 2, "boardwalk" = 1)
Now we can create our plot.
R
ggplot() +
geom_sf(data = lines_HARV, mapping = aes(linewidth = TYPE)) +
scale_linewidth_manual(values = line_width) +
labs(title = "NEON Harvard Forest Field Site",
subtitle = "Roads & Trails - Line width varies") +
coord_sf()
Add Plot Legend
When we plot a vector object and specify colors, line widths, etc.
using the mapping = aes()
function, R will automatically
add a plot to our legend. By default, the plot legend will show up on
the right side of the plot and will include the values that we supplied
to the ggplot object. We can adjust the way our legends look in a
variety of ways.
- We specify the location of our legend by using a default keyword.
For example, we could use
bottomright
,top
,topright
, etc. - We can change the font size
- We can change the little things, like the color of the legend box outline
- And so much more!
Pretty much anything you would want to change on a plot, you can! We
do this using the theme()
function in R.
Let’s start with our basic plot. We will use the
road_colors
object that we created above to color the
legend. We’ll also move the legend to the bottom by specifying
theme(legend.position = "bottom")
.
R
ggplot() +
geom_sf(data = lines_HARV,
mapping = aes(color = TYPE), linewidth = 1.5) +
scale_color_manual(values = road_colors) +
labs(color = "Road Type",
title = "NEON Harvard Forest Field Site",
subtitle = "Roads & Trails - Default Legend") +
coord_sf() +
theme(legend.position = "bottom")
Now, let’s change the appearance of our legend by manually setting
different parameters using the theme()
function.
We can start by changing the font size in the legend. The way we
would do this is by adding the following line to our plot code:
theme(legend.text = element_text(size = 20))
This may seem a little confusing, but there is a method to the
madness. First, we note what we’re trying to change (the
legend.text
argument). Next, we have to specify what kind
of object it needs to be. Since we’re altering the legend text,
it needs to be a text element, so we use the element_text()
function. Within this function, we can specify what we want to change
from the default. In this case, it’s the size of the font
(size = 20
), but we could do this for things like color,
the face (plain, italic, bold, bold.italic), the angle of rotation for
the text, etc.
R
ggplot() +
geom_sf(data = lines_HARV,
mapping = aes(color = TYPE), linewidth = 1.5) +
scale_color_manual(values = road_colors) +
labs(color = "Road Type",
title = "NEON Harvard Forest Field Site",
subtitle = "Roads & Trails - Default Legend") +
coord_sf() +
theme(legend.position = "bottom",
legend.text = element_text(size = 20))
We can decide to have a box around the legend, by altering the
legend.box.background
to have a black border. Since the
legend is a rectangle of sorts, we will use the element:
element_rect()
to adjust these parameters.
So our new line would be:
theme(legend.box.background = element_rect(color = "black"))
R
ggplot() +
geom_sf(data = lines_HARV,
mapping = aes(color = TYPE), linewidth = 1.5) +
scale_color_manual(values = road_colors) +
labs(color = "Road Type",
title = "NEON Harvard Forest Field Site",
subtitle = "Roads & Trails - Default Legend") +
coord_sf() +
theme(legend.position = "bottom",
legend.text = element_text(size = 20),
legend.box.background = element_rect(color = "black"))
Data Tip
You can modify the default R color palette using the palette method.
For example palette(rainbow(6))
or
palette(terrain.colors(6))
. You can reset the palette
colors using palette("default")
!
You can also use colorblind-friendly palettes such as those in the viridis package.
Challenge 5: Plot Lines by Attribute
Create a plot that emphasizes only roads where bicycles and horses
are allowed. To emphasize this, make the lines where bicycles are not
allowed THINNER than the roads where bicycles are allowed. NOTE: this
attribute information is located in the
lines_HARV$BicyclesHo
attribute.
Be sure to add a title and legend to your map. You might consider a color palette that has all bike/horse-friendly roads displayed in a bright color. All other lines can be black.
First we explore the BicyclesHo
attribute to learn the
values that correspond to the roads we need.
R
lines_HARV %>%
pull(BicyclesHo) %>%
unique()
OUTPUT
[1] "Bicycles and Horses Allowed" NA
[3] "DO NOT SHOW ON REC MAP" "Bicycles and Horses NOT ALLOWED"
Now, we can create a data frame with only those roads where bicycles and horses are allowed.
R
lines_showHarv <-
lines_HARV %>%
filter(BicyclesHo == "Bicycles and Horses Allowed")
Finally, we plot the needed roads after setting them to magenta and a thicker line width.
R
ggplot() +
geom_sf(data = lines_HARV) +
geom_sf(data = lines_showHarv,
mapping = aes(color = BicyclesHo), linewidth = 2) +
scale_color_manual(values = "magenta") +
labs(title = "NEON Harvard Forest Field Site",
subtitle = "Roads Where Bikes and Horses Are Allowed",
color = NULL) +
coord_sf()
Challenge 6: Plot Polygon by Attribute
Create a map of the state boundaries in the United States using the
data located in your downloaded data folder:
NEON-DS-Site-Layout-Files/US-Boundary-Layers\US-State-Boundaries-Census-2014
.
Note that this dataset includes Z and M coordinates, which we do not
need. Use the st_zm()
function to drop these dimensions
before plotting.
When plotting, apply a line color to each state using its
region
value. Be sure to add a legend to your plot as
well.
First we read in the data and check how many levels there are in the
region
column:
R
state_boundary_US <-
st_read("data/NEON-DS-Site-Layout-Files/US-Boundary-Layers/US-State-Boundaries-Census-2014.shp") %>%
# NOTE: We need neither Z nor M coordinates!
st_zm()
OUTPUT
Reading layer `US-State-Boundaries-Census-2014' from data source
`/Users/echelleburns/Documents/2024-07-01-ucsb-intro-geospatial/site/built/data/NEON-DS-Site-Layout-Files/US-Boundary-Layers/US-State-Boundaries-Census-2014.shp'
using driver `ESRI Shapefile'
Simple feature collection with 58 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension: XY, XYZ
Bounding box: xmin: -124.7258 ymin: 24.49813 xmax: -66.9499 ymax: 49.38436
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
R
# If you are getting an error, check your file path:
# You might need change your file path to:
# "data/2009586/NEON-DS-Site-Layout-Files/US-Boundary-Layers/US-State-Boundaries-Census-2014.shp"
state_boundary_US$region <- as.factor(state_boundary_US$region)
levels(state_boundary_US$region)
OUTPUT
[1] "Midwest" "Northeast" "Southeast" "Southwest" "West"
Next we set a color vector with that many items:
R
colors <- c("purple", "springgreen", "yellow", "brown", "navy")
Now we can create our plot:
R
ggplot() +
geom_sf(data = state_boundary_US,
mapping = aes(color = region), size = 1) +
scale_color_manual(values = colors) +
labs(title = "Contiguous U.S. State Boundaries") +
coord_sf()