Writing Data
Last updated on 2024-08-19 | Edit this page
Estimated time: 20 minutes
Overview
Questions
- How can I save plots and data created in R?
Objectives
- To be able to write out plots and data from R.
Learners will need to have created the directory structure described in Project Management With RStudio in order for the code in this episode to work.
First, let’s load in all relevant libraries and data to be used in this lesson:
R
library(ggplot2)
library(dplyr)
gapminder <- read.csv("data/gapminder_data.csv", header = TRUE)
We also need to create a cleaned-data
folder within the
data
folder and a figures
folder within the
main project folder. We can do this manually or using code:
R
dir.create("data/cleaned-data")
dir.create("figures")
Saving plots
You can save a plot from within RStudio using the ‘Export’ button in the ‘Plot’ window. This will give you the option of saving as a .pdf or as .png, .jpg or other image formats.
Sometimes you will want to save plots without creating them in the
‘Plot’ window first. Perhaps you want to make a pdf document, for
example. Or perhaps you’re looping through multiple subsets of a file,
plotting data from each subset, and you want to save each plot. In this
case you can use flexible approach. The ggsave
function
saves the latest plot by default. You can control the size and
resolution using the arguments to this function.
R
ggplot(data = gapminder, mapping = aes(x = gdpPercap)) +
geom_histogram()
ggsave("figures/Distribution-of-gdpPercap.pdf", width=12, height=4)
Open up this document and have a look.
R
gapminder_small <- gapminder %>%
filter(continent == "Americas" & year %in% c(1952, 2007))
ggplot(data = gapminder_small,
mapping = aes(x = country, y = gdpPercap, fill = as.factor(year))) +
geom_col(position = "dodge") +
coord_flip()
# Note that ggsave saves by default the latest plot.
ggsave("figures/Distribution-of-gdpPercap.pdf", width = 12, height = 4)
To produce documents in different formats, change the file extension
for jpeg
, png
, tiff
, or
bmp
.
Writing data
At some point, you’ll also want to write out data from R.
We can use the write.csv
function for this, which is
very similar to read.csv
from before.
Let’s create a data-cleaning script, for this analysis, we only want to focus on the gapminder data for Australia:
R
aust_subset <- gapminder %>%
filter(country == "Australia")
write.csv(aust_subset,
file="data/cleaned-data/gapminder-aus.csv"
)
Let’s open the file to make sure it contains the data we expect.
Navigate to your cleaned-data
directory and double-click
the file name. It will open using your computer’s default for opening
files with a .csv
extension. To open in a specific
application, right click and select the application. Using a spreadsheet
program (like Excel) to open this file shows us that we do have properly
formatted data including only the data points from Australia. However,
there are row numbers associated with the data that are not useful to us
(they refer to the row numbers from the gapminder data frame).
Let’s look at the help file to work out how to change this behaviour.
R
?write.csv
By default R will write out the row and column names when writing data to a file. To over write this behavior, we can do the following:
R
write.csv(
aust_subset,
file="data/cleaned-data/gapminder-aus.csv",
row.names=FALSE
)
R
gapminder_after_1990 <- gapminder %>%
filter(year > 1990)
write.csv(gapminder_after_1990,
file = "cleaned-data/gapminder-after-1990.csv",
row.names = FALSE)
- Now that learners know the fundamentals of R, the rest of the workshop will apply these concepts to working with geospatial data in R.
- Packages and functions specific for working with geospatial data will be the focus of the rest of the workshop.
- They will have lots of challenges to practice applying and expanding these skills in the next lesson.