Read and Write Shapefiles

Authors

Andree Valle Campos

Laure Vancauwenberghe

Kene David Nwosu

Published

December 6, 2024


1 Introduction

• We built Thematic maps using {ggplot2}.

Figure 1. Thematic maps: (A) Choropleth map, (B) Dot map, (C) Density map with city roads as background, (D) Basemap below a Dot map.

• But, how can we use external data from other GIS software?

• What is the standard file format to store and share data?

Figure 2. sf package illustration by Allison Horst.

• Today we are going to read and write Shapefiles, and

• Dive into the components of a sf object!


2 Learning objectives

  1. Read Spatial data from Shapefiles using the read_sf() function from the {sf} package.

  2. Identify the components of sf objects.

  3. Identify the components of Shapefiles.

  4. Write Spatial data in Shapefiles using write_sf().


3 Prerequisites

This lesson requires the following packages:

Code
if(!require('pacman')) install.packages('pacman')

pacman::p_load(rnaturalearth,
               ggplot2,
               cholera,
               here,
               sf, 
               rgeoboundaries)

pacman::p_load_gh("afrimapr/afrilearndata")

4 Shapefiles

• The most common data format for storing Spatial data.

4.1 How to read Shapefiles?

• Using read_sf() from {sf}.

• From local files with a .shp extension,

• To a ready-to-use sf object.

• Let’s read the sle_adm3.shp file:

  1. Identify the file path up to the .shp filename:
"data/boundaries/sle_adm3.shp"
  1. Then, paste that path into read_sf() within here():
Code
shape_file <- read_sf(here("data/boundaries/sle_adm3.shp"))

• The output is an sf object and can be plotted using geom_sf():

Code
shape_file %>% class()

# Plot with geom_sf
# Your code here
shape_file %>% 
  ggplot() +
  geom_sf()

Read the shapefile called sle_hf.shp inside the data/healthsites/ folder. Use the read_sf() function:

Code
q1 <- read_sf(here("data/healthsites/sle_hf.shp"))
q1

• Wait! Shapefiles do not come alone!

• They came with a list of sub-component files.

• Let’s check at the files in the data/boundaries/ folder:

# A tibble: 4 × 1
  value       
  <chr>       
1 sle_adm3.dbf
2 sle_adm3.prj
3 sle_adm3.shp
4 sle_adm3.shx

• How are these files related with the sf object?

• Let’s now look under the hood to understand sf objects better.


5 Understanding sf objects

"sf" stands for Simple Features, a set of widely-used standards for storing geospatial information in databases.

• Now, what do sf objects look like and how do we work with them?

• We’ll look at a slice of the countries object:

Code
countries <- ne_countries(returnclass = "sf")

sf is a special kind of data frame

• We can manipulate it with {tidyverse} functions like dplyr::select().

• Let’s select three columns:

Code
countries %>% 
  select(name,    # country name
         pop_est) # estimated population
Simple feature collection with 177 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS:  WGS 84
First 10 features:
                       name   pop_est                       geometry
1                      Fiji    889953 MULTIPOLYGON (((180 -16.067...
2                  Tanzania  58005463 MULTIPOLYGON (((33.90371 -0...
3                 W. Sahara    603253 MULTIPOLYGON (((-8.66559 27...
4                    Canada  37589262 MULTIPOLYGON (((-122.84 49,...
5  United States of America 328239523 MULTIPOLYGON (((-122.84 49,...
6                Kazakhstan  18513930 MULTIPOLYGON (((87.35997 49...
7                Uzbekistan  33580650 MULTIPOLYGON (((55.96819 41...
8          Papua New Guinea   8776109 MULTIPOLYGON (((141.0002 -2...
9                 Indonesia 270625568 MULTIPOLYGON (((141.0002 -2...
10                Argentina  44938712 MULTIPOLYGON (((-68.63401 -...

• What do we see?

• 5-line header and a data frame.

5.1 The sf header

• The header provides context about the rest of the object.

• Let’s go through the most relevant sections:

5.2 Features and Fields

• Tells you the number of features and fields in the sf object:

👉Simple feature collection with 177 features and 2 fields👈
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: +proj=longlat +datum=WGS84

Features are the row of the data frame.

• In our countries dataset, each country is a feature.

Fields are the Attributes of each Feature.

• Equivalent to columns, not counting the “geometry” column.

Figure 3. Features and Fields of an `sf` object

The spData::nz dataset contains mapping information for the regions of New Zealand. How many features and fields does the dataset have?

Code
# Delete the wrong lines and run the correct line
q_nz_features_fields <- "C. 5 features and 4 fields"

5.3 Geometry

• Gives you the type of geometry in the sf object:

Simple feature collection with 177 features and 2 fields
👉Geometry type: MULTIPOLYGON👈
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: +proj=longlat +datum=WGS84

Geometry is a synonym for “shape”.

• Three main geometry types: points, lines and polygons.

• Each has its respective “multi” version: multipoints, multilines and multipolygons.

Fig: Geometry types and example maps for each. Points, lines (or linestrings) and polygons are the most common geometries you will encounter.

The ne_download() function from {rnaturalearth} can be used to obtain a map of major world roads, using the code below:

Code
roads <- 
  ne_download(scale = 10, 
              category = "physical",
              type = "geographic_lines", 
              returnclass = "sf") 

◘ What type of geometry is used to represent the rivers?

Code
# Delete the wrong lines and run the correct line:
q_rivers_geom_type <- "MULTILINESTRING"

• Each individual sf object can only contain one geometry type

• All points, all lines or all polygons.

• You will not find a mixture of geometries in a single sf object.

It is related with the geometry column of the sf dataframe

• The geometry column is the most special property of the sf data frame.

• It holds the core geospatial data (points, linestrings or polygons).

👉Geometry type: MULTIPOLYGON👈

First 10 features:
                                  👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇
                    name  pop_est                       geometry
0            Afghanistan 28400000 MULTIPOLYGON (((61.21082 35...
1                 Angola 12799293 MULTIPOLYGON (((16.32653 -5...
2                Albania  3639453 MULTIPOLYGON (((20.59025 41...
3   United Arab Emirates  4798491 MULTIPOLYGON (((51.57952 24...
4              Argentina 40913584 MULTIPOLYGON (((-65.5 -55.2...
5                Armenia  2967004 MULTIPOLYGON (((43.58275 41...
6             Antarctica     3802 MULTIPOLYGON (((-59.57209 -...
7 Fr. S. Antarctic Lands      140 MULTIPOLYGON (((68.935 -48....
8              Australia 21262641 MULTIPOLYGON (((145.398 -40...
9                Austria  8210281 MULTIPOLYGON (((16.97967 48...

• Some noteworthy points about this column:

• The geometry column can’t be dropped,

geom_sf() automatically recognizes the geometry column.

5.4 Geodetic CRS

• Tells us the Coordinate Reference System (CRS) used.

Simple feature collection with 177 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
👉Geodetic CRS: +proj=longlat +datum=WGS84👈

CRS relate the spatial elements of the data with the surface of Earth.

Figure 4. CRS components include (A) the Coordinate System (Longitude/Latitude) and (B) the Map Projection (e.g., Conical or Cylindrical)

• CRS are a key component of geographic objects.

We will cover them in detail later!


6 Delving into Shapefiles

• They are a collection of files,

• At least three files: .shp, .shx, and .dbf.

• Related with components of a sf object.

Figure 5. Shapefile is a collection of at least three files.

• Let’s see the component files of a Shapefile called sle_adm3.shp.

• All of them are located in the same data/boundaries/ folder:

# A tibble: 4 × 1
  value       
  <chr>       
1 sle_adm3.dbf
2 sle_adm3.prj
3 sle_adm3.shp
4 sle_adm3.shx

• What is inside each file?

  • .shp: contains the Geometry data,
  • .dbf: stores the Attributes (Fields) for each shape.
  • .shx: is a positional index that links each Geometry with its Attributes,
  • .prj: plain text file describing the CRS, including the Map Projection,

• These files can be compressed into a ZIP folder and shared!

All of these sub-component files must be present in a given directory (folder) for the shapefile to be readable.

Which of the following options of component files of Shapefiles:

  1. "shp"
  2. "shx"
  3. "dbf"

contains the Geometry data?

stores the Attributes for each shape?

6.1 How to write Shapefiles?

• Let’s write the countries object to an countries.shp file:

  1. Define the file path up to the .shp filename:
"data/newshapefile/countries.shp"
  1. Paste that path in write_sf() within here():
Code
countries %>% write_sf(here("data/newshapefile/countries.shp"))

• Now, all the components of a sf object are in four new files of one Shapefile:

# A tibble: 5 × 1
  value        
  <chr>        
1 countries.dbf
2 countries.prj
3 countries.shp
4 countries.shx
5 ignore.md    

7 Wrap up

• We read and write Shapefiles using the {sf} package,

• Identified the components of an sf object, and

• Their relation with the files within a Shapefile.

Figure 6. sf package illustration by Allison Horst.

• Now we need to dive into CRS’s.

• Learn how to manage their zoom and transform them!

• Follow along with the lessons to find how to train these skills!


Contributors

The following team members contributed to this lesson:


References

Some material in this lesson was adapted from the following sources:

This work is licensed under the Creative Commons Attribution Share Alike license. Creative Commons License


Answer Key

Practice Q1

Code
q1 <- sf::read_sf(here("data/healthsites/sle_hf.shp"))
q1
Simple feature collection with 44 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -13.26473 ymin: 8.358015 xmax: -13.0821 ymax: 8.490478
Geodetic CRS:  WGS 84
# A tibble: 44 × 12
    osm_id source addrfull building healthcare operatorty addrcity name  amenity
     <dbl> <chr>  <chr>    <chr>    <chr>      <chr>      <chr>    <chr> <chr>  
 1  3.20e9 UNMEE… <NA>     <NA>     <NA>       <NA>       <NA>     Chin… hospit…
 2  3.20e9 UNMEE… <NA>     <NA>     <NA>       <NA>       <NA>     Lakk… hospit…
 3  3.21e9 https… <NA>     <NA>     <NA>       <NA>       <NA>     Prin… hospit…
 4  3.34e9 MSF-CH <NA>     <NA>     <NA>       <NA>       <NA>     COMM… clinic 
 5  3.34e9 MSF-CH <NA>     <NA>     <NA>       <NA>       <NA>     Den … clinic 
 6  3.34e9 <NA>   <NA>     <NA>     <NA>       <NA>       <NA>     MABE… clinic 
 7  3.34e9 MSF-CH <NA>     <NA>     <NA>       <NA>       <NA>     MAYE… clinic 
 8  3.34e9 MSF-CH <NA>     <NA>     <NA>       <NA>       <NA>     HEAL… clinic 
 9  3.34e9 <NA>   <NA>     <NA>     <NA>       <NA>       <NA>     Dent… dentist
10  3.34e9 MSF-CH <NA>     <NA>     <NA>       <NA>       <NA>     GINE… clinic 
# ℹ 34 more rows
# ℹ 3 more variables: healthca_1 <chr>, capacitype <chr>, geometry <POINT [°]>

Practice Q2

The spData::nz dataset contains mapping information for the regions of New Zealand. How many features and fields does the dataset have?

Code
# The correct line is:
q_nz_features_fields <- "A. 16 features and 6 fields"

Practice Q3

The ne_download() function from {rnaturalearth} can be used to obtain a map of major world roads, using the code below:

Code
roads <- 
  ne_download(scale = 10, 
              category = "physical",
              type = "geographic_lines", 
              returnclass = "sf") 

What type of geometry is used to represent the rivers?

Code
# The correct line is:
q_rivers_geom_type <- "MULTILINESTRING"

Practice Q4 & Q5

Which of the following options of component files of Shapefiles: a. "shp" b. "shx" c. "dbf"

contains the Geometry data?

Code
# The correct line is:
q4 <- "shp"

stores the Attributes for each shape?

Code
# The correct line is:
q5 <- "dbf"