In the geospatial data visualization, maps are powerful storytelling tools. However, a map without clear annotations and labels is like a book without titles or chapter headings. While the story might still be there, it becomes significantly harder to understand, interpret, and appreciate.
In this lesson, we place a special emphasis on the importance of annotating and labeling maps. Proper annotation transforms a simple visualization into an informative guide, allowing viewers to quickly grasp complex spatial data. With precise labeling, areas of interest can be immediately recognized, facilitating a clearer comprehension of the data’s narrative.
Before diving into any visualization or analysis, it’s essential to load and preprocess our data. This includes reading in datasets, merging related information, and filtering out unnecessary or irrelevant entries.
Code
# Reading the geographical shapefile datanga_adm1 <- sf::st_read(here::here("data/raw/NGA_adm_shapefile/NGA_adm1.shp"))
Reading layer `NGA_adm1' from data source
`C:\Users\Hon.Olayinka\Documents\GitHub\GRAPH\Data Reporting\Week 2\Labelling_Map\data\raw\NGA_adm_shapefile\NGA_adm1.shp'
using driver `ESRI Shapefile'
Simple feature collection with 38 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668431 ymin: 4.270418 xmax: 14.67642 ymax: 13.89201
Geodetic CRS: WGS 84
The geographical shapefile data for Nigeria’s administrative boundaries (like states or provinces) is read and stored in the nga_adm1 object.
Code
# Reading the attribute data related to malaria casesmalaria_cases <-read_csv(here::here("data/malaria.csv"))
This step loads data related to malaria cases in different states of Nigeria.
Code
# Filtering out the 'Water body' entries from the geographical datanga_adm1 <-filter(nga_adm1, NAME_1 !="Water body")# Merging the geographical data with the malaria cases datamalaria <- malaria_cases %>%left_join(nga_adm1, by =c("state_name"="NAME_1")) %>%st_as_sf()
Here, we’re combining the geographical boundaries data with the malaria cases data using state names as a reference. This merged data is then converted into a format suitable for geospatial visualizations.
Code
# Filtering down to essential columns for our analysismalaria2 <- malaria %>%select(state_name, cases_2000, cases_2006, cases_2010, cases_2015, cases_2021, geometry)
We’re narrowing down our dataset to specific columns, mainly the state names, the malaria cases from various years, and the geographical boundaries of these states (i.e. geometry).
Code
# Reading in population data for different regions of Nigeriapopulation_nigeria <-read_csv(here::here("data/population_nigeria.csv"))
This step loads data indicating the population of different states or regions in Nigeria.
Code
# Combining the population data with our malaria datamalaria3 <- malaria2 %>%left_join(population_nigeria, by =c("state_name")) %>%st_as_sf()
By merging the population data with our malaria cases data, we enrich our dataset. This combined data allows for more comprehensive visualizations and analyses, such as calculating incidence rates or assessing trends relative to population size.
To address aesthetic concerns with labeling a small area like the “Federal Capital Territory” on a map, we can modify the labels before plotting. We can do this by creating a new column for the modified state names or by directly changing the state_name column within the malaria3 dataset. Given that the region in question has a small surface area and the full name may overflow its borders, here’s how you can adjust the name to “Capital” for better display:
Code
# Modify the state names, changing "Federal Capital Territory" to "Capital"malaria3$state_name <-ifelse(malaria3$state_name =="Federal Capital Territory", "Capital", malaria3$state_name)
0.4 Building a Simple Choropleth Map
Choropleth maps are powerful visualization tools that display divided geographical areas shaded or patterned in proportion to the value of a variable.
In this example, we’ll be using a choropleth map to visualize the distribution of malaria cases across different regions of Nigeria for the year 2021.
Code
# Constructing the choropleth map using ggplot2ggplot(data=malaria3) +geom_sf(aes(fill=cases_2021/1000)) +# The fill color is determined by the number of malaria cases in 2021, scaled per 1000labs(title ="Nigeria Malaria Distributed Cases in 2021", fill ="Cases per 1000")+# Adding labels and title to the plotscale_fill_continuous(low ="green", high ="red")+# Using a continuous color scale transitioning from green to redtheme_void() # Using a minimal theme for better visualization of the map
Here’s a detailed explanation of the code:
ggplot(data=malaria3): Initiates a ggplot object using the malaria3 dataset.
geom_sf(aes(fill=cases_2021/1000)): Adds the geographical data from malaria3 and fills each region based on the number of malaria cases in 2021, scaled down by a factor of 1000. This effectively represents the number of cases per thousand people.
labs(title = "Nigeria Malaria Distributed Cases in 2021", fill = "Cases per 1000"): Specifies the title of the plot and the label for the color scale.
scale_fill_continuous(low = "green", high = "red"): Applies a continuous color scale where regions with fewer cases are colored green and regions with more cases are colored red.
theme_void(): Removes axis text, ticks, and other non-data ink to emphasize the map.
The resulting plot provides a clear view of how malaria cases are distributed across Nigeria in 2021, with the color intensity indicating the magnitude of cases in each region.
0.4.1 Q: Modify the provided choropleth map to visualize the distribution of malaria cases in Nigeria for the year 2015.
Instructions
Update the data mapping in the geom_sf() function to reflect malaria cases for the year 2015.
Adjust the title in the labs() function to indicate that the visualization pertains to 2015.
Change the color gradient in scale_fill_continuous() to transition from blue (low cases) to yellow (high cases).
Below a starter code:
Code
# Constructing the choropleth map for 2015 using ggplot2ggplot(data=malaria3) +geom_sf(aes(fill=___________)) +# Fill in the correct data column for 2015labs(title ="______________________", fill ="Cases per 1000")+# Update the title appropriatelyscale_fill_continuous(___________)+# Modify the color scaletheme_void()
0.5 Adding Continuous Data Indicators to the Choropleth Map
When analyzing disease data, it’s often useful to look beyond raw case numbers and focus on rates, particularly incidence rates. The incidence rate provides a normalized measure that can account for differing population sizes across regions, making comparisons more meaningful.
Understanding Incidence
The incidence of a disease is a cornerstone of epidemiological research. It quantifies the number of new cases of a disease that occur within a specific time frame, typically a year, relative to a population at risk.
Mathematically, it’s given by:
\[ \text{Incidence} = \frac{\text{Number of new cases during the time period}}{\text{Population at risk at the start of the period}} \]
In this context, we’re looking at the incidence of malaria in different states of Nigeria for the year 2021. Specifically, we’ll compute the incidence rate by dividing the number of new malaria cases in 2021 by the population of each state from the last available census in 2019.
The following R code accomplishes this and visualizes the data:
Code
# Visualizing Malaria Incidence in 2021 using a Choropleth Mapggplot(data = malaria3) +# Filling each state with a color representing the incidence rate in 2021.geom_sf(aes(fill =round(cases_2021/population_2019, 2))) +# Add the name of each state to its centroidgeom_sf_text(aes(label =str_wrap(state_name, 1)), size =2)+# Adding title and legend title.labs(title ="Nigeria Malaria Incidence in 2021", fill ="Incidence")+# Using a continuous color scale from green (low incidence) to red (high incidence).scale_fill_continuous(low ="green", high ="red")+# Using a minimalistic theme for clearer visualization.theme_void()
Here’s a brief explanation of the visualization:
The geom_sf() function creates a choropleth map where each state’s color represents its malaria incidence rate in 2021.
The geom_sf_text() function labels each state with its specific incidence rate, positioning automaticaly each label at the state’s centroid.
The str_wrap() function is part of the {stringr} package and is being used here to wrap the text of state_name. Since the second argument to str_wrap() is 1, this would cause the state names to be wrapped after every character, to avoid overlapping.
The color scale, transitioning from green to red, visually emphasizes regions with higher incidence rates.
This visualization offers an immediate grasp of the malaria situation in Nigeria, revealing areas of high incidence that might need more focused public health interventions.
0.6 Exploring Malaria Case Increase Rates Using Choropleth Maps
Understanding the change in the number of disease cases over time can provide insights into the effectiveness of interventions, the progression of the disease, and areas where increased resources might be needed. In this context, we’re looking to visualize the percentage increase in malaria cases from 2015 to 2021 across different states in Nigeria.
Computing the Increase Rate
The increase rate for each state is computed as:
\[ \text{Increase Rate} = \left( \frac{\text{Cases in 2021} - \text{Cases in 2015}}{\text{Cases in 2015}} \right) \times 100\% \]
This formula provides the percentage growth (or decrease) in malaria cases from 2015 to 2021.
Visualization of Increase Rates
Let’s delve into the code that accomplishes this visualization:
Code
# Calculate the increase rate for each statemalaria3 %>%mutate(increase_rate =round(((cases_2021 - cases_2015) / cases_2015) *100)) -> malaria3# Visualizing the increase rates using a choropleth mapggplot(data = malaria3) +# Coloring each state based on its increase rategeom_sf(aes(fill = increase_rate), color="white", size =0.2) +# Using a continuous color scale to represent increase rates, transitioning from green to redscale_fill_continuous(name="Increase rate (%)", limits=c(0,70), low ="green", high ="red", breaks=c(0, 20, 40, 60))+# Labeling each state with its increase rate percentagegeom_sf_text(aes(label =str_wrap(paste0(increase_rate, "%"), 1)), size =2)+# Adding a title to the plotlabs(title ="Nigeria Malaria Increase rate in 2021 compared with 2015")+theme_void()
In this visualization:
The geom_sf() function creates the choropleth map where each state’s color intensity represents its malaria increase rate.
scale_fill_continuous() sets the color scale for the increase rates, making areas of higher increase more prominent.
geom_sf_text() adds labels to each state.
str_wrap() ensures that the text does not wrap (since it’s set to wrap at 1 character).
The resulting map allows us to quickly identify regions with significant growth in malaria cases over the selected period.
Through this visualization, you can pinpoint regions where malaria is on the rise and potentially allocate resources more effectively.
0.7 Labeling the Choropleth Map with State Names
In a choropleth map, while color gradients offer a visual cue to understand the distribution of a variable across regions, adding labels can significantly enhance the clarity of the visualization. This is especially true when audiences might not be familiar with all geographical boundaries shown. In the case of our malaria dataset, adding state names to the map makes the data more accessible and understandable.
Code
# Constructing a choropleth map with state namesggplot(data = malaria3) +# Fill each state based on the number of malaria cases in 2021, scaled per 10,000geom_sf(aes(fill = cases_2021/10000)) +# Add the name of each state to its centroidgeom_sf_text(aes(label =str_wrap(state_name, 1)), size =2)+# Add titles and labelslabs(title ="Nigeria Malaria cases in 2021", fill ="Cases/10000")+# Use a continuous color scale from green (low case numbers) to red (high case numbers)scale_fill_continuous(low ="green", high ="red")+theme_void()
Here’s a detailed explanation of the visualization:
geom_sf(aes(fill = cases_2021/10000)): This creates the choropleth map where each state’s color represents the number of malaria cases in 2021, scaled down by a factor of 10,000.
labs(title = "Nigeria Malaria cases in 2021", fill = "Cases/10000"): This function adds a title to the plot and a label to the color scale.
scale_fill_continuous(low = "green", high = "red"): This sets the color scale for the map, transitioning from green for states with fewer cases to red for those with more cases.
The resulting visualization is a clear and informative map of malaria cases across Nigeria in 2021, with each state labeled for easy reference.
0.8 Displaying Combined State Names and Increase Rates on the Choropleth Map
Visualizations can convey a wealth of information when they incorporate multiple data points in an intuitive manner. By pairing state names with their corresponding increase rates, we can provide a richer, more detailed view of the data without overwhelming the audience.
Let’s delve into this visualization:
We want to present a choropleth map showcasing the malaria cases per 10000 residents across Nigerian states in 2021, with labels that combine state names and their respective increase rates from 2015.
Code
# Combine the state names and their respective increase rates into a single labelmalaria3$label_text <-paste(malaria3$state_name, malaria3$increase_rate)# Visualize the dataggplot(data = malaria3) +# Create a choropleth map shaded based on the number of malaria cases in 2021 per 10,000 residentsgeom_sf(aes(fill = cases_2021/10000)) +# Add combined labels (state name and increase rate) to each state's centroidgeom_sf_text(aes(label =str_wrap(label_text, 2)), size =2)+# Add titles and customize the color legendlabs(title ="Nigeria Malaria cases in 2021 (%)", fill ="Cases/10000")+scale_fill_continuous(low ="green", high ="red") +# Set color gradienttheme_void() # Apply a minimal theme for clarity
In this visualization:
The line malaria3$label_text <- paste(malaria3$state_name, malaria3$increase_rate) constructs our combined labels by concatenating the state name with its increase rate
scale_fill_continuous(low = "green", high = "red") assigns a color gradient based on the number of malaria cases. States with fewer cases will be colored green, transitioning to red for states with more cases.
This approach allows us to efficiently communicate two important pieces of information (cases per 10000 and increase rate) within the same visualization while keeping the name of states for easy identification.
0.8.1 Q: Create a single choropleth map that showcases the malaria increase rates across different states in Nigeria with each state labeled by its name
Instructions
Start by setting up the base plot using the malaria3 dataset.
Create a choropleth map where the fill color represents the increase rate from 2015 to 2021.
Label each state with its name using the centroids.
Customize the color scale to transition from green (low increase) to red (high increase).
Add appropriate titles and labels to the plot.
Below a starter code:
Code
# Combining visualization of increase rates with state namesggplot(data = malaria3) + _______ +# Fill in the code to generate the choropleth map _______ +# Add state names to each region _______ +# Specify the color scale for increase rates _______ # Add titles and labels
0.9 Highlighting a Specific Region on the Map while Preserving Context
Sometimes, you might want to draw attention to a particular area or region on your map, without omitting details from the surrounding areas. By using specific graphical elements, like larger markers or distinctive colors, you can emphasize certain regions while still showcasing the broader data. In this example, we’re focusing on the “Kano” region of Nigeria.
Code
# Calculate centroid coordinates for labelingcentroid_coords <-st_coordinates(st_centroid(malaria3$geometry))# Visualize the malaria cases across Nigeria with an emphasis on Kanoggplot(data = malaria3) +# Create a choropleth map with color based on malaria cases in 2021geom_sf(aes(fill = cases_2021/1000)) +# Set a continuous color gradient from green to redscale_fill_continuous(low ="green", high ="red")+# Overlay a point on Kano to emphasize it. The size of the point corresponds to the number of casesgeom_point(data =subset(malaria3, state_name =="Kano"),aes(x =st_coordinates(st_centroid(geometry))[1], y =st_coordinates(st_centroid(geometry))[2], size =round(cases_2021/1000)),color ="black", shape =22, fill ="transparent") +# Label each region with its namegeom_sf_text(aes(label =str_wrap(state_name, 1)), size =3)+# Customize the scale of the size of the emphasized pointscale_size_continuous(range =c(15, 20)) +# Add title and legendslabs(title ="Kano Malaria cases in 2021", subtitle ="Data from the Report on Malaria in Nigeria 2022", size ="Kano cases", fill ="Cases/10000",)+# Apply a minimal theme for claritytheme_void()
In this visualization:
The geom_point() function is utilized to lay a circle over the Kano region. The size of the circle signifies the number of cases in Kano in 2021. The circle is designed transparent (fill = "transparent") with a bold black boundary (color = "black") to set it apart.
The scale_size_continuous() function used to customize the size scale for points (like the one on Kano), setting the range between 15 and 20. This ensures that the point size for Kano stands out relative to other elements on the map.
While “Kano” is emphasized, all other regions are also displayed with their respective color shading based on malaria cases. This offers a holistic view of the situation across Nigeria, enabling viewers to compare Kano with other regions.
Such an approach is invaluable when you wish to spotlight specific details or areas of interest without sidelining the broader dataset, enriching your data presentations.
0.10 Labeling Point Locations: Exploring Malaria Positive Rate and Incidence
Mapping and visualizing specific data points on a geographical map can provide crucial insights, especially when dealing with epidemiological data. Let’s delve into the code to understand the processes and the visualization we’re aiming to achieve:
Code
# Data Retrieval# The malariaAtlas package provides the `getPR` function to access the parasite rate (PR).# PR is an essential indicator of malaria prevalence.# Fetching data for Nigeria for both malaria speciesnigeria_pr <- malariaAtlas::getPR(ISO ="NGA", species ="both") %>%# Filtering out records with missing PR valuesfilter(!is.na(pr)) %>%# Removing any rows with missing longitude or latitudedrop_na(longitude, latitude) %>%# Converting the data into a spatial dataframe to facilitate mappingst_as_sf(coords =c("longitude", "latitude"), crs =4326)
Start tag expected, '<' not found
Start tag expected, '<' not found
Start tag expected, '<' not found
Start tag expected, '<' not found
This chunk fetches malaria prevalence data for Nigeria. After retrieval, the data is cleansed by filtering out missing values. It’s then transformed into a format suitable for geospatial visualization (sf object).
Code
# Setting up a geospatial visualization using ggplot2# Starting the plotggplot() +# Plotting the administrative boundaries of Nigeriageom_sf(data = nga_adm1) +# Adding points for each testing location# The color of each point indicates if the test was positive, and the size represents the number of people testedgeom_sf(aes(size = examined, color = positive), data = nigeria_pr)+# Setting titles, axis labels, and legends for the plotlabs(title ="Location of Positive and Examined Malaria Cases in Nigeria",subtitle ="Data from Malaria Atlas Project",color ="Positive",size ="Examined")+# Adding labels for longitude and latitudexlab("Longitude")+ylab("Latitude")+# Incorporating a north arrow to provide orientation ggspatial::annotation_north_arrow(location ="br")+# Incorporating a scale bar to assist in distance interpretation ggspatial::annotation_scale(location ="bl")+# Applying a minimalistic theme for visual claritytheme_void()
This code chunk visualizes the malaria data on a map. It highlights regions based on the number of malaria tests and their outcomes. Specific tools from the ggspatial package are used to add cartographic elements, making the map more informative.
0.10.1 Q: Visualize Positive Rate by Size and Color
Your final challenge is to create a visualization representing the positive rate of malaria for each location. Adjust the size of the points to reflect the positive rate. This task will test your understanding of combining different data indicators on a map. Good luck!
Geospatial data visualization is more than just plotting data on a map. It’s about narrating a story that’s rooted in location and space. This lesson delved deep into a layers-based narrative approach, from the importance of clear annotations to the integration of diverse data types for a richer understanding.
Learning outcomes
By engaging with this lesson on map labeling, you should now be able to:
Demonstrate the ability to use clear annotations and labels to make complex spatial data understandable.
Show proficiency in integrating continuous indicators, overlaying labels, emphasizing regions, and determining optimal point placements on geospatial maps.
Illustrate the importance of contextualizing highlighted areas within the larger geographical landscape to preserve the integrity and interpretability of the map.
Utilize the functionalities of key R packages like ggplot2, sf, and ggspatial to facilitate advanced mapping processes.
Answer Key
0.10.2 Solution for practice 1
To visualize the distribution of malaria cases in Nigeria for the year 2015, follow the instructions provided in the exercise. Here’s the complete solution:
Code
# Constructing the choropleth map for 2015 using ggplot2ggplot(data = malaria3) +geom_sf(aes(fill = cases_2015 /1000)) +# Updated data column to reflect 2015labs(title ="Nigeria Malaria Distributed Cases in 2015", fill ="Cases per 1000") +# Updated title for 2015scale_fill_continuous(low ="blue", high ="yellow") +# Modified color scale to blue-to-yellow gradienttheme_void()
0.10.3 Solution for practice 2
To combine the visualization of increase rates with state names, follow the steps below:
Code
# Combining visualization of increase rates with state namesggplot(data = malaria3) +# Create a choropleth map with fill color based on the increase rategeom_sf(aes(fill = increase_rate), color="white", size =0.2) +# Label each region with its state namegeom_sf_text(aes(label =str_wrap(state_name, 1)), size =2)+# Specify the color scale for increase ratesscale_fill_continuous(name="Increase rate (%)", limits=c(0,70), low ="green", high ="red", breaks=c(0, 20, 40, 60)) +# Add a title and legend to the plotlabs(title ="Nigeria Malaria Increase Rate from 2015 to 2021", fill ="Increase Rate (%)") +theme_void() # Apply a minimalistic theme for clarity
In this solution, the choropleth map is created based on the increase rate of malaria cases from 2015 to 2021. Each state in Nigeria is labeled by its name, and the color gradient (from green to red) showcases the magnitude of the increase rate. The map is enhanced with a title and a legend to ensure clarity and comprehension.
0.10.4 Solution for practice 3
Create a visualization that represents the positive rate:
Code
# Adding a positive rate columnnigeria_pr$positive_rate <- (nigeria_pr$positive / nigeria_pr$examined) *100ggplot() +geom_sf(data = nga_adm1) +geom_sf(aes(size = positive_rate, color = positive_rate), data = nigeria_pr)+labs(title ="Location and Positive Rate of Malaria Cases in Nigeria",subtitle ="Data from Malaria Atlas Project",color ="Positive Rate (%)",size ="Positive Rate (%)")+xlab("Longitude")+ylab("Latitude")+ ggspatial::annotation_north_arrow(location ="br")+ ggspatial::annotation_scale(location ="bl")+theme_void()
Contributors
The following team members contributed to this lesson:
Wickham, Hadley, Winston Chang, and Maintainer Hadley Wickham. “Package ‘ggplot2’.” Create elegant data visualisations using the grammar of graphics. Version 2, no. 1 (2016): 1-189.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. R for data science. ” O’Reilly Media, Inc.”, 2023.
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. Geocomputation with R. CRC Press, 2019.
Source Code
---title: "Enhancing Disease Maps with Labels in R"author: "The Graph Network"date: "`r Sys.Date()`"toc: truenumber-sections: truehighlight-style: skateformat: html: code-fold: true code-tools: true css: global/style/style.css highlight: kate---```{r, include = FALSE, warning = FALSE, message = FALSE}# Load packages if(!require(pacman)) install.packages("pacman")pacman::p_load(tidyverse, knitr, here)# Source functions source(here("global/functions/misc_functions.R"))# knitr settingsknitr::opts_chunk$set(warning = F, message = F, class.source = "tgc-code-block", error = T)```## IntroductionIn the geospatial data visualization, maps are powerful storytelling tools. However, a map without clear annotations and labels is like a book without titles or chapter headings. While the story might still be there, it becomes significantly harder to understand, interpret, and appreciate.In this lesson, we place a special emphasis on the importance of annotating and labeling maps. Proper annotation transforms a simple visualization into an informative guide, allowing viewers to quickly grasp complex spatial data. With precise labeling, areas of interest can be immediately recognized, facilitating a clearer comprehension of the data's narrative.## Learning objectives {.unlisted .unnumbered}**Learning Objectives: Advanced Geospatial Visualization Techniques**By the end of this section, you should be able to:- Incorporate continuous data indicators into choropleth maps for enhanced granularity.- Effectively overlay state names onto choropleth maps, ensuring clarity and readability.- Seamlessly integrate state names with increase rates on maps without compromising legibility.- Apply techniques to accentuate specific regions on a map while retaining the overall context.- Determine optimal point placement strategies and integrate them effectively into geospatial visualizations.Upon mastering these objectives, you will have the tools and knowledge to create rich, detailed, and informative geospatial visualizations.## Packages```{r warning = F, message = F}# Load packages if(!require(pacman)) install.packages("pacman")pacman::p_load(malariaAtlas, ggplot2, geodata, dplyr, tidyr, here, readr, sf, patchwork) # Unable scientific notation options(scipen=100000)```## Data PreparationBefore diving into any visualization or analysis, it's essential to load and preprocess our data. This includes reading in datasets, merging related information, and filtering out unnecessary or irrelevant entries.```{r}# Reading the geographical shapefile datanga_adm1 <- sf::st_read(here::here("data/raw/NGA_adm_shapefile/NGA_adm1.shp")) ```The geographical shapefile data for Nigeria's administrative boundaries (like states or provinces) is read and stored in the `nga_adm1` object.```{r}# Reading the attribute data related to malaria casesmalaria_cases <-read_csv(here::here("data/malaria.csv"))```This step loads data related to malaria cases in different states of Nigeria.```{r}# Filtering out the 'Water body' entries from the geographical datanga_adm1 <-filter(nga_adm1, NAME_1 !="Water body")# Merging the geographical data with the malaria cases datamalaria <- malaria_cases %>%left_join(nga_adm1, by =c("state_name"="NAME_1")) %>%st_as_sf()```Here, we're combining the geographical boundaries data with the malaria cases data using state names as a reference. This merged data is then converted into a format suitable for geospatial visualizations.```{r}# Filtering down to essential columns for our analysismalaria2 <- malaria %>%select(state_name, cases_2000, cases_2006, cases_2010, cases_2015, cases_2021, geometry)```We're narrowing down our dataset to specific columns, mainly the state names, the malaria cases from various years, and the geographical boundaries of these states (i.e. `geometry`).```{r}# Reading in population data for different regions of Nigeriapopulation_nigeria <-read_csv(here::here("data/population_nigeria.csv"))```This step loads data indicating the population of different states or regions in Nigeria.```{r}# Combining the population data with our malaria datamalaria3 <- malaria2 %>%left_join(population_nigeria, by =c("state_name")) %>%st_as_sf()```By merging the population data with our malaria cases data, we enrich our dataset. This combined data allows for more comprehensive visualizations and analyses, such as calculating incidence rates or assessing trends relative to population size.To address aesthetic concerns with labeling a small area like the "Federal Capital Territory" on a map, we can modify the labels before plotting. We can do this by creating a new column for the modified state names or by directly changing the `state_name` column within the `malaria3` dataset. Given that the region in question has a small surface area and the full name may overflow its borders, here's how you can adjust the name to "Capital" for better display:```{r}# Modify the state names, changing "Federal Capital Territory" to "Capital"malaria3$state_name <-ifelse(malaria3$state_name =="Federal Capital Territory", "Capital", malaria3$state_name)```## Building a Simple Choropleth Map::: reminderChoropleth maps are powerful visualization tools that display divided geographical areas shaded or patterned in proportion to the value of a variable.:::In this example, we'll be using a choropleth map to visualize the distribution of malaria cases across different regions of Nigeria for the year 2021.```{r}# Constructing the choropleth map using ggplot2ggplot(data=malaria3) +geom_sf(aes(fill=cases_2021/1000)) +# The fill color is determined by the number of malaria cases in 2021, scaled per 1000labs(title ="Nigeria Malaria Distributed Cases in 2021", fill ="Cases per 1000")+# Adding labels and title to the plotscale_fill_continuous(low ="green", high ="red")+# Using a continuous color scale transitioning from green to redtheme_void() # Using a minimal theme for better visualization of the map```Here's a detailed explanation of the code:- `ggplot(data=malaria3)`: Initiates a ggplot object using the `malaria3` dataset.- `geom_sf(aes(fill=cases_2021/1000))`: Adds the geographical data from `malaria3` and fills each region based on the number of malaria cases in 2021, scaled down by a factor of 1000. This effectively represents the number of cases per thousand people.- `labs(title = "Nigeria Malaria Distributed Cases in 2021", fill = "Cases per 1000")`: Specifies the title of the plot and the label for the color scale.- `scale_fill_continuous(low = "green", high = "red")`: Applies a continuous color scale where regions with fewer cases are colored green and regions with more cases are colored red.- `theme_void()`: Removes axis text, ticks, and other non-data ink to emphasize the map.The resulting plot provides a clear view of how malaria cases are distributed across Nigeria in 2021, with the color intensity indicating the magnitude of cases in each region.::: practice### Q: Modify the provided choropleth map to visualize the distribution of malaria cases in Nigeria for the year 2015.**Instructions**1) Update the data mapping in the `geom_sf()` function to reflect malaria cases for the year 2015.2) Adjust the title in the `labs()` function to indicate that the visualization pertains to 2015.3) Change the color gradient in `scale_fill_continuous()` to transition from blue (`low` cases) to yellow (`high` cases).Below a starter code:```{r, eval=FALSE}# Constructing the choropleth map for 2015 using ggplot2ggplot(data=malaria3) + geom_sf(aes(fill=___________)) + # Fill in the correct data column for 2015 labs(title = "______________________", fill = "Cases per 1000")+ # Update the title appropriately scale_fill_continuous(___________)+ # Modify the color scale theme_void() ```:::------------------------------------------------------------------------## Adding Continuous Data Indicators to the Choropleth MapWhen analyzing disease data, it's often useful to look beyond raw case numbers and focus on rates, particularly incidence rates. The incidence rate provides a normalized measure that can account for differing population sizes across regions, making comparisons more meaningful.**Understanding Incidence**The incidence of a disease is a cornerstone of epidemiological research. It quantifies the number of new cases of a disease that occur within a specific time frame, typically a year, relative to a population at risk.Mathematically, it's given by:$$ \text{Incidence} = \frac{\text{Number of new cases during the time period}}{\text{Population at risk at the start of the period}} $$In this context, we're looking at the incidence of malaria in different states of Nigeria for the year 2021. Specifically, we'll compute the incidence rate by dividing the number of new malaria cases in 2021 by the population of each state from the last available census in 2019.The following R code accomplishes this and visualizes the data:```{r}# Visualizing Malaria Incidence in 2021 using a Choropleth Mapggplot(data = malaria3) +# Filling each state with a color representing the incidence rate in 2021.geom_sf(aes(fill =round(cases_2021/population_2019, 2))) +# Add the name of each state to its centroidgeom_sf_text(aes(label =str_wrap(state_name, 1)), size =2)+# Adding title and legend title.labs(title ="Nigeria Malaria Incidence in 2021", fill ="Incidence")+# Using a continuous color scale from green (low incidence) to red (high incidence).scale_fill_continuous(low ="green", high ="red")+# Using a minimalistic theme for clearer visualization.theme_void()```Here's a brief explanation of the visualization:- The `geom_sf()` function creates a choropleth map where each state's color represents its malaria incidence rate in 2021.- The `geom_sf_text()` function labels each state with its specific incidence rate, positioning automaticaly each label at the state's centroid.- The `str_wrap()` function is part of the {stringr} package and is being used here to wrap the text of `state_name.` Since the second argument to `str_wrap()` is 1, this would cause the state names to be wrapped after every character, to avoid overlapping.- The color scale, transitioning from green to red, visually emphasizes regions with higher incidence rates.This visualization offers an immediate grasp of the malaria situation in Nigeria, revealing areas of high incidence that might need more focused public health interventions.## Exploring Malaria Case Increase Rates Using Choropleth MapsUnderstanding the change in the number of disease cases over time can provide insights into the effectiveness of interventions, the progression of the disease, and areas where increased resources might be needed. In this context, we're looking to visualize the percentage increase in malaria cases from 2015 to 2021 across different states in Nigeria.**Computing the Increase Rate**The increase rate for each state is computed as:$$ \text{Increase Rate} = \left( \frac{\text{Cases in 2021} - \text{Cases in 2015}}{\text{Cases in 2015}} \right) \times 100\% $$This formula provides the percentage growth (or decrease) in malaria cases from 2015 to 2021.**Visualization of Increase Rates**Let's delve into the code that accomplishes this visualization:```{r}# Calculate the increase rate for each statemalaria3 %>%mutate(increase_rate =round(((cases_2021 - cases_2015) / cases_2015) *100)) -> malaria3# Visualizing the increase rates using a choropleth mapggplot(data = malaria3) +# Coloring each state based on its increase rategeom_sf(aes(fill = increase_rate), color="white", size =0.2) +# Using a continuous color scale to represent increase rates, transitioning from green to redscale_fill_continuous(name="Increase rate (%)", limits=c(0,70), low ="green", high ="red", breaks=c(0, 20, 40, 60))+# Labeling each state with its increase rate percentagegeom_sf_text(aes(label =str_wrap(paste0(increase_rate, "%"), 1)), size =2)+# Adding a title to the plotlabs(title ="Nigeria Malaria Increase rate in 2021 compared with 2015")+theme_void()```In this visualization:- The `geom_sf()` function creates the choropleth map where each state's color intensity represents its malaria increase rate.- `scale_fill_continuous()` sets the color scale for the increase rates, making areas of higher increase more prominent.- `geom_sf_text()` adds labels to each state.- `str_wrap()` ensures that the text does not wrap (since it's set to wrap at 1 character).The resulting map allows us to quickly identify regions with significant growth in malaria cases over the selected period.Through this visualization, you can pinpoint regions where malaria is on the rise and potentially allocate resources more effectively.## Labeling the Choropleth Map with State NamesIn a choropleth map, while color gradients offer a visual cue to understand the distribution of a variable across regions, adding labels can significantly enhance the clarity of the visualization. This is especially true when audiences might not be familiar with all geographical boundaries shown. In the case of our malaria dataset, adding state names to the map makes the data more accessible and understandable.```{r}# Constructing a choropleth map with state namesggplot(data = malaria3) +# Fill each state based on the number of malaria cases in 2021, scaled per 10,000geom_sf(aes(fill = cases_2021/10000)) +# Add the name of each state to its centroidgeom_sf_text(aes(label =str_wrap(state_name, 1)), size =2)+# Add titles and labelslabs(title ="Nigeria Malaria cases in 2021", fill ="Cases/10000")+# Use a continuous color scale from green (low case numbers) to red (high case numbers)scale_fill_continuous(low ="green", high ="red")+theme_void()```Here's a detailed explanation of the visualization:- `geom_sf(aes(fill = cases_2021/10000))`: This creates the choropleth map where each state's color represents the number of malaria cases in 2021, scaled down by a factor of 10,000.- `labs(title = "Nigeria Malaria cases in 2021", fill = "Cases/10000")`: This function adds a title to the plot and a label to the color scale.- `scale_fill_continuous(low = "green", high = "red")`: This sets the color scale for the map, transitioning from green for states with fewer cases to red for those with more cases.The resulting visualization is a clear and informative map of malaria cases across Nigeria in 2021, with each state labeled for easy reference.## Displaying Combined State Names and Increase Rates on the Choropleth MapVisualizations can convey a wealth of information when they incorporate multiple data points in an intuitive manner. By pairing state names with their corresponding increase rates, we can provide a richer, more detailed view of the data without overwhelming the audience.Let's delve into this visualization:We want to present a choropleth map showcasing the malaria cases per 10000 residents across Nigerian states in 2021, with labels that combine state names and their respective increase rates from 2015.```{r}# Combine the state names and their respective increase rates into a single labelmalaria3$label_text <-paste(malaria3$state_name, malaria3$increase_rate)# Visualize the dataggplot(data = malaria3) +# Create a choropleth map shaded based on the number of malaria cases in 2021 per 10,000 residentsgeom_sf(aes(fill = cases_2021/10000)) +# Add combined labels (state name and increase rate) to each state's centroidgeom_sf_text(aes(label =str_wrap(label_text, 2)), size =2)+# Add titles and customize the color legendlabs(title ="Nigeria Malaria cases in 2021 (%)", fill ="Cases/10000")+scale_fill_continuous(low ="green", high ="red") +# Set color gradienttheme_void() # Apply a minimal theme for clarity```In this visualization:- The line `malaria3$label_text <- paste(malaria3$state_name, malaria3$increase_rate)` constructs our combined labels by concatenating the state name with its increase rate- `scale_fill_continuous(low = "green", high = "red")` assigns a color gradient based on the number of malaria cases. States with fewer cases will be colored green, transitioning to red for states with more cases.This approach allows us to efficiently communicate two important pieces of information (cases per 10000 and increase rate) within the same visualization while keeping the name of states for easy identification.::: practice### Q: Create a single choropleth map that showcases the malaria increase rates across different states in Nigeria with each state labeled by its name**Instructions**1) Start by setting up the base plot using the malaria3 dataset.2) Create a choropleth map where the fill color represents the increase rate from 2015 to 2021.3) Label each state with its name using the centroids.4) Customize the color scale to transition from green (low increase) to red (high increase).5) Add appropriate titles and labels to the plot.Below a starter code:```{r, eval=FALSE}# Combining visualization of increase rates with state namesggplot(data = malaria3) + _______ + # Fill in the code to generate the choropleth map _______ + # Add state names to each region _______ + # Specify the color scale for increase rates _______ # Add titles and labels```:::## Highlighting a Specific Region on the Map while Preserving ContextSometimes, you might want to draw attention to a particular area or region on your map, without omitting details from the surrounding areas. By using specific graphical elements, like larger markers or distinctive colors, you can emphasize certain regions while still showcasing the broader data. In this example, we're focusing on the "Kano" region of Nigeria.```{r}# Calculate centroid coordinates for labelingcentroid_coords <-st_coordinates(st_centroid(malaria3$geometry))# Visualize the malaria cases across Nigeria with an emphasis on Kanoggplot(data = malaria3) +# Create a choropleth map with color based on malaria cases in 2021geom_sf(aes(fill = cases_2021/1000)) +# Set a continuous color gradient from green to redscale_fill_continuous(low ="green", high ="red")+# Overlay a point on Kano to emphasize it. The size of the point corresponds to the number of casesgeom_point(data =subset(malaria3, state_name =="Kano"),aes(x =st_coordinates(st_centroid(geometry))[1], y =st_coordinates(st_centroid(geometry))[2], size =round(cases_2021/1000)),color ="black", shape =22, fill ="transparent") +# Label each region with its namegeom_sf_text(aes(label =str_wrap(state_name, 1)), size =3)+# Customize the scale of the size of the emphasized pointscale_size_continuous(range =c(15, 20)) +# Add title and legendslabs(title ="Kano Malaria cases in 2021", subtitle ="Data from the Report on Malaria in Nigeria 2022", size ="Kano cases", fill ="Cases/10000",)+# Apply a minimal theme for claritytheme_void()```In this visualization:- The `geom_point()` function is utilized to lay a circle over the Kano region. The size of the circle signifies the number of cases in Kano in 2021. The circle is designed transparent (`fill = "transparent"`) with a bold black boundary (`color = "black"`) to set it apart.- The `scale_size_continuous()` function used to customize the size scale for points (like the one on Kano), setting the range between 15 and 20. This ensures that the point size for Kano stands out relative to other elements on the map.::: recapWhile "Kano" is emphasized, all other regions are also displayed with their respective color shading based on malaria cases. This offers a holistic view of the situation across Nigeria, enabling viewers to compare Kano with other regions.Such an approach is invaluable when you wish to spotlight specific details or areas of interest without sidelining the broader dataset, enriching your data presentations.:::------------------------------------------------------------------------## Labeling Point Locations: Exploring Malaria Positive Rate and IncidenceMapping and visualizing specific data points on a geographical map can provide crucial insights, especially when dealing with epidemiological data. Let's delve into the code to understand the processes and the visualization we're aiming to achieve:```{r}# Data Retrieval# The malariaAtlas package provides the `getPR` function to access the parasite rate (PR).# PR is an essential indicator of malaria prevalence.# Fetching data for Nigeria for both malaria speciesnigeria_pr <- malariaAtlas::getPR(ISO ="NGA", species ="both") %>%# Filtering out records with missing PR valuesfilter(!is.na(pr)) %>%# Removing any rows with missing longitude or latitudedrop_na(longitude, latitude) %>%# Converting the data into a spatial dataframe to facilitate mappingst_as_sf(coords =c("longitude", "latitude"), crs =4326)```This chunk fetches malaria prevalence data for Nigeria. After retrieval, the data is cleansed by filtering out missing values. It's then transformed into a format suitable for geospatial visualization (`sf` object).```{r}# Setting up a geospatial visualization using ggplot2# Starting the plotggplot() +# Plotting the administrative boundaries of Nigeriageom_sf(data = nga_adm1) +# Adding points for each testing location# The color of each point indicates if the test was positive, and the size represents the number of people testedgeom_sf(aes(size = examined, color = positive), data = nigeria_pr)+# Setting titles, axis labels, and legends for the plotlabs(title ="Location of Positive and Examined Malaria Cases in Nigeria",subtitle ="Data from Malaria Atlas Project",color ="Positive",size ="Examined")+# Adding labels for longitude and latitudexlab("Longitude")+ylab("Latitude")+# Incorporating a north arrow to provide orientation ggspatial::annotation_north_arrow(location ="br")+# Incorporating a scale bar to assist in distance interpretation ggspatial::annotation_scale(location ="bl")+# Applying a minimalistic theme for visual claritytheme_void()```::: recapThis code chunk visualizes the malaria data on a map. It highlights regions based on the number of malaria tests and their outcomes. Specific tools from the `ggspatial` package are used to add cartographic elements, making the map more informative.:::::: practice### Q: Visualize Positive Rate by Size and Color_Your final challenge is to create a visualization representing the positive rate of malaria for each location. Adjust the size of the points to reflect the positive rate. This task will test your understanding of combining different data indicators on a map. Good luck!_```{r, eval=FALSE}# Visualizing positive rateggplot() + _________ + _________ + _________ + _________```:::------------------------------------------------------------------------# WRAP UP! {.unnumbered}Geospatial data visualization is more than just plotting data on a map. It's about narrating a story that's rooted in location and space. This lesson delved deep into a layers-based narrative approach, from the importance of clear annotations to the integration of diverse data types for a richer understanding.------------------------------------------------------------------------## Learning outcomes {.unlisted .unnumbered}By engaging with this lesson on map labeling, you should now be able to:- Demonstrate the ability to use clear annotations and labels to make complex spatial data understandable.- Show proficiency in integrating continuous indicators, overlaying labels, emphasizing regions, and determining optimal point placements on geospatial maps.- Illustrate the importance of contextualizing highlighted areas within the larger geographical landscape to preserve the integrity and interpretability of the map.- Utilize the functionalities of key R packages like ggplot2, sf, and ggspatial to facilitate advanced mapping processes.------------------------------------------------------------------------# Answer Key {.unnumbered}### Solution for practice 1To visualize the distribution of malaria cases in Nigeria for the year 2015, follow the instructions provided in the exercise. Here's the complete solution:```{r, eval=FALSE}# Constructing the choropleth map for 2015 using ggplot2ggplot(data = malaria3) + geom_sf(aes(fill = cases_2015 / 1000)) + # Updated data column to reflect 2015 labs(title = "Nigeria Malaria Distributed Cases in 2015", fill = "Cases per 1000") + # Updated title for 2015 scale_fill_continuous(low = "blue", high = "yellow") + # Modified color scale to blue-to-yellow gradient theme_void() ```### Solution for practice 2To combine the visualization of increase rates with state names, follow the steps below:```{r, eval=FALSE}# Combining visualization of increase rates with state namesggplot(data = malaria3) + # Create a choropleth map with fill color based on the increase rate geom_sf(aes(fill = increase_rate), color="white", size = 0.2) + # Label each region with its state name geom_sf_text(aes(label = str_wrap(state_name, 1)), size = 2)+ # Specify the color scale for increase rates scale_fill_continuous(name="Increase rate (%)", limits=c(0,70), low = "green", high = "red", breaks=c(0, 20, 40, 60)) + # Add a title and legend to the plot labs(title = "Nigeria Malaria Increase Rate from 2015 to 2021", fill = "Increase Rate (%)") + theme_void() # Apply a minimalistic theme for clarity```In this solution, the choropleth map is created based on the increase rate of malaria cases from 2015 to 2021. Each state in Nigeria is labeled by its name, and the color gradient (from green to red) showcases the magnitude of the increase rate. The map is enhanced with a title and a legend to ensure clarity and comprehension.------------------------------------------------------------------------### Solution for practice 3Create a visualization that represents the positive rate:```{r, eval=FALSE}# Adding a positive rate columnnigeria_pr$positive_rate <- (nigeria_pr$positive / nigeria_pr$examined) * 100ggplot() + geom_sf(data = nga_adm1) + geom_sf(aes(size = positive_rate, color = positive_rate), data = nigeria_pr)+ labs(title = "Location and Positive Rate of Malaria Cases in Nigeria", subtitle = "Data from Malaria Atlas Project", color = "Positive Rate (%)", size = "Positive Rate (%)")+ xlab("Longitude")+ ylab("Latitude")+ ggspatial::annotation_north_arrow(location = "br")+ ggspatial::annotation_scale(location = "bl")+ theme_void()```## Contributors {.unlisted .unnumbered}The following team members contributed to this lesson:`r tgc_contributors_list(ids = c("imad", "kendavidn", "joy"))`## References {.unlisted .unnumbered}- Wickham, Hadley, Winston Chang, and Maintainer Hadley Wickham. "Package 'ggplot2'." *Create elegant data visualisations using the grammar of graphics. Version* 2, no. 1 (2016): 1-189.- Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. *R for data science*. " O'Reilly Media, Inc.", 2023.- Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. *Geocomputation with R*. CRC Press, 2019.