Code
::p_load("tidyverse", "here") pacman
geom_col()
, geom_errorbar()
, and position
adjustments.coord_polar()
with geom_col()
.In this lesson we will use the following packages:
{tidyverse}
for data wrangling and data visualization
{here}
for project-relative file paths
::p_load("tidyverse", "here") pacman
cases
started on treatment.Let’s import the tb_outcomes
data subset.
# Import data from CSV
<- read_csv(here::here('data/benin_tb.csv'))
tb_outcomes
# Print data frame
tb_outcomes
Here are the detailed variable definitions for each column:
Time Frame Tracking (period
and period_date
): Quarterly records from 2015Q1
to 2017Q4
.
Health Facility Identifier (hospital
):
Treatment Outcome Categories (outcome
):
completed
: Treatment finished, outcome marked as completed.cured
: Treatment succeeded with sputum smear confirmation.died
: Patient succumbed to TB during treatment.failed
: Treatment did not succeed.unevaluated
: Treatment outcome not determined.Diagnosis Categorization (diagnosis_type
):
bacteriological
: Diagnosis confirmed by bacteriological tests.clinical
: Diagnosis based on clinical symptoms, sans bacteriological confirmation.Case Counts (cases
): Quantifies the number of TB cases starting treatment.
geom_col()
for plotting categorical against numerical data.Let’s exemplify this by visualizing the Number of cases per treatment outcomes in the tb_outcomes
dataset:
# Basic bar plot example 1: Frequency of treatment outcomes
%>%
tb_outcomes # Pass the data to ggplot as a basis for creating the visualization
ggplot(
# Specify that the x and y axis variables
aes(x = outcome, y = cases)) +
# geom_col() creates a bar plot
geom_col() +
labs(title = "Number of cases per treatment outcome")
geom_col()
sums up cases by outcomes, incorporating all periods, hospitals, and diagnosis types.# Basic bar plot example 2: Case counts per hospital
%>%
tb_outcomes ggplot(aes(x = hospital, y = cases)) +
geom_col() +
labs(title = "Number of Cases per Hospital")
coord_flip()
to transform a vertical bar chart into a horizontal layout.# Basic bar plot example 3: Horizontal bars
%>%
tb_outcomes ggplot(aes(x = hospital, y = cases)) +
geom_col() +
labs(title = "Number of Cases per Hospital")
# new code line here:
coord_flip()
<ggproto object: Class CoordFlip, CoordCartesian, Coord, gg>
aspect: function
backtransform_range: function
clip: on
default: FALSE
distance: function
expand: TRUE
is_free: function
is_linear: function
labels: function
limits: list
modify_scales: function
range: function
render_axis_h: function
render_axis_v: function
render_bg: function
render_fg: function
setup_data: function
setup_layout: function
setup_panel_guides: function
setup_panel_params: function
setup_params: function
train_panel_guides: function
transform: function
super: <ggproto object: Class CoordFlip, CoordCartesian, Coord, gg>
ggplot()
Customization: Use fill
attribute to differentiate categories within the bars.# Stacked bar plot:
%>%
tb_outcomes ggplot(
# Fill color of bars by the 'outcome' variable
aes(x = hospital,
y = cases,
# new code here
fill = outcome)) +
geom_col()
position
argument to "dodge"
in geom_col()
to display bars side by side:# Grouped bar plot:
%>%
tb_outcomes ggplot(
aes(x = hospital,
y = cases,
fill = outcome)) +
# Add position argument for side-by-side bars
geom_col(position = "dodge")
# Grouped bar plot: split into 2 bars
%>%
tb_outcomes ggplot(
# Fill color of bars by the 'diagnosis_type'
aes(x = hospital,
y = cases,
# different variable here
fill = diagnosis_type)) +
geom_col(position = "dodge")
Question 1: Basic bar plot
Write the adequate code that generates a basic bar chart of the number of cases
per quarter with period_date
on the x axis
# PQ1 answer:
%>%
tb_outcomes ggplot(
aes(x = period_date, y = cases)) +
geom_col()
Question 2: Stacked bar plot
Create a stacked bar chart to display treatment outcomes over different time periods
%>%
tb_outcomes ggplot(
aes(x = period_date, y = cases, fill = outcome)) +
geom_col()
geom_errorbar()
function.First, let’s create the necessary summary data since we need to have some kind of error measurement. In our case we will compute the standard deviation:
<- tb_outcomes %>%
hosp_dx_error group_by(period_date, diagnosis_type) %>%
summarise(
total_cases = sum(cases, na.rm = T),
error = sd(cases, na.rm = T))
hosp_dx_error
# A tibble: 24 × 4
# Groups: period_date [12]
period_date diagnosis_type total_cases error
<date> <chr> <dbl> <dbl>
1 2015-01-01 bacteriological 143 11.9
2 2015-01-01 clinical 47 4.40
3 2015-04-01 bacteriological 163 13.0
4 2015-04-01 clinical 35 3.84
5 2015-07-01 bacteriological 146 11.2
6 2015-07-01 clinical 34 3.33
7 2015-10-01 bacteriological 152 10.4
8 2015-10-01 clinical 43 3.55
9 2016-01-01 bacteriological 201 15.7
10 2016-01-01 clinical 71 7.01
# ℹ 14 more rows
Now, let use this data to create the plot:
# Recreate grouped bar chart and add error bars
%>%
hosp_dx_error ggplot(
aes(x = period_date,
y = total_cases,
fill = diagnosis_type)) +
geom_col(position = "dodge") + # Dodge the bars
# geom_errorbar() adds error bars
geom_errorbar(
# Specify upper and lower limits of the error bars
aes(ymin = total_cases - error, ymax = total_cases + error),
position = "dodge" # Dodge the error bars to align them with side-by-side bars
)
# Regular stacked bar plot
%>%
tb_outcomes ggplot(
aes(x = hospital,
y = cases,
fill = outcome)) +
geom_col()
This is achieved by setting the position
argument to "fill"
in geom_col()
.
# Percent-stacked bar plot
%>%
tb_outcomes ggplot(
aes(x = hospital,
y = cases,
fill = outcome)) +
# Add position argument for normalized bars
geom_col(position = "fill")
<- tb_outcomes %>%
outcome_totals group_by(outcome) %>%
summarise(
total_cases = sum(cases, na.rm = T))
outcome_totals
# A tibble: 6 × 2
outcome total_cases
<chr> <dbl>
1 completed 573
2 cured 1506
3 died 130
4 failed 30
5 lost 87
6 unevaluated 15
A pie chart is basically a round version of a single 100% stacked bar.
# Single-bar chart (precursor to pie chart)
ggplot(outcome_totals,
aes(x = 4, # Set arbitrary x value
y = total_cases,
fill = outcome)) +
geom_col()
coord_*()
functions can change a plot’s perspective, like tweaking aspect ratios or axis limits.coord_polar()
, which will shape our data into slices for a pie chart.y
aesthetic to angles (using the theta
argument), we’ll collaboratively create a visual that clearly displays the distribution of our categorical data.# Basic pie chart
ggplot(outcome_totals,
aes(x=4,
y=total_cases,
fill=outcome)) +
geom_col() +
coord_polar(theta = "y") # Change y axis to be circular
# PQ1 answer:
%>%
tb_outcomes ggplot(aes(x = period_date,
y = cases)) +
geom_col()
# PQ2 answer:
%>%
tb_outcomes ggplot(
aes(x = period_date,
y = cases,
fill = outcome)) +
geom_col()
# PQ2 answer:
%>%
tb_outcomes ggplot(
aes(x = period_date,
y = cases,
fill = outcome)) +
geom_col()