Code
::p_load(tidyverse, janitor, gt, here) pacman
The previous {gt}
lesson focused mainly on the components of the table its structure and how to manipulate it properly. This lesson, presenting the second part of the {gt}
series will focus on using the package to polish, style, and customize the visual effects of tables in a way that elevate the quality and efficiency of your reports.
Let’s dig in.
Using Pre-defined Themes with opt_stylize
Function: Understand how to apply pre-defined styles and color themes to tables using the opt_stylize
function. Explore various styling options, including the choice of style number and color, to enhance the visual appeal of tables.
Formatting Values in Table with data_color
Function: Learn to visually distinguish values in specific columns using the data_color
function. This technique is useful in large tables for highlighting important data.
Conditional Formatting Using tab_style
Function: Master the concept of conditional formatting in tables. Learn to apply styles to cells based on their values when they meet a certain threshold.
Enhancing Table Text with Fonts and Colors: Explore the customization of table text using the gt::tab_style()
function. Learn to apply different fonts and colors to table titles, subtitles, and other text elements for a more visually appealing presentation.
Adding Borders with tab_style
and cell_borders
Functions: Gain skills in drawing borders within tables to highlight specific areas. Understand how to use tab_style
in conjunction with cell_borders
for adding both vertical and horizontal lines of varying colors and thicknesses.
By the conclusion of this lesson, you will have the skills to artfully style your {gt}
tables to meet your specific preferences achieving a level of detail similar to this:
In this lesson, we will use the following packages:
{gt}
: to create our tables.
{dplyr}
, {tidyr}
, and {purrr}
: to manipulate data an automate functions.
{janitor}
: for rapid cleaning of data names.
{Paletteer}
, {ggsci}
: for calling pre-defined palettes.
::p_load(tidyverse, janitor, gt, here) pacman
In the previous {gt}
lesson we had the opportunity to :
Discover the HIV prevalence data of Malawi.
Discover the grammar of tables and the {gt}
package.
create simple table.
Add details like title and footnote to the table.
Group columns into spanners.
Create Summary rows.
In this lesson, we will use the same data from the previous lesson, you can go back for a detailed description of the data and the preparation process we made.
Here’s the full details of the columns we will use:
region
: The geographical region or area where the data was collected or is being analyzed.
period
: A specific time period associated with the data, often used for temporal analysis.
previous_negative
: The count or number of individuals with a previous negative test result.
previous_positive
: The count or number of individuals with a previous positive test result.
new_negative
: The count or number of newly diagnosed cases with a negative result.
new_positive
: The count or number of newly diagnosed cases with a positive result.
But for the purposes of this lesson we will use the tables directly, this this the table that we created with the right spanners and columns labels, we will base the rest of our lesson on this particular one.
# Import data
<- read_csv(here::here("data/clean/hiv_malawi.csv")) hiv_malawi
Rows: 17235 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (11): region, zone, district, traditional_authority, facility_name, dati...
dbl (18): new_women_registered, total_women_in_booking_cohort, not_tested_fo...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Variables to summarize
<- c("new_positive", "previous_positive", "new_negative", "previous_negative")
cols
# Create a gt table summary
<- hiv_malawi %>%
hiv_malawi_summary group_by(period) %>%
summarize(
across(all_of(cols), sum) # Summarize all columns
%>%
) gt(rowname_col = "period") %>%
tab_header(
title = "HIV Testing in Malawi",
subtitle = "Q1 to Q4 2019"
%>%
) tab_source_note("Source: Malawi HIV Program") %>%
# First spanner for "New tests"
tab_spanner(
label = "New tests",
columns = starts_with("new") # selects columns starting with "new"
%>%
) # Second spanner for "Previous tests"
tab_spanner(
label = "Previous tests",
columns = starts_with("previous") # selects columns starting with "previous"
%>%
) cols_label(
new_positive = "Positive",
new_negative = "Negative",
previous_positive = "Positive",
previous_negative = "Negative"
%>%
) # Use grand_summary_rows() instead of summary_rows()
grand_summary_rows(
columns = cols, # Ensure 'cols' is defined
fns = list(
TOTAL = "sum",
AVERAGE = "mean"
)
)
hiv_malawi_summary
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
Since the objective of this lesson is mainly styling, let’s start with using a pre-defined theme to add more visuals and colors to the table and its components. To do so we use the opt_stylize
function. The function contains multiple pre-defined styles and can accept a color as well. In our case we chose to go with style No.6 and the color ‘gray’, you can set these arguments to your liking.
<- hiv_malawi_summary %>%
t1 opt_stylize(
style = 1,
color = 'cyan'
%>%
) tab_options(
stub.background.color = '#F4F4F4',
)
t1
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
For more sophisticated themes and styling, you can refer to the function tab_options
(documentations here) which is basically the equivalent to the theme
function in ggplot2
. This function contains arguments and options on every single layer and component of the table. For the purposes of this lesson we won’t dive into it.
Wouldn’t it be useful to visualize in colors the difference between values in a specific column? In many reports, these kind of tables are quite useful especially if the number of rows is quite large. Let’s do this for our table such that we have the new_positive
column is formatted red.
We can do this by means of the data_color
function for which we need two specify tow arguments, columns
(as in at what column this styling will take place?) and palette
as the color palette we intend to use.
<- t1 %>%
t2 data_color(
columns = new_positive, # the column or columns as we will see later
palette = "ggsci::red_material" # the palette form the ggsci package.
)
t2
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
ggsci::red_material
is not the only palette we can use, in fact there are hundereds of palettes that are designed to be used in R. You can find a lot more in the paletteer
package documentations in here, or in the official data_color
documentation here.
We can do this for the previous_negative
column as well. We can use a different kind of palette, I’m using for this case the green palette from the same package: ggsci::green_material
, the palette you choose is a matter of convenience and personal taste, you can explore more about this if you refer to the side note above.
%>%
t2 data_color(
columns = previous_negative,
palette = "ggsci::green_material"
)
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
Similarly, we can also color multiple columns at once, for example we can style the columns with positive cases in red, and those with negative cases in green. To do this we need to write two data_color
statements one for each color style:
<- t1 %>%
t4
data_color(
columns = ends_with("positive"), # selecting columns ending with the wor positive
palette = "ggsci::red_material" # red palette
%>%
) data_color(
columns = ends_with("negative"), # selecting columns ending with the wor negative
palette = "ggsci::green_material" # green palette
)
t4
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
Remember in the previous lesson we used the tidyselect
functions to select columns, in the code above we used the function ends_with
to select the columns ending either with the word ‘negative’ or ‘positive’ which is perfect for the purpose of our table.
Again, the column labels in the {gt}
table and the actual column names in the data.frame
can be different, in our case we refer to the names in the data.
We can also set up the table to conditionally change the style of a cell given its value. In our case we want to highlight values in the column previous_positive
according to a threshold (the value 15700). Greater or equal values than the threshold should be in green.
To achieve this we use the tab_style
function where we specify two arguments:
stye
: where we specify the color in the cell_text
function since we intend to manipulate the text within the cells.
location
: where we specify the columns and the rows of our manipulation in the cells_body
since these cells are in the main body of the table.
Let’s use the t2 table as an example:
<- t2 %>%
t5 tab_style(
style = cell_text(
color = "red",
),locations = cells_body(
columns = previous_positive,
rows = previous_positive >= 15700
)
) t5
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
In the code above, the condition over which the styling will occur is stated in :
locations = cells_body(columns = previous_positive, rows = previous_positive >= 15700 )
Also, note that we can pass more arguments to the cell_text
function, such as the size and the font of the cells we intend to style.
What if we want to have a two sided condition over the same threshold? Can we have cells with values greater or equal to the threshold styled in green, and simultaneously other cells with values less than the threshold styled in…. cyan?
We absolutely can, we’ve already done the first part (in the previous code chunk), we just need to add a second condition in a similar manner but in a different tab_style
statement:
<- t5 %>%
t6 tab_style(
style = cell_text(
color = 'cyan'
),location = cells_body(
columns = 'previous_positive',
rows = previous_positive < 15700
)
)
t6
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
Question 1: Conditional Formatting To highlight (in yellow) rows in a {gt}
table where the “hiv_positive” column exceeds 1,000, which R code snippet should you use?
%>%
data gt() %>%
tab_style(
style = cells_body(),
columns = "Sales",
conditions = style_number(Sales > 1000, background = "yellow")
)
%>%
data gt() %>%
tab_style(
style = cells_data(columns = "Sales"),
conditions = style_number(Sales > 1000, background = "yellow")
)
%>%
data gt() %>%
tab_style(
style = cell_fill(
color = "yellow"
), locations = cells_body(
columns = "hiv_positive",
rows = hiv_positive > 1000
) )
%>%
data gt() %>%
tab_style(
style = cells_data(columns = "Sales"),
conditions = style_text(Sales > 1000, background = "yellow")
)
Question 2: Cell Coloration Fill
Using the hiv_malawi
data frame, create a {gt}
table that displays the total number (sum) of “new_positive
” cases for each “region
”. Highlight cells with values more than 50 cases in red and cells with less or equal to 50 in green. Complete the missing parts (________) of this code to achieve this.
# Calculate the total_new_positive summary
<- hiv_malawi %>%
total_summary group_by(region) %>%
summarize(total_new_positive = sum(new_positive))
# Create a gt table and apply cell coloration
<- total_summary %>%
summary_table gt() %>%
tab_style(
style = cell_fill(color = "red"),
locations = cells_body(
columns = "total_new_positive", # ✅ Corrected column name
rows = total_new_positive > 50 # ✅ Corrected reference
)%>%
) tab_style(
style = cell_fill(color = "green"),
locations = cells_body(
columns = "total_new_positive", # ✅ Corrected column name
rows = total_new_positive <= 50 # ✅ Corrected reference (use < instead of <= to avoid double coloring at 50)
) )
Now, we’ll enhance the visual appeal of our table’s text. For this, we’ll use the gt::tab_style()
function once again.
Let’s modify the font and color of the title and the subtitle. We’ll select the Yanone Kaffeesatz
font from Google Fonts, a resource offering a vast array of fonts that can add a unique touch to your table, beyond the standard options in Excel.
To apply these changes, we’ll configure the gt::tab_style()
function as follows:
The style
argument is assigned the cell_text()
function, which houses two other arguments:
font
is assigned the google_font()
function with our chosen font name.
color
is set to a hexadecimal color code that corresponds to our desired text color.
The locations
argument is assigned the cells_title()
function:
title
and subtitle
within the groups
argument using vector notation c(…)
.To specifically modify the title or subtitle, you can use locations = cells_title(groups = "title")
or locations = cells_title(groups = "subtitle")
, respectively, without the need for c(…)
.
Using lists to pass arguments in gt: Lists in R are an integral part of the language and are extremely versatile. A list can contain elements of different types (numbers, strings, vectors, and even other lists) and each element can be accessed by its index. In the context of our {gt} table, we use lists to group together style properties (with the style argument) and to specify multiple locations in the table where these styles should be applied (with the locations argument).
Using Hexadecimal Color Codes: Colors in many programming languages, including R, can be specified using hexadecimal color codes. These codes start with a hash symbol (#) and are followed by six hexadecimal digits. The first two digits represent the red component, the next two represent the green component, and the last two represent the blue component. So, when we set color = “#00353f”, we’re specifying a color that has no red, a bit of green, and a good amount of blue, which results in a deep blue color. This allows us to have precise control over the colors we use in our tables.
<- t4 %>%
t7 tab_style(
style = cell_text(
font = google_font(name = 'Yanone Kaffeesatz'),
color = "#00353f"
),locations = cells_title(groups = c("title", "subtitle"))
) t7
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
We can extend our customization to include the labels for columns, spanners, and stubs, as well as the source note. Within the locations
argument, we’ll supply a list indicating the specific locations for these changes. For a comprehensive understanding of the locations, please refer to Appendix (List 1).
<- t7 %>%
t8 tab_style(
style = list(
cell_text(
font = google_font(name = "Montserrat"),
color = "#00353f"
)
),locations = list(
cells_column_labels(columns = everything()), # select every column
cells_column_spanners(spanners = everything()), # select all spanners
cells_source_notes(),
cells_stub()
)
) t8
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
If you want to change the fill background of the title, you can do so by adjusting the locations
argument to point at cells_title(groups = "title")
. Here’s how you could do it:
<- t7 %>%
t9 tab_style(
style = cell_fill(color = "#ffffff"),
locations = cells_title(groups = "title")
) t9
In this code, cell_fill(background = "#ffffff")
changes the background color to white, and locations = cells_title(groups = "title")
applies this change specifically to the title of the table.
Question 3: Fonts and Text Which R code snippet allows you to change the font size of the footnote text in a {gt}
table?
%>%
data gt() %>%
tab_header(font.size = px(16))
%>%
data gt() %>%
tab_style(
style = cell_text(
size = 16
),locations = cells_footnotes()
)
%>%
data gt() %>%
tab_style(
style = cells_header(),
css = "font-size: 16px;"
)
%>%
data gt() %>%
tab_style(
style = cells_header(),
css = "font-size: 16;"
)
In {gt}
it’s also possible to draw borders in the tables to help the end user focus on specific area in the table. In order to add borders to a {gt}
table we will use, again the, tab_style
function and, again, specify the style and locations argument. The only difference now is that we will use the cell_borders
helper function and assign it to the style argument. Here’s how:
Let’s first add a vertical line:
<- t8 %>%
t10 tab_style(
style = cell_borders( # we are adding a border
sides = "left", # to the left of the selected location
color = "#45785e", # with a dark green color
weight = px(5) # and five pixels of thickness
),locations = cells_body(columns = 2) # add this border line to the left of column 2
) t10
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
Now let’s add another pink horizontal border line:
<- t10 %>%
t11 tab_style(
style = cell_borders( # we are adding a border line
sides = "bottom", # to the bottom of the selected location
color = "#45785e", # with a pink color
weight = px(5) # and five pixels of thickness
),locations = list(
cells_column_labels(columns = everything()), # add this border line to the bottom of the column labels
cells_stubhead() # and to the stubhead
)
)
t11
HIV Testing in Malawi | ||||
---|---|---|---|---|
Q1 to Q4 2019 | ||||
New tests
|
Previous tests
|
|||
Positive | Negative | Positive | Negative | |
2019 Q1 | 6199 | 284694 | 14816 | 6595 |
2019 Q2 | 6132 | 282249 | 15101 | 5605 |
2019 Q3 | 5907 | 300529 | 15799 | 6491 |
2019 Q4 | 5646 | 291622 | 15700 | 6293 |
sum | 23884 | 1159094.0 | 61416 | 24984 |
mean | 5971 | 289773.5 | 15354 | 6246 |
Source: Malawi HIV Program |
Question 4: Borders To add a solid border around the entire {gt}
table, which R code snippet should you use?
Hint : we can use a function that sets options for the entirety of the table, just like the theme
function for the ggplot
package.
%>%
data gt() %>%
tab_options(table.border.top.style = "solid")
%>%
data gt() %>%
tab_options(table.border.style = "solid")
%>%
data gt() %>%
tab_style(
style = cells_table(),
css = "border: 1px solid black;"
)
%>%
data gt() %>%
tab_style(
style = cells_body(),
css = "border: 1px solid black;"
)
Wrapping up our series, we began with a comprehensive recap from part one, setting the stage for advanced table styling techniques. We explored the utilization of the opt_stylize()
function to elegantly apply predefined styles and colors, enhancing the visual appeal of our tables. A key part of our discussion introduced the data_color
function, a powerful tool for applying color-coded distinctions to table values, which aids in quick data assessment and visual tracking. We then delved into conditional formatting, leveraging the tab_style
function to dynamically alter cell styles based on the data they contain, a step that brings attention to critical metrics and trends. Moving beyond functionality, we focused on aesthetics with gt::tab_style()
, showcasing how unique fonts from Google Fonts can significantly enhance the readability and text appearance within tables. Finally, we covered the strategic addition of borders using the tab_style
function in conjunction with the cell_borders
helper function, guiding users’ focus to specific areas of the table and improving overall data interpretation. Each of these techniques contributes to a more engaging and informative table presentation, ensuring that the data is not only accessible but also compelling.
# Solutions are where the numbered lines are
# Calculate the total_new_pos summary
<- hiv_malawi %>%
total_summary group_by(region) %>% ##1
summarize(total_new_positive = new_positive) ##2
# Create a gt table and apply cell coloration
<- total_summary %>%
summary_table gt() %>% ##3
tab_style(
style = cell_fill(color = "red"),
locations = cells_body( ##4
columns = "new_positive",
rows = new_positive >= 50 ##5
)%>%
) tab_style(
style = cell_fill(color = "green"), ##6
locations = cells_body(
columns = "new_positive",
rows = new_positive < 50 ##7
) )
The following team members contributed to this lesson:
The definite cookbook of {gt}
by Tom Mock : https://themockup.blog/static/resources/gt-cookbook.html#introduction
the Grammar of Table article : https://themockup.blog/posts/2020-05-16-gt-a-grammar-of-tables/#add-titles
official {gt}
documentation page : https://gt.rstudio.com/articles/intro-creating-gt-tables.html
Create Awesome HTML Table with knitr::kable and kableExtra book by Hao Zhu : https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html#Overview
The {gt}
package in R provides a variety of functions to specify locations within a table where certain styles or formatting should be applied. Here are some of them:
cells_body()
: This function targets cells within the body of the table. You can further specify rows and columns to target a subset of the body.
cells_column_labels()
: This function targets the cells that contain the column labels.
cells_column_spanners()
: This function targets cells that span multiple columns.
cells_footnotes()
: This function targets cells that contain footnotes.
cells_grand_summary()
: This function targets cells that contain grand summary rows.
cells_group()
: This function targets cells that contain group label rows.
cells_row_groups()
: This function targets cells that contain row group label rows.
cells_source_notes()
: This function targets cells that contain source notes.
cells_stub()
: This function targets cells in the table stub (the labels in the first column of the table).
cells_stubhead()
: This function targets the cell that contains the stubhead.
cells_stub_summary()
: This function targets cells that contain stub summary rows.
cells_title()
: This function targets cells that contain the table title and subtitle.
cells_summary()
: This function targets cells that contain summary rows.
These functions can be used in the locations
argument of the tab_style()
function to apply specific styles to different parts of the table.