Quiz_Conditional_mutate

if(!require(pacman)) install.packages("pacman")   
Loading required package: pacman
Warning: package 'pacman' was built under R version 4.3.3
pacman::p_load(rio, tidyverse, here, janitor)   
india_tb <- import("https://docs.google.com/uc?id=1dorSmZ09JtuchIYS18T2r1lxndayUFoz&export=download",
 format = "csv",
 setclass = "tibble")

head(india_tb, 10)
# A tibble: 10 × 13
       id sex      age education    employment alcohol smoking wtin_kgs htin_cms
    <int> <chr>  <int> <chr>        <chr>      <chr>   <chr>      <dbl>    <dbl>
 1 100687 Male      27 Middle       Non - Wor… No      No            51      165
 2 101172 Female    21 Graduate     Non - Wor… No      No            31      160
 3 101948 Female    30 Secondary    Working    No      No            38      152
 4 103209 Male      65 No Education Non - Wor… No      No            48      162
 5 103694 Male      55 Middle       Working    No      No            46      155
 6 103791 Female    22 Higher Seco… Non - Wor… No      No            32      145
 7 104276 Male      38 Primary      Working    No      No            49      165
 8 104373 Male      55 No Education Non - Wor… No      No            49      165
 9 200209 Male      25 Higher Seco… Working    Yes     No            43      150
10 200306 Male      42 No Education Working    Yes     No            45      175
# ℹ 4 more variables: diabetes <int>, form_of_tb <chr>, chest_xray <chr>,
#   total_cost <int>

calculates and classifies BMI, and tabulates BMI categories by frequency

india_tb_bmis_1 <- 
  india_tb %>%
  mutate(height_m = htin_cms/100, 
         bmi = wtin_kgs/(height_m^2)) %>% 
  mutate(bmi_class = case_when(bmi < 18.5 ~ 
  'Underweight', bmi >= 18.5 & bmi < 25 ~ 'Normal range', bmi >= 25 & bmi < 30 ~ 'Overweight', bmi >= 30 ~ 'Obese'))
tabyl(india_tb_bmis_1, bmi_class) %>% 
  select(bmi_class, n)
    bmi_class   n
 Normal range  61
        Obese   1
   Overweight   5
  Underweight 179
         <NA>   4

For Asian populations, lower BMI cut-off values are sometimes used for defining overweight and obesity. (See for example, this paper)

india_tb_bmis <- 
  india_tb %>%
  mutate(height_m = htin_cms/100, 
         bmi = wtin_kgs/(height_m^2)) %>% 
  mutate(bmi_class = case_when(bmi < 18.5 ~  'Underweight', bmi >= 18.5 & bmi < 23 ~ 'Normal range', bmi >= 23 & bmi < 25 ~ 'Overweight', bmi >= 25 & bmi < 30 ~ 'Pre-Obese', bmi >= 30 ~ "Obese"))
tabyl(india_tb_bmis, bmi_class) %>% 
  select(bmi_class, n)
    bmi_class   n
 Normal range  52
        Obese   1
   Overweight   9
    Pre-Obese   5
  Underweight 179
         <NA>   4

Using the BMI classifications from the first question (not the Asian-specific classification), you would like to create a bar graph of BMI class frequencies for just women.

data_for_bmi_plot <- 
  india_tb_bmis_1 %>% 
  mutate(bmi_class = factor(x = 
bmi_class, levels
  = c("Underweight", "Normal range", "Overweight", "Obese"))) %>% 
  filter(sex 
== "Female")

ggplot(data_for_bmi_plot) +
  aes(x = bmi_class) +
  geom_bar(fill = "#112446") +
  theme_minimal()

Hint: You can use esquisse::esquisser(data_for_bmi_plot) to obtain the appropriate ggplot2 code.

esquisse::esquisser(data_for_bmi_plot)
Loading required namespace: plotly
Loading required package: shiny

Listening on http://127.0.0.1:4680

Recruiting subgroups

You would like to recruit individuals who drink alcohol or who smoke into a further study on health habits. Women and men will be recruited into separate studies and a new column, recruit

health_habits_recruitment_df <- 
  india_tb %>% 
  select(sex, alcohol, smoking) %>% # subset to make manipulations more visible
  mutate(recruit_to_mental_health_study = 
           case_when(sex == "Female" & (alcohol == "Yes" | smoking == "Yes") ~ "F study",
            sex == "Male" & (alcohol == "Yes" | smoking == "Yes") ~ "M study", TRUE
  ~ "Do not recruit") # do not recruit everyone else
         )

tabyl(health_habits_recruitment_df, recruit_to_mental_health_study) %>% 
  select(recruit_to_mental_health_study, n)
 recruit_to_mental_health_study   n
                 Do not recruit 128
                        F study   2
                        M study 120

Now, imagine you would like to recruit individuals who are overweight or obese (BMI ≥ 25) or who have diabetes (diabetes == 1) into a further study on comorbidities. Employed and not-employed individuals are to be recruited into separate studies (an “Employed study” and a “Not employed” study respectively).

employment_indicators_df <- 
  india_tb_bmis_1 %>% 
  select(employment, diabetes, bmi) %>% # subset to make manipulations more visible
  mutate(recruit_comorbidity_study = 
           case_when(employment == "Non - Working" & (bmi >= 25 | diabetes == 1) ~ "Not employed study" ,
            employment == "Working" & (bmi >= 25 | diabetes == 1) ~ "Employed study", TRUE
  ~ "Do not recruit") # do not recruit everyone else
         )

tabyl(employment_indicators_df, recruit_comorbidity_study) %>% 
  select(recruit_comorbidity_study, n)
 recruit_comorbidity_study   n
            Do not recruit 207
            Employed study  33
        Not employed study  10

Replacing NAs

You would like to replace the missing values in the chest_xray variable with the string “X-ray not performed”, then cross-tabulate the TB form and chest_xray variables.

india_tb %>% 
  mutate(chest_xray = if_else(condition = chest_xray == "", true = "X-ray not performed", false = 
chest_xray)) %>% 
  tabyl(form_of_tb, chest_xray)
  form_of_tb Negative Positive X-ray not performed
 Ini smear -        0       50                   5
 Ini smear +       22       19                 147
     Missing        0        7                   0