This online appendix provides R codes to apply the methods presented in the companion paper (Charpentier, Flachaire, and Gallic (2023)).

1 Data

To illustrate the methods, we rely on the Linked Birth/Infant Death Cohort Data (2013 cohort). The CSV files for the 2013 cohort were downloaded from the NBER collection of Birth Cohort Linked Birth and Infant Death Data of the National Vital Statistics System of the National Center for Health Statistics, on the NBER website. We formated the data (see ./births_stats.html).

Let us load the birth data.

library(tidyverse)
load("../data/births.RData")

Then, we can only keep a subsample of variables to illustrate the method.

base <- 
  births %>% 
  mutate(
    black_mother = mracerec == "Black",
    nonnatural_delivery = rdmeth_rec != "Vaginal"
  ) %>% 
  select(
    sex, dbwt, cig_rec, wtgain, 
    black_mother, nonnatural_delivery,
    mracerec, rdmeth_rec
  ) %>% 
  mutate(
    cig_rec = replace_na(cig_rec, "Unknown or not stated"),
    black_mother = ifelse(black_mother, yes = "Yes", no = "No"),
    is_girl = ifelse(sex == "Female", yes = "Yes", no = "No")
  ) %>% 
  labelled::set_variable_labels(
    black_mother = "Is the mother Black?",
    is_girl = "Is the newborn a girl?",
    nonnatural_delivery = "Is the delivery method non-natural?"
  ) %>% 
  rename(birth_weight = dbwt)
base

# A tibble: 3,940,764 × 9
   sex    birth_weight cig_rec wtgain black_mo…¹ nonna…² mrace…³ rdmet…⁴ is_girl
   <fct>         <dbl> <fct>    <dbl> <chr>      <lgl>   <fct>   <fct>   <chr>  
 1 Male           3017 No           1 No         FALSE   "White" Vaginal No     
 2 Female         3367 No          18 No         FALSE   "White" Vaginal Yes    
 3 Male           4409 No          36 No         FALSE   "White" Vaginal No     
 4 Male           4047 No          25 No         FALSE   "White" Vaginal No     
 5 Male           3119 Yes          0 No         FALSE   "Ameri… Vaginal No     
 6 Female         4295 Yes         34 No         FALSE   "White" Vaginal Yes    
 7 Male           3160 No          47 No         FALSE   "Ameri… Vaginal No     
 8 Male           3711 No          39 No         FALSE   "White" Vaginal No     
 9 Male           3430 No          NA No         FALSE   "White" Vaginal No     
10 Female         3772 No          36 No         FALSE   "White" Vaginal Yes    
# … with 3,940,754 more rows, and abbreviated variable names ¹black_mother,
#   ²nonnatural_delivery, ³mracerec, ⁴rdmeth_rec

The variables that were kept are the following:

sex: sex of the newborn
birth_weight: Birth Weight (in Grams)
cig_rec: mother smokes cigarettes
wtgain: mother weight gain
mracerec: mother’s race
black_mother: is mother black?
rdmeth_rec: delivery method
nonnatural_delivery: is the delivery method non-natural?

Then, let us discard individuals with missing values.

base <- 
  base %>% 
  filter(
    !is.na(birth_weight),
    !is.na(wtgain),
    !is.na(nonnatural_delivery),
    !is.na(black_mother)
  )

Let us define some colours for later use:

library(wesanderson)
colr1 <- wes_palette("Darjeeling1")
colr2 <- wes_palette("Darjeeling2")
couleur1 <- colr1[2]
couleur2 <- colr1[4]
couleur3 <- colr2[2]

coul1 <- "#882255"
coul2 <- "#DDCC77"

We need to load some packages.

library(car)
library(transport)
library(splines)
library(mgcv)
library(expm)
library(tidyverse)

2 Objectives

We would like to measure the effect of a treatment variable \(T\) on the probability of observing a binary outcome \(y\), depending on a binary treatment \(T\). The outcome is assumed to depend on another variable \(\boldsymbol{x}^m\) that is also influenced by the treatment; this variable is called a mediator.

Output (\(y\), nonnatural_delivery): Probability of having a non-natural delivery
Treatment (\(T\)), either:
- cig_rec: Whether the mother smokes \(\color{couleur2}{t=1}\) or not \(\color{couleur1}{t=0}\)
- black_mother: Whether the mother is Black \(\color{couleur2}{t=1}\) or not \(\color{couleur1}{t=0}\)
- sex: Whether the baby is a girl \(\color{couleur2}{t=1}\) or not \(\color{couleur1}{t=0}\)
Mediator (\(\boldsymbol{x}^m\)) is either:
- birth_weight: Birth weight of the newborn
- wtgain: weight gain of the mother.

Sample Conditional Average Treatment Effect: \(\text{SCATE}(\boldsymbol{x})\)

Consider two models, \(\color{couleur1}{\widehat{m}_0(x)}\) and \(\color{couleur2}{\widehat{m}_1(x)}\), that estimate, respectively, \(\color{couleur1}{\mathbb{E}[Y|X=x,T=0]}\) and \(\color{couleur2}{\mathbb{E}[Y|X=x,T=1]}\), \[ \text{SCATE}(x)=\color{couleur2}{\widehat{m}_1}\big(\widehat{\mathcal{T}}(x)\big)\color{black}{} - \color{couleur1}{\widehat{m}_0}\big(x\big) \] where \(\widehat{\mathcal{T}}(\cdot)\) is a transport function:

if we consider a quantile-based matching, \(\widehat{\mathcal{T}}(x)= \color{couleur2}{\widehat{F}_1^{-1}}\color{black}{}\circ \color{couleur1}{\widehat{F}_0}(x)\), with \(\color{couleur1}{\widehat{F}_0}\) and \(\color{couleur2}{\widehat{F}_1}\) denoting the empirical distribution functions of \(x\) conditional on \(\color{couleur1}{t=0}\) and \(\color{couleur2}{t=1}\), respectively.
assuming a Gaussian distribution of the mediator variable, we can consider \(\widehat{\mathcal{T}}(x) := \widehat{\mathcal{T}}_{\mathcal{N}}(x)= \color{couleur2}{\overline{x}_1}\color{black}{}+\color{couleur2}{s_1}\color{couleur1}{s_0^{-1}}\color{black}{} (x-\color{couleur1}{\overline{x}_0}\color{black}{})\), \(\color{couleur1}{\overline{x}_0}\) and \(\color{couleur2}{\overline{x}_1}\) being respectively the averages of \(x\) in the two sub-populations, and \(\color{couleur1}{s_0}\) and \(\color{couleur2}{s_1}\) the sample standard deviations.

What is Needed

To estimate the mutatis mutandis Sample Conditional Average Treatment Effect, the following are required:

The two estimated models \(\color{couleur1}{\widehat{m}_0(x)}\) and \(\color{couleur2}{\widehat{m}_1(x)}\).
A function to predict new values with these each of these two models.
A transport method \(\mathcal{T}(\cdot)\).

We will compute the SCATE at the different values of \(\boldsymbol{x}\):

x_0_birth_weight <- seq(2000,4500, by=500)

x_0_weight_gain <- seq(5,55, by=10)

3 Models

We will consider here different models:

GAM
Local averages computed using kernels

For each model, we need to define two functions: a first one that estimates the model, and a second one that makes predictions based on the estimated model.

3.1 GAM (with cubic splines)

The function used to estimate both models \(\color{couleur1}{\hat{m}_0}\color{black}{}()\) and \(\color{couleur2}{\hat{m}_1}\color{black}{}()\)

library(splines)

#' Returns $\hat{m}_0()$ and $\hat{m}_1()$, using GAM
#' @param target name of the target variable
#' @param treatment_name name of the treatment variable (column in `data`)
#' @param x_m_name name of the mediator variable
#' @param data data frame with the observations
#' @param treatment_0 value for non treated
#' @param treatment_1 value for treated
#' @param df degrees of freedom (in the `bs()` function, deefault to `df=3`)
models_spline <- function(target, treatment_name, x_m_name, data, treatment_0, treatment_1, df = 3){
  # \hat{m}_0()
  reg_0 <-
    bquote(glm(.(target)~bs(.(x_m_name), df = df), data=data,
               family=binomial, subset = (.(treatment_name) == .(treatment_0))),
           list(target = as.name(target), x_m_name=as.name(x_m_name),
                treatment_name = as.name(treatment_name),
                treatment_0 = treatment_0)) %>% 
    eval()
  # \hat{m}_1()
  reg_1 <- bquote(glm(.(target)~bs(.(x_m_name), df = df), data=data,
                      family=binomial, subset = (.(treatment_name) == .(treatment_1))),
                  list(target = as.name(target), x_m_name=as.name(x_m_name),
                       treatment_name = as.name(treatment_name),
                       treatment_1 = treatment_1)) %>% 
    eval()
  
  list(reg_0 = reg_0, reg_1 = reg_1)
}

Let us estimate these models.

target         <- "nonnatural_delivery"
treatment_name <- "cig_rec"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

reg_univ_smoker_bw <- 
  models_spline(target = target,
              treatment_name = treatment_name, 
              x_m_name = x_m_name,
              data = base, treatment_0 = treatment_0, treatment_1 = treatment_1, df = 3)

target         <- "nonnatural_delivery"
treatment_name <- "cig_rec"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

reg_univ_smoker_wg <- 
  models_spline(target = target,
              treatment_name = treatment_name, 
              x_m_name = x_m_name,
              data = base, treatment_0 = treatment_0, treatment_1 = treatment_1, df = 3)

target         <- "nonnatural_delivery"
treatment_name <- "black_mother"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

reg_univ_blackm_bw <- 
  models_spline(target = target,
              treatment_name = treatment_name, 
              x_m_name = x_m_name,
              data = base, treatment_0 = treatment_0, treatment_1 = treatment_1, df = 3)

target         <- "nonnatural_delivery"
treatment_name <- "black_mother"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

reg_univ_blackm_wg <- 
  models_spline(target = target,
              treatment_name = treatment_name, 
              x_m_name = x_m_name,
              data = base, treatment_0 = treatment_0, treatment_1 = treatment_1, df = 3)

target         <- "nonnatural_delivery"
treatment_name <- "sex"
x_m_name       <- "birth_weight"
treatment_0    <- "Male"
treatment_1    <- "Female"

reg_univ_sex_bw <- 
  models_spline(target = target,
              treatment_name = treatment_name, 
              x_m_name = x_m_name,
              data = base, treatment_0 = treatment_0, treatment_1 = treatment_1, df = 3)

target         <- "nonnatural_delivery"
treatment_name <- "sex"
x_m_name       <- "wtgain"
treatment_0    <- "Male"
treatment_1    <- "Female"

reg_univ_sex_wg <- 
  models_spline(target = target,
              treatment_name = treatment_name, 
              x_m_name = x_m_name,
              data = base, treatment_0 = treatment_0, treatment_1 = treatment_1, df = 3)

3.2 Kernels

Let us define a function to compute the local average of a single observation:

#' @param target name of the response variable
#' @param x_m_name name of the mediator variable
#' @param x single value which will be transported
#' @param h bandwidth
#' @param data data frame, subset of data with t=0 or t=1
pred_kernel_single <- function(target, x_m_name, x, h=10, data){
  w <- dnorm(pull(data, x_m_name), x, h)
  sum(w*pull(data, target))/sum(w)
}

target         <- "nonnatural_delivery"
treatment_name <- "cig_rec"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

mod_0_k_smoker_bw <- list(target = target, x_m_name = x_m_name, h= 50,
                     data = base[pull(base, treatment_name) == treatment_0,])
mod_1_k_smoker_bw <- list(target = target, x_m_name = x_m_name, h= 50, 
                     data = base[pull(base, treatment_name) == treatment_1,])

target         <- "nonnatural_delivery"
treatment_name <- "cig_rec"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

mod_0_k_smoker_wg <- list(target = target, x_m_name = x_m_name, h = 50,
                     data = base[pull(base, treatment_name) == treatment_0,])
mod_1_k_smoker_wg <- list(target = target, x_m_name = x_m_name, h = 50, 
                     data = base[pull(base, treatment_name) == treatment_1,])

target         <- "nonnatural_delivery"
treatment_name <- "black_mother"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

mod_0_k_blackm_bw <- list(target = target, x_m_name = x_m_name, h= 50,
                     data = base[pull(base, treatment_name) == treatment_0,])
mod_1_k_blackm_bw <- list(target = target, x_m_name = x_m_name, h= 50, 
                     data = base[pull(base, treatment_name) == treatment_1,])

target         <- "nonnatural_delivery"
treatment_name <- "black_mother"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

mod_0_k_blackm_wg <- list(target = target, x_m_name = x_m_name, h= 50,
                     data = base[pull(base, treatment_name) == treatment_0,])
mod_1_k_blackm_wg <- list(target = target, x_m_name = x_m_name, h= 50, 
                     data = base[pull(base, treatment_name) == treatment_1,])

target         <- "nonnatural_delivery"
treatment_name <- "sex"
x_m_name       <- "birth_weight"
treatment_0    <- "Male"
treatment_1    <- "Female"

mod_0_k_sex_bw <- list(target = target, x_m_name = x_m_name, h= 50,
                     data = base[pull(base, treatment_name) == treatment_0,])
mod_1_k_sex_bw <- list(target = target, x_m_name = x_m_name, h= 50, 
                     data = base[pull(base, treatment_name) == treatment_1,])

target         <- "nonnatural_delivery"
treatment_name <- "sex"
x_m_name       <- "wtgain"
treatment_0    <- "Male"
treatment_1    <- "Female"

mod_0_k_sex_wg <- list(target = target, x_m_name = x_m_name, h= 50,
                     data = base[pull(base, treatment_name) == treatment_0,])
mod_1_k_sex_wg <- list(target = target, x_m_name = x_m_name, h= 50, 
                     data = base[pull(base, treatment_name) == treatment_1,])

4 Prediction Function

4.1 GAM

The prediction function for GAM:

#' @param object regression model (GAM)
#' @param newdata data frame in which to look for the mediator variable used to predict the target
model_spline_predict <- function(object, newdata){
  predict(object, newdata = newdata, type="response")
}

4.2 Kernels

Function used to compute the local average of a single observation:

#' @param target name of the response variable
#' @param x_m_name name of the mediator variable
#' @param x single value which will be transported
#' @param h bandwidth
#' @param data data frame, subset of data with t=0 or t=1
pred_kernel_single <- function(target, x_m_name, x, h=10, data){
  w <- dnorm(pull(data, x_m_name), x, h)
  sum(w*pull(data, target))/sum(w)
}

And the prediction function:

#' @param object regression model (GAM)
#' @param newdata data frame in which to look for the mediator variable used to predict the target
pred_kernel <- function(object, newdata){
  target <- object$target
  x_m_name <- object$x_m_name
  h <- object$h
  data <- object$data
  pull(newdata, 1) %>% 
    map_dbl(~pred_kernel_single(target = target, x_m_name = x_m_name, x = ., h = h,
                                data = data))
}

5 Transport

Let us now turn to the transport of the mediator variable.

5.1 Quantile-based

Then, we define a function to transport the values of a mediator.

#' Quantile-based transport (Univariate)
#' @param x_0 vector of values to be transported
#' @param x_m_name name of the mediator variable
#' @param treatment_name name of the treatment variable (column in `data`)
#' @param treatment_0 value for non treated
#' @param treatment_1 value for treated
#' @param data data frame with both T=0 or T=1
transport_quantile <- function(x_0, x_m_name, treatment_name, treatment_0, treatment_1, data){
  ind_0 <- pull(data, treatment_name) == treatment_0
  x_val <- pull(data, x_m_name)
  # Empirical Cumulative Distribution Function for values of the mediator variable for non-treated
  Fn <- ecdf(x_val)
  # Probability associated in the non-treated
  u <- Fn(x_0)
  # Transported_values
  x_1 <- pull(data, x_m_name)[pull(data, treatment_name) == treatment_1]
  x_t_quantile <- quantile(x_1, u)
  list(x_0 = x_0, u = u, x_t_quantile = x_t_quantile)
}

The transported values of \(\boldsymbol{x}\), i.e., \(\mathcal{T}(\boldsymbol{x})\):

target         <- "nonnatural_delivery"
treatment_name <- "cig_rec"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

# Only at a few points
trans_q_smoker_bw <- 
  transport_quantile(x_0 = x_0_birth_weight, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
probs <- seq(0,1,length=601)
# Quantiles for non treated
x_0_probs_smoker_bw <- quantile(pull(base, x_m_name)[pull(base, treatment_name) == treatment_0], probs = probs)
# Transported values
trans_q_smoker_bw_probs <- 
  transport_quantile(x_0 = x_0_probs_smoker_bw, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

target         <- "nonnatural_delivery"
treatment_name <- "cig_rec"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

# Only at a few points
trans_q_smoker_wg <- 
  transport_quantile(x_0 = x_0_weight_gain, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
probs <- seq(0,1,length=601)
# Quantiles for non treated
x_0_probs_smoker_wg <- quantile(pull(base, x_m_name)[pull(base, treatment_name) == treatment_0], probs = probs)
# Transported values
trans_q_smoker_wg_probs <- 
  transport_quantile(x_0 = x_0_probs_smoker_wg, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

target         <- "nonnatural_delivery"
treatment_name <- "black_mother"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

# Only at a few points
trans_q_blackm_bw <- 
  transport_quantile(x_0 = x_0_birth_weight, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
probs <- seq(0,1,length=601)
# Quantiles for non treated
x_0_probs_blackm_bw <- quantile(pull(base, x_m_name)[pull(base, treatment_name) == treatment_0], probs = probs)
# Transported values
trans_q_blackm_bw_probs <- 
  transport_quantile(x_0 = x_0_probs_blackm_bw, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

target         <- "nonnatural_delivery"
treatment_name <- "black_mother"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

# Only at a few points
trans_q_blackm_wg <- 
  transport_quantile(x_0 = x_0_weight_gain, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
probs <- seq(0,1,length=601)
# Quantiles for non treated
x_0_probs_blackm_wg <- quantile(pull(base, x_m_name)[pull(base, treatment_name) == treatment_0], probs = probs)
# Transported values
trans_q_blackm_wg_probs <- 
  transport_quantile(x_0 = x_0_probs_blackm_wg, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

target         <- "nonnatural_delivery"
treatment_name <- "sex"
x_m_name       <- "birth_weight"
treatment_0    <- "Male"
treatment_1    <- "Female"

# Only at a few points
trans_q_sex_bw <- 
  transport_quantile(x_0 = x_0_birth_weight, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
probs <- seq(0,1,length=601)
# Quantiles for non treated
x_0_probs_sex_bw <- quantile(pull(base, x_m_name)[pull(base, treatment_name) == treatment_0], probs = probs)
# Transported values
trans_q_sex_bw_probs <- 
  transport_quantile(x_0 = x_0_probs_sex_bw, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

target         <- "nonnatural_delivery"
treatment_name <- "sex"
x_m_name       <- "wtgain"
treatment_0    <- "Male"
treatment_1    <- "Female"

# Only at a few points
trans_q_sex_wg <- 
  transport_quantile(x_0 = x_0_weight_gain, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
probs <- seq(0,1,length=601)
# Quantiles for non treated
x_0_probs_sex_wg <- quantile(pull(base, x_m_name)[pull(base, treatment_name) == treatment_0], probs = probs)
# Transported values
trans_q_sex_wg_probs <- 
  transport_quantile(x_0 = x_0_probs_sex_wg, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

We can visualize the transported values from the control group to the treated group. To do so, let us define a function.

Show the R codes

#' @param x_0 vector of values
#' @param x_0_t vector of transported values
#' @param data data frame with the observations
#' @param treatment_name name of the treatment variable (column in `data`)
#' @param treatment_0 value for non treated
#' @param treatment_1 value for treated
#' @param x_m_name name of the mediator variable
#' @param distrib_plot_type type of visualization for the distribution in each subset ("density", "histogram", "none")
#' @param breaks breaks for histogram (if `distrib_plot_type` is not "none")
#' @param xlim limits on the x-axis
#' @param x_c_label label of the mediator variable
#' @param treated_label label of the treated variable
#' @param non_treated_label label of the non-treated variable
plot_transport_univariate <- function(x_0, x_0_t, data,
         treatment_name, treatment_0, treatment_1, x_m_name,
         distrib_plot_type = c("density", "histogram", "none"),
         breaks = NULL,
         xlim, x_c_label, treated_label, non_treated_label){

  distrib_plot_type <- match.arg(distrib_plot_type)

  # Subset of non treated
  d_0 <- filter(data, !!sym(treatment_name) == treatment_0)
  # Subset of treated
  d_1 <- filter(data, !!sym(treatment_name) == treatment_1)

  if(distrib_plot_type == "density"){
    # Density for non-treated
    dens_0 <- density(pull(d_0, x_m_name))
    # Density for treated
    dens_1 <- density(pull(d_1, x_m_name))
  }else if(distrib_plot_type == "histogram"){
    hist_0 <- hist(pull(d_0, x_m_name), breaks = breaks, plot = FALSE)
    hist_1 <- hist(pull(d_1, x_m_name), breaks = breaks, plot = FALSE)
  }

  if(distrib_plot_type != "none"){
    mat <- matrix(c(1,2,0,3), 2)
    par(mfrow = c(2,2))
    layout(mat, c(3.5,1), c(1,3))
    par(mar = c(0.5, 4.5, 0.5, 0.5))
  }

  if(distrib_plot_type == "density"){
    # Density of the mediator variable on the subset of non treated (green)
    plot(dens_0$x, dens_0$y,
         main="", axes=FALSE, xlab="", ylab="",
         xlim = xlim, col="white")
    polygon(dens_0$x, dens_0$y, col = couleur1, border = NA)
  }else if(distrib_plot_type == "histogram"){
    # Histogram of the mediator variable on the subset of non treated (green)
    barplot(hist_0$counts[hist_0$breaks >= min(xlim) & hist_0$breaks <= max(xlim)],
            width=5, space=0, main="",
            axes=FALSE,xlab="", ylab="",
            xlim=xlim, col=couleur1, border="white")

  }
  
  par(mar=c(4.5, 4.5, 0.5, 0.5))
  # Optimal transport (quantile based)
  plot(x_0, x_0_t,
       col = couleur3, lwd = 2,
       type = "l",
       xlab = "",
       ylab = "",
       xlim = xlim, ylim = xlim)
  abline(a = 0, b = 1, col = couleur3, lty = 2)
  
  ylab_1 <- bquote(.(x_c_label) * " (" *phantom(.(non_treated_label)) * ")")
  ylab_2 <- bquote(phantom(.(x_c_label)) * phantom(" (") *.(non_treated_label) * phantom(")"))
  mtext(ylab_1, side=1,line=3, col = "black")
  mtext(ylab_2, side=1,line=3, col = couleur1)
  
  xlab_1 <- bquote(.(x_c_label) * " (" *phantom(.(treated_label)) * ")")
  xlab_2 <- bquote(phantom(.(x_c_label)) * phantom(" (") *.(treated_label) *phantom(")"))
  mtext(xlab_1, side=2,line=3, col = "black")
  mtext(xlab_2, side=2,line=3, col = couleur2)

  if(distrib_plot_type == "density"){
    # Density of the mediator variable the subset of treated (orange)
    par(mar = c(4.5, 0.5, 0.5, 0.5))
    plot(dens_1$y, dens_1$x,
         main="", axes=FALSE, xlab="", ylab="",
         ylim = xlim, col="white")
    polygon(dens_1$y, dens_1$x, col = couleur2, border = NA)
  }else if(distrib_plot_type == "histogram"){
    # Histogram of the mediator variable the subset of treated (orange)
    par(mar = c(4.5, 0.5, 0.5, 0.5))
    barplot(hist_1$counts[hist_1$breaks >= min(xlim) & hist_1$breaks <= max(xlim)],
            width=5, space=0, main="",
            axes=FALSE,xlab="", ylab="",
            ylim=xlim, col=couleur2, border="white", horiz=TRUE)
  }
}

Show the R codes

plot_transport_univariate(x_0 = trans_q_smoker_bw_probs$x_0, 
                          x_0_t = trans_q_smoker_bw_probs$x_t_quantile, 
                          data = base, 
                          treatment_name = "cig_rec", 
                          treatment_0 = "No", treatment_1 = "Yes", 
                          x_m_name = "birth_weight", distrib_plot_type = "density",
                          breaks = NULL, xlim = c(1800,4600), 
                          x_c_label = "Weight of the baby",
                          treated_label = "Smoker mother", non_treated_label = "non-smoker mother")

Figure 1. Transported newborn weights from the control group (non-smoking mothers) to the treated group (smoking mothers), with estimated densities

Show the R codes

plot_transport_univariate(x_0 = trans_q_smoker_wg_probs$x_0, 
                          x_0_t = trans_q_smoker_wg_probs$x_t_quantile, 
                          data = base, 
                          treatment_name = "cig_rec", 
                          treatment_0 = "No", treatment_1 = "Yes", 
                          x_m_name = "wtgain", distrib_plot_type = "histogram",
                          breaks = seq(0,105,by=5), xlim = c(0,90), 
                          x_c_label = "Weight of the baby",
                          treated_label = "Smoker mother", non_treated_label = "non-smoker mother")

Figure 2. Transported mother’s weight gain from the control group (non-smoking mothers) to the treated group (smoking mothers), with histograms

Show the R codes

plot_transport_univariate(x_0 = trans_q_blackm_bw_probs$x_0, 
                          x_0_t = trans_q_blackm_bw_probs$x_t_quantile, 
                          data = base, 
                          treatment_name = "black_mother", 
                          treatment_0 = "No", treatment_1 = "Yes", 
                          x_m_name = "birth_weight", distrib_plot_type = "density",
                          breaks = NULL, xlim = c(1800,4600), 
                          x_c_label = "Weight of the baby",
                          treated_label = "Black mother", non_treated_label = "non-Black mother")

Figure 3. Transported newborn weights from the control group (non-Black mothers) to the treated group (Black mothers), with estimated densities

Show the R codes

plot_transport_univariate(x_0 = trans_q_blackm_wg_probs$x_0, 
                          x_0_t = trans_q_blackm_wg_probs$x_t_quantile, 
                          data = base, 
                          treatment_name = "black_mother", 
                          treatment_0 = "No", treatment_1 = "Yes", 
                          x_m_name = "wtgain", distrib_plot_type = "histogram",
                          breaks = seq(0,105,by=5), xlim = c(0,90),
                          x_c_label = "Weight gain of the mother",
                          treated_label = "Black mother", non_treated_label = "non-Black mother")

Figure 4. Transported mother’s weight gain from the control group (non-Black mothers) to the treated group (Black mothers), with histograms

\(\boldsymbol{x}^m\): Mother’s weight gain

Show the R codes

plot_transport_univariate(x_0 = trans_q_sex_wg_probs$x_0, 
                          x_0_t = trans_q_sex_wg_probs$x_t_quantile, 
                          data = base, 
                          treatment_name = "sex", 
                          treatment_0 = "Male", treatment_1 = "Female", 
                          x_m_name = "wtgain", distrib_plot_type = "histogram",
                          breaks = seq(0,105,by=5), xlim = c(0,90),
                          x_c_label = "Weight gain of the mother",
                          treated_label = "baby girl", non_treated_label = "baby boy")

Figure 5. Transported mother’s weight gain from the control group (baby boy) to the treated group (baby girl), with histograms

5.2 Gaussian Assumption

Now, let us consider the case in which the mediator is assumed to be Gaussian.

The transport function can be defined as follows:

#' @param x_0 vector of values to be transported
#' @param x_m_name name of the mediator variable
#' @param treatment_name name of the treatment variable (column in `data`)
#' @param treatment_0 value for non treated
#' @param treatment_1value for treated
#' @param data data frame with both T=0 or T=1
transport_univ_gaussian <- function(x_0, x_m_name, treatment_name, treatment_0, treatment_1, data){
  x0 <- mean(pull(data, x_m_name)[pull(data, treatment_name) == treatment_0])
  x1 <- mean(pull(data, x_m_name)[pull(data, treatment_name) == treatment_1])
  s0 <- sd(pull(data, x_m_name)[pull(data, treatment_name) == treatment_0])
  s1 <- sd(pull(data, x_m_name)[pull(data, treatment_name) == treatment_1])
  u_N <- pnorm(x_0,x0,s0)
  x_t_N <- qnorm(u_N, x1, s1)
  list(x_0 = x_0, u_N = u_N, x_t_N = x_t_N)
}

The transported values of \(\boldsymbol{x}\), i.e., \(\mathcal{T}_\mathcal{N}(\boldsymbol{x})\):

target         <- "nonnatural_delivery"
treatment_name <- "cig_rec"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

# Only at a few points
trans_N_smoker_bw <- 
  transport_univ_gaussian(x_0 = x_0_birth_weight, x_m_name = x_m_name, 
                          treatment_name = treatment_name, 
                          treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
trans_N_smoker_bw_probs <- 
  transport_univ_gaussian(x_0 = x_0_probs_smoker_bw, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

target         <- "nonnatural_delivery"
treatment_name <- "cig_rec"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

trans_N_smoker_wg <- 
  transport_univ_gaussian(x_0 = x_0_weight_gain, x_m_name = x_m_name, 
                          treatment_name = treatment_name, 
                          treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
trans_N_smoker_wg_probs <- 
  transport_univ_gaussian(x_0 = x_0_probs_smoker_wg, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

target         <- "nonnatural_delivery"
treatment_name <- "black_mother"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

trans_N_blackm_bw <- 
  transport_univ_gaussian(x_0 = x_0_birth_weight, x_m_name = x_m_name, 
                          treatment_name = treatment_name, 
                          treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
trans_N_blackm_bw_probs <- 
  transport_univ_gaussian(x_0 = x_0_probs_blackm_bw, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

target         <- "nonnatural_delivery"
treatment_name <- "black_mother"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

trans_N_blackm_wg <- 
  transport_univ_gaussian(x_0 = x_0_weight_gain, x_m_name = x_m_name, 
                          treatment_name = treatment_name, 
                          treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
trans_N_blackm_wg_probs <- 
  transport_univ_gaussian(x_0 = x_0_probs_blackm_wg, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

target         <- "nonnatural_delivery"
treatment_name <- "sex"
x_m_name       <- "birth_weight"
treatment_0    <- "Male"
treatment_1    <- "Female"

trans_N_sex_bw <- 
  transport_univ_gaussian(x_0 = x_0_birth_weight, x_m_name = x_m_name, 
                          treatment_name = treatment_name, 
                          treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
trans_N_sex_bw_probs <- 
  transport_univ_gaussian(x_0 = x_0_probs_sex_bw, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

target         <- "nonnatural_delivery"
treatment_name <- "sex"
x_m_name       <- "wtgain"
treatment_0    <- "Male"
treatment_1    <- "Female"

trans_N_sex_wg <- 
  transport_univ_gaussian(x_0 = x_0_weight_gain, x_m_name = x_m_name, 
                          treatment_name = treatment_name, 
                          treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

# At more points
trans_N_sex_wg_probs <- 
  transport_univ_gaussian(x_0 = x_0_probs_sex_wg, x_m_name = x_m_name, 
                     treatment_name = treatment_name, 
                     treatment_0 = treatment_0, treatment_1 = treatment_1, data = base)

We can visualize the transported values from the control group to the treated group.

Show the R codes

plot_transport_univariate(x_0 = trans_N_smoker_bw_probs$x_0, 
                          x_0_t = trans_N_smoker_bw_probs$x_t_N, 
                          data = base, 
                          treatment_name = "cig_rec", 
                          treatment_0 = "No", treatment_1 = "Yes", 
                          x_m_name = "birth_weight", distrib_plot_type = "density",
                          breaks = NULL, xlim = c(1800,4600), 
                          x_c_label = "Weight of the baby",
                          treated_label = "Smoker mother", non_treated_label = "non-smoker mother")

Figure 6. Transported newborn weights (Gaussian assumption) from the control group (non-smoking mothers) to the treated group (smoking mothers), with estimated densities

Show the R codes

plot_transport_univariate(x_0 = trans_N_smoker_wg_probs$x_0, 
                          x_0_t = trans_N_smoker_wg_probs$x_t_N, 
                          data = base, 
                          treatment_name = "cig_rec", 
                          treatment_0 = "No", treatment_1 = "Yes", 
                          x_m_name = "wtgain", distrib_plot_type = "histogram",
                          breaks = seq(0,105,by=5), xlim = c(0,90), 
                          x_c_label = "Weight of the baby",
                          treated_label = "Smoker mother", non_treated_label = "non-smoker mother")

Figure 7. Transported mother’s weight gain (Gaussian assumption)from the control group (non-smoking mothers) to the treated group (smoking mothers), with histograms

Show the R codes

plot_transport_univariate(x_0 = trans_N_blackm_bw_probs$x_0, 
                          x_0_t = trans_N_blackm_bw_probs$x_t_N, 
                          data = base, 
                          treatment_name = "black_mother", 
                          treatment_0 = "No", treatment_1 = "Yes", 
                          x_m_name = "birth_weight", distrib_plot_type = "density",
                          breaks = NULL, xlim = c(1800,4600), 
                          x_c_label = "Weight of the baby",
                          treated_label = "Black mother", non_treated_label = "non-Black mother")

Figure 8. Transported newborn weights (Gaussian assumption) from the control group (non-Black mothers) to the treated group (Black mothers), with estimated densities

Show the R codes

plot_transport_univariate(x_0 = trans_N_blackm_wg_probs$x_0, 
                          x_0_t = trans_N_blackm_wg_probs$x_t_N, 
                          data = base, 
                          treatment_name = "black_mother", 
                          treatment_0 = "No", treatment_1 = "Yes", 
                          x_m_name = "wtgain", distrib_plot_type = "histogram",
                          breaks = seq(0,105,by=5), xlim = c(0,90),
                          x_c_label = "Weight gain of the mother",
                          treated_label = "Black mother", non_treated_label = "non-Black mother")

Figure 9. Transported mother’s weight gain (Gaussian assumption)from the control group (non-Black mothers) to the treated group (Black mothers), with histograms

Show the R codes

plot_transport_univariate(x_0 = trans_N_sex_bw_probs$x_0, 
                          x_0_t = trans_N_sex_bw_probs$x_t_N, 
                          data = base, 
                          treatment_name = "sex", 
                          treatment_0 = "Male", treatment_1 = "Female", 
                          x_m_name = "birth_weight", distrib_plot_type = "density",
                          breaks = NULL, xlim = c(1800,4600), 
                          x_c_label = "Weight of the baby",
                          treated_label = "baby girl", non_treated_label = "baby boy")

Figure 10. Transported newborn weights (Gaussian assumption) from the control group (baby boy) to the treated group (baby girl), with estimated densities

Show the R codes

plot_transport_univariate(x_0 = trans_N_sex_wg_probs$x_0, 
                          x_0_t = trans_N_sex_wg_probs$x_t_N, 
                          data = base, 
                          treatment_name = "sex", 
                          treatment_0 = "Male", treatment_1 = "Female",
                          x_m_name = "wtgain", distrib_plot_type = "histogram",
                          breaks = seq(0,105,by=5), xlim = c(0,90),
                          x_c_label = "Weight gain of the mother",
                          treated_label = "baby girl", non_treated_label = "baby boy")

Figure 11. Transported mother’s weight gain (Gaussian assumption)from the control group (baby boy) to the treated group (baby girl), with histograms

6 Estimation

Now that the two models \(\color{couleur1}{\widehat{m}_0(x)}\) and \(\color{couleur2}{\widehat{m}_1(x)}\) are defined, we can compute the Mutatis Mutandis Sample Conditional Average Effect.

#' Computes the Sample Average Treatment Effect with and without transport
#' @param x_0 vector of values at which to compute $\hat{m}_0(x_0)$, $\hat{m}_1(x_0)$
#' @param x_t vector of values at which to compute $\hat{m}_1(\mathcal{T}(x_t))$
#' @param x_m_name vector of names of the mediator variables
#' @param mod_0 model $\hat{m}_0$
#' @param mod_1 model $\hat{m}_1$
#' @param pred_mod_0 prediction function for model $\hat{m}_0(\cdot)$
#' @param pred_mod_1 prediction function for model $\hat{m}_1(\cdot)$
#' @param return_x if `TRUE` (default) the mediator variables are returned in the table, as well as their transported values
sate <- function(x_0, x_t, x_m_names, mod_0, mod_1, pred_mod_0, pred_mod_1, return_x = TRUE){
  if(is.vector(x_0)){
    # Univariate case
    new_data <- tibble(!!x_m_names := x_0)
  }else{
    new_data <- x_0
  }
  if(is.vector(x_t)){
    # Univariate case
    new_data_t <- tibble(!!x_m_names := x_t)
  }else{
    new_data_t <- x_t
  }
  
  
  # $\hat{m}_0(x_0)$
  y_0 <- pred_mod_0(object = mod_0, newdata = new_data)
  # $\hat{m}_1(x_0)$
  y_1 <- pred_mod_1(object = mod_1, newdata = new_data)
  # $\hat{m}_1(\mathcal{T}(x_0))$
  y_1_t <- pred_mod_1(object = mod_1, newdata = new_data_t)
  
  
  scate_tab <- 
    tibble(y_0 = y_0, y_1 = y_1, y_1_t = y_1_t,
         CATE = y_1-y_0, SCATE = y_1_t-y_0)
  
  if(return_x){
    new_data_t <- new_data_t %>% rename_all(~paste0(.x, "_t"))
    scate_tab <- bind_cols(new_data, new_data_t, scate_tab)
  }
  scate_tab
}

6.1 GAM, with Quantile-based transport

Let us compute here the SCATE where the models \(\color{couleur1}{\widehat{m}_0(x)}\) and \(\color{couleur2}{\widehat{m}_1(x)}\) are GAM and where the transport method is quantile-based.

treatment_name <- "cig_rec"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

# Only at some points
cate_q_gam_univ_smoker_bw <-
  sate(x_0 = x_0_birth_weight,
       x_t = trans_q_smoker_bw$x_t_quantile,
       x_m_names = x_m_name,
       mod_0 = reg_univ_smoker_bw$reg_0,
       mod_1 = reg_univ_smoker_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_q_gam_univ_smoker_bw_2 <- 
  sate(x_0 = trans_q_smoker_bw_probs$x_0,
       x_t = trans_q_smoker_bw_probs$x_t_quantile,
       x_m_names = x_m_name,
       mod_0 = reg_univ_smoker_bw$reg_0,
       mod_1 = reg_univ_smoker_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

treatment_name <- "cig_rec"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

# Only at some points
cate_q_gam_univ_smoker_wg <-
  sate(x_0 = x_0_weight_gain,
       x_t = trans_q_smoker_wg$x_t_quantile,
       x_m_names = x_m_name,
       mod_0 = reg_univ_smoker_wg$reg_0,
       mod_1 = reg_univ_smoker_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_q_gam_univ_smoker_wg_2 <- 
  sate(x_0 = trans_q_smoker_wg_probs$x_0,
       x_t = trans_q_smoker_wg_probs$x_t_quantile,
       x_m_names = x_m_name,
       mod_0 = reg_univ_smoker_wg$reg_0,
       mod_1 = reg_univ_smoker_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

treatment_name <- "black_mother"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

# Only at some points
cate_q_gam_univ_blackm_bw <-
  sate(x_0 = x_0_birth_weight,
       x_t = trans_q_blackm_bw$x_t_quantile,
       x_m_names = x_m_name,
       mod_0 = reg_univ_blackm_bw$reg_0,
       mod_1 = reg_univ_blackm_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_q_gam_univ_blackm_bw_2 <- 
  sate(x_0 = trans_q_blackm_bw_probs$x_0,
       x_t = trans_q_blackm_bw_probs$x_t_quantile,
       x_m_names = x_m_name,
       mod_0 = reg_univ_blackm_bw$reg_0,
       mod_1 = reg_univ_blackm_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

treatment_name <- "black_mother"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

# Only at some points
cate_q_gam_univ_blackm_wg <- 
  sate(x_0 = x_0_weight_gain,
       x_t = trans_N_blackm_wg$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_blackm_wg$reg_0,
       mod_1 = reg_univ_blackm_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_q_gam_univ_blackm_wg_2 <- 
  sate(x_0 = trans_q_blackm_wg_probs$x_0,
       x_t = trans_q_blackm_wg_probs$x_t_quantile,
       x_m_names = x_m_name,
       mod_0 = reg_univ_blackm_wg$reg_0,
       mod_1 = reg_univ_blackm_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

treatment_name <- "sex"
x_m_name       <- "birth_weight"
treatment_0    <- "Male"
treatment_1    <- "Female"

# Only at some points
cate_q_gam_univ_sex_bw <-
  sate(x_0 = x_0_birth_weight,
       x_t = trans_q_sex_bw$x_t_quantile,
       x_m_names = x_m_name,
       mod_0 = reg_univ_sex_bw$reg_0,
       mod_1 = reg_univ_sex_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_q_gam_univ_sex_bw_2 <- 
  sate(x_0 = trans_q_sex_bw_probs$x_0,
       x_t = trans_q_sex_bw_probs$x_t_quantile,
       x_m_names = x_m_name,
       mod_0 = reg_univ_sex_bw$reg_0,
       mod_1 = reg_univ_sex_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

treatment_name <- "sex"
x_m_name       <- "wtgain"
treatment_0    <- "Male"
treatment_1    <- "Female"

# Only at some points
cate_q_gam_univ_sex_wg <- 
  sate(x_0 = x_0_weight_gain,
       x_t = trans_N_sex_wg$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_sex_wg$reg_0,
       mod_1 = reg_univ_sex_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_q_gam_univ_sex_wg_2 <- 
  sate(x_0 = trans_q_sex_wg_probs$x_0,
       x_t = trans_q_sex_wg_probs$x_t_quantile,
       x_m_names = x_m_name,
       mod_0 = reg_univ_sex_wg$reg_0,
       mod_1 = reg_univ_sex_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

6.2 GAM, with Gaussian assumption for transport

Let us compute here the SCATE where the models \(\color{couleur1}{\widehat{m}_0(x)}\) and \(\color{couleur2}{\widehat{m}_1(x)}\) are GAM and where the transport method is made under the Gaussian assumption.

treatment_name <- "cig_rec"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

# Only at a few points
cate_N_gam_univ_smoker_bw <- 
  sate(x_0 = x_0_birth_weight,
       x_t = trans_N_smoker_bw$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_smoker_bw$reg_0,
       mod_1 = reg_univ_smoker_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_N_gam_univ_smoker_bw_2 <-
  sate(x_0 = x_0_probs_smoker_bw,
       x_t = trans_N_smoker_bw_probs$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_smoker_bw$reg_0,
       mod_1 = reg_univ_smoker_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

treatment_name <- "cig_rec"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

cate_N_gam_univ_smoker_wg <- 
  sate(x_0 = x_0_weight_gain,
       x_t = trans_N_smoker_wg$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_smoker_wg$reg_0,
       mod_1 = reg_univ_smoker_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_N_gam_univ_smoker_wg_2 <-
  sate(x_0 = x_0_probs_smoker_wg,
       x_t = trans_N_smoker_wg_probs$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_smoker_wg$reg_0,
       mod_1 = reg_univ_smoker_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

treatment_name <- "black_mother"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

cate_N_gam_univ_blackm_bw <- 
  sate(x_0 = x_0_birth_weight,
       x_t = trans_N_blackm_bw$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_blackm_bw$reg_0,
       mod_1 = reg_univ_blackm_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_N_gam_univ_blackm_bw_2 <-
  sate(x_0 = x_0_probs_blackm_bw,
       x_t = trans_N_blackm_bw_probs$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_blackm_bw$reg_0,
       mod_1 = reg_univ_blackm_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

treatment_name <- "black_mother"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

cate_N_gam_univ_blackm_wg <- 
  sate(x_0 = x_0_weight_gain,
       x_t = trans_N_blackm_wg$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_blackm_wg$reg_0,
       mod_1 = reg_univ_blackm_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_N_gam_univ_blackm_wg_2 <-
  sate(x_0 = x_0_probs_blackm_wg,
       x_t = trans_N_blackm_wg_probs$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_blackm_wg$reg_0,
       mod_1 = reg_univ_blackm_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

treatment_name <- "sex"
x_m_name       <- "birth_weight"
treatment_0    <- "Male"
treatment_1    <- "Female"

cate_N_gam_univ_sex_bw <- 
  sate(x_0 = x_0_birth_weight,
       x_t = trans_N_sex_bw$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_sex_bw$reg_0,
       mod_1 = reg_univ_sex_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_N_gam_univ_sex_bw_2 <-
  sate(x_0 = x_0_probs_sex_bw,
       x_t = trans_N_sex_bw_probs$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_sex_bw$reg_0,
       mod_1 = reg_univ_sex_bw$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

treatment_name <- "sex"
x_m_name       <- "wtgain"
treatment_0    <- "Male"
treatment_1    <- "Female"

cate_N_gam_univ_sex_wg <- 
  sate(x_0 = x_0_weight_gain,
       x_t = trans_N_sex_wg$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_sex_wg$reg_0,
       mod_1 = reg_univ_sex_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

# At more values
cate_N_gam_univ_sex_wg_2 <-
  sate(x_0 = x_0_probs_sex_wg,
       x_t = trans_N_sex_wg_probs$x_t_N,
       x_m_names = x_m_name,
       mod_0 = reg_univ_sex_wg$reg_0,
       mod_1 = reg_univ_sex_wg$reg_1,
       pred_mod_0 = model_spline_predict, pred_mod_1 = model_spline_predict)

6.3 Kernel, with Gaussian assumption for transport

target         <- "nonnatural_delivery"
treatment_name <- "cig_rec"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

cate_N_kernel_univ_smoker_bw <- 
  sate(x_0 = x_0_birth_weight,
       x_t = trans_N_smoker_bw$x_t_N,
       x_m_names = x_m_name,
       mod_0 = mod_0_k_smoker_bw,
       mod_1 = mod_1_k_smoker_bw,
       pred_mod_0 = pred_kernel, pred_mod_1 = pred_kernel)

target         <- "nonnatural_delivery"
treatment_name <- "cig_rec"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

cate_N_kernel_univ_smoker_wg <- 
  sate(x_0 = x_0_weight_gain,
       x_t = trans_N_smoker_wg$x_t_N,
       x_m_names = x_m_name,
       mod_0 = mod_0_k_smoker_wg,
       mod_1 = mod_1_k_smoker_wg,
       pred_mod_0 = pred_kernel, pred_mod_1 = pred_kernel)

target         <- "nonnatural_delivery"
treatment_name <- "black_mother"
x_m_name       <- "birth_weight"
treatment_0    <- "No"
treatment_1    <- "Yes"

cate_N_kernel_univ_blackm_bw <- 
  sate(x_0 = x_0_birth_weight,
       x_t = trans_N_blackm_bw$x_t_N,
       x_m_names = x_m_name,
       mod_0 = mod_0_k_blackm_bw,
       mod_1 = mod_1_k_blackm_bw,
       pred_mod_0 = pred_kernel, pred_mod_1 = pred_kernel)

target         <- "nonnatural_delivery"
treatment_name <- "black_mother"
x_m_name       <- "wtgain"
treatment_0    <- "No"
treatment_1    <- "Yes"

cate_N_kernel_univ_blackm_wg <- 
  sate(x_0 = x_0_weight_gain,
       x_t = trans_N_blackm_wg$x_t_N,
       x_m_names = x_m_name,
       mod_0 = mod_0_k_blackm_wg,
       mod_1 = mod_1_k_blackm_wg,
       pred_mod_0 = pred_kernel, pred_mod_1 = pred_kernel)

target         <- "nonnatural_delivery"
treatment_name <- "sex"
x_m_name       <- "birth_weight"
treatment_0    <- "Male"
treatment_1    <- "Female"

cate_N_kernel_univ_sex_bw <- 
  sate(x_0 = x_0_birth_weight,
       x_t = trans_N_sex_bw$x_t_N,
       x_m_names = x_m_name,
       mod_0 = mod_0_k_sex_bw,
       mod_1 = mod_1_k_sex_bw,
       pred_mod_0 = pred_kernel, pred_mod_1 = pred_kernel)

target         <- "nonnatural_delivery"
treatment_name <- "sex"
x_m_name       <- "wtgain"
treatment_0    <- "Male"
treatment_1    <- "Female"

cate_N_kernel_univ_sex_wg <- 
  sate(x_0 = x_0_weight_gain,
       x_t = trans_N_sex_wg$x_t_N,
       x_m_names = x_m_name,
       mod_0 = mod_0_k_sex_wg,
       mod_1 = mod_1_k_sex_wg,
       pred_mod_0 = pred_kernel, pred_mod_1 = pred_kernel)

7 Results

7.1 Sample Conditional Average Treatment Effect at some values of \(\boldsymbol{x}\)

Let us wrap up the results in a single output. To do so, we define a small function (only useful in this notebook).

print_table <- function(x_m_name, x_label, gam_q, gam_n, kernel_n, u){
  gam_q %>% 
    rename(x = !!x_m_name, x_t = !!paste0(x_m_name, "_t")) %>% 
    select(x, x_t_quantile = x_t, CATE, SCATE) %>% 
    left_join(
      gam_n %>% 
        rename(x = !!x_m_name, x_t_N = !!paste0(x_m_name, "_t")) %>% 
        select(x, x_t_N, SCATE_N_gam = SCATE)
    ) %>% 
    left_join(
      kernel_n %>% 
        rename(x = !!x_m_name) %>% 
        select(x, SCATE_N_kernel = SCATE)
    ) %>% 
    mutate(u = u) %>% 
    mutate(across(.cols = c(u, CATE,   SCATE, SCATE_N_gam, SCATE_N_kernel), .fns = ~scales::percent(., accuracy = .01))) %>% 
    mutate(across(.cols = c(x, x_t_quantile, x_t_N), .fns = ~scales::number(.))) %>% 
    select(x, u, CATE, x_t_quantile, SCATE, x_t_N, SCATE_N_gam, SCATE_N_kernel) %>% 
    rename("\\(\\text{CATE}_0(x)\\) (GAM)" = "CATE",
           "\\(\\widehat{\\mathcal{T}}(x)\\)" = "x_t_quantile",
           "\\(\\text{SCATE}(x)\\) (GAM)" = "SCATE",
           "\\(\\widehat{\\mathcal{T}}_{\\mathcal{N}}(x)\\)" = "x_t_N",
           "\\(\\text{SCATE}_{\\mathcal{N}}(x)\\) (GAM)" = "SCATE_N_gam",
           "\\(\\text{SCATE}_{\\mathcal{N}}(x)\\) (kernel)" = "SCATE_N_kernel") %>% 
    pivot_longer(cols = -x) %>% 
    pivot_wider(names_from = x, values_from = value) %>% 
    rename(!!sym(x_label) := "name") %>% 
    knitr::kable(format = "html", escape=FALSE) %>% 
    kableExtra::kable_classic(full_width = F, html_font = "Cambria")
}

Show the codes used to gather the results in a table

x_m_name <- "birth_weight" ; x_label <- "\\(x\\) (newborn's weight)"
print_table(x_m_name = x_m_name, x_label = x_label,
            gam_q = cate_q_gam_univ_smoker_bw, 
            gam_n = cate_N_gam_univ_smoker_bw, 
            kernel_n = cate_N_kernel_univ_smoker_bw,
            u = trans_q_smoker_bw$u)

Table 1. Estimation of the conditional average treatment (CATE), on the probability to have a non-natural birth (y), as a function of the newborn’s weight (x, in grams), when the treatment variable T is whether the mother is a smoker (T=1) or not (T=0).
\(x\) (newborn's weight)	2 000	2 500	3 000	3 500	4 000	4 500
u	2.88%	7.90%	26.15%	65.18%	92.06%	98.91%
\(\text{CATE}_0(x)\) (GAM)	-4.41%	-2.55%	-0.69%	0.79%	1.97%	2.50%
\(\widehat{\mathcal{T}}(x)\)	1 800	2 301	2 815	3 340	3 850	4 338
\(\text{SCATE}(x)\) (GAM)	-0.55%	0.78%	1.01%	0.49%	-0.77%	-4.36%
\(\widehat{\mathcal{T}}_{\mathcal{N}}(x)\)	1 786	2 295	2 805	3 314	3 824	4 333
\(\text{SCATE}_{\mathcal{N}}(x)\) (GAM)	-0.28%	0.88%	1.12%	0.50%	-1.15%	-4.53%
\(\text{SCATE}_{\mathcal{N}}(x)\) (kernel)	-0.80%	0.24%	1.72%	0.15%	-1.75%	-2.78%

Show the codes used to gather the results in a table

x_m_name <- "wtgain" ; x_label <- "\\(x\\) (weight gain of the mother)"
print_table(x_m_name = x_m_name, x_label = x_label, 
            gam_q = cate_q_gam_univ_smoker_wg, 
            gam_n = cate_N_gam_univ_smoker_wg, 
            kernel_n = cate_N_kernel_univ_smoker_wg,
            u = trans_q_smoker_wg$u)

Table 2. Estimation of the conditional average treatment (CATE), on the probability to have a non-natural birth (y), as a function of the mother’s weight gain (x, in lbs), when the treatment variable T is whether the mother is a smoker (T=1) or not (T=0).
\(x\) (weight gain of the mother)	5	15	25	35	45	55
u	4.87%	14.95%	37.78%	66.91%	86.19%	94.80%
\(\text{CATE}_0(x)\) (GAM)	0.06%	0.84%	0.69%	-0.08%	-1.26%	-2.59%
\(\widehat{\mathcal{T}}(x)\)	1	13	25	37	49	60
\(\text{SCATE}(x)\) (GAM)	1.60%	1.21%	0.69%	0.14%	-0.38%	-1.08%
\(\widehat{\mathcal{T}}_{\mathcal{N}}(x)\)	1	13	24	36	48	59
\(\text{SCATE}_{\mathcal{N}}(x)\) (GAM)	1.63%	1.29%	0.71%	0.01%	-0.71%	-1.33%
\(\text{SCATE}_{\mathcal{N}}(x)\) (kernel)	0.35%	0.32%	0.29%	0.27%	0.25%	0.25%

Show the codes used to gather the results in a table

x_m_name <- "birth_weight" ; x_label <- "\\(x\\) (newborn's weight)"
print_table(x_m_name = x_m_name, x_label = x_label, 
            gam_q = cate_q_gam_univ_blackm_bw, 
            gam_n = cate_N_gam_univ_blackm_bw, 
            kernel_n = cate_N_kernel_univ_blackm_bw,
            u = trans_q_blackm_bw$u)

Table 3. Estimation of the conditional average treatment (CATE), on the probability to have a non-natural birth (y), as a function of the mother’s weight gain (x, in lbs), when the treatment variable T is whether the mother is Black (T=1) or not (T=0).
\(x\) (newborn's weight)	2 000	2 500	3 000	3 500	4 000	4 500
u	2.88%	7.90%	26.15%	65.18%	92.06%	98.91%
\(\text{CATE}_0(x)\) (GAM)	-1.33%	0.37%	1.89%	4.09%	8.46%	15.18%
\(\widehat{\mathcal{T}}(x)\)	1 550	2 268	2 835	3 345	3 856	4 366
\(\text{SCATE}(x)\) (GAM)	6.65%	4.53%	3.41%	3.38%	4.64%	7.95%
\(\widehat{\mathcal{T}}_{\mathcal{N}}(x)\)	1 652	2 205	2 758	3 311	3 864	4 417
\(\text{SCATE}_{\mathcal{N}}(x)\) (GAM)	5.03%	5.73%	4.31%	3.33%	4.82%	10.63%
\(\text{SCATE}_{\mathcal{N}}(x)\) (kernel)	7.34%	7.07%	3.53%	3.62%	4.77%	7.70%

Show the codes used to gather the results in a table

x_m_name <- "wtgain" ; x_label <- "\\(x\\) (weight gain of the mother)"
print_table(x_m_name = x_m_name, x_label = x_label, 
            gam_q = cate_q_gam_univ_blackm_wg, 
            gam_n = cate_N_gam_univ_blackm_wg, 
            kernel_n = cate_N_kernel_univ_blackm_wg,
            u = trans_q_blackm_wg$u)

Table 4. Estimation of the conditional average treatment (CATE), on the probability to have a non-natural birth (y), as a function of the newborn’s weight (x, in grams), when the treatment variable T is whether the mother is Black (T=1) or not (T=0).
\(x\) (weight gain of the mother)	5	15	25	35	45	55
u	4.87%	14.95%	37.78%	66.91%	86.19%	94.80%
\(\text{CATE}_0(x)\) (GAM)	2.10%	3.72%	4.34%	4.24%	3.58%	2.52%
\(\widehat{\mathcal{T}}(x)\)	0	11	23	34	45	57
\(\text{SCATE}(x)\) (GAM)	3.73%	4.17%	4.26%	4.08%	3.66%	3.02%
\(\widehat{\mathcal{T}}_{\mathcal{N}}(x)\)	0	11	23	34	45	57
\(\text{SCATE}_{\mathcal{N}}(x)\) (GAM)	3.73%	4.17%	4.26%	4.08%	3.66%	3.02%
\(\text{SCATE}_{\mathcal{N}}(x)\) (kernel)	3.87%	3.88%	3.89%	3.90%	3.92%	3.94%

Show the codes used to gather the results in a table

x_m_name <- "birth_weight" ; x_label <- "\\(x\\) (newborn's weight)"
print_table(x_m_name = x_m_name, x_label = x_label, 
            gam_q = cate_q_gam_univ_sex_bw, 
            gam_n = cate_N_gam_univ_sex_bw, 
            kernel_n = cate_N_kernel_univ_sex_bw,
            u = trans_q_sex_bw$u)

Table 5. Estimation of the conditional average treatment (CATE), on the probability to have a non-natural birth (y), as a function of the mother’s weight gain (x, in lbs), when the treatment variable T is whether the newborn is a baby girl (T=1) or a baby boy (T=0).
\(x\) (newborn's weight)	2 000	2 500	3 000	3 500	4 000	4 500
u	2.88%	7.90%	26.15%	65.18%	92.06%	98.91%
\(\text{CATE}_0(x)\) (GAM)	-0.24%	-1.66%	-2.24%	-2.00%	-0.77%	2.38%
\(\widehat{\mathcal{T}}(x)\)	1 984	2 466	2 948	3 430	3 920	4 403
\(\text{SCATE}(x)\) (GAM)	0.14%	-0.97%	-1.66%	-2.08%	-2.34%	-2.32%
\(\widehat{\mathcal{T}}_{\mathcal{N}}(x)\)	1 947	2 424	2 901	3 377	3 854	4 331
\(\text{SCATE}_{\mathcal{N}}(x)\) (GAM)	1.02%	-0.08%	-1.07%	-2.05%	-3.41%	-5.43%
\(\text{SCATE}_{\mathcal{N}}(x)\) (kernel)	2.06%	-0.27%	-1.02%	-2.30%	-3.53%	-5.10%

Show the codes used to gather the results in a table

x_m_name <- "wtgain" ; x_label <- "\\(x\\) (weight gain of the mother)"
print_table(x_m_name = x_m_name, x_label = x_label, 
            gam_q = cate_q_gam_univ_sex_wg, 
            gam_n = cate_N_gam_univ_sex_wg, 
            kernel_n = cate_N_kernel_univ_sex_wg,
            u = trans_q_sex_wg$u)

Table 6. Estimation of the conditional average treatment (CATE), on the probability to have a non-natural birth (y), as a function of the newborn’s weight (x, in grams), when the treatment variable T is whether the newborn is a baby girl (T=1) or a baby boy (T=0).
\(x\) (weight gain of the mother)	5	15	25	35	45	55
u	4.87%	14.95%	37.78%	66.91%	86.19%	94.80%
\(\text{CATE}_0(x)\) (GAM)	-1.79%	-1.60%	-1.60%	-1.73%	-1.90%	-2.02%
\(\widehat{\mathcal{T}}(x)\)	4.4	14.3	24.3	34.2	44.1	54.0
\(\text{SCATE}(x)\) (GAM)	-1.52%	-1.47%	-1.61%	-1.87%	-2.17%	-2.41%
\(\widehat{\mathcal{T}}_{\mathcal{N}}(x)\)	4.4	14.3	24.3	34.2	44.1	54.0
\(\text{SCATE}_{\mathcal{N}}(x)\) (GAM)	-1.52%	-1.47%	-1.61%	-1.87%	-2.17%	-2.41%
\(\text{SCATE}_{\mathcal{N}}(x)\) (kernel)	-1.77%	-1.78%	-1.80%	-1.81%	-1.83%	-1.84%

7.2 Sample Conditional Average Treatment Effect at more values of \(\boldsymbol{x}\)

7.2.1 GAM, with Quantile-based transport

Now, we can visualize how the predicted probability of the response variable evolves at different values of the mediator variable, depending on the treatment variable and whether or not the mediator’s values are transported. Let us first define a function that can be applied depending on the treatment and on the mediator variable.

# Plot the evolution of the SCATE
plot_evolution <- function(cate, trans_q_x0, trans_q_xt, x_m_name,
                           title, xlab, ylab,
                           lab_0, lab_1, lab_1_t, base,
                           treatment_name, treatment_0, treatment_1,
                           legend_x, legend_y, xlim, ylim){
  
  x_0 <- pull(base, x_m_name)[pull(base, treatment_name) == treatment_0]
  x_1 <- pull(base, x_m_name)[pull(base, treatment_name) == treatment_1]
  
  # Estimated probability of the target, for non-treated
  plot(pull(cate, x_m_name),
       cate$y_0, type="l", 
       col = couleur1, main = "",
       ylab = title, xlab = "",
       axes = FALSE, ylim = ylim, xlim = xlim)
  axis(1)
  axis(2)
  axis(3, at = trans_q_xt, label = trans_q_x0)
  mtext(xlab,side=3,line=3)
  mtext(ylab,side=1,line=3)
  b <- .6
  # Empirical quantiles of mediator among non-treated
  a <- quantile(x_0, c(.05,(1:9)/10,.95))
  # Legend for the quantile probabilities
  L <- paste(c(5,(1:9)*10,95),"%",sep="")
  segments(a,b,a,b+1)
  i = c(1,2,3,5,7,9,10,11)
  text(a[i],b,labels = L[i],pos=1)
  # Estimated probability of y=1, for treated, without transport
  lines(pull(cate, x_m_name), cate$y_1, col = scales::alpha(couleur2,.2))
  # Estimated probability of y=1, for treated, with transport
  lines(pull(cate, x_m_name), cate$y_1_t, col = couleur2)
  legend(legend_x,legend_y, c(lab_0, lab_1, lab_1_t),
         lty = 1, col =c (couleur1,couleur2, scales::alpha(couleur2,.2)),bty="n")
}