Optimal Transport for Counterfactual Estimation: A Method for Causal Inference

Linked Birth/Infant Death Cohort Data: Descriptive Statistics (2013 cohort)

Authors

Arthur Charpentier

Emmanuel Flachaire

Ewen Gallic

1 Load Data

The CSV files for the 2013 cohort were downloaded from the NBER collection of Birth Cohort Linked Birth and Infant Death Data of the National Vital Statistics System of the National Center for Health Statistics, on the NBER website.

linked_deaths <- read_csv("../data/linkco2013us_num.csv.zip")
linked_deaths
linked_births <- read_csv("../data/linkco2013us_den.csv.zip")
linked_births

Each row corresponds to a newborn. When the newborn died, an identified with a unique ID (column idnumber) is provided in both tables. A lot of columns from both files contain the same information. We will only keep some of those columns.

1.1 Deaths

First, let us focus on the subsample of deaths, only on the following variables.

deaths <- 
  linked_deaths %>% 
  select(idnumber, d_restatus, hospd, weekdayd, dthyr, dthmon,
         dob_yy, dob_mm, dob_wk, aged)
Warning

Some variables, specific to the deaths file contain only NA:

  • stoccfipd: “State of Occurrence (FIPS) - Death”,
  • cntocfipd: “County of Occurrence (FIPS) of Death”,
  • stresfipd: “State of Residence (FIPS) - Death”,
  • drcnty: “State of Residence Death Recode”,
  • cntyrfpd: “County of Residence (FIPS) - Death”,
  • cntrsppd: “Population Size of County of Residence of Death”,

We therefore do not keep these.

Let us encode the values for categorical variables:

Show the R codes
deaths <- 
  deaths %>% 
  mutate(
    d_restatus = factor(
      d_restatus, 
      levels = c(1:4), 
      labels = c("Residents", "Intrastate Nonresidents", 
                 "Interstate or Interterritory Nonresidents", "Foreign Residents")
    ),
    hospd = factor(
      hospd,
      levels = c(1:7, 9),
      labels = c("Hospital, clinic or Medical Center - Inpatient",
                 "Hospital, clinic or Medical Center - ER",
                 "Hospital, clinic or Medical Center - Dead on Arrival",
                 "Decedent's home",
                 "Hospice facility",
                 "Nursing home/long term care",
                 "Other",
                 "Unknown"
      ),
    ),
    weekdayd = factor(
      weekdayd, 
      levels = c(1:7, 9),
      labels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
                 "Friday", "Saturday", "Unknown")
    ),
    dthmon = factor(dthmon, levels = 1:12),
    dob_mm = factor(dob_mm, levels = 1:12),
    dob_wk = factor(
      dob_wk,
      levels = c(1:7, 9),
      labels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday",
                 "Friday", "Saturday", "Unknown")
    )
  )

The description of those columns can be added to our tibble, using the set_variable_labels() function from {labelled}.

library(labelled)
Show the R codes
deaths <- 
  deaths %>% 
  labelled::set_variable_labels(
    d_restatus = "Death Resident Status",
    hospd = "Place of Death and Decedent's Status",
    weekdayd = "Day of Week of Death",
    dthyr = "Year of Death",
    dthmon = "Month of Death",
    dob_yy = "Birth Year",
    dob_mm = "Birth Month",
    dob_wk = "Birth Weekday",
    aged = "Age at Death in Days"
  )

Let us save this tibble which contains only the information that we are interested in.

save(deaths, file = "../data/deaths.RData")

Here is an overview of the first rows:

deaths
# A tibble: 23,159 × 10
   idnumber d_restatus     hospd weekd…¹ dthyr dthmon dob_yy dob_mm dob_wk  aged
      <dbl> <fct>          <fct> <fct>   <dbl> <fct>   <dbl> <fct>  <fct>  <dbl>
 1        1 Residents      Hosp… Thursd…  2013 2        2013 1      Thurs…    56
 2        2 Intrastate No… Hosp… Friday   2013 1        2013 1      Sunday    12
 3        3 Residents      Hosp… Saturd…  2013 1        2013 1      Satur…     0
 4        4 Residents      Hosp… Sunday   2013 1        2013 1      Sunday     0
 5        5 Intrastate No… Unkn… Tuesday  2013 1        2013 1      Monday     1
 6        6 Residents      Hosp… Sunday   2013 1        2013 1      Sunday     0
 7        7 Residents      Hosp… Sunday   2013 1        2013 1      Monday     6
 8        8 Residents      Hosp… Friday   2013 1        2013 1      Wedne…     9
 9        9 Residents      Hosp… Monday   2013 7        2013 1      Wedne…   180
10       10 Residents      Hosp… Tuesday  2013 1        2013 1      Tuesd…     0
# … with 23,149 more rows, and abbreviated variable name ¹​weekdayd

1.2 Births

Now let us turn to the births dataset. As for the deaths dataset, let us recode the values and add some labels.

births <- 
  linked_births %>% 
  select(
    # Mother's characteristics
    mager41, mbrace, mracerec, mar, meduc,
    # Father's characteristics
    fagecomb, fbrace, fracerec,
    # Child's characteristics
    idnumber, bfacil, bfacil3, restatus, sex, dbwt,
    # Pregnancy
    lbo, tbo, precare_rec, uprevis, wtgain, cig_1, cig_2, cig_3, cig_rec,
    dlmp_mm, dlmp_yy, estgest, combgest,
    # Risk Factor
    rf_diab, rf_gest, rf_phyp, rf_ghyp, rf_eclam, rf_ppterm,
    rf_ppoutc, rf_cesar,
    # Obstetric Procedures
    op_cerv, op_tocol, op_ecvs, op_ecvf, uop_induc, uop_tocol,
    # Onset of Labor
    on_ruptr, on_abrup, on_prolg,
    # Charact. of Labor and Delivery
    ld_induct, ld_augment, ld_nvrtx, ld_steroids, ld_antibio, 
    ld_chorio, ld_mecon, ld_fintol, ld_anesth,
    # Complications of Labor and Delivery
    uld_meco, uld_precip, uld_breech,
    # Method of Delivery, Delivery
    md_present, md_route, md_trial, rdmeth_rec, dmeth_rec, 
    attend, apgar5, dplural,
    # Abnormal Conditions of the Newborn
    ab_vent, ab_vent6, ab_nicu, ab_surfac, ab_antibio, ab_seiz, ab_inj,
    # Congenital Anomalies of the Newborn
    ca_anen, ca_menin, ca_heart, ca_hernia, ca_ompha, ca_gastro, 
    ca_limb, ca_cleftlp, ca_cleft, ca_downs, ca_chrom, ca_hypos,
    # Death
    manner
  )

Recoding the observations, following the guide:

Show the R codes
births <- 
  births %>% 
  mutate(
    mager41 = factor(
      mager41,
      levels = c(12:50),
      labels = c("10-12", 13:49, "50-64"),
    ),
    bfacil = factor(
      bfacil,
      levels = c(1:7, 9),
      labels = c(
        "Hospital",
        "Freestanding Birth Center",
        "Home (intended)",
        "Home (not intended)",
        "Home (unknown if intended)",
        "Clinic / Doctor’s Office",
        "Other",
        "Unknown"
      )
    ),
    bfacil3 = factor(
      bfacil3,
      levels = c(1,2,3),
      labels = c("Hospital", "Not in Hospital", "Unknown or Not Stated")
    ),
    restatus = factor(
      restatus,
      levels = c(1:4), 
      labels = c("Residents", "Intrastate Nonresidents", 
                 "Interstate or Interterritory Nonresidents", "Foreign Residents")
    ),
    mbrace = factor(
      mbrace,
      levels = c(1:14, 21:24),
      labels = c(
        "White – single race",
        "Black – single race",
        "American Indian – single race",
        "Asian Indian – single race",
        "Chinese – single race",
        "Filipino – single race",
        "Japanese – single race",
        "Korean – single race",
        "Vietnamese – single race",
        "Other Asian – single race",
        "Hawaiian – single race",
        "Guamanian – single race",
        "Samoan – single race",
        "Other Pacific Islander – single race",
        "White – bridged multiple race",
        "Black – bridged multiple race",
        "American Indian & Alaskan Native – bridged multiple race",
        "Asian / Pacific Islander – bridged multiple race"
      )
    ),
    mracerec = factor(
      mracerec,
      levels = c(0, 1:4),
      labels = c(
        "Other (not classified as White or Black)",
        "White",
        "Black",
        "American Indian / Alaskan Native ",
        "Asian / Pacific Islander"
      )
      ),
    mar = factor(
      mar,
      levels = c(1,2, 3, 9),
      labels = c("Yes", "No", "Unmarried parents not living together",
                 "Unknown or not Stated")
    ),
    meduc = factor(
      meduc,
      levels = c(1:9),
      labels = c(
        "8th grade or less",
        "9th through 12th grade with no diploma",
        "High school graduate or GED completed",
        "Some college credit, but not a degree",
        "Associate degree (AA,AS)",
        "Bachelor’s degree (BA, AB, BS)",
        "Master’s degree (MA, MS, MEng, MEd, MSW, MBA)",
        "Doctorate (PhD, EdD) or Professional Degree (MD, DDS, DVM, LLB, JD)",
        "Unknown"
      )
    ),
    fagecomb = ifelse(fagecomb == 99, NA, fagecomb),
    fbrace = factor(
      fbrace,
      levels = c(1:14, 21:24, 99),
      labels = c(
        "White – single race",
        "Black – single race",
        "American Indian – single race",
        "Asian Indian – single race",
        "Chinese – single race",
        "Filipino – single race",
        "Japanese – single race",
        "Korean – single race",
        "Vietnamese – single race",
        "Other Asian – single race",
        "Hawaiian – single race",
        "Guamanian – single race",
        "Samoan – single race",
        "Other Pacific Islander – single race",
        "White – bridged multiple race",
        "Black – bridged multiple race",
        "American Indian & Alaskan Native – bridged multiple race",
        "Asian / Pacific Islander – bridged multiple race",
        "Unknown or not stated, also includes states not reporting multiple race"
      )
    ),
    fracerec = factor(
      fracerec,
      levels = c(0:4, 9),
      labels = c(
       "Other (not classified as White or Black)",
       "White",
       "Black",
       "American Indian / Alaskan Native",
       "Asian / Pacific Islander",
       "Unknown or not stated"
      ),
    ),
    lbo = ifelse(lbo == 9, NA, lbo), # careful: 8: 8+
    tbo = ifelse(tbo == 9, NA, tbo), # careful: 8: 8+
    precare_rec = factor(
      precare_rec,
      levels = c(1:5),
      labels = c(
        "1st to 3rd month",
        "4th to 6th month",
        "7th to final month",
        "No prenatal care",
        "Unknown or not stated"
      )
    ),
    uprevis = ifelse(uprevis == 99, NA, uprevis),
    wtgain = ifelse(wtgain == 99, NA, wtgain), # careful: 98: 98+
    cig_1 = ifelse(cig_1 == 99, NA, cig_1), # careful: 98: 98+
    cig_2 = ifelse(cig_2 == 99, NA, cig_1), # careful: 98: 98+
    cig_3 = ifelse(cig_3 == 99, NA, cig_1), # careful: 98: 98+
    cig_rec = factor(
      cig_rec,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    rf_diab = factor(
      rf_diab,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    rf_gest = factor(
      rf_gest,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    rf_phyp = factor(
      rf_phyp,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    rf_ghyp = factor(
      rf_ghyp,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    rf_eclam = factor(
      rf_eclam,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    rf_ppterm = factor(
      rf_ppterm,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    rf_ppoutc = factor(
      rf_ppoutc,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    rf_cesar = factor(
      rf_cesar,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    op_cerv = factor(
      op_cerv,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    op_tocol = factor(
      op_tocol,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    op_ecvs = factor(
      op_ecvs,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    op_ecvf = factor(
      op_ecvf,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    uop_induc = factor(
      uop_induc,
      levels = c(1,2,8,9),
      labels = c("Yes", "No", "Not on certificate", "Unknown or not stated")
    ),
    uop_tocol = factor(
      uop_tocol,
      levels = c(1,2,8,9),
      labels = c("Yes", "No", "Not on certificate", "Unknown or not stated")
    ),
    on_ruptr = factor(
      on_ruptr,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    on_abrup = factor(
      on_abrup,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    on_prolg = factor(
      on_prolg,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    ld_induct = factor(
      ld_induct,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    ld_augment = factor(
      ld_augment,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    ld_nvrtx = factor(
      ld_nvrtx,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    ld_steroids = factor(
      ld_steroids,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    ld_antibio = factor(
      ld_antibio,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    ld_chorio = factor(
      ld_chorio,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    ld_mecon = factor(
      ld_mecon,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    ld_fintol = factor(
      ld_fintol,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    ld_anesth = factor(
      ld_anesth,
      levels = c("Y", "N", "U"),
      labels = c("Yes", "No", "Unknown or not stated")
    ),
    uld_meco = factor(
      uld_meco,
      levels = c(1,2,8,9),
      labels = c("Yes", "No", "Not on certificate", "Unknown or not stated")
    ),
    uld_precip = factor(
      uld_precip,
      levels = c(1,2,8,9),
      labels = c("Yes", "No", "Not on certificate", "Unknown or not stated")
    ),
    uld_breech = factor(
      uld_breech,
      levels = c(1,2,8,9),
      labels = c("Yes", "No", "Not on certificate", "Unknown or not stated")
    ),
    md_present = factor(
      md_present,
      levels = c(1,2,3,9),
      labels = c(
        "Cephalic",
        "Breech",
        "Other",
        "Unknown or not stated"
      )
    ),
    md_route = factor(
      md_route,
      levels = c(1,2,3,4,9),
      labels = c(
        "Spontaneous",
        "Forceps",
        "Vacuum",
        "Cesarean",
        "Unknown or not stated"
      )
    ),
    md_trial = factor(
      md_trial,
      levels = c("Y", "N", "X", "U"),
      labels = c(
        "Yes", "No",
        "Not applicable ", "Unknown or not stated"
      )
    ),
    rdmeth_rec = factor(
      rdmeth_rec,
      levels = c(1:6, 9),
      labels = c(
        "Vaginal",
        "Vaginal after previous c-section",
        "Primary C-section",
        "Repeat C-section",
        "Vaginal (unknown if previous c-section)",
        "C-section (unknown if previous c-section)",
        "Not stated"
      )
    ),
    dmeth_rec = factor(
      dmeth_rec,
      levels = c(1,2,9),
      labels = c("Vaginal", "C-Section", "Unknown")
    ),
    attend = factor(
      attend,
      levels = c(1:5, 9),
      labels = c(
        "Doctor of Medicine",
        "Doctor of Osteopathy",
        "Certified Nurse Midwife",
        "Other Midwife",
        "Other",
        "Unknown or not stated"
      )
    ),
    apgar5 = ifelse(apgar5 == 99, yes = NA, no = apgar5),
    dplural = factor(
      dplural,
      levels = c(1:5),
      labels = c(
        "Single",
        "Twin",
        "Triplet",
        "Quadruplet",
        "Quintuplet or higher"
      )
    ),
    sex = factor(
      sex,
      levels = c("M", "F"),
      labels = c("Male", "Female")
    ),
    dlmp_mm = ifelse(dlmp_mm == 99, NA, dlmp_mm),
    dlmp_yy = ifelse(dlmp_yy == 9999, NA, dlmp_yy),
    estgest = ifelse(estgest == 99, yes = NA, estgest),
    combgest = ifelse(combgest == 99, NA, combgest),
    dbwt = ifelse(dbwt == 9999, NA, dbwt),
    ab_vent = factor(
      ab_vent,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, Complication reported",
        "No Complication reported",
        "Unknown or not stated"
      )
    ),
    ab_vent6 = factor(
      ab_vent6,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, Complication reported",
        "No Complication reported",
        "Unknown or not stated"
      )
    ),
    ab_nicu = factor(
      ab_nicu,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, Complication reported",
        "No Complication reported",
        "Unknown or not stated"
      )
    ),
    ab_surfac = factor(
      ab_surfac,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, Complication reported",
        "No Complication reported",
        "Unknown or not stated"
      )
    ),
    ab_antibio = factor(
      ab_antibio,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, Complication reported",
        "No Complication reported",
        "Unknown or not stated"
      )
    ),
    ab_seiz = factor(
      ab_seiz,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, Complication reported",
        "No Complication reported",
        "Unknown or not stated"
      )
    ),
    ab_inj = factor(
      ab_inj,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, Complication reported",
        "No Complication reported",
        "Unknown or not stated"
      )
    ),
    ca_anen = factor(
      ca_anen,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_menin = factor(
      ca_menin,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_heart = factor(
      ca_heart,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_hernia = factor(
      ca_hernia,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_ompha = factor(
      ca_ompha,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_gastro = factor(
      ca_gastro,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_limb = factor(
      ca_limb,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_cleftlp = factor(
      ca_cleftlp,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_cleft = factor(
      ca_cleft,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_downs = factor(
      ca_downs,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_chrom = factor(
      ca_chrom,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    ca_hypos = factor(
      ca_hypos,
      levels = c("Y", "N", "U"),
      labels = c(
        "Yes, anomaly reported",
        "No, anomaly not reported",
        "Unknown"
      )
    ),
    manner = factor(
      manner,
      levels = c(1:7),
      labels = c(
        "Accident",
        "Suicide",
        "Homicide",
        "Pending investigation",
        "Could not determine",
        "Self-inflicted",
        "Natural"
      )
    )
  )

Then, let us add some labels to each column.

Show the R codes
births <- 
  births %>% 
  labelled::set_variable_labels(
    # Child's characteristics
    bfacil = "Birth Place",
    bfacil3 = "Birth Place",
    restatus ="Resident Status",
    sex = "Sex of Infant",
    dbwt = "Birth Weight (in Grams)",
    # Death of the newborn
    manner = "Manner of Death",
    # Mother's characteristics
    mager41 = "Mother's Age",
    mbrace = "Mother's Bridged Race",
    mracerec = "Mother's Race",
    mar ="Mother's Marital Status",
    meduc = "Mother's Education",
    # Father's characteristics
    fagecomb = "Father's Combined Age",
    fbrace = "Father's Bridged Race",
    fracerec = "Father's Race",
    # Pregnancy
    lbo = "Live Birth Order",
    tbo = "Total Birth Order",
    precare_rec = "Month Prenatal Care Began",
    uprevis = "Number of Prenatal Visits",
    wtgain = "Weight Gain",
    cig_1 = "Cigarettes 1st Trimester",
    cig_2 = "Cigarettes 2nd Trimester",
    cig_3 = "Cigarettes 3rd Trimester",
    cig_rec = "Smokes cigarettes",
    dlmp_mm = "Last Normal Menses: Month",
    dlmp_yy = "Last Normal Menses: Year",
    estgest = "Obstetric/Clinical Gestation Est.",
    combgest = "Gestation – Detail in Weeks",
    # Risk factor
    rf_diab = "Risk Factor: Prepregnancy Diabetes",
    rf_gest = "Risk Factor: Gestational Diabetes",
    rf_phyp = "Risk Factor: Prepregnancy Hypertension",
    rf_ghyp = "Risk Factor: Gestational Hypertension",
    rf_eclam = "Risk Factor: Hypertension Eclampsia",
    rf_ppterm = "Risk Factor: Previous Preterm Birth",
    rf_ppoutc = "Risk Factor: Poor Pregnancy Outcome",
    rf_cesar = "Risk Factor: Previous Cesarean Deliveries",
    # Obstetric procedures
    op_cerv =  "Obstetric Procedures: Cervical Cerclage",
    op_tocol = "Obstetric Procedures: Tocolysis",
    op_ecvs = "Obstetric Procedures: Successful External Cephalic",
    op_ecvf = "Obstetric Procedures: Failed External Cephalic",
    uop_induc = "Obstetric Procedures: Induction of labor",
    uop_tocol = "Obstetric Procedures: Tocolysis",
    on_ruptr = "Onset of Labor: Premature Rupture of Membrane",
    on_abrup = "Onset of Labor: Abruptio placenta",
    on_prolg = "Onset of Labor: Prolonged Labor",
    # Labor
    ld_induct = "Charact. of Labor and Delivery: Induction of Labor",
    ld_augment = "Charact. of Labor and Delivery: Augmentation of Labor",
    ld_nvrtx = "Charact. of Labor and Delivery: Non-Vertex Presentation",
    ld_steroids = "Charact. of Labor and Delivery: Steroids",
    ld_antibio = "Charact. of Labor and Delivery: Antibiotics",
    ld_chorio = "Charact. of Labor and Delivery: Chorioamnionitis",
    ld_mecon = "Charact. of Labor and Delivery: Meconium Staining",
    ld_fintol = "Charact. of Labor and Delivery: Fetal Intolerance",
    ld_anesth = "Charact. of Labor and Delivery: Anesthesia",
    uld_meco = "Complications of Labor and Delivery: Meconium",
    uld_precip = "Complications of Labor and Delivery: Precipitous labor",
    uld_breech = "Complications of Labor and Delivery: Breech",
    md_present = "Method of Delivery: Fetal Presentation",
    md_route = "Method of Delivery: Final Route and Method of Delivery",
    md_trial = "Method of Delivery: Trial of Labor Attempted",
    rdmeth_rec = "Delivery Method",
    dmeth_rec = "Delivery Method",
    attend = "Attendant",
    apgar5 = "Five Minute Apgar Score",
    dplural = "Plurality",
    # Abnormal Conditions and anomalies of the newborn
    ab_vent = "Abnormal Conditions of the Newborn: Assisted Ventilation",
    ab_vent6 = "Abnormal Conditions of the Newborn: Assisted Ventilation >6hrs",
    ab_nicu = "Abnormal Conditions of the Newborn: Admission to NICU",
    ab_surfac = "Abnormal Conditions of the Newborn: Surfactant",
    ab_antibio = "Abnormal Conditions of the Newborn: Antibiotics",
    ab_seiz = "Abnormal Conditions of the Newborn: Seizures",
    ab_inj = "Abnormal Conditions of the Newborn: Birth Injury",
    #
    ca_anen = "Congenital Anomalies of the Newborn: Anencephaly",
    ca_menin = "Congenital Anomalies of the Newborn: Meningomyelocele/Spina Bifida",
    ca_heart = "Congenital Anomalies of the Newborn: Cyanotic Congenital Heart Disease",
    ca_hernia = "Congenital Anomalies of the Newborn: Congenital Diaphragmatic Hernia",
    ca_ompha = "Congenital Anomalies of the Newborn: Omphalocele",
    ca_gastro = "Congenital Anomalies of the Newborn: Gastroschisis",
    ca_limb = "Congenital Anomalies of the Newborn: Limb Reduction Deficit",
    ca_cleftlp = "Congenital Anomalies of the Newborn: Cleft Lip w/ or w/o Cleft Palate",
    ca_cleft = "Congenital Anomalies of the Newborn: Cleft Palate Alone",
    ca_downs = "Congenital Anomalies of the Newborn: Downs Syndrome",
    ca_chrom = "Congenital Anomalies of the Newborn: Suspected Chromosonal Disorder",
    ca_hypos = "Congenital Anomalies of the Newborn: Hypospadias"
  )

And let us save the data:

save(births, file = "../data/births.RData")
Warning

The father’s education is missing from the data.

Here is an extract from the first rows of the births data:

births
# A tibble: 3,940,764 × 84
   mager41 mbrace      mrace…¹ mar   meduc fagec…² fbrace frace…³ idnum…⁴ bfacil
   <fct>   <fct>       <fct>   <fct> <fct>   <dbl> <fct>  <fct>     <dbl> <fct> 
 1 22      White – si… "White" Yes   High…      23 Unkno… Unknow…      NA Hospi…
 2 25      White – br… "White" Yes   Some…      27 White… White        NA Hospi…
 3 25      White – si… "White" Yes   Bach…      25 White… White        NA Hospi…
 4 30      White – si… "White" Yes   Bach…      31 White… White        NA Hospi…
 5 23      American I… "Ameri… No    9th …      30 Ameri… Americ…      NA Hospi…
 6 32      White – br… "White" No    High…      32 Ameri… Americ…      NA Hospi…
 7 23      American I… "Ameri… No    Some…      26 Ameri… Americ…      NA Hospi…
 8 21      White – si… "White" Yes   Some…      21 White… White        NA Hospi…
 9 31      White – si… "White" Yes   Bach…      35 White… White        NA Frees…
10 25      White – si… "White" Yes   Some…      32 White… White        NA Hospi…
# … with 3,940,754 more rows, 74 more variables: bfacil3 <fct>, restatus <fct>,
#   sex <fct>, dbwt <dbl>, lbo <dbl>, tbo <dbl>, precare_rec <fct>,
#   uprevis <dbl>, wtgain <dbl>, cig_1 <dbl>, cig_2 <dbl>, cig_3 <dbl>,
#   cig_rec <fct>, dlmp_mm <dbl>, dlmp_yy <dbl>, estgest <dbl>, combgest <dbl>,
#   rf_diab <fct>, rf_gest <fct>, rf_phyp <fct>, rf_ghyp <fct>, rf_eclam <fct>,
#   rf_ppterm <fct>, rf_ppoutc <fct>, rf_cesar <fct>, op_cerv <fct>,
#   op_tocol <fct>, op_ecvs <fct>, op_ecvf <fct>, uop_induc <fct>, …

2 Descriptive statistics

df <- 
  births %>% 
  left_join(deaths)

We can quickly produce some descriptive statistics from those data, using the tbl_summary() function from {gtsummary}.

library(gtsummary)
get_table_desc_stat <- function(df, variables){
  df %>% 
  select(!!variables) %>% 
  tbl_summary(
    # by = ,
    type = all_continuous() ~ "continuous2",
    statistic = list(
      all_continuous() ~ c("{mean} ({sd})", "{median} ({p25}, {p75})"),
      all_categorical() ~ "{n} ({p}%)"),
    digits = list(
      all_continuous() ~ 2,
      all_categorical() ~ 0
    ),
    missing_text = "Missing value"
  ) %>% 
  modify_header(label ~ "**Variable**") %>% 
  add_stat_label(
    label = list(
      all_continuous() ~ c("Mean (Std)", "Median (IQR)"),
      all_categorical() ~ "n (%)"
    )
  )
}

2.1 Raw data

Show the R codes
childrens_characteristics <- c("sex", "dbwt", "bfacil", "bfacil3", "restatus")

get_table_desc_stat(df, childrens_characteristics)
Variable N = 3,940,764
Sex of Infant, n (%)
    Male 2,017,374 (51%)
    Female 1,923,390 (49%)
Birth Weight (in Grams)
    Mean (Std) 3,270.78 (593.63)
    Median (IQR) 3,316.00 (2,977.00, 3,630.00)
    Missing value 646
Birth Place, n (%)
    Hospital 3,510,218 (98%)
    Freestanding Birth Center 17,531 (0%)
    Home (intended) 25,499 (1%)
    Home (not intended) 3,693 (0%)
    Home (unknown if intended) 4,552 (0%)
    Clinic / Doctor’s Office 355 (0%)
    Other 2,456 (0%)
    Unknown 113 (0%)
    Missing value 376,347
Birth Place, n (%)
    Hospital 3,883,310 (99%)
    Not in Hospital 57,331 (1%)
    Unknown or Not Stated 123 (0%)
Resident Status, n (%)
    Residents 2,847,673 (72%)
    Intrastate Nonresidents 999,320 (25%)
    Interstate or Interterritory Nonresidents 85,188 (2%)
    Foreign Residents 8,583 (0%)
Show the R codes
death_variables <- c("d_restatus", "hospd", "weekdayd", "dthyr", "dthmon",
                      "dob_yy", "dob_mm", "dob_wk", "aged", "manner")

get_table_desc_stat(df, death_variables)
Variable N = 3,940,764
Death Resident Status, n (%)
    Residents 14,430 (62%)
    Intrastate Nonresidents 7,508 (32%)
    Interstate or Interterritory Nonresidents 1,176 (5%)
    Foreign Residents 45 (0%)
    Missing value 3,917,605
Place of Death and Decedent's Status, n (%)
    Hospital, clinic or Medical Center - Inpatient 17,943 (77%)
    Hospital, clinic or Medical Center - ER 2,907 (13%)
    Hospital, clinic or Medical Center - Dead on Arrival 283 (1%)
    Decedent's home 1,586 (7%)
    Hospice facility 54 (0%)
    Nursing home/long term care 24 (0%)
    Other 323 (1%)
    Unknown 39 (0%)
    Missing value 3,917,605
Day of Week of Death, n (%)
    Sunday 3,223 (14%)
    Monday 3,120 (13%)
    Tuesday 3,249 (14%)
    Wednesday 3,469 (15%)
    Thursday 3,376 (15%)
    Friday 3,391 (15%)
    Saturday 3,331 (14%)
    Unknown 0 (0%)
    Missing value 3,917,605
Year of Death, n (%)
    2013 20,487 (88%)
    2014 2,672 (12%)
    Missing value 3,917,605
Month of Death, n (%)
    1 1,897 (8%)
    2 1,799 (8%)
    3 1,973 (9%)
    4 1,920 (8%)
    5 2,014 (9%)
    6 1,977 (9%)
    7 1,930 (8%)
    8 1,937 (8%)
    9 1,946 (8%)
    10 2,019 (9%)
    11 1,816 (8%)
    12 1,931 (8%)
    Missing value 3,917,605
Birth Year, n (%)
    2013 23,159 (100%)
    Missing value 3,917,605
Birth Month, n (%)
    1 1,951 (8%)
    2 1,740 (8%)
    3 1,897 (8%)
    4 1,853 (8%)
    5 2,030 (9%)
    6 1,971 (9%)
    7 1,991 (9%)
    8 2,010 (9%)
    9 1,995 (9%)
    10 2,032 (9%)
    11 1,795 (8%)
    12 1,894 (8%)
    Missing value 3,917,605
Birth Weekday, n (%)
    Sunday 2,733 (12%)
    Monday 3,316 (14%)
    Tuesday 3,667 (16%)
    Wednesday 3,578 (15%)
    Thursday 3,511 (15%)
    Friday 3,474 (15%)
    Saturday 2,880 (12%)
    Unknown 0 (0%)
    Missing value 3,917,605
Age at Death in Days
    Mean (Std) 41.56 (72.56)
    Median (IQR) 3.00 (0.00, 52.00)
    Missing value 3,917,605
Manner of Death, n (%)
    Accident 1,205 (6%)
    Suicide 0 (0%)
    Homicide 261 (1%)
    Pending investigation 301 (2%)
    Could not determine 2,017 (10%)
    Self-inflicted 0 (0%)
    Natural 15,622 (81%)
    Missing value 3,921,358
Show the R codes
mother_variables <- c("mager41", "mbrace", "mracerec", "mar", "meduc")
get_table_desc_stat(df, mother_variables)
Variable N = 3,940,764
Mother's Age, n (%)
    10-12 71 (0%)
    13 459 (0%)
    14 2,569 (0%)
    15 9,439 (0%)
    16 22,596 (1%)
    17 42,956 (1%)
    18 76,261 (2%)
    19 122,346 (3%)
    20 151,182 (4%)
    21 167,417 (4%)
    22 185,362 (5%)
    23 194,103 (5%)
    24 200,099 (5%)
    25 208,091 (5%)
    26 215,347 (5%)
    27 227,545 (6%)
    28 234,861 (6%)
    29 237,526 (6%)
    30 237,377 (6%)
    31 231,394 (6%)
    32 212,709 (5%)
    33 191,239 (5%)
    34 166,761 (4%)
    35 143,151 (4%)
    36 118,161 (3%)
    37 93,346 (2%)
    38 73,923 (2%)
    39 56,507 (1%)
    40 41,758 (1%)
    41 29,721 (1%)
    42 19,821 (1%)
    43 11,936 (0%)
    44 6,502 (0%)
    45 3,701 (0%)
    46 1,887 (0%)
    47 977 (0%)
    48 587 (0%)
    49 387 (0%)
    50-64 689 (0%)
Mother's Bridged Race, n (%)
    White – single race 2,689,736 (75%)
    Black – single race 556,437 (15%)
    American Indian – single race 33,908 (1%)
    Asian Indian – single race 56,790 (2%)
    Chinese – single race 49,453 (1%)
    Filipino – single race 32,191 (1%)
    Japanese – single race 7,108 (0%)
    Korean – single race 14,910 (0%)
    Vietnamese – single race 19,687 (1%)
    Other Asian – single race 41,740 (1%)
    Hawaiian – single race 1,136 (0%)
    Guamanian – single race 1,282 (0%)
    Samoan – single race 2,285 (0%)
    Other Pacific Islander – single race 6,022 (0%)
    White – bridged multiple race 38,302 (1%)
    Black – bridged multiple race 24,363 (1%)
    American Indian & Alaskan Native – bridged multiple race 5,243 (0%)
    Asian / Pacific Islander – bridged multiple race 14,306 (0%)
    Missing value 345,865
Mother's Race, n (%)
    Other (not classified as White or Black) 0 (0%)
    White 2,993,686 (76%)
    Black 635,120 (16%)
    American Indian / Alaskan Native 46,011 (1%)
    Asian / Pacific Islander 265,947 (7%)
Mother's Marital Status, n (%)
    Yes 2,342,660 (59%)
    No 1,598,104 (41%)
    Unmarried parents not living together 0 (0%)
    Unknown or not Stated 0 (0%)
Mother's Education, n (%)
    8th grade or less 136,701 (4%)
    9th through 12th grade with no diploma 421,293 (12%)
    High school graduate or GED completed 879,956 (25%)
    Some college credit, but not a degree 753,056 (21%)
    Associate degree (AA,AS) 280,660 (8%)
    Bachelor’s degree (BA, AB, BS) 669,170 (19%)
    Master’s degree (MA, MS, MEng, MEd, MSW, MBA) 297,054 (8%)
    Doctorate (PhD, EdD) or Professional Degree (MD, DDS, DVM, LLB, JD) 84,707 (2%)
    Unknown 41,820 (1%)
    Missing value 376,347
Show the R codes
father_variables <- c("fagecomb", "fbrace", "fracerec")
get_table_desc_stat(df, father_variables)
Variable N = 3,940,764
Father's Combined Age
    Mean (Std) 31.09 (6.89)
    Median (IQR) 31.00 (26.00, 35.00)
    Missing value 826,291
Father's Bridged Race, n (%)
    White – single race 2,210,002 (61%)
    Black – single race 418,722 (12%)
    American Indian – single race 22,558 (1%)
    Asian Indian – single race 55,700 (2%)
    Chinese – single race 42,055 (1%)
    Filipino – single race 21,498 (1%)
    Japanese – single race 4,108 (0%)
    Korean – single race 11,221 (0%)
    Vietnamese – single race 15,867 (0%)
    Other Asian – single race 33,597 (1%)
    Hawaiian – single race 941 (0%)
    Guamanian – single race 983 (0%)
    Samoan – single race 2,325 (0%)
    Other Pacific Islander – single race 4,458 (0%)
    White – bridged multiple race 28,772 (1%)
    Black – bridged multiple race 14,438 (0%)
    American Indian & Alaskan Native – bridged multiple race 7,617 (0%)
    Asian / Pacific Islander – bridged multiple race 12,986 (0%)
    Unknown or not stated, also includes states not reporting multiple race 687,051 (19%)
    Missing value 345,865
Father's Race, n (%)
    Other (not classified as White or Black) 0 (0%)
    White 2,466,993 (63%)
    Black 476,535 (12%)
    American Indian / Alaskan Native 35,143 (1%)
    Asian / Pacific Islander 222,529 (6%)
    Unknown or not stated 739,564 (19%)
Show the R codes
pregnancy_variables <- c("lbo", "tbo", "precare_rec", "uprevis", "wtgain",
                         "cig_1", "cig_2", "cig_3", "cig_rec", 
                         "dlmp_mm", "dlmp_yy", "estgest", "combgest")
get_table_desc_stat(df, pregnancy_variables)
Variable N = 3,940,764
Live Birth Order, n (%)
    1 1,550,114 (40%)
    2 1,246,847 (32%)
    3 654,946 (17%)
    4 276,936 (7%)
    5 108,168 (3%)
    6 44,188 (1%)
    7 20,301 (1%)
    8 20,732 (1%)
    Missing value 18,532
Total Birth Order, n (%)
    1 1,272,200 (33%)
    2 1,108,655 (28%)
    3 714,320 (18%)
    4 392,389 (10%)
    5 202,062 (5%)
    6 99,603 (3%)
    7 51,311 (1%)
    8 61,099 (2%)
    Missing value 39,125
Month Prenatal Care Began, n (%)
    1st to 3rd month 2,539,077 (71%)
    4th to 6th month 672,306 (19%)
    7th to final month 158,902 (4%)
    No prenatal care 51,392 (1%)
    Unknown or not stated 142,740 (4%)
    Missing value 376,347
Number of Prenatal Visits
    Mean (Std) 11.27 (4.04)
    Median (IQR) 12.00 (9.00, 13.00)
    Missing value 118,625
Weight Gain
    Mean (Std) 30.33 (14.92)
    Median (IQR) 30.00 (20.00, 39.00)
    Missing value 190,653
Cigarettes 1st Trimester
    Mean (Std) 0.88 (3.79)
    Median (IQR) 0.00 (0.00, 0.00)
    Missing value 549,170
Cigarettes 2nd Trimester
    Mean (Std) 0.88 (3.79)
    Median (IQR) 0.00 (0.00, 0.00)
    Missing value 550,029
Cigarettes 3rd Trimester
    Mean (Std) 0.88 (3.78)
    Median (IQR) 0.00 (0.00, 0.00)
    Missing value 550,272
Smokes cigarettes, n (%)
    Yes 287,250 (8%)
    No 3,104,885 (87%)
    Unknown or not stated 172,282 (5%)
    Missing value 376,347
Last Normal Menses: Month
    Mean (Std) 6.60 (3.49)
    Median (IQR) 7.00 (4.00, 10.00)
    Missing value 195,745
Last Normal Menses: Year, n (%)
    2011 1,455 (0%)
    2012 2,810,941 (75%)
    2013 940,736 (25%)
    Missing value 187,632
Obstetric/Clinical Gestation Est.
    Mean (Std) 38.51 (2.14)
    Median (IQR) 39.00 (38.00, 40.00)
    Missing value 7,483
Gestation – Detail in Weeks
    Mean (Std) 38.65 (2.50)
    Median (IQR) 39.00 (38.00, 40.00)
    Missing value 3,695
Show the R codes
risk_factor_variables <- c("rf_diab", "rf_gest", "rf_phyp", "rf_ghyp", 
                           "rf_eclam", "rf_ppterm", "rf_ppoutc", "rf_cesar")
get_table_desc_stat(df, risk_factor_variables)
Variable N = 3,940,764
Risk Factor: Prepregnancy Diabetes, n (%)
    Yes 26,896 (1%)
    No 3,529,191 (99%)
    Unknown or not stated 8,330 (0%)
    Missing value 376,347
Risk Factor: Gestational Diabetes, n (%)
    Yes 187,064 (5%)
    No 3,369,023 (95%)
    Unknown or not stated 8,330 (0%)
    Missing value 376,347
Risk Factor: Prepregnancy Hypertension, n (%)
    Yes 54,198 (2%)
    No 3,501,889 (98%)
    Unknown or not stated 8,330 (0%)
    Missing value 376,347
Risk Factor: Gestational Hypertension, n (%)
    Yes 172,919 (5%)
    No 3,383,168 (95%)
    Unknown or not stated 8,330 (0%)
    Missing value 376,347
Risk Factor: Hypertension Eclampsia, n (%)
    Yes 8,148 (0%)
    No 3,547,939 (100%)
    Unknown or not stated 8,330 (0%)
    Missing value 376,347
Risk Factor: Previous Preterm Birth, n (%)
    Yes 92,331 (3%)
    No 3,463,756 (97%)
    Unknown or not stated 8,330 (0%)
    Missing value 376,347
Risk Factor: Poor Pregnancy Outcome, n (%)
    Yes 78,310 (2%)
    No 3,477,777 (98%)
    Unknown or not stated 8,330 (0%)
    Missing value 376,347
Risk Factor: Previous Cesarean Deliveries, n (%)
    Yes 523,128 (15%)
    No 3,032,959 (85%)
    Unknown or not stated 8,330 (0%)
    Missing value 376,347
Show the R codes
obstetric_procedures_variables <- c("op_cerv", "op_tocol", "op_ecvs", "op_ecvf",
                                    "uop_induc", "uop_tocol", "on_ruptr", 
                                    "on_abrup", "on_prolg")
get_table_desc_stat(df, obstetric_procedures_variables)
Variable N = 3,940,764
Obstetric Procedures: Cervical Cerclage, n (%)
    Yes 10,900 (0%)
    No 3,542,733 (99%)
    Unknown or not stated 10,784 (0%)
    Missing value 376,347
Obstetric Procedures: Tocolysis, n (%)
    Yes 33,727 (1%)
    No 3,519,906 (99%)
    Unknown or not stated 10,784 (0%)
    Missing value 376,347
Obstetric Procedures: Successful External Cephalic, n (%)
    Yes 4,790 (0%)
    No 3,548,843 (100%)
    Unknown or not stated 10,784 (0%)
    Missing value 376,347
Obstetric Procedures: Failed External Cephalic, n (%)
    Yes 4,070 (0%)
    No 3,549,563 (100%)
    Unknown or not stated 10,784 (0%)
    Missing value 376,347
Obstetric Procedures: Induction of labor, n (%)
    Yes 904,437 (23%)
    No 3,031,183 (77%)
    Not on certificate 0 (0%)
    Unknown or not stated 5,144 (0%)
Obstetric Procedures: Tocolysis, n (%)
    Yes 37,355 (1%)
    No 3,892,312 (99%)
    Not on certificate 0 (0%)
    Unknown or not stated 11,097 (0%)
Onset of Labor: Premature Rupture of Membrane, n (%)
    Yes 123,588 (3%)
    No 3,429,618 (96%)
    Unknown or not stated 11,211 (0%)
    Missing value 376,347
Onset of Labor: Abruptio placenta, n (%)
    Yes 110,247 (3%)
    No 3,442,959 (97%)
    Unknown or not stated 11,211 (0%)
    Missing value 376,347
Onset of Labor: Prolonged Labor, n (%)
    Yes 47,198 (1%)
    No 3,506,008 (98%)
    Unknown or not stated 11,211 (0%)
    Missing value 376,347
Show the R codes
labor_variables <- 
  c("ld_induct", "ld_augment", "ld_nvrtx", "ld_steroids", "ld_antibio", 
    "ld_chorio", "ld_mecon", "ld_fintol", "ld_anesth", "uld_meco", 
    "uld_precip", "uld_breech", "md_present", "md_route", "md_trial", 
    "rdmeth_rec", "dmeth_rec", "attend", "apgar5", "dplural")
get_table_desc_stat(df, labor_variables)
Variable N = 3,940,764
Charact. of Labor and Delivery: Induction of Labor, n (%)
    Yes 825,322 (23%)
    No 2,734,264 (77%)
    Unknown or not stated 4,831 (0%)
    Missing value 376,347
Charact. of Labor and Delivery: Augmentation of Labor, n (%)
    Yes 716,538 (20%)
    No 2,843,048 (80%)
    Unknown or not stated 4,831 (0%)
    Missing value 376,347
Charact. of Labor and Delivery: Non-Vertex Presentation, n (%)
    Yes 0 (NA%)
    No 0 (NA%)
    Unknown or not stated 0 (NA%)
    Missing value 3,940,764
Charact. of Labor and Delivery: Steroids, n (%)
    Yes 52,190 (1%)
    No 3,507,396 (98%)
    Unknown or not stated 4,831 (0%)
    Missing value 376,347
Charact. of Labor and Delivery: Antibiotics, n (%)
    Yes 775,645 (22%)
    No 2,783,941 (78%)
    Unknown or not stated 4,831 (0%)
    Missing value 376,347
Charact. of Labor and Delivery: Chorioamnionitis, n (%)
    Yes 46,183 (1%)
    No 3,513,403 (99%)
    Unknown or not stated 4,831 (0%)
    Missing value 376,347
Charact. of Labor and Delivery: Meconium Staining, n (%)
    Yes 184,992 (5%)
    No 3,374,594 (95%)
    Unknown or not stated 4,831 (0%)
    Missing value 376,347
Charact. of Labor and Delivery: Fetal Intolerance, n (%)
    Yes 160,446 (5%)
    No 3,399,140 (95%)
    Unknown or not stated 4,831 (0%)
    Missing value 376,347
Charact. of Labor and Delivery: Anesthesia, n (%)
    Yes 2,548,929 (72%)
    No 1,010,657 (28%)
    Unknown or not stated 4,831 (0%)
    Missing value 376,347
Complications of Labor and Delivery: Meconium, n (%)
    Yes 204,711 (5%)
    No 3,730,794 (95%)
    Not on certificate 0 (0%)
    Unknown or not stated 5,259 (0%)
Complications of Labor and Delivery: Precipitous labor, n (%)
    Yes 119,166 (3%)
    No 3,809,959 (97%)
    Not on certificate 0 (0%)
    Unknown or not stated 11,639 (0%)
Complications of Labor and Delivery: Breech, n (%)
    Yes 221,560 (6%)
    No 3,615,311 (92%)
    Not on certificate 0 (0%)
    Unknown or not stated 103,893 (3%)
Method of Delivery: Fetal Presentation, n (%)
    Cephalic 3,253,574 (91%)
    Breech 131,733 (4%)
    Other 75,645 (2%)
    Unknown or not stated 103,465 (3%)
    Missing value 376,347
Method of Delivery: Final Route and Method of Delivery, n (%)
    Spontaneous 2,281,509 (64%)
    Forceps 21,220 (1%)
    Vacuum 97,440 (3%)
    Cesarean 1,161,670 (33%)
    Unknown or not stated 2,578 (0%)
    Missing value 376,347
Method of Delivery: Trial of Labor Attempted, n (%)
    Yes 301,959 (8%)
    No 838,556 (24%)
    Not applicable 2,400,169 (67%)
    Unknown or not stated 23,733 (1%)
    Missing value 376,347
Delivery Method, n (%)
    Vaginal 2,339,254 (66%)
    Vaginal after previous c-section 55,320 (2%)
    Primary C-section 692,015 (19%)
    Repeat C-section 467,728 (13%)
    Vaginal (unknown if previous c-section) 5,595 (0%)
    C-section (unknown if previous c-section) 1,927 (0%)
    Not stated 2,578 (0%)
    Missing value 376,347
Delivery Method, n (%)
    Vaginal 2,648,618 (67%)
    C-Section 1,287,178 (33%)
    Unknown 4,968 (0%)
Attendant, n (%)
    Doctor of Medicine 3,316,572 (84%)
    Doctor of Osteopathy 244,057 (6%)
    Certified Nurse Midwife 321,236 (8%)
    Other Midwife 29,119 (1%)
    Other 27,377 (1%)
    Unknown or not stated 2,403 (0%)
Five Minute Apgar Score
    Mean (Std) 8.79 (0.83)
    Median (IQR) 9.00 (9.00, 9.00)
    Missing value 17,008
Plurality, n (%)
    Single 3,803,447 (97%)
    Twin 132,589 (3%)
    Triplet 4,392 (0%)
    Quadruplet 270 (0%)
    Quintuplet or higher 66 (0%)
Show the R codes
conditions_newborn_variable <- 
  c("ab_vent", "ab_vent6", "ab_nicu", "ab_surfac", "ab_antibio", "ab_seiz", 
    "ab_inj", "ca_anen", "ca_menin", "ca_heart", "ca_hernia", "ca_ompha", 
    "ca_gastro", "ca_limb", "ca_cleftlp", "ca_cleft", "ca_downs", 
    "ca_chrom", "ca_hypos")
get_table_desc_stat(df, conditions_newborn_variable)
Variable N = 3,940,764
Abnormal Conditions of the Newborn: Assisted Ventilation, n (%)
    Yes, Complication reported 118,090 (3%)
    No Complication reported 3,436,435 (96%)
    Unknown or not stated 9,892 (0%)
    Missing value 376,347
Abnormal Conditions of the Newborn: Assisted Ventilation >6hrs, n (%)
    Yes, Complication reported 37,317 (1%)
    No Complication reported 3,517,208 (99%)
    Unknown or not stated 9,892 (0%)
    Missing value 376,347
Abnormal Conditions of the Newborn: Admission to NICU, n (%)
    Yes, Complication reported 283,207 (8%)
    No Complication reported 3,271,318 (92%)
    Unknown or not stated 9,892 (0%)
    Missing value 376,347
Abnormal Conditions of the Newborn: Surfactant, n (%)
    Yes, Complication reported 14,339 (0%)
    No Complication reported 3,540,186 (99%)
    Unknown or not stated 9,892 (0%)
    Missing value 376,347
Abnormal Conditions of the Newborn: Antibiotics, n (%)
    Yes, Complication reported 2,071 (0%)
    No Complication reported 3,552,454 (100%)
    Unknown or not stated 9,892 (0%)
    Missing value 376,347
Abnormal Conditions of the Newborn: Seizures, n (%)
    Yes, Complication reported 0 (NA%)
    No Complication reported 0 (NA%)
    Unknown or not stated 0 (NA%)
    Missing value 3,940,764
Abnormal Conditions of the Newborn: Birth Injury, n (%)
    Yes, Complication reported 0 (NA%)
    No Complication reported 0 (NA%)
    Unknown or not stated 0 (NA%)
    Missing value 3,940,764
Congenital Anomalies of the Newborn: Anencephaly, n (%)
    Yes, anomaly reported 379 (0%)
    No, anomaly not reported 3,550,054 (100%)
    Unknown 13,984 (0%)
    Missing value 376,347
Congenital Anomalies of the Newborn: Meningomyelocele/Spina Bifida, n (%)
    Yes, anomaly reported 575 (0%)
    No, anomaly not reported 3,549,858 (100%)
    Unknown 13,984 (0%)
    Missing value 376,347
Congenital Anomalies of the Newborn: Cyanotic Congenital Heart Disease, n (%)
    Yes, anomaly reported 3,047 (0%)
    No, anomaly not reported 3,547,386 (100%)
    Unknown 13,984 (0%)
    Missing value 376,347
Congenital Anomalies of the Newborn: Congenital Diaphragmatic Hernia, n (%)
    Yes, anomaly reported 463 (0%)
    No, anomaly not reported 3,549,970 (100%)
    Unknown 13,984 (0%)
    Missing value 376,347
Congenital Anomalies of the Newborn: Omphalocele, n (%)
    Yes, anomaly reported 376 (0%)
    No, anomaly not reported 3,550,057 (100%)
    Unknown 13,984 (0%)
    Missing value 376,347
Congenital Anomalies of the Newborn: Gastroschisis, n (%)
    Yes, anomaly reported 993 (0%)
    No, anomaly not reported 3,549,440 (100%)
    Unknown 13,984 (0%)
    Missing value 376,347
Congenital Anomalies of the Newborn: Limb Reduction Deficit, n (%)
    Yes, anomaly reported 471 (0%)
    No, anomaly not reported 3,549,962 (100%)
    Unknown 13,984 (0%)
    Missing value 376,347
Congenital Anomalies of the Newborn: Cleft Lip w/ or w/o Cleft Palate, n (%)
    Yes, anomaly reported 1,834 (0%)
    No, anomaly not reported 3,548,599 (100%)
    Unknown 13,984 (0%)
    Missing value 376,347
Congenital Anomalies of the Newborn: Cleft Palate Alone, n (%)
    Yes, anomaly reported 847 (0%)
    No, anomaly not reported 3,549,586 (100%)
    Unknown 13,984 (0%)
    Missing value 376,347
Congenital Anomalies of the Newborn: Downs Syndrome, n (%)
    Yes, anomaly reported 0 (0%)
    No, anomaly not reported 3,548,582 (100%)
    Unknown 13,984 (0%)
    Missing value 378,198
Congenital Anomalies of the Newborn: Suspected Chromosonal Disorder, n (%)
    Yes, anomaly reported 0 (0%)
    No, anomaly not reported 3,549,154 (100%)
    Unknown 13,984 (0%)
    Missing value 377,626
Congenital Anomalies of the Newborn: Hypospadias, n (%)
    Yes, anomaly reported 2,104 (0%)
    No, anomaly not reported 3,548,329 (100%)
    Unknown 13,984 (0%)
    Missing value 376,347

2.2 Recoding some variables

Let us recode some variables, based on what was observed on the descriptive statistics table. Some variables are discarded, because they correspond to multiply defined information. For example, the race of the mother is provided through multiple variables: mbrace and mracerec. The former gives a much finer decomposition, but the number of observation is too limited for some categories.

df <- 
  df %>% 
  select(
    -bfacil, -hospd, -mbrace, -fbrace, -tbo, -ld_nvrtx, -rdmeth_rec, 
    -ab_seiz, -ab_inj) %>% 
  filter(!is.na(bfacil3)) %>% 
  filter(is.na(manner) | manner %in% c("Could not determine", "Natural")) %>% 
  mutate(
    d_restatus = fct_recode(
      d_restatus, 
      "Other" = "Interstate or Interterritory Nonresidents", 
      "Other" = "Foreign Residents"
    ),
    precare_rec = replace_na(precare_rec, "Unknown or not stated"),
    cig_rec = replace_na(cig_rec, "Unknown or not stated"),
    op_cerv = replace_na(op_cerv, "Unknown or not stated"),
    op_tocol = replace_na(op_tocol, "Unknown or not stated"),
    op_ecvs = replace_na(op_ecvs, "Unknown or not stated"),
    op_ecvf = replace_na(op_ecvf, "Unknown or not stated"),
    uop_tocol = replace_na(uop_tocol, "Unknown or not stated"),
    on_ruptr = replace_na(on_ruptr, "Unknown or not stated"),
    on_abrup = replace_na(on_abrup, "Unknown or not stated"),
    on_prolg = replace_na(on_prolg, "Unknown or not stated"),
    ld_induct = replace_na(ld_induct, "Unknown or not stated"),
    ld_augment = replace_na(ld_augment, "Unknown or not stated"),
    ld_steroids = replace_na(ld_steroids, "Unknown or not stated"),
    ld_antibio = replace_na(ld_antibio, "Unknown or not stated"),
    ld_chorio = replace_na(ld_chorio, "Unknown or not stated"),
    ld_mecon = replace_na(ld_mecon, "Unknown or not stated"),
    ld_fintol = replace_na(ld_fintol, "Unknown or not stated"),
    ld_anesth = replace_na(ld_anesth, "Unknown or not stated"),
    uld_meco = replace_na(uld_meco, "Unknown or not stated"),
    uld_precip = replace_na(uld_precip, "Unknown or not stated"),
    uld_breech = replace_na(uld_breech, "Unknown or not stated"),
    md_present = replace_na(md_present, "Unknown or not stated"),
    md_route = replace_na(md_route, "Unknown or not stated"),
    md_trial = replace_na(md_trial, "Unknown or not stated"),
    dmeth_rec = replace_na(dmeth_rec, "Unknown"),
    attend = replace_na(attend, "Unknown or not stated"),
    dplural = fct_recode(
      dplural,
      "Quadruplet or higher" = "Quadruplet",
      "Quadruplet or higher" = "Quintuplet or higher"
    ),
    ab_vent = replace_na(ab_vent, "Unknown or not stated"),
    ab_vent6 = replace_na(ab_vent6, "Unknown or not stated"),
    ab_nicu = replace_na(ab_nicu, "Unknown or not stated"),
    ab_surfac = replace_na(ab_surfac, "Unknown or not stated"),
    ab_antibio = replace_na(ab_antibio, "Unknown or not stated"),
    ca_anen = replace_na(ca_anen, "Unknown"),
    ca_menin = replace_na(ca_menin, "Unknown"),
    ca_heart = replace_na(ca_heart, "Unknown"),
    ca_hernia = replace_na(ca_hernia, "Unknown"),
    ca_ompha = replace_na(ca_ompha, "Unknown"),
    ca_gastro = replace_na(ca_gastro, "Unknown"),
    ca_limb = replace_na(ca_limb, "Unknown"),
    ca_cleftlp = replace_na(ca_cleftlp, "Unknown"),
    ca_cleft = replace_na(ca_cleft, "Unknown"),
    ca_downs = replace_na(ca_downs, "Unknown"),
    ca_chrom = replace_na(ca_chrom, "Unknown"),
    ca_hypos = replace_na(ca_hypos, "Unknown")
  )

Let us also create three dummy variables to state whether the mother smoked during each trimester of her pregnancy.

df <- 
  df %>% 
  mutate(
    cig_1_d = case_when(
      cig_1 == 0 ~ "No",
      cig_1 > 0 ~ "True",
      is.na(cig_1) ~ "Unknown or not stated"
    ),
    cig_2_d = case_when(
      cig_2 == 0 ~ "No",
      cig_2 > 0 ~ "True",
      is.na(cig_2) ~ "Unknown or not stated"
    ),
    cig_3_d = case_when(
      cig_3 == 0 ~ "No",
      cig_3 > 0 ~ "True",
      is.na(cig_3) ~ "Unknown or not stated"
    ),
    cig_1_d = factor(cig_1_d, levels = c("Yes", "No", "Unknown or not stated")),
    cig_2_d = factor(cig_2_d, levels = c("Yes", "No", "Unknown or not stated")),
    cig_3_d = factor(cig_3_d, levels = c("Yes", "No", "Unknown or not stated"))
  )

Let us add some labels for these three dummies.

df <- 
  df %>% 
  labelled::set_variable_labels(
    cig_1_d = "Smoked cigarettes 1st Trimester",
    cig_2_d = "Smoked cigarettes 2nd Trimester",
    cig_3_d = "Smoked cigarettes 3rd Trimester"
  )

And the, let us save the data:

save(df, file = "../data/df_2013.rda")

2.3 Grouping according to smoking

Let us provide the same kind descriptive statistics as above, but this time by splitting the data according to whether the mother smokes or not.

get_table_desc_stat_smoke <- function(df, variables){
  df %>% 
    select(!!variables, cig_rec) %>% 
    tbl_summary(
      by = cig_rec,
      type = all_continuous() ~ "continuous2",
      statistic = list(
        all_continuous() ~ c("{mean} ({sd})", "{median} ({p25}, {p75})"),
        all_categorical() ~ "{n} ({p}%)"),
      digits = list(
        all_continuous() ~ 2,
        all_categorical() ~ 0
      ),
      missing_text = "Missing value"
    ) %>% 
    add_p() %>% 
    add_overall(col_label = "Whole sample") %>% 
    modify_header(label ~ "**Variable**") %>% 
    modify_spanning_header(c("stat_1", "stat_2") ~ "**Mother is a smoker**") %>% 
    add_stat_label(
      label = list(
        all_continuous() ~ c("Mean (Std)", "Median (IQR)"),
        all_categorical() ~ "n (%)"
      )
    )
}
Show the R codes
childrens_characteristics <- c("sex", "dbwt", "bfacil3", "restatus")

get_table_desc_stat_smoke(df, childrens_characteristics)
Variable Whole sample Mother is a smoker Unknown or not stated, N = 548,314 p-value1
Yes, N = 286,824 No, N = 3,103,859
Sex of Infant, n (%) 0.4
    Male 2,016,365 (51%) 147,124 (51%) 1,588,849 (51%) 280,392 (51%)
    Female 1,922,632 (49%) 139,700 (49%) 1,515,010 (49%) 267,922 (49%)
Birth Weight (in Grams) <0.001
    Mean (Std) 3,270.92 (593.50) 3,096.68 (600.91) 3,288.66 (588.60) 3,261.68 (602.18)
    Median (IQR) 3,316.00 (2,977.00, 3,630.00) 3,146.00 (2,790.00, 3,469.00) 3,325.00 (2,997.00, 3,657.00) 3,310.00 (2,965.00, 3,629.00)
    Missing value 642 22 508 112
Birth Place, n (%) <0.001
    Hospital 3,881,580 (99%) 285,364 (99%) 3,053,119 (98%) 543,097 (99%)
    Not in Hospital 57,295 (1%) 1,450 (1%) 50,684 (2%) 5,161 (1%)
    Unknown or Not Stated 122 (0%) 10 (0%) 56 (0%) 56 (0%)
Resident Status, n (%) <0.001
    Residents 2,846,402 (72%) 193,302 (67%) 2,261,285 (73%) 391,815 (71%)
    Intrastate Nonresidents 998,869 (25%) 86,150 (30%) 770,291 (25%) 142,428 (26%)
    Interstate or Interterritory Nonresidents 85,144 (2%) 7,327 (3%) 65,016 (2%) 12,801 (2%)
    Foreign Residents 8,582 (0%) 45 (0%) 7,267 (0%) 1,270 (0%)
1 Pearson's Chi-squared test; Kruskal-Wallis rank sum test
Show the R codes
death_variables <- c("d_restatus", "weekdayd", "dthyr", "dthmon",
                      "dob_yy", "dob_mm", "dob_wk", "aged", "manner")

get_table_desc_stat_smoke(df, death_variables)
Variable Whole sample Mother is a smoker Unknown or not stated, N = 548,314 p-value1
Yes, N = 286,824 No, N = 3,103,859
Death Resident Status, n (%) 0.010
    Residents 13,044 (61%) 1,478 (60%) 9,524 (61%) 2,042 (61%)
    Intrastate Nonresidents 7,204 (34%) 880 (36%) 5,234 (34%) 1,090 (33%)
    Other 1,144 (5%) 112 (5%) 822 (5%) 210 (6%)
    Missing value 3,917,605 284,354 3,088,279 544,972
Day of Week of Death, n (%)
    Sunday 2,992 (14%) 372 (15%) 2,132 (14%) 488 (15%)
    Monday 2,876 (13%) 326 (13%) 2,086 (13%) 464 (14%)
    Tuesday 3,025 (14%) 345 (14%) 2,227 (14%) 453 (14%)
    Wednesday 3,232 (15%) 346 (14%) 2,422 (16%) 464 (14%)
    Thursday 3,122 (15%) 378 (15%) 2,255 (14%) 489 (15%)
    Friday 3,105 (15%) 334 (14%) 2,275 (15%) 496 (15%)
    Saturday 3,040 (14%) 369 (15%) 2,183 (14%) 488 (15%)
    Unknown 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Missing value 3,917,605 284,354 3,088,279 544,972
Year of Death, n (%) <0.001
    2013 19,246 (90%) 2,127 (86%) 14,143 (91%) 2,976 (89%)
    2014 2,146 (10%) 343 (14%) 1,437 (9%) 366 (11%)
    Missing value 3,917,605 284,354 3,088,279 544,972
Month of Death, n (%) 0.5
    1 1,762 (8%) 220 (9%) 1,261 (8%) 281 (8%)
    2 1,670 (8%) 196 (8%) 1,225 (8%) 249 (7%)
    3 1,832 (9%) 217 (9%) 1,320 (8%) 295 (9%)
    4 1,772 (8%) 196 (8%) 1,289 (8%) 287 (9%)
    5 1,867 (9%) 213 (9%) 1,366 (9%) 288 (9%)
    6 1,827 (9%) 210 (9%) 1,317 (8%) 300 (9%)
    7 1,798 (8%) 185 (7%) 1,314 (8%) 299 (9%)
    8 1,781 (8%) 182 (7%) 1,327 (9%) 272 (8%)
    9 1,808 (8%) 229 (9%) 1,308 (8%) 271 (8%)
    10 1,858 (9%) 206 (8%) 1,388 (9%) 264 (8%)
    11 1,638 (8%) 190 (8%) 1,182 (8%) 266 (8%)
    12 1,779 (8%) 226 (9%) 1,283 (8%) 270 (8%)
    Missing value 3,917,605 284,354 3,088,279 544,972
Birth Year, n (%)
    2013 21,392 (100%) 2,470 (100%) 15,580 (100%) 3,342 (100%)
    Missing value 3,917,605 284,354 3,088,279 544,972
Birth Month, n (%) >0.9
    1 1,796 (8%) 217 (9%) 1,290 (8%) 289 (9%)
    2 1,602 (7%) 178 (7%) 1,207 (8%) 217 (6%)
    3 1,752 (8%) 199 (8%) 1,281 (8%) 272 (8%)
    4 1,727 (8%) 192 (8%) 1,264 (8%) 271 (8%)
    5 1,890 (9%) 222 (9%) 1,370 (9%) 298 (9%)
    6 1,831 (9%) 202 (8%) 1,333 (9%) 296 (9%)
    7 1,830 (9%) 203 (8%) 1,321 (8%) 306 (9%)
    8 1,848 (9%) 207 (8%) 1,363 (9%) 278 (8%)
    9 1,819 (9%) 218 (9%) 1,321 (8%) 280 (8%)
    10 1,888 (9%) 216 (9%) 1,380 (9%) 292 (9%)
    11 1,661 (8%) 205 (8%) 1,185 (8%) 271 (8%)
    12 1,748 (8%) 211 (9%) 1,265 (8%) 272 (8%)
    Missing value 3,917,605 284,354 3,088,279 544,972
Birth Weekday, n (%)
    Sunday 2,555 (12%) 295 (12%) 1,861 (12%) 399 (12%)
    Monday 3,062 (14%) 343 (14%) 2,246 (14%) 473 (14%)
    Tuesday 3,363 (16%) 427 (17%) 2,428 (16%) 508 (15%)
    Wednesday 3,292 (15%) 366 (15%) 2,392 (15%) 534 (16%)
    Thursday 3,224 (15%) 345 (14%) 2,372 (15%) 507 (15%)
    Friday 3,198 (15%) 381 (15%) 2,331 (15%) 486 (15%)
    Saturday 2,698 (13%) 313 (13%) 1,950 (13%) 435 (13%)
    Unknown 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Missing value 3,917,605 284,354 3,088,279 544,972
Age at Death in Days <0.001
    Mean (Std) 35.66 (67.92) 47.95 (72.18) 33.50 (66.79) 36.65 (68.92)
    Median (IQR) 2.00 (0.00, 37.00) 8.00 (0.00, 72.75) 1.00 (0.00, 30.00) 2.00 (0.00, 38.00)
    Missing value 3,917,605 284,354 3,088,279 544,972
Manner of Death, n (%)
    Accident 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Suicide 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Homicide 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Pending investigation 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Could not determine 2,017 (11%) 521 (23%) 1,113 (9%) 383 (13%)
    Self-inflicted 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Natural 15,622 (89%) 1,712 (77%) 11,392 (91%) 2,518 (87%)
    Missing value 3,921,358 284,591 3,091,354 545,413
1 Pearson's Chi-squared test; Kruskal-Wallis rank sum test
Show the R codes
mother_variables <- c("mager41", "mracerec", "mar", "meduc")
get_table_desc_stat_smoke(df, mother_variables)
Variable Whole sample Mother is a smoker Unknown or not stated, N = 548,314 p-value1
Yes, N = 286,824 No, N = 3,103,859
Mother's Age, n (%) <0.001
    10-12 71 (0%) 0 (0%) 61 (0%) 10 (0%)
    13 458 (0%) 9 (0%) 386 (0%) 63 (0%)
    14 2,567 (0%) 60 (0%) 2,188 (0%) 319 (0%)
    15 9,426 (0%) 331 (0%) 7,866 (0%) 1,229 (0%)
    16 22,581 (1%) 1,113 (0%) 18,384 (1%) 3,084 (1%)
    17 42,923 (1%) 3,006 (1%) 33,793 (1%) 6,124 (1%)
    18 76,187 (2%) 7,277 (3%) 58,088 (2%) 10,822 (2%)
    19 122,239 (3%) 13,187 (5%) 91,432 (3%) 17,620 (3%)
    20 151,061 (4%) 17,499 (6%) 111,820 (4%) 21,742 (4%)
    21 167,280 (4%) 19,788 (7%) 123,190 (4%) 24,302 (4%)
    22 185,224 (5%) 21,780 (8%) 136,737 (4%) 26,707 (5%)
    23 193,979 (5%) 22,191 (8%) 143,773 (5%) 28,015 (5%)
    24 199,972 (5%) 21,055 (7%) 150,686 (5%) 28,231 (5%)
    25 207,972 (5%) 20,028 (7%) 158,782 (5%) 29,162 (5%)
    26 215,237 (5%) 18,665 (7%) 166,403 (5%) 30,169 (6%)
    27 227,443 (6%) 17,354 (6%) 178,969 (6%) 31,120 (6%)
    28 234,771 (6%) 16,158 (6%) 186,336 (6%) 32,277 (6%)
    29 237,448 (6%) 14,178 (5%) 190,606 (6%) 32,664 (6%)
    30 237,305 (6%) 13,161 (5%) 191,389 (6%) 32,755 (6%)
    31 231,337 (6%) 11,654 (4%) 187,748 (6%) 31,935 (6%)
    32 212,653 (5%) 10,262 (4%) 173,427 (6%) 28,964 (5%)
    33 191,198 (5%) 8,662 (3%) 156,639 (5%) 25,897 (5%)
    34 166,722 (4%) 7,094 (2%) 136,557 (4%) 23,071 (4%)
    35 143,123 (4%) 5,638 (2%) 117,884 (4%) 19,601 (4%)
    36 118,137 (3%) 4,502 (2%) 97,566 (3%) 16,069 (3%)
    37 93,337 (2%) 3,495 (1%) 76,973 (2%) 12,869 (2%)
    38 73,908 (2%) 2,683 (1%) 61,194 (2%) 10,031 (2%)
    39 56,495 (1%) 1,979 (1%) 46,844 (2%) 7,672 (1%)
    40 41,750 (1%) 1,506 (1%) 34,675 (1%) 5,569 (1%)
    41 29,715 (1%) 1,086 (0%) 24,623 (1%) 4,006 (1%)
    42 19,816 (1%) 685 (0%) 16,491 (1%) 2,640 (0%)
    43 11,933 (0%) 403 (0%) 9,934 (0%) 1,596 (0%)
    44 6,501 (0%) 189 (0%) 5,467 (0%) 845 (0%)
    45 3,701 (0%) 93 (0%) 3,092 (0%) 516 (0%)
    46 1,887 (0%) 27 (0%) 1,575 (0%) 285 (0%)
    47 977 (0%) 15 (0%) 832 (0%) 130 (0%)
    48 587 (0%) 6 (0%) 502 (0%) 79 (0%)
    49 387 (0%) 0 (0%) 336 (0%) 51 (0%)
    50-64 689 (0%) 5 (0%) 611 (0%) 73 (0%)
Mother's Race, n (%)
    Other (not classified as White or Black) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    White 2,992,542 (76%) 239,932 (84%) 2,348,230 (76%) 404,380 (74%)
    Black 634,570 (16%) 37,526 (13%) 501,090 (16%) 95,954 (17%)
    American Indian / Alaskan Native 45,974 (1%) 6,734 (2%) 30,805 (1%) 8,435 (2%)
    Asian / Pacific Islander 265,911 (7%) 2,632 (1%) 223,734 (7%) 39,545 (7%)
Mother's Marital Status, n (%)
    Yes 2,342,150 (59%) 83,748 (29%) 1,940,699 (63%) 317,703 (58%)
    No 1,596,847 (41%) 203,076 (71%) 1,163,160 (37%) 230,611 (42%)
    Unmarried parents not living together 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Unknown or not Stated 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Mother's Education, n (%) <0.001
    8th grade or less 136,648 (4%) 5,880 (2%) 126,262 (4%) 4,506 (3%)
    9th through 12th grade with no diploma 420,930 (12%) 67,193 (23%) 334,434 (11%) 19,303 (11%)
    High school graduate or GED completed 879,360 (25%) 114,693 (40%) 717,840 (23%) 46,827 (27%)
    Some college credit, but not a degree 752,694 (21%) 71,611 (25%) 641,677 (21%) 39,406 (23%)
    Associate degree (AA,AS) 280,582 (8%) 16,205 (6%) 252,219 (8%) 12,158 (7%)
    Bachelor’s degree (BA, AB, BS) 669,084 (19%) 8,026 (3%) 631,975 (20%) 29,083 (17%)
    Master’s degree (MA, MS, MEng, MEd, MSW, MBA) 297,029 (8%) 1,465 (1%) 281,653 (9%) 13,911 (8%)
    Doctorate (PhD, EdD) or Professional Degree (MD, DDS, DVM, LLB, JD) 84,701 (2%) 262 (0%) 81,151 (3%) 3,288 (2%)
    Unknown 41,789 (1%) 1,489 (1%) 36,648 (1%) 3,652 (2%)
    Missing value 376,180 0 0 376,180
1 Pearson's Chi-squared test
Show the R codes
father_variables <- c("fagecomb", "fracerec")
get_table_desc_stat_smoke(df, father_variables)
Variable Whole sample Mother is a smoker Unknown or not stated, N = 548,314 p-value1
Yes, N = 286,824 No, N = 3,103,859
Father's Combined Age <0.001
    Mean (Std) 31.09 (6.89) 29.39 (7.04) 31.23 (6.86) 30.91 (6.81)
    Median (IQR) 31.00 (26.00, 35.00) 28.00 (24.00, 33.00) 31.00 (26.00, 35.00) 31.00 (26.00, 35.00)
    Missing value 825,598 80,392 342,123 403,083
Father's Race, n (%)
    Other (not classified as White or Black) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    White 2,466,248 (63%) 161,227 (56%) 1,961,775 (63%) 343,246 (63%)
    Black 476,215 (12%) 28,421 (10%) 375,761 (12%) 72,033 (13%)
    American Indian / Alaskan Native 35,122 (1%) 4,729 (2%) 24,002 (1%) 6,391 (1%)
    Asian / Pacific Islander 222,505 (6%) 1,858 (1%) 186,679 (6%) 33,968 (6%)
    Unknown or not stated 738,907 (19%) 90,589 (32%) 555,642 (18%) 92,676 (17%)
1 Kruskal-Wallis rank sum test
Show the R codes
pregnancy_variables <- c("lbo", "precare_rec", "uprevis", "wtgain",
                         "cig_1", "cig_2", "cig_3", "cig_rec", 
                         "cig_1_d", "cig_2_d", "cig_3_d",
                         "dlmp_mm", "dlmp_yy", "estgest", "combgest")
get_table_desc_stat_smoke(df, pregnancy_variables)
Variable Whole sample Mother is a smoker Unknown or not stated, N = 548,314 p-value1
Yes, N = 286,824 No, N = 3,103,859
Live Birth Order, n (%) <0.001
    1 1,549,617 (40%) 95,054 (33%) 1,239,353 (40%) 215,210 (39%)
    2 1,246,311 (32%) 85,290 (30%) 986,084 (32%) 174,937 (32%)
    3 654,590 (17%) 56,171 (20%) 507,684 (16%) 90,735 (17%)
    4 276,749 (7%) 27,839 (10%) 211,055 (7%) 37,855 (7%)
    5 108,082 (3%) 12,173 (4%) 81,057 (3%) 14,852 (3%)
    6 44,142 (1%) 4,970 (2%) 32,856 (1%) 6,316 (1%)
    7 20,282 (1%) 2,294 (1%) 15,119 (0%) 2,869 (1%)
    8 20,712 (1%) 1,977 (1%) 15,786 (1%) 2,949 (1%)
    Missing value 18,512 1,056 14,865 2,591
Month Prenatal Care Began, n (%) <0.001
    1st to 3rd month 2,538,197 (64%) 173,111 (60%) 2,254,884 (73%) 110,202 (20%)
    4th to 6th month 671,883 (17%) 75,017 (26%) 567,356 (18%) 29,510 (5%)
    7th to final month 158,790 (4%) 20,054 (7%) 131,511 (4%) 7,225 (1%)
    No prenatal care 51,313 (1%) 8,415 (3%) 39,878 (1%) 3,020 (1%)
    Unknown or not stated 518,814 (13%) 10,227 (4%) 110,230 (4%) 398,357 (73%)
Number of Prenatal Visits <0.001
    Mean (Std) 11.27 (4.04) 10.39 (4.49) 11.35 (3.96) 11.26 (4.19)
    Median (IQR) 12.00 (9.00, 13.00) 11.00 (8.00, 13.00) 12.00 (9.00, 13.00) 11.00 (9.00, 13.00)
    Missing value 118,528 10,074 97,285 11,169
Weight Gain <0.001
    Mean (Std) 30.33 (14.92) 30.55 (17.25) 30.41 (14.80) 29.79 (14.25)
    Median (IQR) 30.00 (20.00, 39.00) 30.00 (19.00, 41.00) 30.00 (21.00, 39.00) 30.00 (20.00, 38.00)
    Missing value 190,520 13,534 144,864 32,122
Cigarettes 1st Trimester <0.001
    Mean (Std) 0.88 (3.79) 10.45 (8.38) 0.00 (0.00) NA (NA)
    Median (IQR) 0.00 (0.00, 0.00) 10.00 (5.00, 15.00) 0.00 (0.00, 0.00) NA (NA, NA)
    Missing value 548,853 539 0 548,314
Cigarettes 2nd Trimester <0.001
    Mean (Std) 0.88 (3.78) 10.45 (8.37) 0.00 (0.00) NA (NA)
    Median (IQR) 0.00 (0.00, 0.00) 10.00 (5.00, 15.00) 0.00 (0.00, 0.00) NA (NA, NA)
    Missing value 549,711 1,397 0 548,314
Cigarettes 3rd Trimester <0.001
    Mean (Std) 0.88 (3.78) 10.45 (8.37) 0.00 (0.00) NA (NA)
    Median (IQR) 0.00 (0.00, 0.00) 10.00 (5.00, 15.00) 0.00 (0.00, 0.00) NA (NA, NA)
    Missing value 549,950 1,636 0 548,314
Smoked cigarettes 1st Trimester, n (%)
    Yes 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    No 3,110,782 (85%) 6,923 (93%) 3,103,859 (100%) 0 (0%)
    Unknown or not stated 548,853 (15%) 539 (7%) 0 (0%) 548,314 (100%)
    Missing value 279,362 279,362 0 0
Smoked cigarettes 2nd Trimester, n (%)
    Yes 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    No 3,110,782 (85%) 6,923 (83%) 3,103,859 (100%) 0 (0%)
    Unknown or not stated 549,711 (15%) 1,397 (17%) 0 (0%) 548,314 (100%)
    Missing value 278,504 278,504 0 0
Smoked cigarettes 3rd Trimester, n (%)
    Yes 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    No 3,110,777 (85%) 6,918 (81%) 3,103,859 (100%) 0 (0%)
    Unknown or not stated 549,950 (15%) 1,636 (19%) 0 (0%) 548,314 (100%)
    Missing value 278,270 278,270 0 0
Last Normal Menses: Month <0.001
    Mean (Std) 6.60 (3.49) 6.57 (3.48) 6.60 (3.49) 6.62 (3.46)
    Median (IQR) 7.00 (4.00, 10.00) 7.00 (4.00, 10.00) 7.00 (4.00, 10.00) 7.00 (4.00, 10.00)
    Missing value 195,573 23,950 151,869 19,754
Last Normal Menses: Year, n (%) <0.001
    2011 1,454 (0%) 161 (0%) 1,177 (0%) 116 (0%)
    2012 2,809,765 (75%) 196,255 (74%) 2,212,677 (75%) 400,833 (76%)
    2013 940,312 (25%) 67,292 (26%) 744,884 (25%) 128,136 (24%)
    Missing value 187,466 23,116 145,121 19,229
Obstetric/Clinical Gestation Est. <0.001
    Mean (Std) 38.51 (2.14) 38.27 (2.31) 38.54 (2.11) 38.48 (2.20)
    Median (IQR) 39.00 (38.00, 40.00) 39.00 (38.00, 39.00) 39.00 (38.00, 40.00) 39.00 (38.00, 40.00)
    Missing value 7,469 935 5,568 966
Gestation – Detail in Weeks <0.001
    Mean (Std) 38.65 (2.50) 38.49 (2.81) 38.66 (2.45) 38.63 (2.61)
    Median (IQR) 39.00 (38.00, 40.00) 39.00 (38.00, 40.00) 39.00 (38.00, 40.00) 39.00 (38.00, 40.00)
    Missing value 3,685 492 2,611 582
1 Pearson's Chi-squared test; Kruskal-Wallis rank sum test
Show the R codes
risk_factor_variables <- c("rf_diab", "rf_gest", "rf_phyp", "rf_ghyp", 
                           "rf_eclam", "rf_ppterm", "rf_ppoutc", "rf_cesar")
get_table_desc_stat_smoke(df, risk_factor_variables)
Variable Whole sample Mother is a smoker Unknown or not stated, N = 548,314 p-value1
Yes, N = 286,824 No, N = 3,103,859
Risk Factor: Prepregnancy Diabetes, n (%) <0.001
    Yes 26,874 (1%) 2,638 (1%) 23,033 (1%) 1,203 (1%)
    No 3,527,629 (99%) 283,268 (99%) 3,074,720 (99%) 169,641 (99%)
    Unknown or not stated 8,314 (0%) 918 (0%) 6,106 (0%) 1,290 (1%)
    Missing value 376,180 0 0 376,180
Risk Factor: Gestational Diabetes, n (%) <0.001
    Yes 186,992 (5%) 14,720 (5%) 164,545 (5%) 7,727 (4%)
    No 3,367,511 (95%) 271,186 (95%) 2,933,208 (95%) 163,117 (95%)
    Unknown or not stated 8,314 (0%) 918 (0%) 6,106 (0%) 1,290 (1%)
    Missing value 376,180 0 0 376,180
Risk Factor: Prepregnancy Hypertension, n (%) <0.001
    Yes 54,172 (2%) 5,405 (2%) 46,240 (1%) 2,527 (1%)
    No 3,500,331 (98%) 280,501 (98%) 3,051,513 (98%) 168,317 (98%)
    Unknown or not stated 8,314 (0%) 918 (0%) 6,106 (0%) 1,290 (1%)
    Missing value 376,180 0 0 376,180
Risk Factor: Gestational Hypertension, n (%) <0.001
    Yes 172,825 (5%) 13,624 (5%) 151,296 (5%) 7,905 (5%)
    No 3,381,678 (95%) 272,282 (95%) 2,946,457 (95%) 162,939 (95%)
    Unknown or not stated 8,314 (0%) 918 (0%) 6,106 (0%) 1,290 (1%)
    Missing value 376,180 0 0 376,180
Risk Factor: Hypertension Eclampsia, n (%) <0.001
    Yes 8,138 (0%) 691 (0%) 7,257 (0%) 190 (0%)
    No 3,546,365 (100%) 285,215 (99%) 3,090,496 (100%) 170,654 (99%)
    Unknown or not stated 8,314 (0%) 918 (0%) 6,106 (0%) 1,290 (1%)
    Missing value 376,180 0 0 376,180
Risk Factor: Previous Preterm Birth, n (%) <0.001
    Yes 92,251 (3%) 14,261 (5%) 73,747 (2%) 4,243 (2%)
    No 3,462,252 (97%) 271,645 (95%) 3,024,006 (97%) 166,601 (97%)
    Unknown or not stated 8,314 (0%) 918 (0%) 6,106 (0%) 1,290 (1%)
    Missing value 376,180 0 0 376,180
Risk Factor: Poor Pregnancy Outcome, n (%) <0.001
    Yes 78,255 (2%) 11,227 (4%) 63,498 (2%) 3,530 (2%)
    No 3,476,248 (98%) 274,679 (96%) 3,034,255 (98%) 167,314 (97%)
    Unknown or not stated 8,314 (0%) 918 (0%) 6,106 (0%) 1,290 (1%)
    Missing value 376,180 0 0 376,180
Risk Factor: Previous Cesarean Deliveries, n (%) <0.001
    Yes 522,869 (15%) 45,860 (16%) 453,483 (15%) 23,526 (14%)
    No 3,031,634 (85%) 240,046 (84%) 2,644,270 (85%) 147,318 (86%)
    Unknown or not stated 8,314 (0%) 918 (0%) 6,106 (0%) 1,290 (1%)
    Missing value 376,180 0 0 376,180
1 Pearson's Chi-squared test
Show the R codes
obstetric_procedures_variables <- c("op_cerv", "op_tocol", "op_ecvs", "op_ecvf",
                                    "uop_induc", "uop_tocol", "on_ruptr", 
                                    "on_abrup", "on_prolg")
get_table_desc_stat_smoke(df, obstetric_procedures_variables)
Variable Whole sample Mother is a smoker Unknown or not stated, N = 548,314 p-value1
Yes, N = 286,824 No, N = 3,103,859
Obstetric Procedures: Cervical Cerclage, n (%) <0.001
    Yes 10,895 (0%) 815 (0%) 9,230 (0%) 850 (0%)
    No 3,541,148 (90%) 285,365 (99%) 3,085,945 (99%) 169,838 (31%)
    Unknown or not stated 386,954 (10%) 644 (0%) 8,684 (0%) 377,626 (69%)
Obstetric Procedures: Tocolysis, n (%) <0.001
    Yes 33,707 (1%) 4,064 (1%) 28,820 (1%) 823 (0%)
    No 3,518,336 (89%) 282,116 (98%) 3,066,355 (99%) 169,865 (31%)
    Unknown or not stated 386,954 (10%) 644 (0%) 8,684 (0%) 377,626 (69%)
Obstetric Procedures: Successful External Cephalic, n (%) <0.001
    Yes 4,787 (0%) 366 (0%) 4,265 (0%) 156 (0%)
    No 3,547,256 (90%) 285,814 (100%) 3,090,910 (100%) 170,532 (31%)
    Unknown or not stated 386,954 (10%) 644 (0%) 8,684 (0%) 377,626 (69%)
Obstetric Procedures: Failed External Cephalic, n (%) <0.001
    Yes 4,069 (0%) 330 (0%) 3,618 (0%) 121 (0%)
    No 3,547,974 (90%) 285,850 (100%) 3,091,557 (100%) 170,567 (31%)
    Unknown or not stated 386,954 (10%) 644 (0%) 8,684 (0%) 377,626 (69%)
Obstetric Procedures: Induction of labor, n (%)
    Yes 904,069 (23%) 77,594 (27%) 712,180 (23%) 114,295 (21%)
    No 3,029,795 (77%) 208,841 (73%) 2,388,093 (77%) 432,861 (79%)
    Not on certificate 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Unknown or not stated 5,133 (0%) 389 (0%) 3,586 (0%) 1,158 (0%)
Obstetric Procedures: Tocolysis, n (%)
    Yes 37,332 (1%) 4,064 (1%) 28,820 (1%) 4,448 (1%)
    No 3,890,578 (99%) 282,116 (98%) 3,066,355 (99%) 542,107 (99%)
    Not on certificate 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Unknown or not stated 11,087 (0%) 644 (0%) 8,684 (0%) 1,759 (0%)
Onset of Labor: Premature Rupture of Membrane, n (%) <0.001
    Yes 123,507 (3%) 11,401 (4%) 106,288 (3%) 5,818 (1%)
    No 3,428,111 (87%) 274,695 (96%) 2,988,702 (96%) 164,714 (30%)
    Unknown or not stated 387,379 (10%) 728 (0%) 8,869 (0%) 377,782 (69%)
Onset of Labor: Abruptio placenta, n (%) <0.001
    Yes 110,169 (3%) 10,674 (4%) 91,062 (3%) 8,433 (2%)
    No 3,441,449 (87%) 275,422 (96%) 3,003,928 (97%) 162,099 (30%)
    Unknown or not stated 387,379 (10%) 728 (0%) 8,869 (0%) 377,782 (69%)
Onset of Labor: Prolonged Labor, n (%) <0.001
    Yes 47,182 (1%) 3,656 (1%) 41,280 (1%) 2,246 (0%)
    No 3,504,436 (89%) 282,440 (98%) 3,053,710 (98%) 168,286 (31%)
    Unknown or not stated 387,379 (10%) 728 (0%) 8,869 (0%) 377,782 (69%)
1 Pearson's Chi-squared test
Show the R codes
labor_variables <- 
  c("ld_induct", "ld_augment", "ld_steroids", "ld_antibio", 
    "ld_chorio", "ld_mecon", "ld_fintol", "ld_anesth", "uld_meco", 
    "uld_precip", "uld_breech", "md_present", "md_route", "md_trial", 
    "dmeth_rec", "attend", "apgar5", "dplural")
get_table_desc_stat_smoke(df, labor_variables)
Variable Whole sample Mother is a smoker Unknown or not stated, N = 548,314 p-value1
Yes, N = 286,824 No, N = 3,103,859
Charact. of Labor and Delivery: Induction of Labor, n (%) <0.001
    Yes 824,982 (21%) 77,594 (27%) 712,180 (23%) 35,208 (6%)
    No 2,733,015 (69%) 208,841 (73%) 2,388,093 (77%) 136,081 (25%)
    Unknown or not stated 381,000 (10%) 389 (0%) 3,586 (0%) 377,025 (69%)
Charact. of Labor and Delivery: Augmentation of Labor, n (%) <0.001
    Yes 716,271 (18%) 62,446 (22%) 620,256 (20%) 33,569 (6%)
    No 2,841,726 (72%) 223,989 (78%) 2,480,017 (80%) 137,720 (25%)
    Unknown or not stated 381,000 (10%) 389 (0%) 3,586 (0%) 377,025 (69%)
Charact. of Labor and Delivery: Steroids, n (%) <0.001
    Yes 52,145 (1%) 6,458 (2%) 43,437 (1%) 2,250 (0%)
    No 3,505,852 (89%) 279,977 (98%) 3,056,836 (98%) 169,039 (31%)
    Unknown or not stated 381,000 (10%) 389 (0%) 3,586 (0%) 377,025 (69%)
Charact. of Labor and Delivery: Antibiotics, n (%) <0.001
    Yes 775,285 (20%) 71,955 (25%) 669,069 (22%) 34,261 (6%)
    No 2,782,712 (71%) 214,480 (75%) 2,431,204 (78%) 137,028 (25%)
    Unknown or not stated 381,000 (10%) 389 (0%) 3,586 (0%) 377,025 (69%)
Charact. of Labor and Delivery: Chorioamnionitis, n (%) <0.001
    Yes 46,172 (1%) 2,373 (1%) 42,262 (1%) 1,537 (0%)
    No 3,511,825 (89%) 284,062 (99%) 3,058,011 (99%) 169,752 (31%)
    Unknown or not stated 381,000 (10%) 389 (0%) 3,586 (0%) 377,025 (69%)
Charact. of Labor and Delivery: Meconium Staining, n (%) <0.001
    Yes 184,920 (5%) 16,822 (6%) 159,631 (5%) 8,467 (2%)
    No 3,373,077 (86%) 269,613 (94%) 2,940,642 (95%) 162,822 (30%)
    Unknown or not stated 381,000 (10%) 389 (0%) 3,586 (0%) 377,025 (69%)
Charact. of Labor and Delivery: Fetal Intolerance, n (%) <0.001
    Yes 160,372 (4%) 16,456 (6%) 136,328 (4%) 7,588 (1%)
    No 3,397,625 (86%) 269,979 (94%) 2,963,945 (95%) 163,701 (30%)
    Unknown or not stated 381,000 (10%) 389 (0%) 3,586 (0%) 377,025 (69%)
Charact. of Labor and Delivery: Anesthesia, n (%) <0.001
    Yes 2,547,825 (65%) 211,505 (74%) 2,216,646 (71%) 119,674 (22%)
    No 1,010,172 (26%) 74,930 (26%) 883,627 (28%) 51,615 (9%)
    Unknown or not stated 381,000 (10%) 389 (0%) 3,586 (0%) 377,025 (69%)
Complications of Labor and Delivery: Meconium, n (%)
    Yes 204,630 (5%) 16,822 (6%) 159,631 (5%) 28,177 (5%)
    No 3,729,119 (95%) 269,613 (94%) 2,940,642 (95%) 518,864 (95%)
    Not on certificate 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Unknown or not stated 5,248 (0%) 389 (0%) 3,586 (0%) 1,273 (0%)
Complications of Labor and Delivery: Precipitous labor, n (%)
    Yes 119,085 (3%) 10,674 (4%) 91,062 (3%) 17,349 (3%)
    No 3,808,285 (97%) 275,422 (96%) 3,003,928 (97%) 528,935 (96%)
    Not on certificate 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Unknown or not stated 11,627 (0%) 728 (0%) 8,869 (0%) 2,030 (0%)
Complications of Labor and Delivery: Breech, n (%)
    Yes 221,432 (6%) 16,283 (6%) 181,516 (6%) 23,633 (4%)
    No 3,613,736 (92%) 266,362 (93%) 2,843,775 (92%) 503,599 (92%)
    Not on certificate 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Unknown or not stated 103,829 (3%) 4,179 (1%) 78,568 (3%) 21,082 (4%)
Method of Delivery: Fetal Presentation, n (%) <0.001
    Cephalic 3,252,160 (83%) 266,362 (93%) 2,843,775 (92%) 142,023 (26%)
    Breech 131,667 (3%) 11,368 (4%) 114,056 (4%) 6,243 (1%)
    Other 75,589 (2%) 4,915 (2%) 67,460 (2%) 3,214 (1%)
    Unknown or not stated 479,581 (12%) 4,179 (1%) 78,568 (3%) 396,834 (72%)
Method of Delivery: Final Route and Method of Delivery, n (%) <0.001
    Spontaneous 2,280,543 (58%) 182,242 (64%) 1,986,844 (64%) 111,457 (20%)
    Forceps 21,209 (1%) 1,813 (1%) 18,579 (1%) 817 (0%)
    Vacuum 97,401 (2%) 8,170 (3%) 85,132 (3%) 4,099 (1%)
    Cesarean 1,161,090 (29%) 94,440 (33%) 1,011,203 (33%) 55,447 (10%)
    Unknown or not stated 378,754 (10%) 159 (0%) 2,101 (0%) 376,494 (69%)
Method of Delivery: Trial of Labor Attempted, n (%) <0.001
    Yes 301,828 (8%) 25,765 (9%) 261,416 (8%) 14,647 (3%)
    No 838,115 (21%) 67,228 (23%) 730,935 (24%) 39,952 (7%)
    Not applicable 2,399,153 (61%) 192,225 (67%) 2,090,555 (67%) 116,373 (21%)
    Unknown or not stated 399,901 (10%) 1,606 (1%) 20,953 (1%) 377,342 (69%)
Delivery Method, n (%) <0.001
    Vaginal 2,647,505 (67%) 192,225 (67%) 2,090,555 (67%) 364,725 (67%)
    C-Section 1,286,529 (33%) 94,440 (33%) 1,011,203 (33%) 180,886 (33%)
    Unknown 4,963 (0%) 159 (0%) 2,101 (0%) 2,703 (0%)
Attendant, n (%) <0.001
    Doctor of Medicine 3,315,042 (84%) 236,528 (82%) 2,622,845 (85%) 455,669 (83%)
    Doctor of Osteopathy 243,944 (6%) 25,036 (9%) 169,203 (5%) 49,705 (9%)
    Certified Nurse Midwife 321,145 (8%) 22,467 (8%) 260,953 (8%) 37,725 (7%)
    Other Midwife 29,116 (1%) 708 (0%) 26,141 (1%) 2,267 (0%)
    Other 27,352 (1%) 1,880 (1%) 22,816 (1%) 2,656 (0%)
    Unknown or not stated 2,398 (0%) 205 (0%) 1,901 (0%) 292 (0%)
Five Minute Apgar Score <0.001
    Mean (Std) 8.79 (0.83) 8.74 (0.95) 8.79 (0.83) 8.82 (0.79)
    Median (IQR) 9.00 (9.00, 9.00) 9.00 (9.00, 9.00) 9.00 (9.00, 9.00) 9.00 (9.00, 9.00)
    Missing value 16,979 1,281 14,147 1,551
Plurality, n (%) <0.001
    Single 3,801,760 (97%) 278,646 (97%) 2,995,139 (96%) 527,975 (96%)
    Twin 132,510 (3%) 8,025 (3%) 104,889 (3%) 19,596 (4%)
    Triplet 4,392 (0%) 148 (0%) 3,533 (0%) 711 (0%)
    Quadruplet or higher 335 (0%) 5 (0%) 298 (0%) 32 (0%)
1 Pearson's Chi-squared test; Kruskal-Wallis rank sum test
Show the R codes
conditions_newborn_variable <- 
  c("ab_vent", "ab_vent6", "ab_nicu", "ab_surfac", "ab_antibio", 
    "ca_anen", "ca_menin", "ca_heart", "ca_hernia", "ca_ompha", 
    "ca_gastro", "ca_limb", "ca_cleftlp", "ca_cleft", "ca_downs", 
    "ca_chrom", "ca_hypos")
get_table_desc_stat_smoke(df, conditions_newborn_variable)
Variable Whole sample Mother is a smoker Unknown or not stated, N = 548,314 p-value1
Yes, N = 286,824 No, N = 3,103,859
Abnormal Conditions of the Newborn: Assisted Ventilation, n (%) <0.001
    Yes, Complication reported 117,969 (3%) 13,618 (5%) 99,463 (3%) 4,888 (1%)
    No Complication reported 3,434,968 (87%) 272,386 (95%) 2,996,427 (97%) 166,155 (30%)
    Unknown or not stated 386,060 (10%) 820 (0%) 7,969 (0%) 377,271 (69%)
Abnormal Conditions of the Newborn: Assisted Ventilation >6hrs, n (%) <0.001
    Yes, Complication reported 37,266 (1%) 4,671 (2%) 31,221 (1%) 1,374 (0%)
    No Complication reported 3,515,671 (89%) 281,333 (98%) 3,064,669 (99%) 169,669 (31%)
    Unknown or not stated 386,060 (10%) 820 (0%) 7,969 (0%) 377,271 (69%)
Abnormal Conditions of the Newborn: Admission to NICU, n (%) <0.001
    Yes, Complication reported 282,937 (7%) 29,714 (10%) 240,725 (8%) 12,498 (2%)
    No Complication reported 3,270,000 (83%) 256,290 (89%) 2,855,165 (92%) 158,545 (29%)
    Unknown or not stated 386,060 (10%) 820 (0%) 7,969 (0%) 377,271 (69%)
Abnormal Conditions of the Newborn: Surfactant, n (%) <0.001
    Yes, Complication reported 14,318 (0%) 1,946 (1%) 11,874 (0%) 498 (0%)
    No Complication reported 3,538,619 (90%) 284,058 (99%) 3,084,016 (99%) 170,545 (31%)
    Unknown or not stated 386,060 (10%) 820 (0%) 7,969 (0%) 377,271 (69%)
Abnormal Conditions of the Newborn: Antibiotics, n (%) <0.001
    Yes, Complication reported 2,066 (0%) 199 (0%) 1,777 (0%) 90 (0%)
    No Complication reported 3,550,871 (90%) 285,805 (100%) 3,094,113 (100%) 170,953 (31%)
    Unknown or not stated 386,060 (10%) 820 (0%) 7,969 (0%) 377,271 (69%)
Congenital Anomalies of the Newborn: Anencephaly, n (%) <0.001
    Yes, anomaly reported 379 (0%) 32 (0%) 330 (0%) 17 (0%)
    No, anomaly not reported 3,548,468 (90%) 285,826 (100%) 3,092,784 (100%) 169,858 (31%)
    Unknown 390,150 (10%) 966 (0%) 10,745 (0%) 378,439 (69%)
Congenital Anomalies of the Newborn: Meningomyelocele/Spina Bifida, n (%) <0.001
    Yes, anomaly reported 574 (0%) 80 (0%) 473 (0%) 21 (0%)
    No, anomaly not reported 3,548,273 (90%) 285,778 (100%) 3,092,641 (100%) 169,854 (31%)
    Unknown 390,150 (10%) 966 (0%) 10,745 (0%) 378,439 (69%)
Congenital Anomalies of the Newborn: Cyanotic Congenital Heart Disease, n (%) <0.001
    Yes, anomaly reported 3,045 (0%) 241 (0%) 2,759 (0%) 45 (0%)
    No, anomaly not reported 3,545,802 (90%) 285,617 (100%) 3,090,355 (100%) 169,830 (31%)
    Unknown 390,150 (10%) 966 (0%) 10,745 (0%) 378,439 (69%)
Congenital Anomalies of the Newborn: Congenital Diaphragmatic Hernia, n (%) <0.001
    Yes, anomaly reported 463 (0%) 44 (0%) 410 (0%) 9 (0%)
    No, anomaly not reported 3,548,384 (90%) 285,814 (100%) 3,092,704 (100%) 169,866 (31%)
    Unknown 390,150 (10%) 966 (0%) 10,745 (0%) 378,439 (69%)
Congenital Anomalies of the Newborn: Omphalocele, n (%) <0.001
    Yes, anomaly reported 375 (0%) 35 (0%) 327 (0%) 13 (0%)
    No, anomaly not reported 3,548,472 (90%) 285,823 (100%) 3,092,787 (100%) 169,862 (31%)
    Unknown 390,150 (10%) 966 (0%) 10,745 (0%) 378,439 (69%)
Congenital Anomalies of the Newborn: Gastroschisis, n (%) <0.001
    Yes, anomaly reported 992 (0%) 198 (0%) 750 (0%) 44 (0%)
    No, anomaly not reported 3,547,855 (90%) 285,660 (100%) 3,092,364 (100%) 169,831 (31%)
    Unknown 390,150 (10%) 966 (0%) 10,745 (0%) 378,439 (69%)
Congenital Anomalies of the Newborn: Limb Reduction Deficit, n (%) <0.001
    Yes, anomaly reported 468 (0%) 68 (0%) 386 (0%) 14 (0%)
    No, anomaly not reported 3,548,379 (90%) 285,790 (100%) 3,092,728 (100%) 169,861 (31%)
    Unknown 390,150 (10%) 966 (0%) 10,745 (0%) 378,439 (69%)
Congenital Anomalies of the Newborn: Cleft Lip w/ or w/o Cleft Palate, n (%) <0.001
    Yes, anomaly reported 1,832 (0%) 259 (0%) 1,499 (0%) 74 (0%)
    No, anomaly not reported 3,547,015 (90%) 285,599 (100%) 3,091,615 (100%) 169,801 (31%)
    Unknown 390,150 (10%) 966 (0%) 10,745 (0%) 378,439 (69%)
Congenital Anomalies of the Newborn: Cleft Palate Alone, n (%) <0.001
    Yes, anomaly reported 847 (0%) 118 (0%) 694 (0%) 35 (0%)
    No, anomaly not reported 3,548,000 (90%) 285,740 (100%) 3,092,420 (100%) 169,840 (31%)
    Unknown 390,150 (10%) 966 (0%) 10,745 (0%) 378,439 (69%)
Congenital Anomalies of the Newborn: Downs Syndrome, n (%)
    Yes, anomaly reported 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    No, anomaly not reported 3,546,998 (90%) 285,715 (100%) 3,091,485 (100%) 169,798 (31%)
    Unknown 391,999 (10%) 1,109 (0%) 12,374 (0%) 378,516 (69%)
Congenital Anomalies of the Newborn: Suspected Chromosonal Disorder, n (%)
    Yes, anomaly reported 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    No, anomaly not reported 3,547,568 (90%) 285,717 (100%) 3,092,011 (100%) 169,840 (31%)
    Unknown 391,429 (10%) 1,107 (0%) 11,848 (0%) 378,474 (69%)
Congenital Anomalies of the Newborn: Hypospadias, n (%) <0.001
    Yes, anomaly reported 2,103 (0%) 214 (0%) 1,780 (0%) 109 (0%)
    No, anomaly not reported 3,546,744 (90%) 285,644 (100%) 3,091,334 (100%) 169,766 (31%)
    Unknown 390,150 (10%) 966 (0%) 10,745 (0%) 378,439 (69%)
1 Pearson's Chi-squared test