This chapter uses Jordà (2005) Local Projection framework to measure how sensitive agricultural output is to exogenous changes in the weather. Instead of using monthly data as in Chapter 7, we use here quarterly data.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Functions useful to shape the data for local projectionssource("../weatherperu/R/format_data-quarter.R")# Load function in utilssource("../weatherperu/R/utils.R")# Load detrending functionssource("../weatherperu/R/detrending-quarter.R")
12.1 Linear Local Projections
In this section, we focus on estimating the Local Projections (Jordà 2005) to quantify the impact of weather on agricultural production. We utilize panel data, similar to the approach used in the study by Acevedo et al. (2020), and independently estimate models for each specific crop.
For a particular crop denoted as \(c\), the model can be expressed as follows: \[
\begin{aligned}
\underbrace{y_{c,i,{\color{wongGold}t+h}}}_{\text{Production}} = & \underbrace{\alpha_{c,i,h}}_{\text{reg. fixed effect}} + {\color{wongOrange}\beta_{c,{\color{wongGold}h}}^{{\color{wongPurple}T}}} {\color{wongPurple}{T_{i,{\color{wongGold}t}}}} + {\color{wongOrange}\beta_{c,{\color{wongGold}h}}^{{\color{wongPurple}P}}} {\color{wongPurple}P_{i,{\color{wongGold}t}}}\\
&+\delta_{c,i,h}\underbrace{X_{t}}_{\text{controls}}+\varepsilon_{c,i,t+h}
\end{aligned}
\tag{12.1}\]
Here, \(i\) represents the region, \(t\) represents the time, and \(h\) represents the horizon. The primary focus lies on estimating the coefficients associated with temperature and precipitation for different time horizons\(\color{wongGold}h=\{0,1,...,T_{c}\}\)
12.1.1 Functions
To estimate the models, we develop a function, get_data_lp(), that generates the endogenous variable and the regressors for a specific crop, considering a given time horizon. This function is designed to return a list where each element represents the dataset that will be utilized for estimating the model corresponding to a specific time horizon.
When we call the get_data_lp() function, we check for missing values in the weather data. If missing values are present for a specific region and crop, we keep only the longest consecutive sequence without missing values. To achieve this, we use the two functions defined previously: get_index_longest_non_na() and keep_values_longest_non_na().
#' Get the data in a table for the local projections, for a specific crop#'#' @param df original dataset#' @param horizons number of horizons#' @param y_name name of the exogenous variable#' @param group_name name of the group variable#' @param crop_name name of the crop to focus on#' @param control_names vector of names of the control variables#' @param weather_names vector of names of the weather variables#' @param add_quarter_fe should columns with quarter dummy variables be added?#' Default to `TRUE`#' @param share_geo vector of names of the variables that contain the share of#' each type of geographical pattern. By default `NULL`: no share used#' @param transition_name name of the variable used to define the transition to#' the two states. By default `NULL`#' @param transition_method if transition function, name of the method to use:#' `logistic` or `normal` (default to `NULL`, i.e., no transition)#' @param state_names name of the two states in a vector of characters (only if#' `transition_name` is not `NULL`). First period corresponds to mapped values#' of `transition_name` close to 0, which is for large positive values of#' `transition_name`#' @param gamma logistic growth rate (default to 3, only used if#' `transition_name` is not `NULL`)#' @param other_var_to_keep vector of names of other variables to keep (default#' to `NULL`: no additional vairable kept)#' @export#' @importFrom dplyr filter select mutate sym group_by across rowwise arrange#' slice lead ends_with#' @importFrom fastDummies dummy_cols#' @importFrom stringr str_c str_detectget_data_lp <-function(df, horizons, y_name, group_name, crop_name, control_names, weather_names,add_quarter_fe =TRUE,share_geo =NULL,transition_name =NULL,transition_method =NULL,state_names =c("planted", "harvested"),gamma =3,other_var_to_keep =NULL) {if (!is.null(share_geo) &!is.null(transition_name)) {stop("You can only use one between share_geo and transition_name") }# Init empty object to return: list of length horizons df_horizons <-vector(mode ="list", length = horizons +1)# Keep only the variables needed df_focus <- df |>filter(product_eng ==!!crop_name) |>select(!!y_name,!!group_name, year, quarter, product_eng,!!!control_names,!!!weather_names,!!!share_geo,!!transition_name,!!other_var_to_keep ) |>mutate(!!group_name :=factor(!!sym(group_name)) )# Quarter dummy fixed-effectsif (add_quarter_fe) { df_focus <- df_focus |># mutate(# quarter = as.character(lubridate::quarter(date))# ) |>dummy_cols(select_columns ="quarter", remove_first_dummy =FALSE) }# For each region, only keep the longest sequence of non NA values found in# the weather variables df_focus <- df_focus |>group_by(region_id) |>mutate(across(.cols =!!weather_names,.fns = keep_values_longest_non_na,.names ="{.col}_keep" ) ) |>rowwise() |>mutate(keep_cols =all(across(ends_with("_keep")))) |>ungroup() |>filter(keep_cols) |>select(-keep_cols, -!!paste0(weather_names, "_keep"))if (!is.null(share_geo)) {# For each geographical type, multiply the weather variables by the share# that the geo. type representsfor(share_geo_type in share_geo) { df_focus <- df_focus |>mutate(across(.cols =!!weather_names,.fns =~ .x *!!sym(share_geo_type),.names =str_c("{.col}_", share_geo_type) ) ) } }if (!is.null(transition_name)) { state_1_name <- state_names[1] state_2_name <- state_names[2]if (transition_method =="logistic") { df_focus <- df_focus |>mutate(fz =logist(!!sym(transition_name), gamma = gamma) ) } elseif (transition_method =="normal") { df_focus <- df_focus |>mutate(fz =pnorm(-!!sym(transition_name)) ) } else {stop("transition method must be either \"losistic\" or \"normal\"") } df_focus <- df_focus |>dummy_cols(group_name, remove_first_dummy =FALSE) ind_dummies_group <-str_detect(colnames(df_focus), str_c(group_name, "_")) dummies_group_name <-colnames(df_focus)[ind_dummies_group]if (add_quarter_fe) { ind_dummies_quarter <-str_detect(colnames(df_focus), "^quarter_") dummies_quarter_name <-colnames(df_focus)[ind_dummies_quarter] dummies_group_name <-c(dummies_group_name, dummies_quarter_name) } df_focus <- df_focus |>mutate(# First regime:across(.cols =c(!!!control_names, !!!weather_names, !!!dummies_group_name),.fns =list(state_1_name =~ (1- fz) * .x,state_2_name =~ fz * .x ),.names ="{fn}_{col}" ) ) |>rename_with(.fn =~str_replace(string = .x, pattern ="state_1_name", replacement = state_1_name),.cols =starts_with("state_1_name") ) |>rename_with(.fn =~str_replace(string = .x, pattern ="state_2_name", replacement = state_2_name),.cols =starts_with("state_2_name") ) } else { df_focus <- df_focus |>dummy_cols(group_name, remove_first_dummy =FALSE) }# Prepare the values for y at t+hfor (h in0:horizons) { df_horizons[[h+1]] <- df_focus |>group_by(!!sym(group_name)) |>arrange(year, quarter) |>mutate(y_lead = dplyr::lead(!!sym(y_name), n = h)) |>slice(1:(n()-h)) }names(df_horizons) <-0:horizons df_horizons}
Following the data preparation step, we proceed to define a function, estimate_linear_lp() that performs the estimation of models for all time horizons. This function utilizes the datasets obtained through the get_data_lp() function.
#' Estimate Local Projections#'#' @param df original dataset#' @param horizons number of horizons#' @param y_name name of the exogenous variable#' @param group_name name of the group variable#' @param crop_name name of the crop to focus on#' @param control_names vector of names of the control variables#' @param weather_names vector of names of the weather variables#' @param add_quarter_fe should columns with quarter dummy variables be added?#' Default to `TRUE`#' @param add_intercept should an intercept we added to the regressions?#' (default to `FALSE`)#' @param share_geo vector of names of the variables that contain the share of#' each type of geographical pattern. By default `NULL`: no share used#' @param std type of standard error (`"NW"` for Newey-West, `"Cluster"`,#' `"Standard"` otherwise)#' @param transition_name name of the variable used to define the transition to#' the two states. By default `NULL`#' @param transition_method if transition function, name of the method to use:#' `logistic` or `normal` (default to `NULL`, i.e., no transition)#' @param state_names name of the two states in a vector of characters (only if#' `transition_name` is not `NULL`). First period corresponds to mapped values#' of `transition_name` close to 0, which is for large positive values of#' `transition_name`#' @param gamma logistic growth rate (default to 3, only used if#' `transition_name` is not `NULL`)#' @param other_var_to_keep vector of names of other variables to keep in the#' returned dataset (default to `NULL`: no additional vairable kept)#' @export#' @importFrom dplyr mutate sym ungroup summarise across left_join#' @importFrom stringr str_c str_detect#' @importFrom purrr map map_dbl list_rbind#' @importFrom tibble enframe#' @importFrom tidyr pivot_longer#' @importFrom sandwich NeweyWest#' @importFrom stats sd model.matrix nobs residuals lm coefestimate_linear_lp <-function(df, horizons, y_name, group_name, crop_name, control_names, weather_names,add_quarter_fe =TRUE,add_intercept =FALSE,share_geo =NULL,transition_name =NULL,transition_method =NULL,state_names =c("planted", "harvested"),gamma =3,std =c("nw", "cluster", "standard"),other_var_to_keep =NULL) {# Format the dataset data_lp <-get_data_lp(df = df,horizons = horizons,y_name = y_name,group_name = group_name,crop_name = crop_name,control_names = control_names,weather_names = weather_names,share_geo = share_geo,transition_name = transition_name,transition_method = transition_method,state_names = state_names,gamma = gamma,other_var_to_keep = other_var_to_keep )# Recode levels for the groupsfor(h in0:horizons){ data_lp[[h +1]] <- data_lp[[h +1]] |>mutate(!!group_name :=as.factor(as.character(!!sym(group_name))) ) } control_names_full <- control_names weather_names_full <- weather_names ind_names_groups <-str_detect(colnames(data_lp[[1]]), str_c("^", group_name, "_") ) group_names_full <-colnames(data_lp[[1]])[ind_names_groups]if (!is.null(share_geo)) {# Name of the weather variables weather_names_full <-paste(rep(weather_names, each =length(share_geo)), share_geo,sep ="_" ) }if (!is.null(transition_name)) { state_1_name <-str_c(state_names[1], "_") state_2_name <-str_c(state_names[2], "_")# Name of the variables weather_names_full <-str_c(rep(c(state_1_name, state_2_name),each =length(weather_names) ),rep(weather_names, 2) ) control_names_full <-str_c(rep(c(state_1_name, state_2_name),each =length(control_names) ),rep(control_names, 2) ) ind_names_groups <-str_detect(colnames(data_lp[[1]]),str_c("(^", state_1_name, "|", state_2_name, ")", group_name, "_") ) group_names_full <-colnames(data_lp[[1]])[ind_names_groups] }# Observed standard deviations in the data sd_weather_shock <-map(.x = data_lp,.f =~ungroup(.x) |>summarise(across(.cols =c(!!control_names_full, !!weather_names_full, !!share_geo),.fns = sd ) ) ) |>list_rbind(names_to ="horizon") |>pivot_longer(cols =-horizon, names_to ="name", values_to ="std_shock") |>mutate(horizon =as.numeric(horizon))# Observed median value in the data median_weather_shock <-map(.x = data_lp,.f =~ungroup(.x) |>summarise(across(.cols =c(!!control_names_full, !!weather_names_full, !!share_geo),.fns =~quantile(.x, probs = .5) ) ) ) |>list_rbind(names_to ="horizon") |>pivot_longer(cols =-horizon, names_to ="name", values_to ="median_shock") |>mutate(horizon =as.numeric(horizon))# Observed quantile of order 0.05 value in the data q05_weather_shock <-map(.x = data_lp,.f =~ungroup(.x) |>summarise(across(.cols =c(!!control_names_full, !!weather_names_full, !!share_geo),.fns =~quantile(.x, probs = .05) ) ) ) |>list_rbind(names_to ="horizon") |>pivot_longer(cols =-horizon, names_to ="name", values_to ="q05_shock") |>mutate(horizon =as.numeric(horizon))# Observed quantile of order 0.95 value in the data q95_weather_shock <-map(.x = data_lp,.f =~ungroup(.x) |>summarise(across(.cols =c(!!control_names_full, !!weather_names_full, !!share_geo),.fns =~quantile(.x, probs = .95) ) ) ) |>list_rbind(names_to ="horizon") |>pivot_longer(cols =-horizon, names_to ="name", values_to ="q95_shock") |>mutate(horizon =as.numeric(horizon))# Formula for the regressions formula_lp <-paste0("y_lead"," ~ -1+",# " ~ 1+", # interceptpaste(weather_names_full, collapse =" + ")," + ",paste(control_names_full, collapse =" + ")," + ",ifelse( add_intercept,# removing last groupyes =paste(group_names_full[-length(group_names_full)], collapse =" + "),# keeping last groupno =paste(group_names_full, collapse =" + ") ) )if (add_quarter_fe) { formula_lp <-paste0( formula_lp, " + ", paste(paste0("quarter_", 1:11), collapse =" + ") ) }# Regressions reg_lp <-map(data_lp, ~lm(formula = formula_lp, data = .))# Standard error of the residuals sig_ols <-map_dbl(.x = reg_lp, .f =~sd(.x$residuals)) log_likelihood <-map_dbl(.x = reg_lp,.f =function(reg) { err_x <- reg$residuals sig_ols <-sd(err_x)sum(log(1/sqrt(2* pi * sig_ols^2) *exp(-err_x^2/ (2* sig_ols^2)))) } ) mse <-map_dbl(.x = reg_lp,.f =function(reg) { err_x <- reg$residualsmean(err_x^2) } ) coefs <-map( reg_lp,~enframe(coef(.)) ) |>list_rbind(names_to ="horizon")if (std =="nw") {# Newey-West cov_nw_pw <-map( reg_lp,~ sandwich::NeweyWest(.x, lag =1, prewhite =FALSE) ) se_df <-map( cov_nw_pw,~enframe(sqrt(diag(.)), value ="std") ) |>list_rbind(names_to ="horizon") } elseif (std =="Cluster") { cl_std <-vector(mode ="list", length = horizons +1)names(cl_std) <-0:horizonsfor (h in0:horizons) { ind_NA_coef <-which(is.na(reg_lp[[h +1]]$coefficients)) X <-model.matrix(reg_lp[[h +1]])if (length(ind_NA_coef) >1) { X <- X[, -ind_NA_coef] } errx <-residuals(reg_lp[[h +1]]) omega <-0 m_clust <-ifelse( add_intercept,yes =length(group_names_full[-length(group_names_full)]),no =length(group_names_full) ) n_clust <-length(errx) k_clust <-ncol(X)for (ri in1:m_clust) {# Identify id row within the cluster idx <-which(X[, group_names_full[ri]] ==1) omega <- omega +t(X[idx, ]) %*% errx[idx] %*%t(errx[idx]) %*% X[idx, ] } omega <- m_clust / (m_clust -1) * (n_clust -1) / (n_clust - k_clust) * omega t1 <-solve(t(X) %*% X)colSums(t1) v_cluster <- t1 %*% omega %*% t1 cl_std[[h +1]] <-sqrt(diag(v_cluster))names(cl_std[[h +1]]) <-colnames(omega) } se_df <-map(cl_std, ~enframe(., value ="std")) |>list_rbind(names_to ="horizon") } else { se_df <-map( reg_lp,~enframe(sqrt(diag(vcov(.))), value ="std") ) |>list_rbind(names_to ="horizon") } coefs <- coefs |>left_join(se_df, by =c("horizon", "name")) |>mutate(crop = crop_name,horizon =as.numeric(horizon) ) |>left_join(sd_weather_shock, by =c("horizon", "name")) |>left_join(median_weather_shock, by =c("horizon", "name")) |>left_join(q05_weather_shock, by =c("horizon", "name")) |>left_join(q95_weather_shock, by =c("horizon", "name"))list(# reg_lp = reg_lp,coefs = coefs,horizons = horizons,log_likelihood = log_likelihood,mse = mse,crop_name = crop_name,data_lp = data_lp )}
Note
The get_data_lp() function is defined in the R script saved here: /weatherperu/R/format_data-quarter.R. The estimation function, estimate_linear_lp(), is defined in the R script saved here: /weatherperu/R/estimations-quarter.R.
source("../weatherperu/R/estimations-quarter.R")
12.1.2 Estimation
To loop over the different crops, we can utilize the map() function. This function enables us to apply the estimate_linear_lp() function to each crop iteratively, facilitating the estimation process.
We can visualize the Impulse Response Functions (IRFs) by plotting the estimated coefficients associated with the weather variables. These coefficients represent the impact of weather on agricultural production and can provide valuable insights into the dynamics of the system. By plotting the IRFs, we can gain a better understanding of the relationship between weather variables and the response of agricultural production over time.
Acevedo, Sebastian, Mico Mrkaic, Natalija Novta, Evgenia Pugacheva, and Petia Topalova. 2020. “The Effects of Weather Shocks on Economic Activity: What Are the Channels of Impact?”Journal of Macroeconomics 65: 103207.
Jordà, Òscar. 2005. “Estimation and Inference of Impulse Responses by Local Projections.”American Economic Review 95 (1): 161–82. https://doi.org/10.1257/0002828053828518.