6  Descriptive Statistics

Objectives

This chapter presents some descriptive statistics on the variables used in the article.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

The different ggplot2 themes can be loaded:

source("../weatherperu/R/utils.R")

Let us load the data used in the local projections estimations.

load("../data/output/df_lp.rda")

We also load the weather data:

load("../data/output/weather/weather_regions_df.rda")

The map of Peru can be loaded (See Section 1.1 for more details on the map).

map_peru <- "../data/raw/shapefile_peru/departamentos/" |> 
  sf::st_read(quiet = TRUE)
map_peru <-
  rmapshaper::ms_simplify(input = as(map_peru, 'Spatial')) |>
  sf::st_as_sf()

6.1 Dictionary of the Variables

Table 6.1: Variables in the df dataset stored in "../data/output/df_lp.rda" file
Variable name Type Description
y_new numeric Monthly Agricultural Production (tons)
y_dev_pct numeric Agricultural Production (percent deviation from monthly regional values)
product character Name of the crop (in Spanish)
product_eng character Name of the Product (in English)
region_id integer Region numerical ID
region character Name of the region
year numeric Year (YYYY)
month numeric Month (MM)
date Date Date (YYYY-MM-DD)
ln_prices numeric Product price (log)
ln_produc numeric Production (log of tons)
Value_prod numeric Production (tons)
surf_m numeric Planted Surface during the current month (hectares)
Value_surfR numeric Harvested Surface (hectares)
Value_prices numeric Unit Price (Pesos)
campaign numeric ID of the planting campaing (starting in August)
campaign_plain character Years of the planting campaing (starting in August)
month_campaign numeric Month of the planting campaing (August = 1)
surf_lag_calend numeric Planted Surface laggued by the growth duration computed from the caledars (hectares)
perc_product numeric Share of the annual production harvested at month m
perc_product_mean numeric Average share of the annual production harvested at month m
diff_plant_harv numeric Difference between planted and harvested surfaces during month m
exposition numeric Cumulative difference between planted and harvested surfaces
exposition_trend numeric Trend of the exposition using HP filter
exposition_detrended numeric Difference between the exposition and its trend
exposition_norm numeric Normalisation of the detrended exposition
temp_min numeric Monthly average of daily min temperature
temp_max numeric Monthly average of daily max temperature
temp_mean numeric Monthly average of daily mean temperature
precip_sum, precip_piscop_sum numeric Monthly sum of daily rainfall
perc_gamma_precip, perc_gamma_precip_piscop numeric Percentile of the monthly precipitation (Estimated Gamma Distribution)
temp_min_dev numeric Deviation of monthly min temperatures (temp_min) from climate normals (1986 – 2015)
temp_max_dev numeric Deviation of monthly max temperatures (temp_max) from climate normals (1986 – 2015)
temp_mean_dev numeric Deviation of monthly mean temperatures (temp_mean) from climate normals (1986 – 2015)
precip_sum_dev, precip_piscop_sum_dev numeric Deviation of monthly total rainfall (precip_sum) from climate normals (1986 – 2015)
spi_1, spi_piscop_1 numeric SPI Drought Index, Scale = 1
spi_3, spi_piscop_3 numeric SPI Drought Index, Scale = 3
spi_6, spi_piscop_6 numeric SPI Drought Index, Scale = 6
spi_12, spi_piscop_12 numeric SPI Drought Index, Scale = 12
spei_1, spei_piscop_1 numeric SPEI Drought Index, Scale = 1
spei_3, spei_piscop_3 numeric SPEI Drought Index, Scale = 3
spei_6, spei_piscop_6 numeric SPEI Drought Index, Scale = 6
spei_12, spei_piscop_12 numeric SPEI Drought Index, Scale = 12
ONI numeric Oceanic Niño Index
elnino numeric 1 if El-Niño event, 0 otherwise
lanina numeric 1 if La-Niña event, 0 otherwise
State numeric State: "La Niña", "Normal", or "El Niño"
enso_start numeric 1 if current date corresponds to the begining of one of the three states, 0 otherwise
enso_end numeric 1 if current date corresponds to the end of one of the three states, 0 otherwise
temp_min_dev_ENSO numeric Deviation of Min. Temperature from ENSO Normals
temp_max_dev_ENSO numeric Deviation of Max. Temperature from ENSO Normals
temp_mean_dev_ENSO numeric Deviation of Mean Temperature from ENSO Normals
precip_sum_dev_ENSO, precip_piscop_sum_dev_ENSO numeric Deviation of Total Rainfall from ENSO Normals
gdp numeric GDP in percentage point, percentage deviation from trend, detrended and seasonally adjusted
ya numeric Agricultural GDP in percentage point, percentage deviation from trend, detrended and seasonally adjusted
rer_hp numeric Real exchange rate, detrended using HP filter
rer_dt_sa numeric Real exchange rate, detrended and seasonally adjusted
r numeric Interest rate, in percentage point, detrended
r_hp numeric Interest rate, in percentage point, detrended using HP filter
pi numeric Inflation rate, in percentage point
pia numeric Food inflation rate, in percentage point, seasonally adjusted
ind_prod numeric Manufacturing Production, in percentage point, percentage deviation from trend, detrended and seasonally adjusted
price_int numeric International commodity prices
price_int_inf numeric Growth rate of international commodity prices
share_sierra numeric Share of highlands
share_selva numeric Share of forest
share_costa numeric Share of coast

6.2 Crops

We focus on the following crops:

crops <- c("Rice", "Dent corn", "Potato", "Cassava")

Table 6.2 presents some descriptive statistics about the monthly production of the selected crops, averaged over the region

Code
df |> 
  group_by(product_eng) |> 
  summarise(
    Mean = mean(y_new, na.rm = TRUE),
    Median = median(y_new, na.rm = TRUE),
    `Standard Deviation` = sd(y_new, na.rm = TRUE),
    Min = min(y_new, na.rm = TRUE),
    Max = max(y_new, na.rm = TRUE),
    `No. regions` = length(unique(region)),
    `No. obs.` = n()
  ) |> 
  rename(Culture = product_eng) |> 
  knitr::kable(
    booktabs = TRUE, 
    format.args = list(big.mark = ",")
  )
Table 6.2: Descriptive statistics for monthly production (in tons) per type of crop
Culture Mean Median Standard Deviation Min Max No. regions No. obs.
Cassava 4,531.272 1,798.6500 7,109.64 0 57,135.0 20 3,600
Dent corn 4,290.878 983.4735 7,254.97 0 74,623.7 23 4,140
Potato 16,393.353 5,048.8500 29,574.83 0 360,070.0 19 3,420
Rice 13,458.593 1,654.0000 28,312.71 0 318,706.0 16 2,880

6.2.1 National Production

Figure 6.1 provides a visual representation of the national production of each time of crop over our time sample, which is the sum of the monthly regional production.

Code
ggplot(
  data = df |> 
    group_by(product_eng, date) |> 
    summarise(
      nat_prod = sum(y_new),
      .groups = "drop"
    ) |> 
    mutate(
      product_eng = factor(
        product_eng, 
        levels = c("Rice", "Dent corn", "Potato", "Cassava"),
        labels = c("Rice", "Maize", "Potato", "Cassava"),
      )
    )
) +
  geom_line(
    mapping = aes(x = date, y = nat_prod),
    colour = "#1f78b4",
    linewidth = 1
  ) +
  facet_wrap(~product_eng, scales = "free_y", ncol = 2) +
  labs(x = NULL, y=  "Aggregate Production (tons)") +
  scale_y_continuous(labels = scales::number_format(big.mark = ",")) +
  theme_paper()
Figure 6.1: National monthly crop production for selected cultures (in tons)

6.2.2 National Production by Month and Type of Region

In Figure 6.2, we document the regional differences and the seasonality by averaging the monthly production over the different types of natural regions.

Code
ggplot(
  data = df |> 
    group_by(product_eng, year, month) |> 
    # Average each month at the national level
    summarise(
      prod_nat_costa = sum(y_new * share_costa),
      prod_nat_selva = sum(y_new * share_selva),
      prod_nat_sierra = sum(y_new * share_sierra),
      .groups = "drop"
    ) |> 
    pivot_longer(
      cols = c(prod_nat_costa, prod_nat_selva, prod_nat_sierra),
      names_to = "geo",
      values_to = "monthly_prod_geo"
    ) |> 
    # Average in each region type for each calendar month
    group_by(product_eng, month, geo) |> 
    summarise(
      monthly_prod_geo = mean(monthly_prod_geo),
      .groups = "drop"
    ) |> 
    mutate(
      product_eng = factor(
        product_eng, 
        levels = c("Rice", "Dent corn", "Potato", "Cassava"),
        labels = c("Rice", "Maize", "Potato", "Cassava"),
      ),
      geo = factor(
        geo,
        levels = c("prod_nat_costa", "prod_nat_selva", "prod_nat_sierra"),
        labels = c("Coast", "Forest", "Highlands")
      )
    ),
  mapping = aes(
    x = month, y = monthly_prod_geo, 
    colour = geo, linetype = geo
  )
) +
  geom_line(
    linewidth = 1
  ) +
  facet_wrap(~product_eng, scales = "free") +
  labs(x = NULL, y = "Average Production (tons)") +
  scale_colour_manual(
    NULL,
    values = c(
      "Coast" = "#56B4E9", "Forest" = "#009E73", "Highlands" = "#E69F00"
    )
  ) +
  scale_linetype_discrete(NULL) +
  scale_x_continuous(breaks= 1:12, labels = month.abb) +
  scale_y_continuous(labels = scales::number_format(big.mark = ",")) +
  theme_paper()
Figure 6.2: Crop production by months and natural regions (in tons)

6.2.3 Share of agricultural land

Let us load the share of agricultural area in the cell, for each cell of the grid (see Section 1.2 for more details on that specific grid):

load("../data/output/land/map_peru_grid_agri.rda")

Figure 6.3 presents the production distribution for each type of crop.

Code
ggplot() +
  geom_sf(
    data = map_peru_grid_agri |> 
      mutate(share_cropland = percent_cropland / 100), 
    mapping = aes(fill = share_cropland),
    colour = "grey"
  ) +
  geom_sf(data = map_peru, fill = NA) +
  scale_fill_gradient2(
    "Share of\nagricultural\nland", 
    low = "white", high = "#61C250",
    labels = scales::label_percent()
  ) +
  theme_map_paper()
Figure 6.3: Regional distribution of crop production by administrative regions

6.2.4 Regions Used in the Local Projection Analysis

We need to know, for each region of the map, whether it is included or not in the analysis.

The number of regions used in the analysis for each crop:

df |> 
  group_by(product_eng) |> 
  summarise(nb_regions = length(unique(region_id)))
# A tibble: 4 × 2
  product_eng nb_regions
  <chr>            <int>
1 Cassava             20
2 Dent corn           23
3 Potato              19
4 Rice                16

We can also look at the spatial distribution of these regions. Let us create a map that shows whether each region is included/excludes. On the same map, we would like to have an idea of the relative share in the total production over the period each region represents, for each crop.

df_plot_map_regions <- 
  map_peru |> 
  left_join(
    df |> 
      select(product_eng, region) |> unique() |> 
      mutate(used = TRUE) |> 
      pivot_wider(names_from = product_eng, values_from = used),
    by = c("DEPARTAMEN" = "region")
  )

Let us load the dataset obtained at the end of Chapter 5.

load("../data/output/dataset_2001_2015.rda")

We compute the crop-specific share that each region represents in the annual production.

prod_reg <- 
  data_total |> 
  group_by(region, product_eng) |> 
  summarise(
    total_production = sum(Value_prod, na.rm = T),
    .groups = "drop"
  ) |> 
  unique() |> 
  group_by(product_eng) |> 
  mutate(total_culture = sum(total_production)) |> 
  ungroup() |> 
  mutate(share = total_production / total_culture)
prod_reg
# A tibble: 144 × 5
   region   product_eng     total_production total_culture   share
   <chr>    <chr>                      <dbl>         <dbl>   <dbl>
 1 AMAZONAS Amylaceous corn           99281.      3976940. 0.0250 
 2 AMAZONAS Cassava                 1894369.     16268850. 0.116  
 3 AMAZONAS Dent corn                329168.     17776855. 0.0185 
 4 AMAZONAS Potato                   925576.     55753747. 0.0166 
 5 AMAZONAS Rice                    4001141.     38770607. 0.103  
 6 AMAZONAS Wheat                     13652.      3041972. 0.00449
 7 ANCASH   Amylaceous corn          183573.      3976940. 0.0462 
 8 ANCASH   Cassava                   99944.     16268850. 0.00614
 9 ANCASH   Dent corn               1259909.     17776855. 0.0709 
10 ANCASH   Potato                  1582314.     55753747. 0.0284 
# ℹ 134 more rows

We prepare a map dataset for each crop:

map_production_regions <- NULL
for (i in 1:length(crops)) {
  culture <- as.character(crops[i])
  map_peru_tmp <- map_peru |> 
    left_join(
      prod_reg |> 
        filter(product_eng == !!culture), 
      by = c("DEPARTAMEN" = "region")
    ) |> 
    mutate(product_eng = !!culture)
  map_production_regions <- bind_rows(map_production_regions, map_peru_tmp)
}

Lastly, let us add information on inclusion/exclusion from the analysis:

included_regions <- 
  df_plot_map_regions |> 
  pivot_longer(cols = !!crops, values_to = "included") |> 
  as_tibble() |> 
  select(IDDPTO, name, included) |> 
  mutate(included = replace_na(included, FALSE)) |> 
  rename(product_eng = name)

Figure 6.4 allows to visualize the relative importance of the discarded series in relation to the overall national agricultural production.

Code
library(ggpattern)
p_regions_use_production <- 
  ggplot(
    data = map_production_regions |> 
      left_join(included_regions, by = c("IDDPTO", "product_eng")) |> 
      mutate(
        included = factor(
          included, 
          levels = c(TRUE, FALSE), 
          labels = c("Yes", "No")
        )
      ) |> 
      mutate(
        product_eng = factor(
          product_eng,
          levels = c("Cassava", "Dent corn", "Potato", "Rice"),
          labels = c("Cassava", "Maize", "Potato", "Rice")
        )
      )
  ) +
  geom_sf_pattern(
    mapping = aes(
      pattern = included,
      pattern_type = included, 
      fill = share
    ),
    pattern_fill = "white",
    pattern_alpha = .8
  ) +
  # scale_pattern_type_discrete("Included", values = c("Yes" = "none", "No" = "stripe")) +
  scale_pattern_manual("Included", values = c("No" = "stripe", "Yes" = "none")) +
  scale_pattern_type_manual("Included", values=c(NA, NA)) +
  scale_fill_gradient2(
    "Share", low = "#FFC107", mid = "white", high = "#009E73",
    labels = scales::percent_format()
  ) +
  facet_wrap(~product_eng) +
  theme_map_paper()
p_regions_use_production
Figure 6.4: Comparison of Discarded Series with National Production

6.3 Natural regions

Let us now focus on the natural regions: Coast, Forest, Highlands.

First, we load the map (see Section 4.1):

map_regiones_naturales <-  sf::st_read(
  str_c(
    "../data/raw/shapefile_peru/regiones_naturales/",
    "region natural_geogpsperu_JuanPabloSuyoPomalia.geojson"
  ),
  quiet = TRUE
)

We prepare a map:

map_regiones_naturales <-
  rmapshaper::ms_simplify(input = as(map_regiones_naturales, 'Spatial')) |> 
  sf::st_as_sf() |> 
  mutate(
    Natural_region = case_when(
      Nm_RegNat == "Costa" ~ "Coast",
      Nm_RegNat == "Selva" ~ "Forest",
      Nm_RegNat == "Sierra" ~ "Highlands"
    )
  )

We use the following colours for each type of region:

cols <- c("Coast" = "#56B4E9", "Forest" = "#009E73", "Highlands" = "#E69F00")

Figure 6.5 shows these regions, and adds the grid used with the weather data from Chapter 1 as well as the regional boundaries used in the analysis.

Code
ggplot(data = map_regiones_naturales) +
  geom_sf(mapping = aes(fill = Natural_region), lwd = 0) +
  scale_fill_manual(values = cols, name = "Natural region") +
  geom_sf(data = map_peru, fill = NA) +
  geom_sf(data = map_peru_grid_agri, fill = NA, lwd = 0.25) +
  theme_map_paper()
Figure 6.5: Natural Regions in Peru

6.4 Correlations Between Agricultural Production and the Weather

Let us compute the correlation between the agricultural production and our various weather variables. Recall from Chapter 5 that the agricultural production is defined as the percent deviation from the monthly trend.

The name of the weather variables:

name_weather_variables <- 
  weather_regions_df |> 
  select(where(is.numeric)) |> 
  select(-year, -month) |> 
  colnames()
name_weather_variables
 [1] "temp_min"                 "temp_max"                
 [3] "temp_mean"                "precip_sum"              
 [5] "precip_piscop_sum"        "perc_gamma_precip"       
 [7] "perc_gamma_precip_piscop" "gdd_rice"                
 [9] "gdd_maize"                "gdd_potato"              
[11] "gdd_cassava"              "hdd_rice"                
[13] "hdd_maize"                "hdd_potato"              
[15] "hdd_cassava"              "temp_min_dev"            
[17] "temp_max_dev"             "temp_mean_dev"           
[19] "precip_sum_dev"           "precip_piscop_sum_dev"   
[21] "gdd_rice_dev"             "gdd_maize_dev"           
[23] "gdd_potato_dev"           "gdd_cassava_dev"         
[25] "hdd_rice_dev"             "hdd_maize_dev"           
[27] "hdd_potato_dev"           "hdd_cassava_dev"         
[29] "cold_surprise_maize"      "cold_surprise_rice"      
[31] "cold_surprise_potato"     "cold_surprise_cassava"   
[33] "hot_surprise_maize"       "hot_surprise_rice"       
[35] "hot_surprise_potato"      "hot_surprise_cassava"    
[37] "dry_surprise_maize"       "dry_surprise_rice"       
[39] "dry_surprise_potato"      "dry_surprise_cassava"    
[41] "wet_surprise_maize"       "wet_surprise_rice"       
[43] "wet_surprise_potato"      "wet_surprise_cassava"    
[45] "spi_1"                    "spi_3"                   
[47] "spi_6"                    "spi_12"                  
[49] "spei_1"                   "spei_3"                  
[51] "spei_6"                   "spei_12"                 
[53] "spi_piscop_1"             "spi_piscop_3"            
[55] "spi_piscop_6"             "spi_piscop_12"           
[57] "spei_piscop_1"            "spei_piscop_3"           
[59] "spei_piscop_6"            "spei_piscop_12"          
Code
df |>
  select(product_eng, y_new, !!!syms(name_weather_variables)) |>
  na.omit() |>
  nest(.by = product_eng) |> 
  mutate(
    correlation = map(
      data, ~cor(.x) |> 
        data.frame() |> 
        as_tibble(rownames = "var")
    )
  ) |> 
  select(-data) |> 
  unnest(correlation) |> 
  select(Crop = product_eng, Variable = var, y_new) |> 
  pivot_wider(names_from = Crop, values_from = y_new) |> 
  mutate(across(where(is.numeric), ~round(.x, 2))) |> 
  knitr::kable()
Table 6.3: Correlations between the weather and the agricultural production
Variable Cassava Dent corn Potato Rice
y_new 1.00 1.00 1.00 1.00
temp_min 0.33 0.17 -0.19 0.15
temp_max 0.30 0.08 -0.24 0.11
temp_mean 0.32 0.14 -0.22 0.13
precip_sum 0.31 -0.13 0.05 -0.17
precip_piscop_sum 0.36 -0.08 0.06 -0.12
perc_gamma_precip 0.07 -0.06 0.03 -0.07
perc_gamma_precip_piscop -0.18 -0.08 0.01 -0.17
gdd_rice 0.32 0.13 -0.22 0.14
gdd_maize 0.32 0.13 -0.22 0.14
gdd_potato 0.32 0.13 -0.23 0.14
gdd_cassava 0.32 0.13 -0.23 0.14
hdd_rice 0.17 0.01 -0.04 -0.02
hdd_maize 0.17 0.01 -0.04 -0.02
hdd_potato 0.04 -0.01 -0.01 -0.01
hdd_cassava 0.04 -0.01 -0.01 -0.01
temp_min_dev 0.03 0.01 0.03 0.01
temp_max_dev -0.06 -0.05 0.02 -0.01
temp_mean_dev -0.02 -0.03 0.03 0.00
precip_sum_dev 0.04 -0.04 0.00 -0.02
precip_piscop_sum_dev 0.05 -0.04 0.00 -0.02
gdd_rice_dev -0.01 -0.01 0.02 0.01
gdd_maize_dev -0.01 -0.01 0.02 0.01
gdd_potato_dev -0.01 -0.02 0.02 0.00
gdd_cassava_dev -0.01 -0.02 0.02 0.00
hdd_rice_dev 0.08 0.02 0.03 -0.01
hdd_maize_dev 0.08 0.02 0.03 -0.01
hdd_potato_dev 0.03 0.00 0.05 0.00
hdd_cassava_dev 0.03 0.00 0.05 0.00
cold_surprise_maize 0.02 0.03 0.01 0.04
cold_surprise_rice 0.02 0.03 0.01 0.04
cold_surprise_potato 0.02 0.03 0.01 0.04
cold_surprise_cassava 0.02 0.03 0.01 0.04
hot_surprise_maize 0.08 0.01 0.00 0.00
hot_surprise_rice 0.08 0.01 0.00 0.00
hot_surprise_potato 0.04 0.00 0.02 -0.01
hot_surprise_cassava 0.04 0.00 0.02 -0.01
dry_surprise_maize -0.13 -0.06 0.01 0.20
dry_surprise_rice -0.13 -0.06 0.01 0.20
dry_surprise_potato -0.13 -0.06 0.01 0.20
dry_surprise_cassava -0.13 -0.06 0.01 0.20
wet_surprise_maize -0.14 0.05 -0.01 0.08
wet_surprise_rice -0.14 0.05 -0.01 0.08
wet_surprise_potato -0.14 0.05 -0.01 0.08
wet_surprise_cassava -0.14 0.05 -0.01 0.08
spi_1 0.05 -0.04 0.02 -0.02
spi_3 0.06 -0.04 0.04 -0.03
spi_6 0.05 -0.04 0.06 -0.03
spi_12 0.06 -0.05 0.05 -0.05
spei_1 0.05 -0.04 0.02 -0.03
spei_3 0.06 -0.04 0.04 -0.02
spei_6 0.05 -0.04 0.06 -0.02
spei_12 0.06 -0.05 0.05 -0.04
spi_piscop_1 0.00 -0.03 -0.02 0.00
spi_piscop_3 0.01 -0.03 0.01 -0.01
spi_piscop_6 0.03 -0.02 0.05 0.00
spi_piscop_12 0.07 -0.02 0.06 0.00
spei_piscop_1 0.01 -0.03 -0.01 -0.01
spei_piscop_3 0.01 -0.03 0.01 -0.01
spei_piscop_6 0.03 -0.01 0.05 0.01
spei_piscop_12 0.06 -0.01 0.05 0.01

6.5 Summary Statistics for the Local Projection Data

The number of observation in each region and crops (ordered by decreasing values):

df |> 
  count(product_eng, region_id) |> 
  arrange(n)
# A tibble: 78 × 3
   product_eng region_id     n
   <chr>       <fct>     <int>
 1 Cassava     1           180
 2 Cassava     2           180
 3 Cassava     3           180
 4 Cassava     4           180
 5 Cassava     5           180
 6 Cassava     6           180
 7 Cassava     7           180
 8 Cassava     9           180
 9 Cassava     10          180
10 Cassava     11          180
# ℹ 68 more rows

Let us plot the agricultural production series for each crop in each region.

Code
plots_crops_lp <- vector(mode = "list", length = length(crops))
for (i_crop in 1:length(crops)) {
  current_crop <- crops[i_crop]
  # The series in each region for the current crop
  p_crop_lp <- 
    ggplot(
    data = df |> filter(product_eng == !!current_crop),
    mapping = aes(x = date, y = y_new)
  ) +
    geom_line() +
    facet_wrap(~region, scales = "free_y") +
    labs(
      title = current_crop, x = NULL,
      y = "Production (tons)") +
    theme_paper()
  plots_crops_lp[[i_crop]] <- p_crop_lp
}
names(plots_crops_lp) <- crops
print(plots_crops_lp[["Rice"]])
Figure 6.6: Regional Production of Rice (tons)
print(plots_crops_lp[["Dent corn"]])
Figure 6.7: Regional Production of Maize (tons)
print(plots_crops_lp[["Potato"]])
Figure 6.8: Regional Production of Potato (tons)
print(plots_crops_lp[["Cassava"]])
Figure 6.9: Regional Production of Cassava (tons)
Code
plots_crops_lp <- vector(mode = "list", length = length(crops))
for (i_crop in 1:length(crops)) {
  current_crop <- crops[i_crop]
  # The series in each region for the current crop
  p_crop_lp <- 
    ggplot(
    data = df |> filter(product_eng == !!current_crop),
    mapping = aes(x = date, y = y_dev_pct)
  ) +
    geom_line() +
    facet_wrap(~region, scales = "free_y") +
    labs(
      title = current_crop, x = NULL,
      y = "Percent deviation from regional monthly trend") +
    theme_paper()
  plots_crops_lp[[i_crop]] <- p_crop_lp
}
names(plots_crops_lp) <- crops
print(plots_crops_lp[["Rice"]])
Figure 6.10: Regional Production of Rice (Deviation from Monthly Trend)
print(plots_crops_lp[["Dent corn"]])
Figure 6.11: Regional Production of Maize (Deviation from Monthly Trend)
print(plots_crops_lp[["Potato"]])
Figure 6.12: Regional Production of Potato (Deviation from Monthly Trend)
print(plots_crops_lp[["Cassava"]])
Figure 6.13: Regional Production of Cassava (Deviation from Monthly Trend)
Code
library(gtsummary)
df |>
  tbl_summary(
    include = c(
      "product_eng",
      # Production
      "y_new", "y_dev_pct",
      # Weather
      "temp_max", "precip_sum", "precip_piscop_sum",
      "temp_max_dev", "precip_sum_dev", "precip_piscop_sum_dev",
      # Control
      "rer_hp", "r_hp", "pi", "ind_prod", "ONI", "price_int_inf",
      # Type of region
      "share_costa", "share_selva", "share_sierra"
    ),
    by = product_eng,
    type = all_continuous() ~ "continuous2",
    statistic = list(
      all_continuous() ~ c("{mean} ({sd})", "{median} ({p25}, {p75})"),
      all_categorical() ~ "{n} ({p}%)"),
    digits = list(
      all_continuous() ~ 2,
      all_categorical() ~ 0
    )
  ) |> 
  add_overall(col_label = "Whole sample") |> 
  modify_header(label ~ "**Variable**") |> 
  modify_spanning_header(
    c("stat_1", "stat_2", "stat_3", "stat_4") ~ "**Crop**"
  ) |> 
  add_stat_label(
    label = list(
      all_continuous() ~ c("Mean (SD)", "Median (IQR)"),
      all_categorical() ~ "n (%)"
    )
  )
Table 6.4: Descriptive Statistics of the Data Used in the Local Projections
Variable Whole sample Crop
Cassava, N = 3,600 Dent corn, N = 4,140 Potato, N = 3,420 Rice, N = 2,880
Monthly Agricultural Production (tons)




    Mean (SD) 9,181.11 (20,854.21) 4,531.27 (7,109.64) 4,290.88 (7,254.97) 16,393.35 (29,574.83) 13,458.59 (28,312.71)
    Median (IQR) 1,934.00 (218.00, 8,840.35) 1,798.65 (412.75, 5,619.30) 983.47 (56.06, 5,648.00) 5,048.85 (904.75, 20,775.86) 1,654.00 (45.00, 15,162.07)
Agricultural Production (percent deviation from monthly regional values)




    Mean (SD) 0.00 (0.96) 0.00 (0.70) 0.00 (1.02) 0.00 (0.77) 0.00 (1.30)
    Median (IQR) -0.08 (-0.45, 0.22) -0.06 (-0.35, 0.20) -0.14 (-0.54, 0.25) -0.06 (-0.35, 0.21) -0.12 (-0.62, 0.23)
Max. Temperature




    Mean (SD) 23.43 (4.75) 24.10 (4.84) 23.53 (4.78) 21.47 (3.77) 24.75 (4.92)
    Median (IQR) 21.85 (19.63, 27.42) 22.67 (19.97, 29.03) 21.82 (19.64, 27.85) 20.56 (18.96, 22.97) 23.97 (20.47, 29.85)
Total Rainfall (Chirps)




    Mean (SD) 70.67 (75.27) 76.17 (80.04) 71.58 (77.66) 52.89 (52.89) 83.61 (83.91)
    Median (IQR) 47.07 (12.23, 107.66) 51.28 (13.60, 115.54) 46.61 (11.72, 109.18) 33.40 (9.95, 84.62) 62.44 (16.70, 126.40)
Total Rainfall (Piscop)




    Mean (SD) 80.58 (97.55) 88.55 (104.81) 81.61 (100.96) 54.62 (61.77) 99.98 (110.57)
    Median (IQR) 43.64 (7.98, 122.41) 50.20 (9.06, 135.91) 42.42 (7.02, 123.12) 27.60 (5.49, 88.01) 65.50 (13.10, 150.77)
Deviation of Max. Temperature from Normals




    Mean (SD) 0.10 (0.60) 0.09 (0.59) 0.10 (0.59) 0.13 (0.60) 0.07 (0.60)
    Median (IQR) 0.07 (-0.29, 0.48) 0.06 (-0.30, 0.47) 0.08 (-0.28, 0.49) 0.10 (-0.27, 0.52) 0.04 (-0.32, 0.45)
Deviation of Total Rainfall from Normals (Chirps)




    Mean (SD) 2.78 (25.00) 2.84 (26.24) 2.90 (25.77) 2.39 (19.76) 2.97 (27.71)
    Median (IQR) -0.17 (-5.87, 9.24) -0.12 (-6.31, 10.05) -0.19 (-5.83, 9.26) -0.20 (-4.57, 7.37) -0.10 (-6.85, 10.96)
Deviation of Total Rainfall from Normals (Piscop)




    Mean (SD) 2.60 (32.56) 2.81 (34.96) 2.64 (33.23) 1.90 (22.51) 3.11 (38.10)
    Median (IQR) -0.23 (-7.57, 9.90) -0.13 (-8.49, 10.60) -0.24 (-7.37, 9.77) -0.24 (-5.42, 7.46) -0.29 (-10.45, 13.34)
Real exchange rate




    Mean (SD) 0.00 (1.87) 0.00 (1.87) 0.00 (1.87) 0.00 (1.87) 0.00 (1.87)
    Median (IQR) 0.11 (-1.38, 1.25) 0.11 (-1.38, 1.25) 0.11 (-1.38, 1.25) 0.11 (-1.38, 1.25) 0.11 (-1.38, 1.25)
Interest rate (pp)




    Mean (SD) 0.00 (1.28) 0.00 (1.28) 0.00 (1.28) 0.00 (1.28) 0.00 (1.28)
    Median (IQR) 0.06 (-0.25, 0.36) 0.06 (-0.25, 0.36) 0.06 (-0.25, 0.36) 0.06 (-0.25, 0.36) 0.06 (-0.25, 0.36)
Inflation rate (pp)




    Mean (SD) 0.22 (0.26) 0.22 (0.26) 0.22 (0.26) 0.22 (0.26) 0.22 (0.26)
    Median (IQR) 0.24 (0.03, 0.38) 0.24 (0.03, 0.38) 0.24 (0.03, 0.38) 0.24 (0.03, 0.38) 0.24 (0.03, 0.38)
Manufacturing Prod. (pp)




    Mean (SD) -0.16 (5.78) -0.16 (5.78) -0.16 (5.78) -0.16 (5.78) -0.16 (5.78)
    Median (IQR) -0.30 (-3.93, 3.99) -0.30 (-3.93, 3.99) -0.30 (-3.93, 3.99) -0.30 (-3.93, 3.99) -0.30 (-3.93, 3.99)
Oceanic Niño Index




    Mean (SD) 0.01 (0.80) 0.01 (0.80) 0.01 (0.80) 0.01 (0.80) 0.01 (0.80)
    Median (IQR) -0.10 (-0.43, 0.43) -0.10 (-0.43, 0.43) -0.10 (-0.43, 0.43) -0.10 (-0.43, 0.43) -0.10 (-0.43, 0.43)
Int. commodity inflation rate




    Mean (SD) 0.00 (0.05) 0.00 (0.03) 0.00 (0.06) 0.00 (0.03) 0.01 (0.06)
    Median (IQR) 0.00 (-0.02, 0.03) 0.00 (-0.02, 0.02) 0.00 (-0.03, 0.04) 0.00 (-0.02, 0.02) 0.00 (-0.02, 0.02)
Share of coastal areas in the region




    Mean (SD) 0.27 (0.35) 0.27 (0.36) 0.27 (0.35) 0.28 (0.33) 0.25 (0.35)
    Median (IQR) 0.00 (0.00, 0.42) 0.00 (0.00, 0.42) 0.00 (0.00, 0.42) 0.11 (0.00, 0.42) 0.00 (0.00, 0.41)
Share of forest areas in the region




    Mean (SD) 0.33 (0.39) 0.38 (0.40) 0.33 (0.40) 0.19 (0.27) 0.47 (0.40)
    Median (IQR) 0.06 (0.00, 0.69) 0.21 (0.00, 0.73) 0.05 (0.00, 0.69) 0.00 (0.00, 0.44) 0.48 (0.00, 0.87)
Share of highland areas in the region




    Mean (SD) 0.40 (0.31) 0.36 (0.32) 0.40 (0.33) 0.53 (0.29) 0.29 (0.25)
    Median (IQR) 0.46 (0.07, 0.58) 0.38 (0.04, 0.56) 0.46 (0.05, 0.60) 0.53 (0.31, 0.71) 0.24 (0.00, 0.53)

6.6 Growing Season Duration

Let us have a look at the growing season duration for each crop.

We need these additional two packages:

library(readxl)

We load the dataset with agricultural data (see Chapter 2).

load("../data/output/minagri/data_S_TOTAL.rda")
load("../data/output/minagri/data_SR_TOTAL.rda")
load("../data/output/df_lp.rda")

Let us focus on the regions kept in the final data for each crop.

list_of_regions <- 
  df |> 
  select(product, region) |> 
  unique() |> 
  mutate(keep_data = 1)
data_S_TOTAL <- data_S_TOTAL |> 
  mutate(
    region = str_replace_all(region, "á", "a"),
    region = str_replace_all(region, "í", "i"),
    region = str_replace_all(region, "é", "e"),
    region = str_replace_all(region, "ó", "o"),
    region = str_replace_all(region, "ú", "u"),
    region = str_replace_all(region, "ñ", "n")
  )


data_SR_TOTAL <- data_SR_TOTAL |> 
  mutate(
    region = str_replace_all(region, "á", "a"),
    region = str_replace_all(region, "í", "i"),
    region = str_replace_all(region, "é", "e"),
    region = str_replace_all(region, "ó", "o"),
    region = str_replace_all(region, "ú", "u"),
    region = str_replace_all(region, "ñ", "n")
  )
harv_cum <- data_SR_TOTAL |> 
  select(-date) |> 
  mutate(month = str_c("month", month)) |> 
  pivot_wider(names_from = month, values_from = Value_surfR) |> 
  mutate(
    cum_sum1  = month1, 
    cum_sum2  = month2 + cum_sum1, 
    cum_sum3  = month3 + cum_sum2, 
    cum_sum4  = month4 + cum_sum3, 
    cum_sum5  = month5 + cum_sum4, 
    cum_sum6  = month6 + cum_sum5, 
    cum_sum7  = month7 + cum_sum6, 
    cum_sum8  = month8 + cum_sum7, 
    cum_sum9  = month9 + cum_sum8, 
    cum_sum10 = month10 + cum_sum9, 
    cum_sum11 = month11 + cum_sum10, 
    cum_sum12 = month12 + cum_sum11
  ) |> 
  mutate(
    perc_cum1   = cum_sum1/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum2   = cum_sum2/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum3   = cum_sum3/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum4   = cum_sum4/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum5   = cum_sum5/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum6   = cum_sum6/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum7   = cum_sum7/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum8   = cum_sum8/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum9   = cum_sum9/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum10  = cum_sum10/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum11  = cum_sum11/ifelse(cum_sum12 == 0, 1, cum_sum12),
    perc_cum12  = cum_sum12/ifelse(cum_sum12 == 0, 1, cum_sum12)
  ) |>  
  select(
    product, region, year, perc_cum1, perc_cum2, perc_cum3, perc_cum4,
    perc_cum5, perc_cum6, perc_cum7, perc_cum8, perc_cum9, perc_cum10,
    perc_cum11, perc_cum12
  ) |> 
  pivot_longer(
    cols = c(
      perc_cum1, perc_cum2, perc_cum3, perc_cum4, perc_cum5, perc_cum6,
      perc_cum7, perc_cum8, perc_cum9, perc_cum10, perc_cum11, perc_cum12
    ),
    names_to = "month"
  ) |> 
  rename(perc_cum_harv_obs = value) |> 
  mutate(perc_cum_harv_obs = perc_cum_harv_obs*100) |> 
  mutate(
    month = case_when(
      month == "perc_cum1"  ~ 1, 
      month == "perc_cum2"  ~ 2, 
      month == "perc_cum3"  ~ 3, 
      month == "perc_cum4"  ~ 4, 
      month == "perc_cum5"  ~ 5, 
      month == "perc_cum6"  ~ 6, 
      month == "perc_cum7"  ~ 7, 
      month == "perc_cum8"  ~ 8, 
      month == "perc_cum9"  ~ 9, 
      month == "perc_cum10" ~ 10, 
      month == "perc_cum11" ~ 11, 
      month == "perc_cum12" ~ 12, 
    )
  ) |> 
  mutate(
    first = ifelse(perc_cum_harv_obs == 100, 1, 0),
    first = ifelse(first == 1 & lag(first) == 1, -1, first),
    perc_cum_harv_obs_first100 = ifelse(first == -1,0,perc_cum_harv_obs)
  ) |> 
  select(-first)

Selecting the total value (December) as an additional month:

temp <- data_S_TOTAL |> 
  select(
    region, product, campaign_plain, month_campaign, Value_surf, campaign
  ) |>
  filter(month_campaign == 12) |> 
  mutate(month_campaign = 13) |> 
  rename(surf_m = Value_surf)
plan_perc <- data_S_TOTAL |> 
  select(
    region, product, campaign_plain, month_campaign, surf_m, campaign
  ) |> 
  rbind(temp) %>%
  .[with(., order(region, product, campaign, month_campaign) ), ] |> 
  select(-campaign) |> 
  mutate(month_campaign = str_c("cum_sum", month_campaign)) |> 
  unique() |> 
  filter(! product == "TOTAL") |> 
  pivot_wider(names_from = month_campaign, values_from = surf_m) |> 
  relocate(
    region, product, campaign_plain, cum_sum1, cum_sum2, cum_sum3,
    cum_sum4, cum_sum5, cum_sum6, cum_sum7, cum_sum8,
    cum_sum9, cum_sum10, cum_sum11, cum_sum12, cum_sum13
  ) |> 
  rename(total = cum_sum13) |> 
  mutate(
    perc_cum1  = cum_sum1/total*100,
    perc_cum2  = cum_sum2/total*100,
    perc_cum3  = cum_sum3/total*100,
    perc_cum4  = cum_sum4/total*100,
    perc_cum5  = cum_sum5/total*100,
    perc_cum6  = cum_sum6/total*100,
    perc_cum7  = cum_sum7/total*100,
    perc_cum8  = cum_sum8/total*100,
    perc_cum9  = cum_sum9/total*100,
    perc_cum10 = cum_sum10/total*100,
    perc_cum11 = cum_sum11/total*100, 
    perc_cum12 = cum_sum12/total*100
    ) |> 
  select(
    product, region, campaign_plain, perc_cum1, perc_cum2, perc_cum3,
    perc_cum4, perc_cum5, perc_cum6, perc_cum7, perc_cum8, perc_cum9, 
    perc_cum10, perc_cum11, perc_cum12
  ) |> 
  pivot_longer(
    cols = c(
      perc_cum1, perc_cum2, perc_cum3, perc_cum4, perc_cum5, perc_cum6, 
      perc_cum7, perc_cum8, perc_cum9, perc_cum10, perc_cum11, perc_cum12
    ), 
    names_to = "month_campaign"
  ) |> 
  rename(perc_plan_obs = value) |> 
  mutate(perc_plan_obs = perc_plan_obs) |> 
  mutate(
    month_campaign = case_when(
      month_campaign == "perc_cum1"  ~ 1,
      month_campaign == "perc_cum2"  ~ 2,
      month_campaign == "perc_cum3"  ~ 3,
      month_campaign == "perc_cum4"  ~ 4,
      month_campaign == "perc_cum5"  ~ 5,
      month_campaign == "perc_cum6"  ~ 6,
      month_campaign == "perc_cum7"  ~ 7,
      month_campaign == "perc_cum8"  ~ 8,
      month_campaign == "perc_cum9"  ~ 9,
      month_campaign == "perc_cum10" ~ 10,
      month_campaign == "perc_cum11" ~ 11,
      month_campaign == "perc_cum12" ~ 12,
    )
  )
plan_perc_cum <- data_S_TOTAL |> 
  select(
    region, product, campaign_plain, month_campaign, Value_surf, campaign
  ) %>%
  .[with(., order(region, product, campaign, month_campaign) ), ] |> 
  select(-campaign) |> 
  mutate(month_campaign = str_c("cum_sum", month_campaign)) |> 
  unique() |> 
  filter(! product == "TOTAL") |> 
  pivot_wider(
    names_from = month_campaign, values_from = Value_surf
  ) |> 
  relocate(
    region, product, campaign_plain, 
    cum_sum1, cum_sum2, cum_sum3, cum_sum4, cum_sum5, cum_sum6, 
    cum_sum7, cum_sum8, cum_sum9, cum_sum10, cum_sum11, cum_sum12
  ) |> 
  mutate(
    perc_cum1   = cum_sum1/cum_sum12, 
    perc_cum2   = cum_sum2/cum_sum12, 
    perc_cum3   = cum_sum3/cum_sum12, 
    perc_cum4   = cum_sum4/cum_sum12,
    perc_cum5   = cum_sum5/cum_sum12, 
    perc_cum6   = cum_sum6/cum_sum12, 
    perc_cum7   = cum_sum7/cum_sum12, 
    perc_cum8   = cum_sum8/cum_sum12, 
    perc_cum9   = cum_sum9/cum_sum12, 
    perc_cum10  = cum_sum10/cum_sum12, 
    perc_cum11  = cum_sum11/cum_sum12, 
    perc_cum12  = cum_sum12/cum_sum12
  ) |> 
  select(
    product, region, campaign_plain, 
    perc_cum1, perc_cum2, perc_cum3, perc_cum4, perc_cum5, perc_cum6, 
    perc_cum7, perc_cum8, perc_cum9, perc_cum10, perc_cum11, perc_cum12
  ) |> 
  pivot_longer(
    cols = c(
      perc_cum1, perc_cum2, perc_cum3, perc_cum4, perc_cum5, perc_cum6, 
      perc_cum7, perc_cum8, perc_cum9, perc_cum10, perc_cum11, perc_cum12
    ), 
    names_to = "month_campaign"
  ) |> 
  rename(perc_cum_plan_obs = value) |> 
  mutate(perc_cum_plan_obs = perc_cum_plan_obs*100) |> 
  mutate(
    month_campaign = case_when(
      month_campaign == "perc_cum1"  ~ 1, 
      month_campaign == "perc_cum2"  ~ 2, 
      month_campaign == "perc_cum3"  ~ 3, 
      month_campaign == "perc_cum4"  ~ 4, 
      month_campaign == "perc_cum5"  ~ 5, 
      month_campaign == "perc_cum6"  ~ 6, 
      month_campaign == "perc_cum7"  ~ 7, 
      month_campaign == "perc_cum8"  ~ 8, 
      month_campaign == "perc_cum9"  ~ 9, 
      month_campaign == "perc_cum10" ~ 10, 
      month_campaign == "perc_cum11" ~ 11, 
      month_campaign == "perc_cum12" ~ 12, 
    )
  )

We load the calendars (see Chapter 2).

load("../data/output/Calendario agricola/calendar3.rda")
load("../data/output/Calendario agricola/calendar4.rda")
calendar3 <- calendar3 |> 
  mutate(
    region = str_replace_all(region, "á", "a"),
    region = str_replace_all(region, "í", "i"),
    region = str_replace_all(region, "é", "e"),
    region = str_replace_all(region, "ó", "o"),
    region = str_replace_all(region, "ú", "u"),
    region = str_replace_all(region, "ñ", "n")
  )|> 
  mutate(
    region = toupper(region), 
    product = toupper(product)
  ) |> 
  rename("perc_cum_harv_calend" = "perc_cum_harv")

calendar4 <- calendar4 |> 
  mutate(
    region = str_replace_all(region, "á", "a"),
    region = str_replace_all(region, "í", "i"),
    region = str_replace_all(region, "é", "e"),
    region = str_replace_all(region, "ó", "o"),
    region = str_replace_all(region, "ú", "u"),
    region = str_replace_all(region, "ñ", "n")
  ) |> 
  mutate(
    region = toupper(region), 
    product = toupper(product)
  ) |> 
  rename("perc_cum_plan_calend" = "perc_cum_plan")

Now, we can determine the optimal growth duration from the data.

optimal_growth_duration <- data_SR_TOTAL |> 
  mutate(
    region = toupper(region), 
    product = toupper(product)
  ) |>
  filter(product  %in% c("PAPA","MAÍZ AMARILLO DURO", "ARROZ CÁSCARA",  "YUCA")) |> 
  full_join(list_of_regions) |> 
  filter(keep_data == 1) |> 
  select(-keep_data)
Joining with `by = join_by(region, product)`

Looping over months:

for (ii in 1:12) {
  optimal_growth_duration <- optimal_growth_duration |> 
    left_join(
      data_S_TOTAL |> 
        group_by(region, product) |> 
        select(region,product, date, surf_m) |> 
        mutate(!!as.name(paste("surf_lag", ii, sep="")) := lag(surf_m, ii)) |> 
        select(-surf_m) , 
      by = c("region", "product", "date")
    ) |> 
    mutate(
      !!as.name(paste("diff", ii, sep="")) := 
        !!as.name(paste("surf_lag", ii, sep="")) - Value_surfR
    )
}
rm(ii)
optimal_growth_duration_results <- optimal_growth_duration |> 
  unique() |> 
  group_by(region, product) |> 
  mutate(sum_totale = sum(Value_surfR, na.rm = T )) |> 
  ungroup() |> 
  filter(! sum_totale == 0) |> 
  select(- sum_totale) |> 
  filter(! year == 2001) |> 
  group_by(region, product) |> 
  transmute(
    nb_neg1 = sum(diff1 <0, na.rm = T),
    nb_neg2 = sum(diff2 <0, na.rm = T),
    nb_neg3 = sum(diff3 <0, na.rm = T),
    nb_neg4 = sum(diff4 <0, na.rm = T),
    nb_neg5 = sum(diff5 <0, na.rm = T),
    nb_neg6 = sum(diff6 <0, na.rm = T),
    nb_neg7 = sum(diff7 <0, na.rm = T),
    nb_neg8 = sum(diff8 <0, na.rm = T),
    diff1 = mean(abs(diff1), na.rm = T), 
    diff2 = mean(abs(diff2), na.rm = T), 
    diff3 = mean(abs(diff3), na.rm = T), 
    diff4 = mean(abs(diff4), na.rm = T), 
    diff5 = mean(abs(diff5), na.rm = T), 
    diff6 = mean(abs(diff6), na.rm = T), 
    diff7 = mean(abs(diff7), na.rm = T), 
    diff8 = mean(abs(diff8), na.rm = T), 
    diff9 = mean(abs(diff9), na.rm = T), 
    diff10 = mean(abs(diff10), na.rm = T), 
    diff11 = mean(abs(diff11), na.rm = T), 
    diff12 = mean(abs(diff12), na.rm = T)
  ) |> 
  unique() |> 
  mutate(
    min_lag = min(
      diff1,diff2,diff3, diff4, diff5, diff6, 
      diff7, diff8, diff9, diff10, diff11, diff12
    ), 
    max_neg_values = max(
      nb_neg1, nb_neg2, nb_neg3, nb_neg4, nb_neg5, nb_neg6, nb_neg7, nb_neg8
    )
  ) |> 
  ungroup() |> 
  filter(! is.nan(min_lag)) |> 
  mutate(
    best_lag = case_when(
      diff1 == min_lag ~ as.numeric(13), 
      diff2 == min_lag ~ as.numeric(14), 
      diff3 == min_lag ~ as.numeric(3), 
      diff4 == min_lag ~ as.numeric(4), 
      diff5 == min_lag ~ as.numeric(5), 
      diff6 == min_lag ~ as.numeric(6), 
      diff7 == min_lag ~ as.numeric(7), 
      diff8 == min_lag ~ as.numeric(8), 
      diff9 == min_lag ~ as.numeric(9), 
      diff10 == min_lag ~ as.numeric(10), 
      diff11 == min_lag ~ as.numeric(11), 
      diff12 == min_lag ~ as.numeric(12), 
    ),
  worse_neg_values = case_when(
    nb_neg1 == max_neg_values ~ as.numeric(1),
    nb_neg2 == max_neg_values ~ as.numeric(2), 
    nb_neg3 == max_neg_values ~ as.numeric(3),
    nb_neg4 == max_neg_values ~ as.numeric(4),
    nb_neg5 == max_neg_values ~ as.numeric(5), 
    nb_neg6 == max_neg_values ~ as.numeric(6), 
    nb_neg7 == max_neg_values ~ as.numeric(7), 
    nb_neg8 == max_neg_values ~ as.numeric(8)), 
  bad_result = ifelse(best_lag == worse_neg_values, 1,0)
  )
summary_opt_growth_duration <- optimal_growth_duration_results |> 
  filter(! best_lag == 1) |> 
  select(region, product, max_neg_values, best_lag) |> 
  group_by(product) |> 
  transmute(
    mean_best_lag = mean(best_lag),
    med_best_lag  = median(best_lag), 
    sd_best_lag   = sd(best_lag), 
    min_best_lag  = min(best_lag), 
    max_best_lag  = max(best_lag), 
    nb_regions    = n()
  ) |> 
  ungroup() |> 
  mutate(
    product_eng = case_when(
      product == "ARROZ CÁSCARA"      ~ "Rice",
      product == "MAÍZ AMARILLO DURO" ~ "Maize",
      product == "PAPA"               ~ "Potato", 
      product == "YUCA"               ~ "Cassava", 
      TRUE ~ "delete")
  ) |> 
  unique() |> 
  select(- product) |> 
  relocate(product_eng) |> 
  arrange(product_eng)
knitr::kable(summary_opt_growth_duration, digits = 2)
Table 6.5: Calculated growing season duration.
product_eng mean_best_lag med_best_lag sd_best_lag min_best_lag max_best_lag nb_regions
Cassava 9.00 9 2.96 3 14 20
Maize 5.17 5 1.64 4 12 23
Potato 5.79 6 1.08 4 8 19
Rice 4.44 4 0.73 3 6 16