Some country names need to be changed to match those found in the Oxford data:
population <- population |>mutate(country_wdi = country,country_wdi =recode( country_wdi,"Brunei Darussalam"="Brunei","Cabo Verde"="Cape Verde","Congo, Dem. Rep."="Democratic Republic of Congo","Congo, Rep."="Congo","Egypt, Arab Rep."="Egypt","Gambia, The"="Gambia","Iran, Islamic Rep."="Iran","Lao PDR"="Laos","Russian Federation"="Russia","Syrian Arab Republic"="Syria","Venezuela, RB"="Venezuela","Yemen, Rep."="Yemen" ) )
2.2 Functions
Our approach is based on an empirical approach. We estimate the parameters of a generalized Richards equation which belongs to the family of exponential growth models, extensively used in the literature:
\[
\Delta C_t = C_{t+1} - C_t = r C_t^P \left[ 1- \left(\frac{C_t}{K}\right)^\alpha\right] + \varepsilon_t, \quad P \in [0,1].
\] where \(\varepsilon_t\) is the model error and is distributed as a Gaussian noise process \(N(0,\sigma^2_{\varepsilon}) \ \forall t\).
The four parameters will be estimated using non linear last square (using nls.lm() from {minpack.lm}).
Let us load the required package:
library(minpack.lm)
We need to define the function objective function, i.e., the one that will be optimized:
#' Generalized Richards#' #' @param theta list of named parameters (r, P, alpha, and K)#' @param x numeric, cumulative number of infectedgeneralized_richards_f <-function(theta, x) { r <- theta[["r"]]; P <- theta[["P"]]; alpha <- theta[["alpha"]] ; K <- theta[["K"]] ; r * x^P *(1-(x/K)^alpha) + x}
The function that will be minimized, the quadratic error:
#' Loss function#' @param theta list of named parameters#' @param fun function to be optimized#' @param y observed number of new infected#' @param x cumulative number of infectedloss_function <-function(theta, fun, y, x) { (y -fun(theta = theta, x = x))^2}
Let us also define a function to plot multiple graphs on a single figure, with a shared legend:
Let us wrap the code used for a single country in a function.
estimate_country <-function(i) { country <- names_countries[i] pop_country <- population |>filter(country ==!!country) |>pull(pop)# Filtering the Oxford data to keep only those of the current country dat <- covid_cases |>filter(country %in%!!country) |>filter(rm_NA_headtail(value))if(nrow(dat) ==0){return(NULL) } dat <- dat |>mutate(C =lead(value), C_m = value) |>filter(rm_NA_headtail(C))# starting param values start <-list(r =1,P = .5,alpha = .5,K = .6*pop_country )# non linear least square estimation out <-try(nls.lm(par = start, fn = loss_function,y = dat$C,x = dat$C_m,fun = generalized_richards_f,control =nls.lm.control(maxiter =1024),lower =c(r=0, P=0, alpha=0, K=0), upper =c(r=100, P=1, alpha=Inf, K=pop_country) ) )if (!inherits(out, "try-error")) {# fitted values fit <-generalized_richards_f(out$par, dat$C_m)# extract estimated model parameters nls_estimates <- out$par# RMSE err_nls =sqrt(sum((dat$C - fit)^2)) } else { fit <- nls_estimates <- err_nls <-NULL } resul_c <-list(nls =list(estimates = nls_estimates,fit = fit,rmse = err_nls ),dat = dat ) resul_c}# End of estimate_country()
Then, let us loop over the countries. As each run is independent from the others, we can perform the computation in parallel. Please note that in this notebook, we will only load the results obtained using the same codes and not actually run those here.