We sometimes need the distance between two countries in econometric models, for instance in gravity models. Bur for some countries, I think this measure could cause problem. Let’s take an example. If we want to model the volume of trade between country (i) and country (j), the economic theory says it will depend on the distance between (i) and (j). Let (i) be China, (j_1) Japan, and (j_2) India. The distance between (i) and (j_1) will be lower than between (i) and (j_2). Can we consider, though, that Japan is farther from China than India is?

So, rather than compute the distance between two head cities, it might be more accurate to compute the closest distance between the borders. If a border is shared by country (i) and country (j), then the distance should be zero.

This is extremely simple to do using R, and it gives me an occasion to use the rbind_all function from the R package dplyr. We could obtain the table even faster using cbind_all, but at the moment it is still in development.

Note that the relation is symmetric, hence we can optimize the computation.

library(maps)
library(geosphere)
library(dplyr)
world.map <- map("world", fill = TRUE)

indicePays <- seq(1,length(world.map$names))[-grep(":", world.map$names)]

# https://stat.ethz.ch/pipermail/r-help/2010-April/237031.html
splitNA <- function(x){
  idx <- 1 + cumsum(is.na(x))
  not.na <- !is.na(x)
  split(x[not.na], idx[not.na])
}

# Coordinates of every country
lesCoordsX <- splitNA(world.map$x)
lesCoordsY <- splitNA(world.map$y)

lesDistancesUnPays <- function(unIndicePays){
  # Borders coordinates for current country
  coordsPays <- data.frame(long = lesCoordsX[[unIndicePays]], lat = lesCoordsY[[unIndicePays]])
  
  # Indexes of countries except the current one
  # and the one for which the computation has already been done
  lesIndicesAutresPays <- indicePays[indicePays > unIndicePays]
  
  
  distancePoint <- function(unPoint){
    unPoint.m <- matrix(unPoint, ncol = 2)
    
    
    # We need to compute distances between unPoint and every border points of every other countries
    # it is given by lesIndicesAutresPays
    
    distancePointPays <- function(unIndicePays2){
      coordsPays2 <- matrix(cbind(long = lesCoordsX[[unIndicePays2]], lat = lesCoordsY[[unIndicePays2]]), ncol = 2)
      lesDistPointPays2 <- spDists(x=coordsPays2, y=matrix(unPoint, ncol=2), longlat=TRUE)
      return(min(lesDistPointPays2)) # shortest distance between unPoint and country which index is unIndicePays2
   }
   lesDistPointPays2 <- lapply(lesIndicesAutresPays, distancePointPays)
   res <- unlist(lesDistPointPays2)
   return(res)
  }
  
  distancesPays <- apply(coordsPays, 1, distancePoint)
  # Shortest distances between unPoint and every other country
  if(!is.matrix(distancesPays)){
    # For the last country on the list
    plusCourtesDistances <- min(distancesPays)
  }else{
    plusCourtesDistances <- apply(distancesPays, 1, min)
  }
  
  resul <- cbind(pays1 = rep(unIndicePays, length(plusCourtesDistances)),pays2 = lesIndicesAutresPays, dist = plusCourtesDistances)
  return(resul)
}

# We don't need distances for the last country (they have all been computed)
lesDist <- lapply(indicePays[-length(indicePays)], lesDistancesUnPays)
lesDist <- rbind_all(lesDist)

# We need to recover distances for each couple
lesDist$ID <- paste(sprintf("%04d", lesDist$pays1), sprintf("%04d", lesDist$pays2), sep = "")

lesDist2 <- data.frame(cbind(pays1 = rep(indicePays, each = length(indicePays)),
                         pays2 = rep(indicePays, length(indicePays))))
lesDist2  <-  lesDist2[-which(lesDist2$pays1 == lesDist2$pays2),]
lesDist2$ID <- paste(sprintf("%04d", lesDist2$pays1), sprintf("%04d", lesDist2$pays2), sep = "")
lesDist2$ID2 <- paste(sprintf("%04d", lesDist2$pays2), sprintf("%04d", lesDist2$pays1), sep = "")
lesDist2$match <- match(lesDist2$ID, lesDist$ID)
lesDist2[is.na(lesDist2$match),"match"] <- match(lesDist2$ID2[is.na(lesDist2$match)], lesDist$ID)
lesDist2$dist <- lesDist[lesDist2$match, "dist"]
lesDist2 <- lesDist2[,c("pays1", "pays2", "dist")]

lesDist <- lesDist2
rm(lesDist2)

# Let's add countries names
lesDist$pays1 <- world.map$names[lesDist$pays1]
lesDist$pays2 <- world.map$names[lesDist$pays2]

There you go. If you wish to use these distances, here is the CSV file. You could also download the RData file.

> load(url("http://egallic.fr/R/Blog/Cartes/countries_distances.RData"))
> head(lesDist)
   pays1        pays2      dist
1 Canada South Africa 11225.350
2 Canada      Denmark  3963.909
3 Canada         USSR  1254.421
4 Canada     Pakistan  7831.515
5 Canada     Aral Sea  6706.607
6 Canada        Italy  4466.916

4 thoughts on “Closest distance between countries

  1. This was exactly what I needed – sadly the data is ancient. I can’t use distance to USSR when trying to figure out distance from Denmark to a number of former USSR countries 🙁

    Really cool work though

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Time limit is exhausted. Please reload CAPTCHA.