We sometimes need the distance between two countries in econometric models, for instance in gravity models. Bur for some countries, I think this measure could cause problem. Let’s take an example. If we want to model the volume of trade between country (i) and country (j), the economic theory says it will depend on the distance between (i) and (j). Let (i) be China, (j_1) Japan, and (j_2) India. The distance between (i) and (j_1) will be lower than between (i) and (j_2). Can we consider, though, that Japan is farther from China than India is?

So, rather than compute the distance between two head cities, it might be more accurate to compute the closest distance between the borders. If a border is shared by country (i) and country (j), then the distance should be zero.

This is extremely simple to do using R, and it gives me an occasion to use the *rbind_all* function from the R package dplyr. We could obtain the table even faster using *cbind_all*, but at the moment it is still in development.

[**Edit 2017-11_17**: See in the comments below and updated and improved version of this by Matt T.]

Note that the relation is symmetric, hence we can optimize the computation.

```
library(maps)
library(geosphere)
library(dplyr)
world.map <- map("world", fill = TRUE)
indicePays <- seq(1,length(world.map$names))[-grep(":", world.map$names)]
# https://stat.ethz.ch/pipermail/r-help/2010-April/237031.html
splitNA <- function(x){
idx <- 1 + cumsum(is.na(x))
not.na <- !is.na(x)
split(x[not.na], idx[not.na])
}
# Coordinates of every country
lesCoordsX <- splitNA(world.map$x)
lesCoordsY <- splitNA(world.map$y)
lesDistancesUnPays <- function(unIndicePays){
# Borders coordinates for current country
coordsPays <- data.frame(long = lesCoordsX[[unIndicePays]], lat = lesCoordsY[[unIndicePays]])
# Indexes of countries except the current one
# and the one for which the computation has already been done
lesIndicesAutresPays <- indicePays[indicePays > unIndicePays]
distancePoint <- function(unPoint){
unPoint.m <- matrix(unPoint, ncol = 2)
# We need to compute distances between unPoint and every border points of every other countries
# it is given by lesIndicesAutresPays
distancePointPays <- function(unIndicePays2){
coordsPays2 <- matrix(cbind(long = lesCoordsX[[unIndicePays2]], lat = lesCoordsY[[unIndicePays2]]), ncol = 2)
lesDistPointPays2 <- spDists(x=coordsPays2, y=matrix(unPoint, ncol=2), longlat=TRUE)
return(min(lesDistPointPays2)) # shortest distance between unPoint and country which index is unIndicePays2
}
lesDistPointPays2 <- lapply(lesIndicesAutresPays, distancePointPays)
res <- unlist(lesDistPointPays2)
return(res)
}
distancesPays <- apply(coordsPays, 1, distancePoint)
# Shortest distances between unPoint and every other country
if(!is.matrix(distancesPays)){
# For the last country on the list
plusCourtesDistances <- min(distancesPays)
}else{
plusCourtesDistances <- apply(distancesPays, 1, min)
}
resul <- cbind(pays1 = rep(unIndicePays, length(plusCourtesDistances)),pays2 = lesIndicesAutresPays, dist = plusCourtesDistances)
return(resul)
}
# We don't need distances for the last country (they have all been computed)
lesDist <- lapply(indicePays[-length(indicePays)], lesDistancesUnPays)
lesDist <- rbind_all(lesDist)
# We need to recover distances for each couple
lesDist$ID <- paste(sprintf("%04d", lesDist$pays1), sprintf("%04d", lesDist$pays2), sep = "")
lesDist2 <- data.frame(cbind(pays1 = rep(indicePays, each = length(indicePays)),
pays2 = rep(indicePays, length(indicePays))))
lesDist2 <- lesDist2[-which(lesDist2$pays1 == lesDist2$pays2),]
lesDist2$ID <- paste(sprintf("%04d", lesDist2$pays1), sprintf("%04d", lesDist2$pays2), sep = "")
lesDist2$ID2 <- paste(sprintf("%04d", lesDist2$pays2), sprintf("%04d", lesDist2$pays1), sep = "")
lesDist2$match <- match(lesDist2$ID, lesDist$ID)
lesDist2[is.na(lesDist2$match),"match"] <- match(lesDist2$ID2[is.na(lesDist2$match)], lesDist$ID)
lesDist2$dist <- lesDist[lesDist2$match, "dist"]
lesDist2 <- lesDist2[,c("pays1", "pays2", "dist")]
lesDist <- lesDist2
rm(lesDist2)
# Let's add countries names
lesDist$pays1 <- world.map$names[lesDist$pays1]
lesDist$pays2 <- world.map$names[lesDist$pays2]
```

There you go. If you wish to use these distances, here is the CSV file. You could also download the RData file.

```
> load(url("http://egallic.fr/R/Blog/Cartes/countries_distances.RData"))
> head(lesDist)
pays1 pays2 dist
1 Canada South Africa 11225.350
2 Canada Denmark 3963.909
3 Canada USSR 1254.421
4 Canada Pakistan 7831.515
5 Canada Aral Sea 6706.607
6 Canada Italy 4466.916
```

what is the unit for this distance?

It is in kilometres.

This was exactly what I needed – sadly the data is ancient. I can’t use distance to USSR when trying to figure out distance from Denmark to a number of former USSR countries 🙁

Really cool work though

You can easily adapt the code with more recent data though!

Hi Ewen, this is great, exactly what I needed. However, I found that your data excluded island countries such as the UK, New Zealand, Japan, etc. Your script also didn’t work on the latest version of R (3.4.2). I adapted your code to provide for all countries. Code is here: https://gist.githubusercontent.com/mtriff/185e15be85b44547ed110e412a1771bf/raw/1bb4d287f79ca07f63d4c56110099c26e7c6ee7d/getCountryDist.r and CSV output is here: https://gist.githubusercontent.com/mtriff/185e15be85b44547ed110e412a1771bf/raw/1bb4d287f79ca07f63d4c56110099c26e7c6ee7d/countries_distances.csv. Hope this helps someone else.

Hi Matt,

Thank you!

Hi Guys, this is great thank you! Any particular reason why the distance between France and Germany isn’t 0?

Many bordering countries have non-zero distance.

I presume it’s because of the inaccuracies in the data set, but can’t be sure.

Maybe you can define a threshold value? If the distance between bordering countries are are lower than that threshold you can consider that they actually share a border?