We sometimes need the distance between two countries in econometric models, for instance in gravity models. Bur for some countries, I think this measure could cause problem. Let’s take an example. If we want to model the volume of trade between country (i) and country (j), the economic theory says it will depend on the distance between (i) and (j). Let (i) be China, (j_1) Japan, and (j_2) India. The distance between (i) and (j_1) will be lower than between (i) and (j_2). Can we consider, though, that Japan is farther from China than India is?

So, rather than compute the distance between two head cities, it might be more accurate to compute the closest distance between the borders. If a border is shared by country (i) and country (j), then the distance should be zero.

This is extremely simple to do using R, and it gives me an occasion to use the rbind_all function from the R package dplyr. We could obtain the table even faster using cbind_all, but at the moment it is still in development.

[Edit 2017-11_17: See in the comments below and updated and improved version of this by Matt T.]

Note that the relation is symmetric, hence we can optimize the computation.

world.map <- map("world", fill = TRUE)

indicePays <- seq(1,length(world.map$names))[-grep(":", world.map$names)]

# https://stat.ethz.ch/pipermail/r-help/2010-April/237031.html
splitNA <- function(x){
  idx <- 1 + cumsum(is.na(x))
  not.na <- !is.na(x)
  split(x[not.na], idx[not.na])

# Coordinates of every country
lesCoordsX <- splitNA(world.map$x)
lesCoordsY <- splitNA(world.map$y)

lesDistancesUnPays <- function(unIndicePays){
  # Borders coordinates for current country
  coordsPays <- data.frame(long = lesCoordsX[[unIndicePays]], lat = lesCoordsY[[unIndicePays]])
  # Indexes of countries except the current one
  # and the one for which the computation has already been done
  lesIndicesAutresPays <- indicePays[indicePays > unIndicePays]
  distancePoint <- function(unPoint){
    unPoint.m <- matrix(unPoint, ncol = 2)
    # We need to compute distances between unPoint and every border points of every other countries
    # it is given by lesIndicesAutresPays
    distancePointPays <- function(unIndicePays2){
      coordsPays2 <- matrix(cbind(long = lesCoordsX[[unIndicePays2]], lat = lesCoordsY[[unIndicePays2]]), ncol = 2)
      lesDistPointPays2 <- spDists(x=coordsPays2, y=matrix(unPoint, ncol=2), longlat=TRUE)
      return(min(lesDistPointPays2)) # shortest distance between unPoint and country which index is unIndicePays2
   lesDistPointPays2 <- lapply(lesIndicesAutresPays, distancePointPays)
   res <- unlist(lesDistPointPays2)
  distancesPays <- apply(coordsPays, 1, distancePoint)
  # Shortest distances between unPoint and every other country
    # For the last country on the list
    plusCourtesDistances <- min(distancesPays)
    plusCourtesDistances <- apply(distancesPays, 1, min)
  resul <- cbind(pays1 = rep(unIndicePays, length(plusCourtesDistances)),pays2 = lesIndicesAutresPays, dist = plusCourtesDistances)

# We don't need distances for the last country (they have all been computed)
lesDist <- lapply(indicePays[-length(indicePays)], lesDistancesUnPays)
lesDist <- rbind_all(lesDist)

# We need to recover distances for each couple
lesDist$ID <- paste(sprintf("%04d", lesDist$pays1), sprintf("%04d", lesDist$pays2), sep = "")

lesDist2 <- data.frame(cbind(pays1 = rep(indicePays, each = length(indicePays)),
                         pays2 = rep(indicePays, length(indicePays))))
lesDist2  <-  lesDist2[-which(lesDist2$pays1 == lesDist2$pays2),]
lesDist2$ID <- paste(sprintf("%04d", lesDist2$pays1), sprintf("%04d", lesDist2$pays2), sep = "")
lesDist2$ID2 <- paste(sprintf("%04d", lesDist2$pays2), sprintf("%04d", lesDist2$pays1), sep = "")
lesDist2$match <- match(lesDist2$ID, lesDist$ID)
lesDist2[is.na(lesDist2$match),"match"] <- match(lesDist2$ID2[is.na(lesDist2$match)], lesDist$ID)
lesDist2$dist <- lesDist[lesDist2$match, "dist"]
lesDist2 <- lesDist2[,c("pays1", "pays2", "dist")]

lesDist <- lesDist2

# Let's add countries names
lesDist$pays1 <- world.map$names[lesDist$pays1]
lesDist$pays2 <- world.map$names[lesDist$pays2]

There you go. If you wish to use these distances, here is the CSV file. You could also download the RData file.

> load(url("http://egallic.fr/R/Blog/Cartes/countries_distances.RData"))
> head(lesDist)
   pays1        pays2      dist
1 Canada South Africa 11225.350
2 Canada      Denmark  3963.909
3 Canada         USSR  1254.421
4 Canada     Pakistan  7831.515
5 Canada     Aral Sea  6706.607
6 Canada        Italy  4466.916

12 thoughts on “Closest distance between countries

  1. This was exactly what I needed – sadly the data is ancient. I can’t use distance to USSR when trying to figure out distance from Denmark to a number of former USSR countries 🙁

    Really cool work though

  2. Hi Ewen, this is great, exactly what I needed. However, I found that your data excluded island countries such as the UK, New Zealand, Japan, etc. Your script also didn’t work on the latest version of R (3.4.2). I adapted your code to provide for all countries. Code is here: https://gist.githubusercontent.com/mtriff/185e15be85b44547ed110e412a1771bf/raw/1bb4d287f79ca07f63d4c56110099c26e7c6ee7d/getCountryDist.r and CSV output is here: https://gist.githubusercontent.com/mtriff/185e15be85b44547ed110e412a1771bf/raw/1bb4d287f79ca07f63d4c56110099c26e7c6ee7d/countries_distances.csv. Hope this helps someone else.

    1. Hello Matt!
      Your data is very useful, thank you!
      May I ask how could I calculate the shortest distance between a coordinate to the country, and not just country to country? I’m not familiar to the programming language you’re using here unfortunately.

      PS: It’s been a few years since your comment has been sent. I hope you see this message 🙂

    1. Many bordering countries have non-zero distance.

      I presume it’s because of the inaccuracies in the data set, but can’t be sure.

      1. Maybe you can define a threshold value? If the distance between bordering countries are are lower than that threshold you can consider that they actually share a border?

  3. On hand, it’s cool, thanks. On the other hand, how old is this database? Czechoslovakia? No Slovakia? Czechoslovakia has not existed for over 30 years. Also, no Russia? But the USSR is still on it????? Also does not exist for 29ys. Anyways, still cool!

  4. Possibly because it’s a very old list. It still has USSR and other former Eastern-Bloc countries no longer in existence. We had two Germanies, who knows which entry is for which. It seems as though someone renamed East-Germany to Germany. But left all the other non-existing countries there. There used to be a distance between France and East-Germany.

Leave a Reply

Your email address will not be published.

Time limit is exhausted. Please reload CAPTCHA.