The other day during my R lecture, something I did not expect happened… I should have known, or at least guess that it would not work… But I wanted to believe it would. When I gave examples on how to read both English and French formatted dates in R, what was working perfectly on Mac OS X did not on Windows Seven. The example was the following:

d_char_fr <- "Mer 04 Fév 2015"
d_char <- "Wed 04 Feb 2015"
as.Date(d_char_fr, format = c("%a %d %b %Y"))
as.Date(d_char, format = c("%a %d %b %Y"))

Both returned NA values on Windows, while the first returned, as expected "2015-02-04" on my own computer, running on Mac OS X. So, what is the problem here? We can blame Mac for not respecting French abbreviations! Indeed, running those two lines on Windows gives different format for weekdays abreviations...

> (d <- seq(as.Date("2014-01-01"), as.Date("2014-12-01"), by = "month"))
 [1] "2014-01-01" "2014-02-01" "2014-03-01" "2014-04-01"
 [5] "2014-05-01" "2014-06-01" "2014-07-01" "2014-08-01"
 [9] "2014-09-01" "2014-10-01" "2014-11-01" "2014-12-01"
> format(d, "%b")
 [1] "janv." "févr." "mars"  "avr."  "mai"   "juin"  "juil."
 [8] "août"  "sept." "oct."  "nov."  "déc."

How can we get to the bottom of this issue? I have no beautiful answer to that question. I've come up with something that does the job though. Let's create a function that takes a date as an argument, a vector of bad abreviations and a vector of corresponding correct abbreviationss.

# Weekdays
jours_ab <- c("Lun", "Mar", "Mer", "Jeu", "Ven", "Sam", "Dim")
jours_ab_correcte <- c("lun.", "mar.", "mer.", "jeu.", "ven", "sam.", "dim.")
# Months
mois_ab <- c("jan", "fév", "mar", "avr", "mai", "jui", "jul", 
             "aou", "sep", "oct", "nov", "déc")
mois_ab_correcte <- c("janv.", "févr.", "mars", "avr.", "mai", "juin", "juil.",
  "août", "sept.", "oct.", "nov.", "déc.")

library(stringr)
# Replaces the incorrect abbreviation by a correct one
# @string:        input character string
# @abbreviation:  string vector of abbreviations to replace
# @correct_abb:   string vector of corresponding correct abbreviations
replace_abb <- function(string, abbreviation, correct_abb){
  ind <- str_detect(string, ignore.case(abbreviation))
  if(any(ind)){
    ind <- which(ind)
    str_replace(string, ignore.case(abbreviation[ind]), correct_abb[ind])
  }else{
    string
  }
}

Let us use this function on our not correctly formatted date:

d_char_fr <- replace_abb(d_char_fr, jours_ab, jours_ab_correcte)
d_char_fr <- replace_abb(d_char_fr, mois_ab, mois_ab_correcte)

Our character date ends up looking like this:

> d_char_fr
[1] "mer. 04 févr. 2015"

Let's transform this string into a Date object:

> as.Date(d_char_fr, format = c("%a %d %b %Y"))
[1] "2015-02-04"

So this function solves the first problem we encountered during the lesson. There was another one, which is location names in R. I am used to Mac OS, and didn't expect these location names to be so different in Windows... What happens if we try to convert our second string date as a Date object?

> as.Date(d_char, format = c("%a %d %b %Y"))
[1] NA

Of course, we get a non-available vector, because weekdays and months are spelled in English, and our time locale steeing is set to French. So, we need to change this time locale setting:

> Sys.setlocale("LC_TIME", "English_United States")
[1] "English_United States.1252"
> as.Date(d_char, format = c("%a %d %b %Y"))
[1] "2015-02-04"

Now, be careful, if you change this setting, don't expect to be able to convert a string date written using French names for months and weekdays to a Date object. Indeed, we need to set the correct time locale back to French:

> as.Date(d_char_fr, format = c("%a %d %b %Y"))
[1] NA
> Sys.setlocale('LC_TIME', "French_France")
[1] "French_France.1252"
> as.Date(d_char_fr, format = c("%a %d %b %Y"))
[1] "2015-02-04"
> as.Date(d_char, format = c("%a %d %b %Y"))
[1] NA

To finish this short note, I'd like to remind you this can be easily accomplished using the awesome package called lubridate:

> dmy("mer. 04 févr. 2015", locale = "french_france")
[1] "2015-02-04 UTC"
> dmy("Wed 04 Feb 2015", locale = "english_us")
[1] "2015-02-04 UTC"

3 thoughts on “French dates in R – From Mac OS to Windows

  1. Je tombe sur ce billet par hasard.

    Très chouette explication du VIF (j’aurais rêvé d’avoir lu ton billet il y a quelques années) – mais je ne comprends pas le titre 🙂

  2. Thanks for the help. It solved my problem.
    I work on a French Windows system and I wanted to convert my date column from character to date format but was returned NA until I changed the
    Sys.setlocale(“LC_TIME”, “English_United States”)
    [1] “English_United States.1252”
    Then now
    data_res$date<-as.Date(data_res$date,"%d%b%Y")
    works perfectly fine.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.