I also reported on the progress of our research during some morning presentations with the members of the Actinfo Chair (see the blog post). Today, Arthur and I published a working paper on HAL untitled "Étude de la démographie française du XIXe siècle à partir de données collaboratives de généalogie" (French for "Nineteenth-century French demography from collaborative genealogy data"). The paper is written in French. Here is the English abstract:

In the digital age, collaborative data can be collected massively at low costs. Genealogy sites are blooming on the Internet to offer their users the chance to recover their family tree online. The collection and digitalization processes done by these users can potentially be reused in historical demography to complete the knowledge of our ancestors’ past. In our study, based on records of 2,457,450 French or French-born individuals who lived in the nineteenth century, we show that it is possible to find, although some biases sometimes remain, certain results of the literature. We propose to explore the temporal characteristics contained in the family trees to study longevity. We also investigate the spatial characteristics of the data to analyze internal migrations of France.

In this paper, we explore a dataset of 2.45 million individuals, corresponding to people born between 1800 and 1804 in France and their descendants over 3 generations. The raw data was huge: more than 700 million lines. Each line represents an event (birth, marriage or death) for an individual in the tree of a geneanet.org user. However, as each user creates his own tree (it should be noted that we do not have access to the trees of users who did not want to make it public), some individuals are duplicated in the database. A lot of work has been done to match and clean the trees, which has led to 2.45 million people at the end of the day.

Distribution of years of birth.

In the paper, we investigated two aspects: a first using temporal characteristics, i.e., the mortality of individuals; and a second exploring spatial characteristics, i.e., the migratory movements from generation to generation.

A small snapshot of what has been done is shown in the figure below, for which we have drawn estimates of survival function (left) and force of mortality (right). We compared our estimates with those of Vallin and Meslé (2001).

Comparison of survival (left) and force of mortality functions (right) estimated for women and men with historical estimates based on life tables.

With regard to migration, for example, we examined the distances between the birth places of ancestors born between 1800 and 1804 and those of their descendants. We can see in the figure below the distribution of these distances, with a logarithmic scale in abscissa.

Migration between generations.

The rest of the paper is available online on HAL. We also provide a companion online methodology annex published on Github.


Vallin, J. et Meslé, F. (2001). Tables de mortalité françaises pour les XIXe et XXe siècles et projections pour le XXIe siècle. Éditions de l’Institut national d’études démographiques.

    Merci pour le lien Fr. ! Ca fait pas mal de buzz l'étude de Kaplanis et co-auteurs dans Nature. Ils ont des résultats très intéressants et semblent avoir une représentativité spatiale assez dingue !

