Tag Archives: Statistiques


Meetup Machine Learning Aix-Marseille S04E02

Meetup Machine Learning Aix-Marseille
Tonight I am participating in the Machine Learning Aix-Marseille Meetup, for the second session of this fourth edition. I am speaking after Leonardo Noleto, senior data scientist at Bleckwen FinTech who is developing a solution to fight financial fraud with machine learning. I will present the project on which Enora Belz, Romain Gaté, Vincent Malardé, Jimmy Merlet, Arthur Charpentier and I worked on last summer for the 2018 Football World Cup (see a previous post). The idea was to use machine learning techniques to predict the outcome of football matches (win, draw or defeat).

The slides are available here (in French): http://www.egallic.fr/Recherche/Worldcup_2018/2018_meetup_ML/egallic_meetup.html


Coupe du Monde 2018: Paul the octopus is back

Fifa World Cup 2018

On the occasion of Euro 2008 and Mondial 2010, the Oberhausen oracle (more commonly known as “Paul the octopus”) made the headlines. His exact predictions regarding the results of the German team at Euro 2008 and the appointment of the winning team of the 2010 World Cup (Spain) are still etched in the memories. With some colleagues (Enora Belz, Romain Gaté, Vincent Malardé and Jimmy Merlet) we tried to continue the work of the late Paul the octopus to predict the outcome of the upcoming meetings of the 2018 World Cup. To do this, we rely on the results of past World Cup and Continental Cup meetings.1


Historical Demographics and Collaborative Data

Généalogie de Victor Hugo

A few month ago, I mentioned in a blog post that I had presented the beginnings of the work undertaken with Arthur Charpentier on historical demographics using collaborative data from a genealogy website, geneanet.org. I also reported on the progress of our research during some morning presentations with the members of the Actinfo Chair (see the blog post). Today, Arthur and I published a working paper on HAL untitled “Étude de la démographie française du XIXe siècle à partir de données collaboratives de généalogie” (French for “Nineteenth-century French demography from collaborative genealogy data”). The paper is written in French.


Where’s Waldo? Here he is!

Yesterday, I came across a nice article untitled “Here’s Waldo: Computing the optimal search strategy for finding Waldo“, written by Randal S. Olson. I used the data he shared to apply a correction to the kernel density estimation of Waldo’s location.

In this article, Randal explains that he has devoted some time to try to compute the optimal search strategy for finding Waldo. To that end, he has used some machine learning techniques.
From an image provided by Slate (Here’s Waldo, 2013, by Ben Blatt), Randal S. Olson retrieved the coordinates of 68 different locations of Waldo, and kindly shared the data afterwards.



[L3 Eco-Gestion] Régression linéaire avec R : sélection de modèle

Après avoir présenté rapidement la régression linéaire multiple avec R, et parlé un peu des problèmes de multicolinéarité, on va se pencher sur différentes techniques qu’il est possible d’employer pour sélectionner un modèle. Bien sûr, il en existe beaucoup d’autres. Le but est ici de donner un rapide aperçu.


Lumières !

L’autre jour, en quittant la salle de reprographie de la Fac, j’ai eu droit à une petite remontrance. Une personne m’indique qu’il est agaçant de devoir se lever à chaque fois que je passe pour rallumer les lumières (cette personne travaille dans le bureau à côté). Je trouve la remarque étrange, ayant toujours été convaincu qu’il était normal d’éteindre l’interrupteur lorsque l’on quitte une pièce.