This tutorial aims to introduce RMarkdown, a working environment for creating documents in data science.
In a single RMarkdown file, it is possible to write code and execute it, and then produce reports (mixing text, code, and code evaluation results) to be shared. The code can be R, but not only: it is possible to evaluate instructions from other languages such as Python or SQL, among others. The possible output formats are numerous. Among the most used: html, pdf, Word, notebook, ioslides.
The {rmarkdown} package can be installed using the following instruction.
install.packages("rmarkdown")
The main reference document on which this guide is based is the RMarkdown Cookbook, written by Yihui Xie, Christophe Dervieux and Emily Riederer (Chapman & Hall/CRC, 2020). An electronic version is available free of charge at the following address https://bookdown.org/yihui/rmarkdown-cookbook/.
A two-page cheat sheet on R Markdown has been produced by RStudio: https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf.
On RStudio, it is good practice to work with projects. First, create an RStudio project, following the tree structure shown in the image below.
Création d’un projet
In RStudio:
File
menu, then New Project...
.New Directory
, then New Project
.Browse...
button.Open
button.Create Project
button to create the project. This opens a new RStudio session. The current directory becomes the one in which you created the project. An extension file .Rproj
has been created in this directory. Simply open this file in the future to open RStudio to work on this project.On the University computers, you will want to make sure you create the project in the C:/Workout
folder. Compiling on VDI (virtual desktop infrastructure) seems to be impossible at the moment.
Please note: at the end of the session, you should remember to copy and paste the entire directory containing your project into your Documents folder. The contents of the C:/Workout
folder are deleted when you log out.
Now that the project is created, it is time to create an R Markdown document.
Creating an R Markdown document
In RStudio:
Click on the File
menu, then New File...
.
Click on R Markdown...
.
In the window that appears :
HTML
option so that the report that will be created afterwards is an html document (a language designed for presenting web pages).OK
button.Save the file created by giving it a name of your choice (e.g. first_report.Rm
).
An RMarkdown document, with the extension .Rmd
is then created. This document consists of three parts:
In the document you have just created, the YAML header indicates :
title: "Mon premier document R Markdown"
author: "Ewen Gallic"
date: "2/7/2022"
output: html_document
The document title, author and date are specified in this header. When the Rmd file is converted to an html file by Pandoc (a document conversion software), this information will be stored in variables and will appear in one or more places in the html file (depending on the template used). The output: html_document
line indicates that the output document will be an html document. Other elements can be indicated, notably in the output
part: presence of a table of contents, numbering of sections, addition of a style sheet, etc.
In a few words, the conversion steps are as follows:
knit()
function of the package {knitr
} executes the codes contained in the chunks and prepares a Markdow file (.md
)If the output format is pdf, an additional step is added: the .md
file is converted into a LaTeX file (.tex
). A compilation of the .tex
file is then performed by LaTeX to obtain the final pdf file. This requires LaTeX or TinyTeX to be installed on your system.
To add a table of contents to an html file, key-value pairs are added:
toc: yes
: the creation of a table of contents is desired (table of contents) ;toc_depth: 3
: the integer given as a value defines the depth of the table of contents (1: only sections, 2: sections and sub-sections, 3: sections, sub-sections and sub-sub-sections, etc.)toc_float: true
: the table of contents will be inserted as a floating object and permanently visible throughout the document.---
title: "Mon premier document R Markdown"
author: "Ewen Gallic"
date: "2/7/2022"
output: html_document:
toc: yes
toc_depth: 3
toc_float: true
---
Please note that indentations must be respected, as in the previous example.
For a table of contents on a final pdf document, the YAML must contain the pairs toc: yes
and toc_depth:3
.
---
title: "Mon premier document R Markdown"
author: "Ewen Gallic"
date: "2/7/2022"
output: pdf_document:
toc: yes
toc_depth: 3
---
The .Rmd
file can contain text that is written in markdown. More information will be given later in this sheet about markdown syntax (which is very simple).
The pieces of code contain two parts:
To be executed, the code calls on an environment (in which variables can be created). This environment can be modified after the code is executed.
To compile an R Markdown document, once the YAML is well specified, you need to:
call the render()
function of the {rmarkdown
} package:
Rmd
file via the input
argument: rmarkdown::render(input = "your_document_rmarkdown.Rmd")
click on the Knit
button (you can easily spot it with its knitting needle icon and ball of wool);
press the keyboard keys Ctrl / Cmd + Shift + K
simultaneously.
The last two solutions lead to displaying the result in a window that opens at the end of the compilation.
Compile your first Markdown R file in one of the three ways shown, then look at the result.
The text parts that add narrative to the reports can be written in markdown. The syntax is very simple.
Simply write as in a notepad and the text will be displayed in the final report.
Ending a line with two spaces allows you to go to the next line.
Leaving a blank line between two texts creates a new paragraph.
Style | Syntax | Example | Rendering |
---|---|---|---|
Bold | ** ** or __ __ |
**bold** text |
bold text |
Italic | * * or _ _ |
A word in *italic* |
A word in italic |
Strikethrough text | ~~ ~~ |
I ~~like~~ love R |
I |
A part in italics in bold | **- -** |
A **_very_ important text** |
A very important text |
All in bold and italics | *** *** |
***新年快樂*** (Xin nian kuai le) |
新年快樂 (Xin nian kuai le) |
Exponent | ^ ^ |
January 1^st^ |
January 1st |
There are six levels of headings in Markdown R documents. The title text is preceded by as many braces (#
) as the desired level.
# Level 1 title
## Level 2 title
### Level 3 title
#### Level 4 title
##### Level 5 title
###### Level 6 title
Notes :
To make a quotation in a block, the quotation must be preceded by the symbol >
, which is placed at the beginning of the line. Example with> chúc mừng năm mới
:
chúc mừng năm mới
To make a quotation contain several paragraphs, a chevron (>
) must be added at the beginning of empty lines.
> “How can two people hate so much without knowing each other?”
>
> --- Alan Moore, _The Killing Joke_
“How can two people hate so much without knowing each other?”
— Alan Moore, The Killing Joke
To insert a long dash (cadratine), three dashes are used: ---
; for a short dash (or semi-cadratine), two dashes are used: --
.
Desired symbol | Syntax | Example | Rendering | |
---|---|---|---|---|
Long dash (cadratine) | --- |
--- a line |
— a line | |
Middle dash (half-cadratine) | -- |
The France--Italy border |
The France–Italy border |
By typing three dashes ---
and passing immediately to the line, a horizontal line is inserted.
To write ellipses, just write three dots (...
) in a row…
A hyperlink is created using two elements: a text to be clicked on, which must be enclosed in square brackets []
, and an address to which the link points, which must be enclosed in brackets (()
).
[wonderful video](https://www.youtube.com/watch?v=dQw4w9WgXcQ). Look at this
Look at this wonderful video.
To create a link without defining specific text to replace the URL, it is possible to simply write the URL. However, it is preferable to enclose the URL in chevrons. The same applies to an email address.
<https://www.youtube.com/watch?v=oavMtUWDBTM>
<ewen.gallic@univ-amu.fr>
https://www.youtube.com/watch?v=oavMtUWDBTM
ewen.gallic@univ-amu.fr
To create an anchor (a link to a specific location on the page already displayed) to a title of the document, you need to know the reference to the anchor point. A simple way is to define it yourself, using the following syntax:
# Title {#name-of-the-ref}
The name of the reference must not contain spaces or underscores (_
). It may, however, as in the example, contain dashes (-
).
In this document, the sub-section in which this text is included is defined as follows:
## Hyperlinks {#liens-hypertextes}
[section](#liens-hypertextes). This makes it easy to link to this
This makes it easy to link to this section.
Numbered footnotes are inserted using square brackets ([]
) containing a circumferential accent and a reference which can be either a number or text (but without spaces or other blank characters).
The footnote number is a link to the footnote. A return arrow is proposed to go back to the text when the document created is an html document.
[^1] followed by a longer note[^long-note].
A simple footnote
[^1]: the footnote.
[^long-note]: a longer footnote.
In which a paragraph can be written.
`{ some code }`
Several paragraphs can even be written.
There are two types of lists: ordered and unordered.
To create an ordered list, a number is placed at the beginning of the line in front of each item in the list, followed immediately by a full stop and a space.
1. A first element
2. A second element
3. A third element.
It is not necessary to respect the numbering:
1. A first element
10. A second element
5. A third element.
The number of the first element in the ordered list defines the counter value:
4. A first element
10. A second element
5. A third element.
To insert an unordered list, precede each element with the -
symbol or the *
symbol.
A list including :
* A first element.
* A second element.
* A third element.
A list including :
To add a list within a list, either a tab stop or 4 spaces must be added before the dash or star.
- A first element.
- A secibd element:
- Which contains a sub-element.
- And a second sub-element.
- And a third one.
- A third element.
To write a paragraph inside a list, a tab or 4 spaces must be added to maintain the continuity of the list. The paragraph must also be preceded by an empty list (an empty line can also be added after the paragraph, but this is optional).
- A first element.
- A second element:
This element contains a paragraph.
- A third element.
A first element.
A second element:
This element contains a paragraph.
A third element.
It is perfectly possible to nest an ordered list in an unordered list and conversely.
1. A first element:
- With a sub-element.
2. A second element.
Adding an image is done by inserting an exclamation mark (!
), followed by a title in square brackets, and then the path to the image in brackets (()
). A description of the image can be added in inverted commas (""
) after the path, still within the brackets (this description is visible when the mouse pointer hovers over the image for a few seconds, and can be read aloud by systems designed for people with disabilities). Finally, to specify image size parameters, it is possible to add information in square brackets ({}
).
![A Tiger](figs/tigre.JPG "Paper tiger"){width="200px"}
Creating a table is done by separating the header from the body of the table. Each cell is separated with a vertical bar (|
, Alt gr + 6
on an azerty keyboard in Windows, and Alt + Shift + L` on Mac Os). Vertical bars are added to the beginning and end of each row of the table. To delimit the header from the body of the table, three or more dashes should be placed between the vertical bars.
| First column | Second column | Third column |
| ------------ | ------------- | ------------ |
| cell 1, row 1 | cell 2, row 1 | cell 3, row 1 |
| cell 1, row 2 | cell 2, row 2 | cell 3, row 2 | | cell 1, row 3 | cell 2, row 3 | cell 3, row 3 |
First column | Second column | Third column |
---|---|---|
cell 1, row 1 | cell 2, row 1 | cell 3, row 1 |
cell 1, row 2 | cell 2, row 2 | cell 3, row 2 |
cell 1, row 3 | cell 2, row 3 | cell 3, row 3 |
Column alignment is managed using the second line, the one composed of dashes. For a left alignment, two dots (:
) are added to the left of the dashes, for a right alignment, two dots are added to the right of the dashes, and for a centred column, two dots are added to both the left and the right.
| Left-aligned column | Centred column | Right-aligned column |
| :------------------ | :------------: | -------------------: |
| cell 1, row 1 | cell 2, row 1 | cell 3, row 1 |
| cell 1, row 2 | cell 2, row 2 | cell 3, row 2 | | cell 1, row 3 | cell 2, row 3 | cell 3, row 3 |
Left-aligned column | Centred column | Right-aligned column |
---|---|---|
cell 1, row 1 | cell 2, row 1 | cell 3, row 1 |
cell 1, row 2 | cell 2, row 2 | cell 3, row 2 |
cell 1, row 3 | cell 2, row 3 | cell 3, row 3 |
It is possible to use LaTeX to write mathematical equations in Markdown. There are two types of formulas:
$equation$
)$$equation$$
). Inline formulas, $f(x)=x^2+3x+4, x\in\mathbb{R}$, allow you to insert formulas in a paragraph, while _display_ formulas allow you to insert formulas in a separate block $$X\sim\mathcal{N}(0,1).$$
Inline formulas, \(f(x)=x^2+3x+4, x\in\mathbb{R}\), allow you to insert formulas in a paragraph, while display formulas allow you to insert formulas in a separate block \[X\sim\mathcal{N}(0,1).\]
A cheat sheet by Jim Hefferon for writing formulas in LaTeX is available on the Comprehensive TeX Archive Network (CTAN): http://tug.ctan.org/info/undergradmath/undergradmath.pdf.
There are two ways to present code in RMarkdown: either inline code, directly inserted into the text, or code blocks. To write inline code, you write the code between two grave accents (` `
, Alt gr + 7
on an azerty keyboard under Windows, key under Mac Os). If you want to have syntax highlighting, you must also indicate the language between square brackets (
{}
), after a full stop ({.language-name}
):
`mean(x)`{.R} is used to calculate the mean of `x`.
The function `SELECT * FROM base`{.SQL} statement retrieves all the columns in `base` in SQL. The
The function mean(x)
is used to calculate the mean of x
.
The SELECT * FROM base
statement retrieves all the columns in base
in SQL.
To insert blocks of code, three grave accents (```
) are used before and after the code:
```
x <- mean(ventes$Sales, na.rm=TRUE)
```
Which will produce:
x <- mean(ventes$Sales, na.rm=TRUE)
Again, if the language of the code contained in the block is specified, syntax highlighting is obtained. The name of the language is given immediately after the three lowercase accents.
```r
<- mean(ventes$Sales, na.rm=TRUE)
x ```
Will give :
<- mean(ventes$Sales, na.rm=TRUE) x
Codes and code blocks written in the section “Codes and code blocks” are never executed. One of the interests of RMarkdown is the ability to evaluate codes (we will only run R here, but it is possible to run other languages in an RMarkdown file), display them, and display the results and output of the evaluation.
Again, we can distinguish between two types of code: inline code and code chunks.
Inline codes are delimited with a grave accent at the beginning and one at the end, as in markdown. The only difference is that the name of the language appears immediately after the first grave accent. The code will not be returned, only its result will appear in the final document. Let’s take a simple example.
`r mean(c(20, 18, 15, 16))`. The average grade is
The average score is 17.25.
To insert blocks of code, we use the same three grave accents before and after the code as in Markdown. The third grave accent at the beginning is followed by the name of the language in square brackets ({r}):
```{r}
x <- c(20, 18, 15, 16)
x
```
The results is as follows:
<- c(20, 18, 15, 16)
x x
## [1] 20 18 15 16
Note that by default, the code is present in the final document as well as the result of its evaluation (as displayed in the console).
Note: we have just created the x
object in the environment. We can reuse it in new code, either inline (`r `
) : 20, 18, 15, 16 or in a chunk (```{r} ```
):
```{r}
x
```
which will then give:
x
## [1] 20 18 15 16
Note : To insert a chunk, you can use the shortcut Ctrl + Alt + I
under Windows or cmd + alt/option + I
under Mac Os.
When defining a chunk, it is possible to add many options that will have an impact on the functioning of the chunk. For example, you can ask to display the code without executing it, to display the result while hiding the code, or to control the display parameters of the figures if the code produces them. There are more than 50 settings. The list is available on the site of the creator of the package {knitr
} :
https://yihui.org/knitr/options/.
The syntax is as follows:
```{r, options}
code
```
A good practice is to assign a unique identifier to each chunk.
```{r identifiant-du-chunk, options}
x
```
Here are some practical options.
Syntax | Effect |
---|---|
include = FALSE |
Neither the code nor any results displayed in the console will appear in the final document. |
echo = FALSE |
The code will not appear in the final output, while the results will be visible. |
eval = FALSE |
The code will appear in the final document but will not be executed. |
message = FALSE |
Messages generated by the code execution will not appear in the final document. |
warning = FALSE |
Warning messages generated by the execution of the code will not appear in the final document. |
fig.cap = "titre" |
If the code produces an image, the title given in quotes will be used as the title of the figure. |
fig.width = "2" , figh.height="4 |
If the code produces an image, allows you to define its width and height, respectively. The values are given in inches (inches). |
out.width='50%' , out.height='75%' |
If the code produces or displays an image, allows you to define, in percentages, the size of the image. |
Here is an example with an image:
```{r plot-iris, out.width='75%', fig.cap = "Width of sepals as a function of length."}
plot(iris$Sepal.Length, iris$Sepal.Width)
```
which gives:
plot(iris$Sepal.Length, iris$Sepal.Width)
The setting of the _chunk set is set by a list (knitr::opts_chunk
). By changing the values in this list as in the example below, and placing this statement in a chunk positioned just after the header in YAML, all chunks will adopt the specified behaviour.
In the example below, all chunks codes will be visible (echo = TRUE
), messages and warnings returned during code execution will be hidden (message = FALSE
and warning = FALSE
), figures produced in R will be centred (fig.align = "center"
)
```{r chunks-settings, echo=FALSE, include=TRUE, message=FALSE, warning=FALSE}
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE,
fig.align = "center")
```
Note: The general chunk behaviour settings can then be modified for specific chunks. To do this, simply add the desired behaviour to the settings of the chunks to be modified. For example, if echo = TRUE
is set in the list of parameters in the knitr::opts_chunk
list, if one wishes to hide the code of a specific chunk, one can simply add the option echo = FALSE
to it:
```{r, echo=FALSE}
# chunk code goes here
```
When an RMarkdown document contains calculations that take a long time to compile, it is possible to cache the results of all or some of the chunks. To do this, simply specify the cache = TRUE
parameter in the chunks you wish to cache.
When the document is compiled, cached chunks will not be executed by {knitr
} if they were executed during the previous compilation and no changes were made to them. If the code inside a chunk has been modified between the time it was executed by {knitr
} during a compilation and a new compilation (or if a parameter of the chunk other than include
has been modified), then it will be re-evaluated and re-cached.
It is risky to set all chunks to be cached. If the je chunk is changed, it will be re-evaluated by {knitr
} at compile time. However, if the evaluation depends on variables defined before the je chunk but which have been modified after, the results returned by the new evaluation of the je chunk may not be reproducible…
It is sometimes useful to clear the cache to avoid the errors mentioned above. This can be done by clicking on the down arrow to the right of the Knit
button, then clicking on the line starting with the broom emoji Clear Knitr Cache...
.
Note: Caching is especially useful when loading estimation results from models that take a long time to compute (e.g., machine-learning models).
Citing a source has standards. One of the standards frequently encountered in economics is that of the American Psychological Association. It uses the acronym of the association: APA.
French explanations of these standards can be found on the Université de Montréal libraries website at the following address: https://bib.umontreal.ca/citer/styles-bibliographiques/apa.
There are two ways of citing references: in the text and in the bibliography.
With RMarkdown we only have to worry about the citation in the text. The addition of the cited references in the bibliography is then done automatically, in accordance with the standard used. To do this, you need to :
A bibliography file in `.bib’ (BibTeX) format must first be created. This file contains the references, formatted according to the expected standard for bib files. On the bibtex.com website, the following page provides a guide to BibTeX formats. https://www.bibtex.com/g/bibtex-format/.
Each entry in the BibTeX file corresponds to a bibliographic reference. It is composed of three parts:
Let’s take an example:
@book{xie_2018_rmarkdown,
title={R markdown: The definitive guide},
author={Xie, Yihui and Allaire, Joseph J and Grolemund, Garrett},
year={2018},
publisher={Chapman and Hall/CRC}
}
This is a book, so the document type is indicated using @book
. The unique reference is xie_2018_rmarkdown
, and the key-value pairs are given in a list where each key is given before the equal symbol and each value is given after the equal symbol. The pairs are separated by a comma.
Your turn: create a new R script file in RStudio. In this file, paste the bibliographic entry from the book by Yihui Xie et al. Save the document as biblio.bib
.
You will be asked if you wish to change the file extension to actually use .bib
rather than .R
: say yes.
In order to avoid having to write all the entries in the BibTeX file by hand, there are a number of tools available to make the job easier. We will look at some of them.
As a first step, one can check if the reference exists on Crossref. This site is very useful for obtaining generally clean citations, but it has a drawback: you sometimes have to be patient, the site frequently encounters 500 (Internal Server Error) errors.
Let’s try to find the reference for Susan Athey’s article “The State of Applied Econometrics: Causality and Policy Evaluation”, published in “The Journal of economic perspectives” (JEP) in 2017.
After clicking on the “Search metadata” button, you can paste the title of the article: The State of Applied Econometrics: Causality and Policy Evaluation
, and then start the search by pressing the Enter
key on the keyboard.
Among the proposed references, when you have identified the one that corresponds to the one you are looking for, you can retrieve the metadata by clicking on the Actions
link and then on Cite
.
The bibliographic entry in BibTeX format is then displayed, all that remains is to copy and paste it into our bib file.
Your turn: search for the reference of Susan Athey’s article published in JEP in 2017 and paste the entry into your BibTeX file.
Your BibTeX file should now contain two entries (don’t forget to save it):
@book{xie_2018_rmarkdown,
title={R markdown: The definitive guide},
author={Xie, Yihui and Allaire, Joseph J and Grolemund, Garrett},
year={2018},
publisher={Chapman and Hall/CRC}
}@article{Athey_2017,
doi = {10.1257/jep.31.2.3},
url = {https://doi.org/10.1257%2Fjep.31.2.3},
year = 2017,
month = {may},
publisher = {American Economic Association},
volume = {31},
number = {2},
pages = {3--32},
author = {Susan Athey and Guido W. Imbens},
title = {The State of Applied Econometrics: Causality and Policy Evaluation},
journal = {Journal of Economic Perspectives}
}
Sometimes the desired information cannot be found on Crossref. It may be a good idea to look on Google Scholar if the reference exists. At present, references on Google Scholar seem to be less well informed than on Crossref.
Let’s try to find the reference of the paper by Olivier Deschênes and Michael Greenstone published in the American Economic Review (AER) in 2007. Let’s look for the paper by its title:
The economic impacts of climate change: evidence from agricultural output and random fluctuations in weather
Once the article is identified in the results, simply click on the Cite
link.
A window pops up. Pre-formatted citations in different styles are given. To retrieve the metadata in BibTeX format, click on the BibTeX
link.
The metadata is displayed in a web page. The only thing left to do is to copy and paste it into the BibTeX file.
Your turn: find the reference of the article by Olivier Deschênes and Michael Greenstone published in the AER in 2007 and paste the entry into your BibTeX file.
Your BibTeX file should now contain three entries (don’t forget to save it):
@book{xie_2018_rmarkdown,
title={R markdown: The definitive guide},
author={Xie, Yihui and Allaire, Joseph J and Grolemund, Garrett},
year={2018},
publisher={Chapman and Hall/CRC}
}@article{Athey_2017,
doi = {10.1257/jep.31.2.3},
url = {https://doi.org/10.1257%2Fjep.31.2.3},
year = 2017,
month = {may},
publisher = {American Economic Association},
volume = {31},
number = {2},
pages = {3--32},
author = {Susan Athey and Guido W. Imbens},
title = {The State of Applied Econometrics: Causality and Policy Evaluation},
journal = {Journal of Economic Perspectives}
}@article{deschenes2007economic,
title={The economic impacts of climate change: evidence from agricultural output and random fluctuations in weather},
author={Desch{\^e}nes, Olivier and Greenstone, Michael},
journal={American economic review},
volume={97},
number={1},
pages={354--385},
year={2007}
}
If the reference you are looking for is not available either on the journal or publisher’s site, or on crossref, or on Google Scholar, there is still the possibility of using a citation generator like the one proposed on this site: https://www.bibme.org/bibtex. It is also possible to enter all the information by hand (but beware of errors!).
Once the BibTeX file has been created, the YAML header (metadata header) must be filled in:
biblio.bib
file is in the same directory as the RMarkdown file).---
output: html_document
bibliography: biblio.bib
biblio-style: apalike
link-citations: yes
---
We now have a BibTeX file containing 3 entries, the RMarkdown document header specifies where this bibliographic entries file is located. Now we have to cite the entries in the text.
To cite a document, you must use its unique identifier, which is placed between square brackets, after an arobase symbol: [@identifier]
.
Description | Syntax | Rendering in text |
---|---|---|
Quote in brackets | [@xie_2018_rmarkdown] |
(Xie, Allaire, and Grolemund 2018) |
Delete the author’s name | [-@xie_2018_rmarkdown] |
(2018) |
Adding more information | [voir @xie_2018_rmarkdown, pp. 10--12] |
(voir Xie, Allaire, and Grolemund 2018, 10–12) |
Multiple quotes | [@xie_2018_rmarkdown;Athey_2017] |
(Xie, Allaire, and Grolemund 2018; Athey and Imbens 2017) |
Citation without brackets for authors | @xie_2018_rmarkdown |
Xie, Allaire, and Grolemund (2018) |
The bibliography at the end of the document will be automatically added. A # References
section can be placed at the end of the document to make a break with the previous section. This section will not be numbered.