2 Types of Data

Many types of data are integrated into Python. In this section we will discuss strings, numerical values, booleans (TRUE/FALSE), the null value, dates and times.

2.1 Strings

A string is a collection of characters such as letters, numbers, spaces, punctuation marks, etc.

Strings are marked with single, double, or triple quotation marks.

Here is an example:

x = "Hello World"

To display the content of our variable x containing the string in the console, the function print() can be used:

## Hello World

As indicated just before, single quotation marks can be used to create a string:

y = 'How are you?'
## How are you?

To include apostrophes in a character string created using single quotation marks, one must use an escape character: a backslash (\):

z = 'I\'m fine'
## I'm fine

Note that if the string is created using double quotation marks, it is not necessary to use the escape character:

z = "I'm \"fine\""
## I'm "fine"

To specify a line break, we use the following string: \n.

x = "Hello, \nWorld"
## Hello, 
## World

In the case of character strings on multiple lines, using single or double quotation marks will return an error (EOL while scanning trial literal, i.e., detection of a syntax error, Python was expecting something else at the end of the line). To write a string on several lines, Python suggests using quotation marks (single or double) at the beginning and end of the string three times:

x = """Hello,
## Hello,
## World

The character \ (backslash) is the escape character. It allows to display certain characters, such as quotation marks in a string defined by quotation marks, or control characters, such as tabulation, line breaks, etc. Here are some common examples:

Code Description Code Description
\n New line \r Line break
\t Tabulation \b Backspace
\\ Backslash \' Quotation mark
\" Double quotation mark \` Grave accent

To obtain the length of a string, Python offers the function len():

x = "Hello World !"
## 13
print(x, len(x))
## Hello World ! 13

2.1.1 Concatenation of Strings

To concatenate strings, i.e., to put them end to end, Python offers to use the operator +:

print("Hello" + " World")
## Hello World

The * operator allows us to repeat a string several times:

print( 3 * "Go Habs Go! " + "Woo Hoo!")
## Go Habs Go! Go Habs Go! Go Habs Go! Woo Hoo!

When two literals of strings are side by side, Python concatenates them:

x = ('You shall ' 'not ' "pass!")
## You shall not pass!

It is also possible to add the content of a variable to a string, using brackets ({}) and the method format():

x = "I like to code in {}"
langage_1 = "R"
langage_2 = "Python"
preference_1 = x.format(langage_1)
## I like to code in R
preference_2 = x.format(langage_2)
## I like to code in Python

It is possible to add more than one variable content in a string, always with brackets and the method format():

x = "I like to code in {} and in {}"
preference_3 = x.format(langage_1, langage_2)
## I like to code in R and in Python

2.1.2 Indexing and Extraction

Strings can be indexed. Be careful, the index of the first character starts at 0.

To obtain the ith character of a string, brackets can be used. The syntax is as follows:


For example, to display the first character, then the fifth of the Hello string:

x = "Hello"
## H
## o

The extraction can be done starting at the end of the chain, by preceding the value of the index with the minus sign (-).

For example, to display the penultimate character of our string x:

## l

The extraction of a substring by specifying its start and end position (implicitly or not) is also done with the brackets. We just need to specify the two index values: [start:end] as in the following example:

x = "You shall not pass!"

# From the fourth character (not included) to the ninth (included)
## shall

When the first value is not specified, the beginning of the string is taken by default; when the second value is not specified, the end of the string is taken by default.

# From the 4th character (non included) to the end of the string
# From the beginning of the string to the penultimate (included)
# From the 3rd character before the end (included) to the end
## shall not pass!
## You shall not pass
## pass!

It is possible to add a third argument in the brackets: the step.

# From the 4th character (not included), 
# to the end of the string, in steps of 3
## sln s

To obtain the chain in the opposite direction:

## !ssap ton llahs uoY

2.1.3 Available Methods with Strings

Many methods are available for strings. By adding a dot (.) after the name of an object designating a string and then pressing the tab key, the available methods are displayed in a drop-down menu.

For example, the count() method allows us to count the number of occurrences of a pattern in the string. To count the number of occurrences of in in the following string:

x = "le train de tes injures roule sur le rail de mon indifférence"
## 3
Once the method call has been written, by placing the cursor at the end of the line and pressing the Shift and Tabulation keys, explanations can be displayed. Conversion to upper or lower case

The lower() and upper() methods allow us to pass a string in lowercase and uppercase characters, respectively.

x = "le train de tes injures roule sur le rail de mon indifférence"
## le train de tes injures roule sur le rail de mon indifférence

When we wish to find a pattern in a string, we can use the method find(). A pattern to be searched is provided in arguments. The find() method returns the smallest index in the string where the pattern is found. If the pattern is not found, the returned value is -1.

## 6
## -1

It is possible to add as an option an indication allowing to restrict the search on a substring, by specifying the start and end index :

print(x.find("in", 7, 20))
## 16

Note: the end index can be omitted; in this case, the end of the string is used:

print(x.find("in", 20))
## 49
If one does not want to know the position of the sub-chain, but only its presence or absence, one can use the operator in: print("train" in x)

To perform a search without regard to case, the method capitalize() can be used:

x = "Mademoiselle Deray, il est interdit de manger de la choucroute ici."
## -1
## 13 Splitting Strings

To split a string into substrings, based on a pattern used to delimit the substrings (e.g., a comma or a space), the method split() can be used:

print(x.split(" "))
## ['Mademoiselle', 'Deray,', 'il', 'est', 'interdit', 'de', 'manger', 'de', 'la', 'choucroute', 'ici.']

By indicating a numerical value as arguments, it is possible to limit the number of substrings returned:

# Will return the elements matched up to the index 3
print(x.split(" ", 3))
## ['Mademoiselle', 'Deray,', 'il', 'est interdit de manger de la choucroute ici.']

The splitlines() method also allows us to separate a string of characters according to a pattern, this pattern being an end of line character, such as a line break or a carriage return for example.

x = '''"No, I am your Father!
- No... No. It's not true! That's impossible!
- Search your feelings. You know it to be true.
- Noooooooo! Noooo!"'''
## ['"No, I am your Father!', "- No... No. It's not true! That's impossible!", '- Search your feelings. You know it to be true.', '- Noooooooo! Noooo!"'] Cleaning, completion

To remove blank characters (e.g., spaces, line breaks, quadratins, etc.) at the beginning and end of a string, we can use the strip() method, which is sometimes very useful for cleaning strings.

x = "\n\n    Pardon, du sucre ?     \n  \n"
## Pardon, du sucre ?

It is possible to specify in arguments which characters to remove at the beginning and end of the string:

x = "www.egallic.fr"
## egallic

Sometimes we have to make sure to obtain a string of a given length (when we have to provide a file with fixed widths for each column for example). The rjust() method is then a great help. By entering a string length and a fill character, it returns the string with a possible completion (if the length of the returned string is not long enough with respect to the requested value), repeating the fill character as many times as necessary.

For example, to have a longitude coordinate stored in a string of characters of length 7, adding spaces may be necessary:

longitude = "48.11"
print(longitude.rjust(7," "))
##   48.11 Replacements

The replace() method allows to perform replacement of patterns in a character string:

x = "Criquette ! Vous, ici ? Dans votre propre salle de bain ? Quelle surprise !"
print(x.replace("Criquette", "Ridge"))
## Ridge ! Vous, ici ? Dans votre propre salle de bain ? Quelle surprise !

This method is very convenient for removing spaces for example:

print(x.replace(" ", ""))
## Criquette!Vous,ici?Dansvotrepropresalledebain?Quellesurprise!

Here is a table listing some of the available methods ([exhaustive list in the documentation] (https://docs.python.org/3/library/stdtypes.html#string-methods)):

Method Description
capitalize() Capitalization of the first character and lowercase of the rest
casefold() Removes case distinctions (useful for comparing strings without regard to case)
count() Counts the number of occurrences (without overlap) of a pattern
encode() Encodes a string of characters in a specific encoding
find() Returns the smallest element where a substring is found
lower() Returns the string having passed each alphabetical character in lower case
replace() Replaces one pattern with another
split() Separates the chain into substring according to a pattern
title() Returns the string after passing each first letter of a word through a capital letter
upper() Returns the string having passed each alphabetical character in upper case

2.1.4 Conversion to character strings

When we want to concatenate a string with a number, Python returns an error.

nb_followers = 0
message = "He has " + nb_followers + "followers."
## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: must be str, not int
## Detailed traceback: 
##   File "<string>", line 1, in <module>
## Error in py_call_impl(callable, dots$args, dots$keywords): NameError: name 'message' is not defined
## Detailed traceback: 
##   File "<string>", line 1, in <module>

We should then convert the object that is not a string into a string beforehand. To do this, Python offers the function str():

message = "He has " + str(nb_followers) + " followers."
## He has 0 followers.

2.1.5 Exercise

  1. Create two variables named a and b so that they contain the following strings respectively: 23 to 0 and C'est la piquette, Jack!.
  2. Display the number of characters from a, then b.
  3. Concatenate a and b in a single string, adding a comma as a separating character.
  4. Same question by choosing the separation line as the separator character.
  5. Using the appropriate method, capitalize a and b.
  6. Using the appropriate method, lowercase a and b.
  7. Extract the word la and Jack from the string b, using indexes.
  8. Look for the sub-chain piqu in b, then do the same with the sub-chain mauvais.
  9. Return the position (index) of the first character a found in the string b, then try with the character w.
  10. Replace the occurrences of the pattern a by the pattern Z in the substring b.
  11. Separate the string b using the comma as a sub-chain separator.
  12. (Bonus) Remove all punctuation characters from string b, then use an appropriate method to remove white characters at the beginning and end of the string. (Use the `regex’ library).

2.2 Numerical values

There are four categories of numbers in Python: integers, floating point numbers and complexes.

2.2.1 Integers

Integers (ints), in Python, are signed integers.

The type of an object is accessed using the type() function in Python.
x = 2
y = -2
## <class 'int'>
## <class 'int'>

2.2.2 Floating Point Numbers

Floats are real numbers. They are written using a dot to distinguish the integer part from the decimal part of the number.

x = 2.0
y = 48.15162342
## <class 'float'>
## <class 'float'>

Scientific notations can also be used, using E or e to indicate a power of 10. For example, to write \(3.2^{12}\):

x = 3.2E12
y = 3.2e12
## 3200000000000.0
## 3200000000000.0

In addition, when the number is equal to a fraction of 1, it is possible to avoid writing the zero:

## 0.35
## 0.35

2.2.3 Complex numbers

Python allows us to natively manipulate complex numbers, of the form \(z=a+ib\), where \(a\) and \(b\) are floating point numbers, and such that \(i^2=(-i)^2=1\). The real part of the number, \(\mathfrak{R}(z)\), is \(a\) while its imaginary part, \(\mathfrak{I}(z)\), is \(b\).

In python, the imaginary unit \(i\) is denoted by the letter j.

z = 1+3j
## (1+3j)
## <class 'complex'>

It is also possible to use the complex() function, which requires two arguments (the real part and the imaginary part):

z = complex(1, 3)
## (1+3j)
## <class 'complex'>

Several methods are available with complex numbers. For example, to access the conjugate, Python provides the method conjugate():

## (1-3j)

Access to the real part of a complex or its imaginary part is done calling the real and imag elements, respectively.

z = complex(1, 3)
## 1.0
## 3.0

2.2.4 Conversions

To convert a number to another digital format, Python has a few functions. Conversion to Integer

The conversion of an integer or string is done using the function int():

x = "3"
x_int = int(x)
## <class 'int'>
## <class 'str'>

Note that the conversion of a floating point number truncates the number to keep only the integer part:

x = 3.6
x_int = int(x)
## 3 Conversion to Floating Point Number

To convert a number or string to a floating point number or string (if possible), Python suggests using the function float().

x = "3.6"
x_float = float(x)
## <class 'float'>

With an integer:

x = 3
x_float = float(x)
## 3.0 Conversion to Complex

The conversion of a number or a string of characters into a complex number is done with the function complex():

x = "2"
x_complex = complex(x)
## (2+0j)

With a float :

x = 2.4
x_complex = complex(x)
## (2.4+0j)

2.3 Booleans

Logical data can have two values: True or False. They correspond to a logical condition. Care must be taken to ensure that the case is well respected.

x = True
y = False
print(x, y)
## True False

True can be automatically converted to 1; False to 0. This can be very convenient, for example, when counting true or false values in the columns of a data table.

res = True + True + False + True*True
## 3

2.4 Empty Object

The empty object, commonly called null, has an equivalent in Python: None. To assign it to a variable, one should be careful with case:

x = None
## None
## <class 'NoneType'>

The None object is a neutral variable, with “null” behavior.

To test if an object is the None object, we proceed as follows (the result is a Boolean):

x = 1
y = None
print(x is None)
## False
print(y is None)
## True

2.5 Dates and Times

There are several moduels to manage dates and time in Python. We will explore part of the datetime module.

2.5.1 Module Datetime

Python has a module called datetime which offers the possibility to manipulate dates and durations (dates and times).

There are several types of objects designating dates:

  • date: a date according to the Gregorian calendar, indicating the year, month and day
  • time: a given time, without taking into account a particular day, indicating the hour, minute, second (possibly the microsecond and time zone as well)
  • datetime: a date combining date and time;
  • timedelta: a time between two objects of the type dates, time or datetime;
  • tzinfo: an abstract basic type, providing information about time zones;
  • timezone: a type using the tzinfo type as a fixed offset from UTC. Date

Objects of type date refer to dates in the Gregorian calendar, for which the following characteristics are mentioned: year, month and day.

To create a date object, the syntax is as follows:

date(year, month, day)

For example, to create the date of April 23, 2013:

from datetime import date
debut = date(year = 2013, month = 4, day = 23)
## 2013-04-23
## <class 'datetime.date'>
It is not mandatory to specify the name of the arguments in the call to the date function. However, the order of priority should be as follows: year, month, day.

The attributes of the created date can then be accessed (they are integers):

print(debut.year) # Extract the year
## 2013
print(debut.month) # Extract the month
## 4
print(debut.day) # Extract the day
## 23

Some methods are available for objects of the type date. We will review some of them. ctime()

The ctime() method returns the date as a string.

## Tue Apr 23 00:00:00 2013 weekday()

The weekday() method returns the position of the day of the week (Monday being 0, Sunday 6)

## 1
This method can be very handy when analyzing data to explore aspects of weekly seasonality. isoweekday()

In the same vein as weekday(), the isoweekday() method returns the position of the day of the week, this time assigning the value 1 to Monday and 7 to Sunday.

## 2 toordinal()

The toordinal() method returns the day number, taking as a reference the value 1 for the first day of year 1.

## 734981 isoformat()

The isoformat() method returns the date in ISO numbering, as a string.

## 2013-04-23 isocalendar()

The isocalendar() method returns a nuplet (c.f. Section ??) with three elements: year, week number and day of week (all three in ISO numbering).

## (2013, 17, 2) replace()

The replace() method returns the date after making a modification.

x = debut.replace(year=2014)
y = debut.replace(month=5)
z = debut.replace(day=24)
print(x, y, z)
## 2014-04-23 2013-05-23 2013-04-24

This has no impact on the original object:

## 2013-04-23

It is possible to modify several elements at the same time:

x = debut.replace(day=24, month=5)
## 2013-05-24 strftime()

The strftime() method returns, as a string, a representation of the date, depending on a mask used.

For example, to have the date represented as DD-MM-YYYY (two-digit day, two-digit month and four-digit year):

## 23-04-2013

In the previous example, two things are noteworthy: the presence of formatting instructions (which begin with the percentage symbol) and the presence of other characters (here, hyphens). It can be noted that characters can be replaced by others, this is a choice to represent the date by separating its elements with dashes. It is possible to adopt another type of writing, for example with slashes, or even other character strings:

## 23/04/2013
print(debut.strftime("Jour : %d, Mois : %m, Annee : %Y"))
## Jour : 23, Mois : 04, Annee : 2013

As for the formatting guidelines, they correspond to the codes required by the C standard (c.f. the Python documentation). Here are some of them:

Table 2.1: Formatting codes
Code Description Example
%a Abbreviation of the day of the week (depends on the location) Tue
%A Full weekday (depends on location) Tuesday
%b Abbreviation of the month (depends on the location) Apr
%B Name of the full month (depends on location) April
%c Date and time (depends on location) in format %a %e %b %H:%M:%S:%Y Tue Apr 23 00:00:00 2013
%C Century (00-99) (integer part of the year’s division by 100) 20
%d Day of the month (01–31) 23
%D Date in format %m/%d/%y 04/23/13
%e Day of the month in decimal number (1–31) 23
%F Date in format %Y-%m-%d 2013-04-23
%h Same as %b Apr
%H Hour (00–24) 00
%I Hour (01–12) 12
%j Day of the year (001–366) 113
%m Month (01–12) 04
%M Minute (00-59) 00
%n Line break in output, white character in input \n
%r Hour in format 12 AM/PM 12:00:00 AM
%R Same as %H:%M 00:00
%S Second (00-61) 00
%t Tabulation in output, white character in input \t
%T Same as %H:%M:%S 00:00:00
%u Day of the week (1–7), starts on Monday 2
%U Week of the year (00–53), Sunday as the beginning of the week, and the first Sunday of the year defines the week 16
%V Week of the year (00-53). If the week (which begins on a Monday) that contains January 1 has four or more days in the New Year, then it is considered Week 1. Otherwise, it is considered as the last of the previous year, and the following week is considered as week 1 (ISO 8601 standard) 17
%w Day of the week (0–6), Sunday being equal to 0 2
%W Week of the year (00–53), Monday being the first day of the week, and typically, the first Monday of the year defines week 1 (U.K. convention) 16
%x Date (depends on location) 04/23/13
%X Hour (depends on location) 00:00:00'
%y Year without the “century” (00–99) 13
%Y Year (in input, only from 0 to 9999) 2013
%z Offset in hours and minutes with respect to UTC time
%Z Abbreviation of the time zone (output only) CEST Time

Time objects refer to specific times without taking into account a particular day. They provide information on the hour, minute, second (possibly the microsecond and time zone as well).

To create a time object, the syntax is as follows:

time(hour, minute, second)

For example, to create the moment 23:04:59 (twenty-three hours, four minutes and fifty-nine seconds):

from datetime import time
moment = time(hour = 23, minute = 4, second = 59)
## 23:04:59
## <class 'datetime.time'>

We can add information about the microsecond. Its value must be between zero and one million.

moment = time(hour = 23, minute = 4, second = 59, microsecond = 230)
## 23:04:59.000230
## <class 'datetime.time'>

The attributes of the created date (they are integers) can then be accessed, including the following:

print(moment.hour) # Extract the hour
## 23
print(moment.minute) # Extract the minute
## 4
print(moment.second) # Extract the second
## 59
print(moment.microsecond) # Extract the microsecond
## 230

Some methods for time objects are available. Their use is similar to objects of the date class (refer to Section Datetime

The datetime objects combine the elements of the date and time objects. They provide the day in the Gregorian calendar as well as the hour, minute, second (possibly the microsecond and time zone).

To create a datetime object, the syntax is as follows:

datetime(year, month, day, hour, minute, second, microsecond)

For example, to create the date 23-04-2013 at 17:10:00:

from datetime import datetime
x = datetime(year = 2013, month = 4, day = 23,
  hour = 23, minute = 4, second = 59)
## 2013-04-23 23:04:59
## <class 'datetime.datetime'>

The datetime objects have the attributes of the date objects (c.f. Section and time type (c.f. Section

As for methods, relatively more are available. We will comment on some of them. today() et now()

The today() and now() methods return the current datetime, the one at the time the instruction is evaluated:

## 2019-10-08 17:57:26.577058
## 2019-10-08 17:57:26.580141

The distinction between the two lies in the time zone. With today(), the attribute tzinfo is set to None, while with now(), the attribute tzinfo, if specified, is taken into account. timestamp()

The timestamp() method returns, as a floating point number, the timestamp POSIX corresponding to the datetime object. The timestamp POSIX corresponds to the Posix time, equivalent to the number of seconds elapsed since January 1, 1970, at 00:00:00 UTC.

## 1366751099.0 date()

The date() method returns a date type object whose year, month and day attributes are identical to those of the object :

x_date = x.date()
## 2013-04-23
## <class 'datetime.date'> time()

The time() method returns an object of type time whose hour, minute, second, microsecond attributes are identical to those of the object :

x_time = x.time()
## 23:04:59
## <class 'datetime.time'> Timedelta

The objects of type timedelta represent times between two dates or times.

To create an object of type timedelta, the syntax is as follows:

timedelta(days, hours, minutes, seconds, microseconds)

It is not mandatory to provide a value for each argument. When an argument does not receive a value, its default value is 0.

For example, to create an object indicating a duration of 1 day and 30 seconds:

from datetime import timedelta
duree = timedelta(days = 1, seconds = 30)
## datetime.timedelta(1, 30)
datetime.timedelta(1, 30)

The attributes (having been defined) can then be accessed. For example, to access the number of days represented by the duration:

## 1

The total_seconds() method is used to obtain the duration expressed in seconds:

duree = timedelta(days = 1, seconds = 30, hours = 20)
158430.0 Time Between Two Objects date or datetime.

When subtracting two objects of type date, the number of days between these two dates is obtained, in the form of an object of type timedelta:

from datetime import timedelta
beginning = date(2018, 1, 1)
end = date(2018, 1, 2)
nb_days = end-beginning
## <class 'datetime.timedelta'>
## 1 day, 0:00:00

When subtracting two objects of type datetime, we obtain the number of days, seconds (and microseconds, if entered) separating these two dates, in the form of an object of type timedelta:

beginning = datetime(2018, 1, 1, 12, 26, 30, 230)
end = datetime(2018, 1, 2, 11, 14, 31)
duration = end-beginning
## <class 'datetime.timedelta'>
## 22:48:00.999770

It can be noted that the durations given take into account leap years. Let us first look at the number of days between February 28 and March 1 for a non-leap year:

beginning = date(2021, 2,28)
end = date(2021, 3, 1)
duration = end - beginning

Now let’s look at the same thing, but in the case of a leap year:

beginning_leap = date(2020, 2,28)
end_leap = date(2020, 3, 1)
beginning_leap = end_leap - beginning_leap

It is also possible to add durations to a date:

debut = datetime(2018, 12, 31, 23, 59, 59)
print(debut + timedelta(seconds = 1))
## 2019-01-01 00:00:00

2.5.2 pytz Module

If date management is of particular importance, a library proposes to go a little further, especially with regard to time zone management. This library is called pytz. Many examples are available on[the project web page] (https://pypi.org/project/pytz/).

2.5.3 Exercices

  1. Using the appropriate function, store the date of August 29, 2019 in an object called d then display the type of the object.
  2. Using the appropriate function, display the current date.
  3. Store the next date in an object named d2 : “2019-08-29 20:30:56”. Then, display in the console with the print() function the year, minute and second attributes of d2.
  4. Add 2 days, 3 hours and 4 minutes to d2, and store the result in an object called d3.
  5. Display the difference in seconds between d3 and d2.
  6. From the object d2, display the date of d2 as a string so that it follows the following syntax: “Month Day, Year”, with “Month” the name of the month (August), “Day” the two-digit day number (29) and “Year” the year of the date (2019).