2 Types of Data
Many types of data are integrated into Python. In this section we will discuss strings, numerical values, booleans (TRUE
/FALSE
), the null
value, dates and times.
2.1 Strings
A string is a collection of characters such as letters, numbers, spaces, punctuation marks, etc.
Strings are marked with single, double, or triple quotation marks.
Here is an example:
To display the content of our variable x
containing the string in the console, the function print()
can be used:
## Hello World
As indicated just before, single quotation marks can be used to create a string:
## How are you?
To include apostrophes in a character string created using single quotation marks, one must use an escape character: a backslash (\
):
## I'm fine
Note that if the string is created using double quotation marks, it is not necessary to use the escape character:
## I'm "fine"
To specify a line break, we use the following string: \n
.
## Hello,
## World
In the case of character strings on multiple lines, using single or double quotation marks will return an error (EOL while scanning trial literal, i.e., detection of a syntax error, Python was expecting something else at the end of the line). To write a string on several lines, Python suggests using quotation marks (single or double) at the beginning and end of the string three times:
## Hello,
## World
The character \
(backslash) is the escape character. It allows to display certain characters, such as quotation marks in a string defined by quotation marks, or control characters, such as tabulation, line breaks, etc. Here are some common examples:
Code | Description | Code | Description |
---|---|---|---|
\n |
New line | \r |
Line break |
\t |
Tabulation | \b |
Backspace |
\\ |
Backslash | \' |
Quotation mark |
\" |
Double quotation mark | \` |
Grave accent |
To obtain the length of a string, Python offers the function len()
:
## 13
## Hello World ! 13
2.1.1 Concatenation of Strings
To concatenate strings, i.e., to put them end to end, Python offers to use the operator +
:
## Hello World
The *
operator allows us to repeat a string several times:
## Go Habs Go! Go Habs Go! Go Habs Go! Woo Hoo!
When two literals of strings are side by side, Python concatenates them:
## You shall not pass!
It is also possible to add the content of a variable to a string, using brackets ({}
) and the method format()
:
x = "I like to code in {}"
langage_1 = "R"
langage_2 = "Python"
preference_1 = x.format(langage_1)
print(preference_1)
## I like to code in R
## I like to code in Python
It is possible to add more than one variable content in a string, always with brackets and the method format()
:
x = "I like to code in {} and in {}"
preference_3 = x.format(langage_1, langage_2)
print(preference_3)
## I like to code in R and in Python
2.1.2 Indexing and Extraction
Strings can be indexed. Be careful, the index of the first character starts at 0.
To obtain the ith character of a string, brackets can be used. The syntax is as follows:
For example, to display the first character, then the fifth of the Hello
string:
## H
## o
The extraction can be done starting at the end of the chain, by preceding the value of the index with the minus sign (-
).
For example, to display the penultimate character of our string x
:
## l
The extraction of a substring by specifying its start and end position (implicitly or not) is also done with the brackets. We just need to specify the two index values: [start:end]
as in the following example:
x = "You shall not pass!"
# From the fourth character (not included) to the ninth (included)
print(x[4:9])
## shall
When the first value is not specified, the beginning of the string is taken by default; when the second value is not specified, the end of the string is taken by default.
# From the 4th character (non included) to the end of the string
print(x[4:])
# From the beginning of the string to the penultimate (included)
print(x[:-1])
# From the 3rd character before the end (included) to the end
print(x[-5:])
## shall not pass!
## You shall not pass
## pass!
It is possible to add a third argument in the brackets: the step.
## sln s
To obtain the chain in the opposite direction:
## !ssap ton llahs uoY
2.1.3 Available Methods with Strings
Many methods are available for strings. By adding a dot (.
) after the name of an object designating a string and then pressing the tab key, the available methods are displayed in a drop-down menu.
For example, the count()
method allows us to count the number of occurrences of a pattern in the string. To count the number of occurrences of in
in the following string:
## 3
Shift
and Tabulation
keys, explanations can be displayed.
2.1.3.1 Conversion to upper or lower case
The lower()
and upper()
methods allow us to pass a string in lowercase and uppercase characters, respectively.
x = "le train de tes injures roule sur le rail de mon indifférence"
print(x.lower())
print(x.upper())
## le train de tes injures roule sur le rail de mon indifférence
## LE TRAIN DE TES INJURES ROULE SUR LE RAIL DE MON INDIFFÉRENCE
2.1.3.2 Seach Pattern for Strings
When we wish to find a pattern in a string, we can use the method find()
. A pattern to be searched is provided in arguments. The find()
method returns the smallest index in the string where the pattern is found. If the pattern is not found, the returned value is -1
.
## 6
## -1
It is possible to add as an option an indication allowing to restrict the search on a substring, by specifying the start and end index :
## 16
Note: the end index can be omitted; in this case, the end of the string is used:
## 49
in
: print("train" in x)
To perform a search without regard to case, the method capitalize()
can be used:
## -1
## 13
2.1.3.3 Splitting Strings
To split a string into substrings, based on a pattern used to delimit the substrings (e.g., a comma or a space), the method split()
can be used:
## ['Mademoiselle', 'Deray,', 'il', 'est', 'interdit', 'de', 'manger', 'de', 'la', 'choucroute', 'ici.']
By indicating a numerical value as arguments, it is possible to limit the number of substrings returned:
## ['Mademoiselle', 'Deray,', 'il', 'est interdit de manger de la choucroute ici.']
The splitlines()
method also allows us to separate a string of characters according to a pattern, this pattern being an end of line character, such as a line break or a carriage return for example.
x = '''"No, I am your Father!
- No... No. It's not true! That's impossible!
- Search your feelings. You know it to be true.
- Noooooooo! Noooo!"'''
print(x.splitlines())
## ['"No, I am your Father!', "- No... No. It's not true! That's impossible!", '- Search your feelings. You know it to be true.', '- Noooooooo! Noooo!"']
2.1.3.4 Cleaning, completion
To remove blank characters (e.g., spaces, line breaks, quadratins, etc.) at the beginning and end of a string, we can use the strip()
method, which is sometimes very useful for cleaning strings.
## Pardon, du sucre ?
It is possible to specify in arguments which characters to remove at the beginning and end of the string:
## egallic
Sometimes we have to make sure to obtain a string of a given length (when we have to provide a file with fixed widths for each column for example). The rjust()
method is then a great help. By entering a string length and a fill character, it returns the string with a possible completion (if the length of the returned string is not long enough with respect to the requested value), repeating the fill character as many times as necessary.
For example, to have a longitude coordinate stored in a string of characters of length 7, adding spaces may be necessary:
## 48.11
2.1.3.5 Replacements
The replace()
method allows to perform replacement of patterns in a character string:
x = "Criquette ! Vous, ici ? Dans votre propre salle de bain ? Quelle surprise !"
print(x.replace("Criquette", "Ridge"))
## Ridge ! Vous, ici ? Dans votre propre salle de bain ? Quelle surprise !
This method is very convenient for removing spaces for example:
## Criquette!Vous,ici?Dansvotrepropresalledebain?Quellesurprise!
Here is a table listing some of the available methods ([exhaustive list in the documentation] (https://docs.python.org/3/library/stdtypes.html#string-methods)):
Method | Description |
---|---|
capitalize() |
Capitalization of the first character and lowercase of the rest |
casefold() |
Removes case distinctions (useful for comparing strings without regard to case) |
count() |
Counts the number of occurrences (without overlap) of a pattern |
encode() |
Encodes a string of characters in a specific encoding |
find() |
Returns the smallest element where a substring is found |
lower() |
Returns the string having passed each alphabetical character in lower case |
replace() |
Replaces one pattern with another |
split() |
Separates the chain into substring according to a pattern |
title() |
Returns the string after passing each first letter of a word through a capital letter |
upper() |
Returns the string having passed each alphabetical character in upper case |
2.1.4 Conversion to character strings
When we want to concatenate a string with a number, Python returns an error.
## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: must be str, not int
##
## Detailed traceback:
## File "<string>", line 1, in <module>
## Error in py_call_impl(callable, dots$args, dots$keywords): NameError: name 'message' is not defined
##
## Detailed traceback:
## File "<string>", line 1, in <module>
We should then convert the object that is not a string into a string beforehand. To do this, Python offers the function str()
:
## He has 0 followers.
2.1.5 Exercise
- Create two variables named
a
andb
so that they contain the following strings respectively:23 to 0
andC'est la piquette, Jack!
. - Display the number of characters from
a
, thenb
. - Concatenate
a
andb
in a single string, adding a comma as a separating character. - Same question by choosing the separation line as the separator character.
- Using the appropriate method, capitalize
a
andb
. - Using the appropriate method, lowercase
a
andb
. - Extract the word
la
andJack
from the stringb
, using indexes. - Look for the sub-chain
piqu
inb
, then do the same with the sub-chainmauvais
. - Return the position (index) of the first character
a
found in the stringb
, then try with the characterw
. - Replace the occurrences of the pattern
a
by the patternZ
in the substringb
. - Separate the string
b
using the comma as a sub-chain separator. - (Bonus) Remove all punctuation characters from string b, then use an appropriate method to remove white characters at the beginning and end of the string. (Use the `regex’ library).
2.2 Numerical values
There are four categories of numbers in Python: integers, floating point numbers and complexes.
2.2.1 Integers
Integers (ints
), in Python, are signed integers.
type()
function in Python.
## <class 'int'>
## <class 'int'>
2.2.2 Floating Point Numbers
Floats are real numbers. They are written using a dot to distinguish the integer part from the decimal part of the number.
## <class 'float'>
## <class 'float'>
Scientific notations can also be used, using E
or e
to indicate a power of 10. For example, to write \(3.2^{12}\):
## 3200000000000.0
## 3200000000000.0
In addition, when the number is equal to a fraction of 1, it is possible to avoid writing the zero:
## 0.35
## 0.35
2.2.3 Complex numbers
Python allows us to natively manipulate complex numbers, of the form \(z=a+ib\), where \(a\) and \(b\) are floating point numbers, and such that \(i^2=(-i)^2=1\). The real part of the number, \(\mathfrak{R}(z)\), is \(a\) while its imaginary part, \(\mathfrak{I}(z)\), is \(b\).
In python, the imaginary unit \(i\) is denoted by the letter j
.
## (1+3j)
## <class 'complex'>
It is also possible to use the complex()
function, which requires two arguments (the real part and the imaginary part):
## (1+3j)
## <class 'complex'>
Several methods are available with complex numbers. For example, to access the conjugate, Python provides the method conjugate()
:
## (1-3j)
Access to the real part of a complex or its imaginary part is done calling the real
and imag
elements, respectively.
## 1.0
## 3.0
2.2.4 Conversions
To convert a number to another digital format, Python has a few functions.
2.2.4.1 Conversion to Integer
The conversion of an integer or string is done using the function int()
:
## <class 'int'>
## <class 'str'>
Note that the conversion of a floating point number truncates the number to keep only the integer part:
## 3
2.2.4.2 Conversion to Floating Point Number
To convert a number or string to a floating point number or string (if possible), Python suggests using the function float()
.
## <class 'float'>
With an integer:
## 3.0
2.3 Booleans
Logical data can have two values: True
or False
. They correspond to a logical condition. Care must be taken to ensure that the case is well respected.
## True False
True
can be automatically converted to 1; False
to 0. This can be very convenient, for example, when counting true or false values in the columns of a data table.
## 3
2.4 Empty Object
The empty object, commonly called null
, has an equivalent in Python: None
. To assign it to a variable, one should be careful with case:
## None
## <class 'NoneType'>
The None
object is a neutral variable, with “null” behavior.
To test if an object is the None
object, we proceed as follows (the result is a Boolean):
## False
## True
2.5 Dates and Times
There are several moduels to manage dates and time in Python. We will explore part of the datetime
module.
2.5.1 Module Datetime
Python has a module called datetime
which offers the possibility to manipulate dates and durations (dates and times).
There are several types of objects designating dates:
date
: a date according to the Gregorian calendar, indicating the year, month and daytime
: a given time, without taking into account a particular day, indicating the hour, minute, second (possibly the microsecond and time zone as well)datetime
: a date combiningdate
andtime
;timedelta
: a time between two objects of the typedates
,time
ordatetime
;tzinfo
: an abstract basic type, providing information about time zones;timezone
: a type using thetzinfo
type as a fixed offset from UTC.
2.5.1.1 Date
Objects of type date
refer to dates in the Gregorian calendar, for which the following characteristics are mentioned: year, month and day.
To create a date
object, the syntax is as follows:
For example, to create the date of April 23, 2013:
## 2013-04-23
## <class 'datetime.date'>
date
function. However, the order of priority should be as follows: year, month, day.
The attributes of the created date can then be accessed (they are integers):
## 2013
## 4
## 23
Some methods are available for objects of the type date
. We will review some of them.
2.5.1.1.1 ctime()
The ctime()
method returns the date as a string.
## Tue Apr 23 00:00:00 2013
2.5.1.1.2 weekday()
The weekday()
method returns the position of the day of the week (Monday being 0, Sunday 6)
## 1
2.5.1.1.3 isoweekday()
In the same vein as weekday()
, the isoweekday()
method returns the position of the day of the week, this time assigning the value 1 to Monday and 7 to Sunday.
## 2
2.5.1.1.4 toordinal()
The toordinal()
method returns the day number, taking as a reference the value 1 for the first day of year 1.
## 734981
2.5.1.1.5 isoformat()
The isoformat()
method returns the date in ISO numbering, as a string.
## 2013-04-23
2.5.1.1.6 isocalendar()
The isocalendar()
method returns a nuplet (c.f. Section ??) with three elements: year, week number and day of week (all three in ISO numbering).
## (2013, 17, 2)
2.5.1.1.7 replace()
The replace()
method returns the date after making a modification.
## 2014-04-23 2013-05-23 2013-04-24
This has no impact on the original object:
## 2013-04-23
It is possible to modify several elements at the same time:
## 2013-05-24
2.5.1.1.8 strftime()
The strftime()
method returns, as a string, a representation of the date, depending on a mask used.
For example, to have the date represented as DD-MM-YYYY
(two-digit day, two-digit month and four-digit year):
## 23-04-2013
In the previous example, two things are noteworthy: the presence of formatting instructions (which begin with the percentage symbol) and the presence of other characters (here, hyphens). It can be noted that characters can be replaced by others, this is a choice to represent the date by separating its elements with dashes. It is possible to adopt another type of writing, for example with slashes, or even other character strings:
## 23/04/2013
## Jour : 23, Mois : 04, Annee : 2013
As for the formatting guidelines, they correspond to the codes required by the C standard (c.f. the Python documentation). Here are some of them:
Code | Description | Example |
---|---|---|
%a |
Abbreviation of the day of the week (depends on the location) | Tue |
%A |
Full weekday (depends on location) | Tuesday |
%b |
Abbreviation of the month (depends on the location) | Apr |
%B |
Name of the full month (depends on location) | April |
%c |
Date and time (depends on location) in format %a %e %b %H:%M:%S:%Y | Tue Apr 23 00:00:00 2013 |
%C |
Century (00-99) (integer part of the year’s division by 100) | 20 |
%d |
Day of the month (01–31) | 23 |
%D |
Date in format %m/%d/%y | 04/23/13 |
%e |
Day of the month in decimal number (1–31) | 23 |
%F |
Date in format %Y-%m-%d | 2013-04-23 |
%h |
Same as %b | Apr |
%H |
Hour (00–24) | 00 |
%I |
Hour (01–12) | 12 |
%j |
Day of the year (001–366) | 113 |
%m |
Month (01–12) | 04 |
%M |
Minute (00-59) | 00 |
%n |
Line break in output, white character in input | \n |
%p |
AM/PM PM | AM |
%r |
Hour in format 12 AM/PM | 12:00:00 AM |
%R |
Same as %H:%M | 00:00 |
%S |
Second (00-61) | 00 |
%t |
Tabulation in output, white character in input | \t |
%T |
Same as %H:%M:%S | 00:00:00 |
%u |
Day of the week (1–7), starts on Monday | 2 |
%U |
Week of the year (00–53), Sunday as the beginning of the week, and the first Sunday of the year defines the week | 16 |
%V |
Week of the year (00-53). If the week (which begins on a Monday) that contains January 1 has four or more days in the New Year, then it is considered Week 1. Otherwise, it is considered as the last of the previous year, and the following week is considered as week 1 (ISO 8601 standard) | 17 |
%w |
Day of the week (0–6), Sunday being equal to 0 | 2 |
%W |
Week of the year (00–53), Monday being the first day of the week, and typically, the first Monday of the year defines week 1 (U.K. convention) | 16 |
%x |
Date (depends on location) | 04/23/13 |
%X |
Hour (depends on location) | 00:00:00' |
%y |
Year without the “century” (00–99) | 13 |
%Y |
Year (in input, only from 0 to 9999) | 2013 |
%z |
Offset in hours and minutes with respect to UTC time | |
%Z |
Abbreviation of the time zone (output only) CEST |
2.5.1.2 Time
Time objects refer to specific times without taking into account a particular day. They provide information on the hour, minute, second (possibly the microsecond and time zone as well).
To create a time
object, the syntax is as follows:
For example, to create the moment 23:04:59 (twenty-three hours, four minutes and fifty-nine seconds):
## 23:04:59
## <class 'datetime.time'>
We can add information about the microsecond. Its value must be between zero and one million.
## 23:04:59.000230
## <class 'datetime.time'>
The attributes of the created date (they are integers) can then be accessed, including the following:
## 23
## 4
## 59
## 230
Some methods for time
objects are available. Their use is similar to objects of the date
class (refer to Section 2.5.1.1).
2.5.1.3 Datetime
The datetime
objects combine the elements of the date
and time
objects. They provide the day in the Gregorian calendar as well as the hour, minute, second (possibly the microsecond and time zone).
To create a datetime
object, the syntax is as follows:
For example, to create the date 23-04-2013 at 17:10:00:
from datetime import datetime
x = datetime(year = 2013, month = 4, day = 23,
hour = 23, minute = 4, second = 59)
print(x)
## 2013-04-23 23:04:59
## <class 'datetime.datetime'>
The datetime
objects have the attributes of the date
objects (c.f. Section 2.5.1.1) and time
type (c.f. Section 2.5.1.2).
As for methods, relatively more are available. We will comment on some of them.
2.5.1.3.1 today()
et now()
The today()
and now()
methods return the current datetime
, the one at the time the instruction is evaluated:
## 2019-10-08 17:57:26.577058
## 2019-10-08 17:57:26.580141
The distinction between the two lies in the time zone. With today()
, the attribute tzinfo
is set to None
, while with now()
, the attribute tzinfo
, if specified, is taken into account.
2.5.1.3.2 timestamp()
The timestamp()
method returns, as a floating point number, the timestamp POSIX corresponding to the datetime
object. The timestamp POSIX corresponds to the Posix time, equivalent to the number of seconds elapsed since January 1, 1970, at 00:00:00 UTC.
## 1366751099.0
2.5.1.3.3 date()
The date()
method returns a date
type object whose year, month and day attributes are identical to those of the object :
## 2013-04-23
## <class 'datetime.date'>
2.5.1.4 Timedelta
The objects of type timedelta
represent times between two dates or times.
To create an object of type timedelta
, the syntax is as follows:
It is not mandatory to provide a value for each argument. When an argument does not receive a value, its default value is 0.
For example, to create an object indicating a duration of 1 day and 30 seconds:
## datetime.timedelta(1, 30)
The attributes (having been defined) can then be accessed. For example, to access the number of days represented by the duration:
## 1
The total_seconds()
method is used to obtain the duration expressed in seconds:
2.5.1.4.1 Time Between Two Objects date
or datetime
.
When subtracting two objects of type date
, the number of days between these two dates is obtained, in the form of an object of type timedelta
:
from datetime import timedelta
beginning = date(2018, 1, 1)
end = date(2018, 1, 2)
nb_days = end-beginning
print(type(nb_days))
## <class 'datetime.timedelta'>
## 1 day, 0:00:00
When subtracting two objects of type datetime
, we obtain the number of days, seconds (and microseconds, if entered) separating these two dates, in the form of an object of type timedelta
:
beginning = datetime(2018, 1, 1, 12, 26, 30, 230)
end = datetime(2018, 1, 2, 11, 14, 31)
duration = end-beginning
print(type(duration))
## <class 'datetime.timedelta'>
## 22:48:00.999770
It can be noted that the durations given take into account leap years. Let us first look at the number of days between February 28 and March 1 for a non-leap year:
Now let’s look at the same thing, but in the case of a leap year:
beginning_leap = date(2020, 2,28)
end_leap = date(2020, 3, 1)
beginning_leap = end_leap - beginning_leap
beginning_leap
It is also possible to add durations to a date:
## 2019-01-01 00:00:00
2.5.2 pytz
Module
If date management is of particular importance, a library proposes to go a little further, especially with regard to time zone management. This library is called pytz
. Many examples are available on[the project web page] (https://pypi.org/project/pytz/).
2.5.3 Exercices
- Using the appropriate function, store the date of August 29, 2019 in an object called
d
then display the type of the object. - Using the appropriate function, display the current date.
- Store the next date in an object named
d2
: “2019-08-29 20:30:56”. Then, display in the console with theprint()
function the year, minute and second attributes ofd2
. - Add 2 days, 3 hours and 4 minutes to
d2
, and store the result in an object calledd3
. - Display the difference in seconds between
d3
andd2
. - From the object
d2
, display the date ofd2
as a string so that it follows the following syntax: “Month Day, Year”, with “Month” the name of the month (August), “Day” the two-digit day number (29) and “Year” the year of the date (2019).