Pythonic Containers¶
Overview, Objectives, and Key Terms¶
All the way back in Lecture 1, the simplest
of variable types were presented (namely, int
, float
, and
bool
) along with the more complex but indespensable str
type. In
Lecture 3, NumPy was introduced, with its
ndarray
type serving as the workhorse for a variety of applications,
particularly those with a numerical flavor. In this lecture, the
built-in Python types list
, tuple
, and dict
are presented,
with motivating applications for each.
Objectives¶
By the end of this lesson, you should be able to
- Define and use
list
andtuple
variables. - Define and use
dict
variables. - Explain the difference between mutable and immutable types.
Key Terms¶
list
tuple
dict
- mutable
- immutable
- container type
- sequential type
- associative type
list.append()
list.count()
list.copy()
In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
What is a sequential type?¶
Beyond the simplest types, Python (and other languages) offers a number of built-in types that represent a collection of values. Generically, these are called [container types](https://en.wikipedia.org/wiki/Container_(abstract_data_type) because they are data structures that contain one or more values. Container types differ in the way by which their values are stored and accessed.
Probably the simple container type to understand is the sequential
type, the elements of which are arranged one after another (i.e., a
sequence) and can be accessed individually by knowing only their
location within the sequence. Sound familiar? It should: NumPy’s
ndarray
is a sequential type. Remember, given an array (e.g.,
a = np.array([1, 2, 3])
, we can access any element by its location
with the []
operator (e.g., the first element a[0]
). Similarly,
the str
type is a sequential type the elements of which are limited
to individual characters.
Making Connections: Python’sstr
and NumPy’sndarray
are sequential types.
Often, a sequential type is the right choice when the relationship
between values to be stored is best represented by their position in the
sequence. A word, like “pandemonium,” loses its meaning if the
individual characters are not accessible in order, and an array like
np.linspace(0, 10)
is useless unless it can be used to represent
values of \(x\) from 0 to 10 in order.
The list
Type¶
Lots of computation, numerical or otherwise, needs sequential data, and
Python has more answers. The most versatile of Python’s sequential types
is list
. A list
variable can be defined using comma-separated
values within square brackets []
. For example, a list with the
values 1, 2, and 3 is defined via
In [2]:
a = [1, 2, 3]
a
Out[2]:
[1, 2, 3]
That syntax may look familiar: we used it to help produce NumPy arrays like
In [3]:
import numpy as np
b = np.array([1, 2, 3])
b
Out[3]:
array([1, 2, 3])
With a
defined, we could just as well do
In [4]:
c = np.array(a)
c
Out[4]:
array([1, 2, 3])
However, list
is more versatile for general programming because its
elements can be arbitrary. In other words, they are not limited to
numerical values. For example, consider this list
of very different
values:
In [5]:
d = [1, 3.14, 'hello', np.array([1, 2, 3])]
d
Out[5]:
[1, 3.14, 'hello', array([1, 2, 3])]
Like all sequential types in Python, individual elements of a list
are accessed using []
, e.g.,
In [6]:
d[2]
Out[6]:
'hello'
Elements of list
variables can also be extracted using the same
slicing technique introduced in Lecture 4.
Recall, slicing an array (or, now, any sequence) take the form
a[start:stop:stride]
. Thus, we can get elements 0
and 2
of
d
via
In [7]:
d[0:len(d):2]
Out[7]:
[1, 'hello']
In addition to indexing and slicing, the list
type provides a number
of useful functions that can be listed using dir
.
Exercise: Usedir
to list the attributes (functions, etc.) of alist
variable.
Remember (as in Lecture 2) that dir
can lead
to a lots of names with double underscores, which usually represent
items that are not of much interest. Note, though, that dir
produces
a list, and the elements of that lists can be accessed one by one using
for
loop. The interesting items can be printed by skipping those
whose name starts with double underscore ('__'
):
In [8]:
items = dir(list) # a list of str names
for i in range(len(items)):
if not items[i][0:2] == '__':
print(items[i])
append
clear
copy
count
extend
index
insert
pop
remove
reverse
sort
Some of these names may suggest obvious functionality, while some may be less intuitive.
Exercise: Usehelp
to define what each of these functions does.
Of particular interest is the functions append
. Consider the empty
list, defined as
In [9]:
e = []
e
Out[9]:
[]
Just like the empty string ''
, the empty list
is equivalent to
False
:
In [10]:
bool(e)
Out[10]:
False
and has zero length:
In [11]:
len(e)
Out[11]:
0
We can add an element, say the integer 123, to e
using append:
In [12]:
e.append(123)
e
Out[12]:
[123]
Another element, this time a float
, could be added, e.g.,
In [13]:
e.append(0.111)
e
Out[13]:
[123, 0.111]
The append
functions places the new element after all of the other
elements. Often, lists are constructed iteratively, and append
is
the natural choice. However, one can also use insert
to place new
elements anywhere in the list
. Alternatively, the last element of a
list can be removed using pop
, while arbitrary elements can be
removed via delete
.
Quick Exercise: Letx = [1, 2, 3, 4, 5]
. Now, add a new element 6 tox
after 5. Then delete 3 and 4 fromx
, leaving just[1, 2, 5, 6]
.
Before moving on, it is worth noting that list
variables, like
str
variables, may be manipulated using +
and *
operators,
though the results may not be what one expects. For instance, it may be
reasonable to assume that 3 * [1, 2, 3]
leads to a new list
with
elements equal to [3, 6, 9]
, but that is not the case:
In [14]:
3 * [1, 2, 3]
Out[14]:
[1, 2, 3, 1, 2, 3, 1, 2, 3]
Apparently, the contents of the initial list are simply repeated three
times in sequence. In fact, this behavior is similar to that observed
for multiplication of an int
by a str
:
In [15]:
3 * '...'
Out[15]:
'.........'
Similarly, the addition of two lists leads to
In [16]:
[1, 2] + [3, 4, 5]
Out[16]:
[1, 2, 3, 4, 5]
In other words, addition and multiplication can be used to join the contents of lists rather easily. However, arithmetic operations cannot be used with lists in the same way possible for NumPy arrays.
Exercise: What is the result of
z = 10*[0]
?Exercise: Check whether
list(np.array([1, 2, 3])*2)
and[1, 2, 3]*2
are equivalent. Can you say so without evaluating those expressions?Exercise: Given
f = [1, 2, 3, 4]
, what doesf[::-1]
produce? What are the impliedstart
andstop
values?Exercise: Consider the list of lists
M = [[1, 2], [3, 4]]
. How would one access the element[3, 4]
?Solution: Recognize that[1, 2]
is the first element ofM
and can, therefore, be accessed asM[0]
. If that looks confusing, suppose that you first setrow1 = [1, 2]
androw2 = [3, 4]
. ThenM = [row1, row2]
. We getrow1
fromM[0]
.Exercise: Consider the list of lists
M = [[1, 2], [3, 4]]
. How would one access the element4
?Solution: Recognize that4
is the second element in[3, 4]
and that[3, 4]
is the second element ofM
. We get[3, 4]
viaM[1]
. To get4
, we then can useM[1][1]
. We cannot useM[1, 1]
, which is a special indexing allowed by 2-D NumPy arrays.Exercise: Consider the list of lists
M = [[1, 2], [3, 4]]
. Is it possible to obtain the element[2, 4]
by slicing?Exercise: Given
f = [1, 2, 3, 4]
, how can slicing be used to produce the list[4, 3, 2, 1]
?
Mutability¶
A container type is mutable if the values of individual elements can
be changed. The list
type is mutable, as is NumPy’s ndarray
,
because we can access an element and change the value of an element. For
example, given the list
In [17]:
g = [1, 2, 3, 4]
that first value 1
can be changed to 99
via
In [18]:
g[0] = 99
Of course, we are able to do exactly the same sort of operation to
ndarray
variable elements. However, we are not able to change the
elements of variables whose type is immutable. One such type is
str
. Given
In [19]:
s = 'hello'
we cannot get jello
via
In [20]:
s[0] = 'j'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-a225757de94d> in <module>()
----> 1 s[0] = 'j'
TypeError: 'str' object does not support item assignment
The fact that list
and other types are mutable can lead to some
behavior that may be surprising a new programmer. For example, consider
the following:
In [21]:
list1 = [1, 2]
list2 = list1 # a copy of list1, right?
print('list1 = ', list1)
print('list2 = ', list2) # so far, so good
list2[0] = 99 # change list2
print('list1 = ', list1) # why is list1 changed?
list1 = [1, 2]
list2 = [1, 2]
list1 = [99, 2]
Here, we might have expected list1
to remain as [1, 2]
while
list2
became [99, 2]
. However, the assignment h = i
does
not assign a copy of the values already assigned to list1
.
Rather, list2
is assigned to the same values. When those values
change (by modifying either list1
or list2
), they do so for both
names (list1
and list2
).
Warning: Given alist
variablesa
, the operationb = a
does not make a copy ofa
‘s elements. Rather,b
anda
are two names for the same data. Changing one necessarily changes the other.
Actually, this behavior is no so uncommon in programming, although it’s
not always the default. Basically, when a variable is created (in any
language), a certain amount of memory is required to store the values.
We access those values by using the name of the variable. What Python
does by default is to assign multiple names to the same memory when we
make assignments like list2 = list1
above. For those who have (or
will have) exposure to C or C++, the same behavior can be observed when
assigning pointer variables to one another.
One way to avoid this issue is to create an explicit copy. Built into
list
is the copy
function, which can be used like
In [22]:
list2 = list1.copy()
list2[0] = 101 # changes list2, but not list1
list1
list2
Out[22]:
[99, 2]
Out[22]:
[101, 2]
The tuple
Type¶
Python has another, built-in immutable type called tuple
that, like
list
, allows one to store a sequence of elements with arbitrary
types. The clearest way to define a tuple
is via comma-separated
values enclosed in parentheses, e.g.,
In [23]:
A = (1, 2, 3)
A
Out[23]:
(1, 2, 3)
The parentheses make it very clear what is being defined, but they are
not necessary. For example, one could define another tuple
via
In [24]:
D = 1, 3.14, 'hello', np.array([1, 2, 3])
D
Out[24]:
(1, 3.14, 'hello', array([1, 2, 3]))
Elements of tuple
variables are accessed following the same indexing
and slicing rules for lists. However, their elements cannot be
reassigned because the tuple
type is immutable (so that, e.g.,
D[0] = 99
will fail).
So what does a tuple
variable have to offer? Again, dir
may be
used:
In [25]:
items = dir(tuple) # a list of str names
for i in range(len(items)):
if not items[i][0:2] == '__':
print(items[i])
count
index
That’s not very many functions, and a comparison to list
suggests
that tuple
is missing anything that would replace, remove, or add to
its elements.
Exercise: See what the functionstuple.count
andtuple.index
are used to do.
The dict
Type¶
The list
and tuple
types are the built-in solutions for storing
sequences of arbitrary elements. Remember, a sequential type is the
right choice when the relationship between the data stored is positional
(like word characters, etc.). There are circumstances, though, in which
data is not really (or only) related by order.
Consider a perhaps obvious example: Webster’s
dictionary. Although the entries
are often stored in sequence (alphabetically, not numerically), the data
itself is more complex than a single value. Rather, each entry in the
dictionary consists of two things: a key
, in this the word to be
defined, and the value
, in this case, the definition of the word.
Under the hood, the key:value
pairs need not be stored in sequence
(alphabetically or otherwise), but we should be able to get the
value
given any key
.
Python’s dict
(short for dictionary, of course) provides us with
such a data structure. A Python dict
variable can have keys of any
type paired with values of any type. One way to define a dict
variable is by starting with the empty dictionary:
In [26]:
stuff = {}
stuff
Out[26]:
{}
We can add a new element by indexing using the key and assigning that element to a value, or
In [27]:
stuff['some key'] = 'some value'
stuff # not empty anymore
Out[27]:
{'some key': 'some value'}
We can access the element for any key defined using the same syntax:
In [28]:
stuff['some key']
Out[28]:
'some value'
If we try to get a value for a key not in the dictionary, a KeyError
(similar to IndexError
) is encountered:
In [29]:
stuff['some other key'] # not a key in our dictionary
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-29-bc5178045979> in <module>()
----> 1 stuff['some other key'] # not a key in our dictionary
KeyError: 'some other key'
An easy way to check whether a dictionary has a key is to use the in
operator. The basic syntax is element in container
, and the result
is True
or False
. The only requirement is that container
is
iterable, meaning that each of its elements can be inspected. This is
true for sequential types like list
and ndarray
. It is also true
for dict
(and we’ll see how momentarily). So, to check whether
'some key'
and 'some other key'
are in our dictionary, we can
use
In [30]:
'some key' in stuff
'some other key' in stuff
Out[30]:
True
Out[30]:
False
Hence, one can always check that a key exists before accessing a value.
Dictionaries can also be defined all at once. For example, we can map the first few letters of the alphabet to their position via
In [31]:
alphanum = {'a': 0, 'b': 1, 'c': 2}
alphanum
Out[31]:
{'a': 0, 'b': 1, 'c': 2}
Exercise: Do from string import ascii_lowercase as s
, which
creates a string s
with the entire alphabet. Then, create your own
alphanum
dictionary with all 26 letters.
So what does dict
have? Let’s see:
In [32]:
items = dir(dict) # a list of str names
for i in range(len(items)):
if not items[i][0:2] == '__':
print(items[i])
clear
copy
fromkeys
get
items
keys
pop
popitem
setdefault
update
values
Some of these look similar to those found in list
, and some are new:
go look them up via help
. Two useful ones are keys
and
values
:
In [33]:
alphanum.keys()
list(alphanum.keys())
Out[33]:
dict_keys(['a', 'b', 'c'])
Out[33]:
['a', 'b', 'c']
In [34]:
alphanum.values()
list(alphanum.values())
Out[34]:
dict_values([0, 1, 2])
Out[34]:
[0, 1, 2]
An application requiring just the keys might be one in which student names (and not their student numbers, or grades, or other sensitive items) are required. The values might be needed if just grades were required, perhaps to do a statistical analysis.
Further Reading¶
None at this time.