Pythonic Containers¶
Overview, Objectives, and Key Terms¶
All the way back in Lecture 1, the simplest
of variable types were presented (namely, int, float, and
bool) along with the more complex but indespensable str type. In
Lecture 3, NumPy was introduced, with its
ndarray type serving as the workhorse for a variety of applications,
particularly those with a numerical flavor. In this lecture, the
built-in Python types list, tuple, and dict are presented,
with motivating applications for each.
Objectives¶
By the end of this lesson, you should be able to
- Define and use
listandtuplevariables. - Define and use
dictvariables. - Explain the difference between mutable and immutable types.
Key Terms¶
listtupledict- mutable
- immutable
- container type
- sequential type
- associative type
list.append()list.count()list.copy()
In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
What is a sequential type?¶
Beyond the simplest types, Python (and other languages) offers a number of built-in types that represent a collection of values. Generically, these are called [container types](https://en.wikipedia.org/wiki/Container_(abstract_data_type) because they are data structures that contain one or more values. Container types differ in the way by which their values are stored and accessed.
Probably the simple container type to understand is the sequential
type, the elements of which are arranged one after another (i.e., a
sequence) and can be accessed individually by knowing only their
location within the sequence. Sound familiar? It should: NumPy’s
ndarray is a sequential type. Remember, given an array (e.g.,
a = np.array([1, 2, 3]), we can access any element by its location
with the [] operator (e.g., the first element a[0]). Similarly,
the str type is a sequential type the elements of which are limited
to individual characters.
Making Connections: Python’sstrand NumPy’sndarrayare sequential types.
Often, a sequential type is the right choice when the relationship
between values to be stored is best represented by their position in the
sequence. A word, like “pandemonium,” loses its meaning if the
individual characters are not accessible in order, and an array like
np.linspace(0, 10) is useless unless it can be used to represent
values of \(x\) from 0 to 10 in order.
The list Type¶
Lots of computation, numerical or otherwise, needs sequential data, and
Python has more answers. The most versatile of Python’s sequential types
is list. A list variable can be defined using comma-separated
values within square brackets []. For example, a list with the
values 1, 2, and 3 is defined via
In [2]:
a = [1, 2, 3]
a
Out[2]:
[1, 2, 3]
That syntax may look familiar: we used it to help produce NumPy arrays like
In [3]:
import numpy as np
b = np.array([1, 2, 3])
b
Out[3]:
array([1, 2, 3])
With a defined, we could just as well do
In [4]:
c = np.array(a)
c
Out[4]:
array([1, 2, 3])
However, list is more versatile for general programming because its
elements can be arbitrary. In other words, they are not limited to
numerical values. For example, consider this list of very different
values:
In [5]:
d = [1, 3.14, 'hello', np.array([1, 2, 3])]
d
Out[5]:
[1, 3.14, 'hello', array([1, 2, 3])]
Like all sequential types in Python, individual elements of a list
are accessed using [], e.g.,
In [6]:
d[2]
Out[6]:
'hello'
Elements of list variables can also be extracted using the same
slicing technique introduced in Lecture 4.
Recall, slicing an array (or, now, any sequence) take the form
a[start:stop:stride]. Thus, we can get elements 0 and 2 of
d via
In [7]:
d[0:len(d):2]
Out[7]:
[1, 'hello']
In addition to indexing and slicing, the list type provides a number
of useful functions that can be listed using dir.
Exercise: Usedirto list the attributes (functions, etc.) of alistvariable.
Remember (as in Lecture 2) that dir can lead
to a lots of names with double underscores, which usually represent
items that are not of much interest. Note, though, that dir produces
a list, and the elements of that lists can be accessed one by one using
for loop. The interesting items can be printed by skipping those
whose name starts with double underscore ('__'):
In [8]:
items = dir(list) # a list of str names
for i in range(len(items)):
if not items[i][0:2] == '__':
print(items[i])
append
clear
copy
count
extend
index
insert
pop
remove
reverse
sort
Some of these names may suggest obvious functionality, while some may be less intuitive.
Exercise: Usehelpto define what each of these functions does.
Of particular interest is the functions append. Consider the empty
list, defined as
In [9]:
e = []
e
Out[9]:
[]
Just like the empty string '', the empty list is equivalent to
False:
In [10]:
bool(e)
Out[10]:
False
and has zero length:
In [11]:
len(e)
Out[11]:
0
We can add an element, say the integer 123, to e using append:
In [12]:
e.append(123)
e
Out[12]:
[123]
Another element, this time a float, could be added, e.g.,
In [13]:
e.append(0.111)
e
Out[13]:
[123, 0.111]
The append functions places the new element after all of the other
elements. Often, lists are constructed iteratively, and append is
the natural choice. However, one can also use insert to place new
elements anywhere in the list. Alternatively, the last element of a
list can be removed using pop, while arbitrary elements can be
removed via delete.
Quick Exercise: Letx = [1, 2, 3, 4, 5]. Now, add a new element 6 toxafter 5. Then delete 3 and 4 fromx, leaving just[1, 2, 5, 6].
Before moving on, it is worth noting that list variables, like
str variables, may be manipulated using + and * operators,
though the results may not be what one expects. For instance, it may be
reasonable to assume that 3 * [1, 2, 3] leads to a new list with
elements equal to [3, 6, 9], but that is not the case:
In [14]:
3 * [1, 2, 3]
Out[14]:
[1, 2, 3, 1, 2, 3, 1, 2, 3]
Apparently, the contents of the initial list are simply repeated three
times in sequence. In fact, this behavior is similar to that observed
for multiplication of an int by a str:
In [15]:
3 * '...'
Out[15]:
'.........'
Similarly, the addition of two lists leads to
In [16]:
[1, 2] + [3, 4, 5]
Out[16]:
[1, 2, 3, 4, 5]
In other words, addition and multiplication can be used to join the contents of lists rather easily. However, arithmetic operations cannot be used with lists in the same way possible for NumPy arrays.
Exercise: What is the result of
z = 10*[0]?Exercise: Check whether
list(np.array([1, 2, 3])*2)and[1, 2, 3]*2are equivalent. Can you say so without evaluating those expressions?Exercise: Given
f = [1, 2, 3, 4], what doesf[::-1]produce? What are the impliedstartandstopvalues?Exercise: Consider the list of lists
M = [[1, 2], [3, 4]]. How would one access the element[3, 4]?Solution: Recognize that[1, 2]is the first element ofMand can, therefore, be accessed asM[0]. If that looks confusing, suppose that you first setrow1 = [1, 2]androw2 = [3, 4]. ThenM = [row1, row2]. We getrow1fromM[0].Exercise: Consider the list of lists
M = [[1, 2], [3, 4]]. How would one access the element4?Solution: Recognize that4is the second element in[3, 4]and that[3, 4]is the second element ofM. We get[3, 4]viaM[1]. To get4, we then can useM[1][1]. We cannot useM[1, 1], which is a special indexing allowed by 2-D NumPy arrays.Exercise: Consider the list of lists
M = [[1, 2], [3, 4]]. Is it possible to obtain the element[2, 4]by slicing?Exercise: Given
f = [1, 2, 3, 4], how can slicing be used to produce the list[4, 3, 2, 1]?
Mutability¶
A container type is mutable if the values of individual elements can
be changed. The list type is mutable, as is NumPy’s ndarray,
because we can access an element and change the value of an element. For
example, given the list
In [17]:
g = [1, 2, 3, 4]
that first value 1 can be changed to 99 via
In [18]:
g[0] = 99
Of course, we are able to do exactly the same sort of operation to
ndarray variable elements. However, we are not able to change the
elements of variables whose type is immutable. One such type is
str. Given
In [19]:
s = 'hello'
we cannot get jello via
In [20]:
s[0] = 'j'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-a225757de94d> in <module>()
----> 1 s[0] = 'j'
TypeError: 'str' object does not support item assignment
The fact that list and other types are mutable can lead to some
behavior that may be surprising a new programmer. For example, consider
the following:
In [21]:
list1 = [1, 2]
list2 = list1 # a copy of list1, right?
print('list1 = ', list1)
print('list2 = ', list2) # so far, so good
list2[0] = 99 # change list2
print('list1 = ', list1) # why is list1 changed?
list1 = [1, 2]
list2 = [1, 2]
list1 = [99, 2]
Here, we might have expected list1 to remain as [1, 2] while
list2 became [99, 2]. However, the assignment h = i does
not assign a copy of the values already assigned to list1.
Rather, list2 is assigned to the same values. When those values
change (by modifying either list1 or list2), they do so for both
names (list1 and list2).
Warning: Given alistvariablesa, the operationb = adoes not make a copy ofa‘s elements. Rather,bandaare two names for the same data. Changing one necessarily changes the other.
Actually, this behavior is no so uncommon in programming, although it’s
not always the default. Basically, when a variable is created (in any
language), a certain amount of memory is required to store the values.
We access those values by using the name of the variable. What Python
does by default is to assign multiple names to the same memory when we
make assignments like list2 = list1 above. For those who have (or
will have) exposure to C or C++, the same behavior can be observed when
assigning pointer variables to one another.
One way to avoid this issue is to create an explicit copy. Built into
list is the copy function, which can be used like
In [22]:
list2 = list1.copy()
list2[0] = 101 # changes list2, but not list1
list1
list2
Out[22]:
[99, 2]
Out[22]:
[101, 2]
The tuple Type¶
Python has another, built-in immutable type called tuple that, like
list, allows one to store a sequence of elements with arbitrary
types. The clearest way to define a tuple is via comma-separated
values enclosed in parentheses, e.g.,
In [23]:
A = (1, 2, 3)
A
Out[23]:
(1, 2, 3)
The parentheses make it very clear what is being defined, but they are
not necessary. For example, one could define another tuple via
In [24]:
D = 1, 3.14, 'hello', np.array([1, 2, 3])
D
Out[24]:
(1, 3.14, 'hello', array([1, 2, 3]))
Elements of tuple variables are accessed following the same indexing
and slicing rules for lists. However, their elements cannot be
reassigned because the tuple type is immutable (so that, e.g.,
D[0] = 99 will fail).
So what does a tuple variable have to offer? Again, dir may be
used:
In [25]:
items = dir(tuple) # a list of str names
for i in range(len(items)):
if not items[i][0:2] == '__':
print(items[i])
count
index
That’s not very many functions, and a comparison to list suggests
that tuple is missing anything that would replace, remove, or add to
its elements.
Exercise: See what the functionstuple.countandtuple.indexare used to do.
The dict Type¶
The list and tuple types are the built-in solutions for storing
sequences of arbitrary elements. Remember, a sequential type is the
right choice when the relationship between the data stored is positional
(like word characters, etc.). There are circumstances, though, in which
data is not really (or only) related by order.
Consider a perhaps obvious example: Webster’s
dictionary. Although the entries
are often stored in sequence (alphabetically, not numerically), the data
itself is more complex than a single value. Rather, each entry in the
dictionary consists of two things: a key, in this the word to be
defined, and the value, in this case, the definition of the word.
Under the hood, the key:value pairs need not be stored in sequence
(alphabetically or otherwise), but we should be able to get the
value given any key.
Python’s dict (short for dictionary, of course) provides us with
such a data structure. A Python dict variable can have keys of any
type paired with values of any type. One way to define a dict
variable is by starting with the empty dictionary:
In [26]:
stuff = {}
stuff
Out[26]:
{}
We can add a new element by indexing using the key and assigning that element to a value, or
In [27]:
stuff['some key'] = 'some value'
stuff # not empty anymore
Out[27]:
{'some key': 'some value'}
We can access the element for any key defined using the same syntax:
In [28]:
stuff['some key']
Out[28]:
'some value'
If we try to get a value for a key not in the dictionary, a KeyError
(similar to IndexError) is encountered:
In [29]:
stuff['some other key'] # not a key in our dictionary
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-29-bc5178045979> in <module>()
----> 1 stuff['some other key'] # not a key in our dictionary
KeyError: 'some other key'
An easy way to check whether a dictionary has a key is to use the in
operator. The basic syntax is element in container, and the result
is True or False. The only requirement is that container is
iterable, meaning that each of its elements can be inspected. This is
true for sequential types like list and ndarray. It is also true
for dict (and we’ll see how momentarily). So, to check whether
'some key' and 'some other key' are in our dictionary, we can
use
In [30]:
'some key' in stuff
'some other key' in stuff
Out[30]:
True
Out[30]:
False
Hence, one can always check that a key exists before accessing a value.
Dictionaries can also be defined all at once. For example, we can map the first few letters of the alphabet to their position via
In [31]:
alphanum = {'a': 0, 'b': 1, 'c': 2}
alphanum
Out[31]:
{'a': 0, 'b': 1, 'c': 2}
Exercise: Do from string import ascii_lowercase as s, which
creates a string s with the entire alphabet. Then, create your own
alphanum dictionary with all 26 letters.
So what does dict have? Let’s see:
In [32]:
items = dir(dict) # a list of str names
for i in range(len(items)):
if not items[i][0:2] == '__':
print(items[i])
clear
copy
fromkeys
get
items
keys
pop
popitem
setdefault
update
values
Some of these look similar to those found in list, and some are new:
go look them up via help. Two useful ones are keys and
values:
In [33]:
alphanum.keys()
list(alphanum.keys())
Out[33]:
dict_keys(['a', 'b', 'c'])
Out[33]:
['a', 'b', 'c']
In [34]:
alphanum.values()
list(alphanum.values())
Out[34]:
dict_values([0, 1, 2])
Out[34]:
[0, 1, 2]
An application requiring just the keys might be one in which student names (and not their student numbers, or grades, or other sensitive items) are required. The values might be needed if just grades were required, perhaps to do a statistical analysis.
Further Reading¶
None at this time.