Pythonic Containers

Overview, Objectives, and Key Terms

All the way back in Lecture 1, the simplest of variable types were presented (namely, int, float, and bool) along with the more complex but indespensable str type. In Lecture 3, NumPy was introduced, with its ndarray type serving as the workhorse for a variety of applications, particularly those with a numerical flavor. In this lecture, the built-in Python types list, tuple, and dict are presented, with motivating applications for each.

Objectives

By the end of this lesson, you should be able to

  • Define and use list and tuple variables.
  • Define and use dict variables.
  • Explain the difference between mutable and immutable types.

Key Terms

  • list
  • tuple
  • dict
  • mutable
  • immutable
  • container type
  • sequential type
  • associative type
  • list.append()
  • list.count()
  • list.copy()
In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

What is a sequential type?

Beyond the simplest types, Python (and other languages) offers a number of built-in types that represent a collection of values. Generically, these are called [container types](https://en.wikipedia.org/wiki/Container_(abstract_data_type) because they are data structures that contain one or more values. Container types differ in the way by which their values are stored and accessed.

Probably the simple container type to understand is the sequential type, the elements of which are arranged one after another (i.e., a sequence) and can be accessed individually by knowing only their location within the sequence. Sound familiar? It should: NumPy’s ndarray is a sequential type. Remember, given an array (e.g., a = np.array([1, 2, 3]), we can access any element by its location with the [] operator (e.g., the first element a[0]). Similarly, the str type is a sequential type the elements of which are limited to individual characters.

Making Connections: Python’s str and NumPy’s ndarray are sequential types.

Often, a sequential type is the right choice when the relationship between values to be stored is best represented by their position in the sequence. A word, like “pandemonium,” loses its meaning if the individual characters are not accessible in order, and an array like np.linspace(0, 10) is useless unless it can be used to represent values of \(x\) from 0 to 10 in order.

The list Type

Lots of computation, numerical or otherwise, needs sequential data, and Python has more answers. The most versatile of Python’s sequential types is list. A list variable can be defined using comma-separated values within square brackets []. For example, a list with the values 1, 2, and 3 is defined via

In [2]:
a = [1, 2, 3]
a
Out[2]:
[1, 2, 3]

That syntax may look familiar: we used it to help produce NumPy arrays like

In [3]:
import numpy as np
b = np.array([1, 2, 3])
b
Out[3]:
array([1, 2, 3])

With a defined, we could just as well do

In [4]:
c = np.array(a)
c
Out[4]:
array([1, 2, 3])

However, list is more versatile for general programming because its elements can be arbitrary. In other words, they are not limited to numerical values. For example, consider this list of very different values:

In [5]:
d = [1, 3.14, 'hello', np.array([1, 2, 3])]
d
Out[5]:
[1, 3.14, 'hello', array([1, 2, 3])]

Like all sequential types in Python, individual elements of a list are accessed using [], e.g.,

In [6]:
d[2]
Out[6]:
'hello'

Elements of list variables can also be extracted using the same slicing technique introduced in Lecture 4. Recall, slicing an array (or, now, any sequence) take the form a[start:stop:stride]. Thus, we can get elements 0 and 2 of d via

In [7]:
d[0:len(d):2]
Out[7]:
[1, 'hello']

In addition to indexing and slicing, the list type provides a number of useful functions that can be listed using dir.

Exercise: Use dir to list the attributes (functions, etc.) of a list variable.

Remember (as in Lecture 2) that dir can lead to a lots of names with double underscores, which usually represent items that are not of much interest. Note, though, that dir produces a list, and the elements of that lists can be accessed one by one using for loop. The interesting items can be printed by skipping those whose name starts with double underscore ('__'):

In [8]:
items = dir(list) # a list of str names
for i in range(len(items)):
    if not items[i][0:2] == '__':
        print(items[i])
append
clear
copy
count
extend
index
insert
pop
remove
reverse
sort

Some of these names may suggest obvious functionality, while some may be less intuitive.

Exercise: Use help to define what each of these functions does.

Of particular interest is the functions append. Consider the empty list, defined as

In [9]:
e = []
e
Out[9]:
[]

Just like the empty string '', the empty list is equivalent to False:

In [10]:
bool(e)
Out[10]:
False

and has zero length:

In [11]:
len(e)
Out[11]:
0

We can add an element, say the integer 123, to e using append:

In [12]:
e.append(123)
e
Out[12]:
[123]

Another element, this time a float, could be added, e.g.,

In [13]:
e.append(0.111)
e
Out[13]:
[123, 0.111]

The append functions places the new element after all of the other elements. Often, lists are constructed iteratively, and append is the natural choice. However, one can also use insert to place new elements anywhere in the list. Alternatively, the last element of a list can be removed using pop, while arbitrary elements can be removed via delete.

Quick Exercise: Let x = [1, 2, 3, 4, 5]. Now, add a new element 6 to x after 5. Then delete 3 and 4 from x, leaving just [1, 2, 5, 6].

Before moving on, it is worth noting that list variables, like str variables, may be manipulated using + and * operators, though the results may not be what one expects. For instance, it may be reasonable to assume that 3 * [1, 2, 3] leads to a new list with elements equal to [3, 6, 9], but that is not the case:

In [14]:
3 * [1, 2, 3]

Out[14]:
[1, 2, 3, 1, 2, 3, 1, 2, 3]

Apparently, the contents of the initial list are simply repeated three times in sequence. In fact, this behavior is similar to that observed for multiplication of an int by a str:

In [15]:
3 * '...'
Out[15]:
'.........'

Similarly, the addition of two lists leads to

In [16]:
[1, 2] + [3, 4, 5]
Out[16]:
[1, 2, 3, 4, 5]

In other words, addition and multiplication can be used to join the contents of lists rather easily. However, arithmetic operations cannot be used with lists in the same way possible for NumPy arrays.

Exercise: What is the result of z = 10*[0]?

Exercise: Check whether list(np.array([1, 2, 3])*2) and [1, 2, 3]*2 are equivalent. Can you say so without evaluating those expressions?

Exercise: Given f = [1, 2, 3, 4], what does f[::-1] produce? What are the implied start and stop values?

Exercise: Consider the list of lists M = [[1, 2], [3, 4]]. How would one access the element [3, 4]?

Solution: Recognize that [1, 2] is the first element of M and can, therefore, be accessed as M[0]. If that looks confusing, suppose that you first set row1 = [1, 2] and row2 = [3, 4]. Then M = [row1, row2]. We get row1 from M[0].

Exercise: Consider the list of lists M = [[1, 2], [3, 4]]. How would one access the element 4?

Solution: Recognize that 4 is the second element in [3, 4] and that [3, 4] is the second element of M. We get [3, 4] via M[1]. To get 4, we then can use M[1][1]. We cannot use M[1, 1], which is a special indexing allowed by 2-D NumPy arrays.

Exercise: Consider the list of lists M = [[1, 2], [3, 4]]. Is it possible to obtain the element [2, 4] by slicing?

Exercise: Given f = [1, 2, 3, 4], how can slicing be used to produce the list [4, 3, 2, 1]?

Mutability

A container type is mutable if the values of individual elements can be changed. The list type is mutable, as is NumPy’s ndarray, because we can access an element and change the value of an element. For example, given the list

In [17]:
g = [1, 2, 3, 4]

that first value 1 can be changed to 99 via

In [18]:
g[0] = 99

Of course, we are able to do exactly the same sort of operation to ndarray variable elements. However, we are not able to change the elements of variables whose type is immutable. One such type is str. Given

In [19]:
s = 'hello'

we cannot get jello via

In [20]:
s[0] = 'j'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-a225757de94d> in <module>()
----> 1 s[0] = 'j'

TypeError: 'str' object does not support item assignment

The fact that list and other types are mutable can lead to some behavior that may be surprising a new programmer. For example, consider the following:

In [21]:
list1 = [1, 2]
list2 = list1            # a copy of list1, right?
print('list1 = ', list1)
print('list2 = ', list2) # so far, so good
list2[0] = 99            # change list2
print('list1 = ', list1) # why is list1 changed?
list1 =  [1, 2]
list2 =  [1, 2]
list1 =  [99, 2]

Here, we might have expected list1 to remain as [1, 2] while list2 became [99, 2]. However, the assignment h = i does not assign a copy of the values already assigned to list1. Rather, list2 is assigned to the same values. When those values change (by modifying either list1 or list2), they do so for both names (list1 and list2).

Warning: Given a list variables a, the operation b = a does not make a copy of a‘s elements. Rather, b and a are two names for the same data. Changing one necessarily changes the other.

Actually, this behavior is no so uncommon in programming, although it’s not always the default. Basically, when a variable is created (in any language), a certain amount of memory is required to store the values. We access those values by using the name of the variable. What Python does by default is to assign multiple names to the same memory when we make assignments like list2 = list1 above. For those who have (or will have) exposure to C or C++, the same behavior can be observed when assigning pointer variables to one another.

One way to avoid this issue is to create an explicit copy. Built into list is the copy function, which can be used like

In [22]:
list2 = list1.copy()
list2[0] = 101 # changes list2, but not list1
list1
list2
Out[22]:
[99, 2]
Out[22]:
[101, 2]

The tuple Type

Python has another, built-in immutable type called tuple that, like list, allows one to store a sequence of elements with arbitrary types. The clearest way to define a tuple is via comma-separated values enclosed in parentheses, e.g.,

In [23]:
A = (1, 2, 3)
A
Out[23]:
(1, 2, 3)

The parentheses make it very clear what is being defined, but they are not necessary. For example, one could define another tuple via

In [24]:
D = 1, 3.14, 'hello', np.array([1, 2, 3])
D
Out[24]:
(1, 3.14, 'hello', array([1, 2, 3]))

Elements of tuple variables are accessed following the same indexing and slicing rules for lists. However, their elements cannot be reassigned because the tuple type is immutable (so that, e.g., D[0] = 99 will fail).

So what does a tuple variable have to offer? Again, dir may be used:

In [25]:
items = dir(tuple) # a list of str names
for i in range(len(items)):
    if not items[i][0:2] == '__':
        print(items[i])
count
index

That’s not very many functions, and a comparison to list suggests that tuple is missing anything that would replace, remove, or add to its elements.

Exercise: See what the functions tuple.count and tuple.index are used to do.

The dict Type

The list and tuple types are the built-in solutions for storing sequences of arbitrary elements. Remember, a sequential type is the right choice when the relationship between the data stored is positional (like word characters, etc.). There are circumstances, though, in which data is not really (or only) related by order.

Consider a perhaps obvious example: Webster’s dictionary. Although the entries are often stored in sequence (alphabetically, not numerically), the data itself is more complex than a single value. Rather, each entry in the dictionary consists of two things: a key, in this the word to be defined, and the value, in this case, the definition of the word. Under the hood, the key:value pairs need not be stored in sequence (alphabetically or otherwise), but we should be able to get the value given any key.

Python’s dict (short for dictionary, of course) provides us with such a data structure. A Python dict variable can have keys of any type paired with values of any type. One way to define a dict variable is by starting with the empty dictionary:

In [26]:
stuff = {}
stuff
Out[26]:
{}

We can add a new element by indexing using the key and assigning that element to a value, or

In [27]:
stuff['some key'] = 'some value'
stuff # not empty anymore
Out[27]:
{'some key': 'some value'}

We can access the element for any key defined using the same syntax:

In [28]:
stuff['some key']
Out[28]:
'some value'

If we try to get a value for a key not in the dictionary, a KeyError (similar to IndexError) is encountered:

In [29]:
stuff['some other key'] # not a key in our dictionary
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-29-bc5178045979> in <module>()
----> 1 stuff['some other key'] # not a key in our dictionary

KeyError: 'some other key'

An easy way to check whether a dictionary has a key is to use the in operator. The basic syntax is element in container, and the result is True or False. The only requirement is that container is iterable, meaning that each of its elements can be inspected. This is true for sequential types like list and ndarray. It is also true for dict (and we’ll see how momentarily). So, to check whether 'some key' and 'some other key' are in our dictionary, we can use

In [30]:
'some key' in stuff
'some other key' in stuff
Out[30]:
True
Out[30]:
False

Hence, one can always check that a key exists before accessing a value.

Dictionaries can also be defined all at once. For example, we can map the first few letters of the alphabet to their position via

In [31]:
alphanum = {'a': 0, 'b': 1, 'c': 2}
alphanum
Out[31]:
{'a': 0, 'b': 1, 'c': 2}

Exercise: Do from string import ascii_lowercase as s, which creates a string s with the entire alphabet. Then, create your own alphanum dictionary with all 26 letters.

So what does dict have? Let’s see:

In [32]:
items = dir(dict) # a list of str names
for i in range(len(items)):
    if not items[i][0:2] == '__':
        print(items[i])
clear
copy
fromkeys
get
items
keys
pop
popitem
setdefault
update
values

Some of these look similar to those found in list, and some are new: go look them up via help. Two useful ones are keys and values:

In [33]:
alphanum.keys()
list(alphanum.keys())
Out[33]:
dict_keys(['a', 'b', 'c'])
Out[33]:
['a', 'b', 'c']
In [34]:
alphanum.values()
list(alphanum.values())
Out[34]:
dict_values([0, 1, 2])
Out[34]:
[0, 1, 2]

An application requiring just the keys might be one in which student names (and not their student numbers, or grades, or other sensitive items) are required. The values might be needed if just grades were required, perhaps to do a statistical analysis.

Further Reading

None at this time.