More on NumPy Arrays: Slicing and np.linalg
¶
Overview, Objectives, and Key Terms¶
In this lesson, we’ll continue our investigation started in Lesson 3 and look multidimensional arrays and how to access multiple elements via slicing.
Objectives¶
By the end of this lesson, you should be able to
- define and manipulate two-dimensional NumPy arrays
- visualize two-dimensional arrays
- slicing and other indexing of one- and two-dimensional arrays
Key Terms¶
np.ones
(for 2-D arrays)np.zeros
(for 2-D arrays)np.array
(for 2-D arrays)np.meshgrid
plt.contour
plt.contourf
plt.colorbar
- colormap
- slice
- stride
np.reshape
np.random.rand
- matrix-vector multiplication
np.dot
(for 2-D arrays)np.matmul
In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
Making Two-Dimensional Arrays¶
A lot of data lives in tabulated structures that are logically
equivalent to two-dimensional arrays. We actually saw that in Lesson
3 with our time, velocity, and acceleration
example. When loaded in via np.loadtxt
, that data was stored as an
array having a shape of (3, 10)
.
We can make such two-dimensional arrays. The easiest ways are the
np.ones
and np.zeros
functions, e.g.,
In [2]:
import numpy as np
A = np.ones((3, 3))
A
Out[2]:
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
In [3]:
B = np.zeros((3, 3))
B
Out[3]:
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
We can access and modify individual elements of these arrays just like
we can their one-dimensional cousins. For example, let’s change the
lower-right element of B
to 99:
In [4]:
B[2, 0] = 99
B
Out[4]:
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[99., 0., 0.]])
When indexing a two-dimensional array, the syntax is always like
B[i, j]
, where i
is the row and j
is the column.
Two-dimensional arrays can also be made directly from existing data, e.g.,
In [5]:
C = np.array([[1, 2],
[3, 4]]) # By splitting this line, the structure is much easier to see.
C
Out[5]:
array([[1, 2],
[3, 4]])
Here, the input to np.array
is [[1, 2], [3, 4]]
, which is
actually a list
with elements that are themselves list
‘s. The
key is that we need each row to have the form [x, y, ...]
, and all
rows need to be separated by commas and surrounded by an additional pair
of []
‘s.
Recall that often we need to evaluate a function \(f(x)\) at evenly-spaced points in some range. The same is true in two and three dimensions. We’ll stick in 2-D for now and consider evaluation of the two-dimensional function
which is known as Booth’s
function. Here, we want to
evaluate \(f(x, y)\) for \(x, y \in [-10, 10]\). First, we’ll
define the arrays x
and y
using evenly-spaced points in the
range \([-10, 10]\).
In [6]:
x = np.linspace(-10, 10, 5)
y = np.linspace(-10, 10, 5)
Now, imagine these points as defining a grid in the \(xy\) plane.
How can we evaluate \(f(x, y)\) at each possible pair of points
\((x_i, y_j)\)? The “programming” approach—and one we’ll learn
later—would be to employ loops. However, NumPy provides an easy and
efficient way to do this for us in its np.meshgrid
function.
In [7]:
xx, yy = np.meshgrid(x, y)
xx
Out[7]:
array([[-10., -5., 0., 5., 10.],
[-10., -5., 0., 5., 10.],
[-10., -5., 0., 5., 10.],
[-10., -5., 0., 5., 10.],
[-10., -5., 0., 5., 10.]])
In [8]:
yy
Out[8]:
array([[-10., -10., -10., -10., -10.],
[ -5., -5., -5., -5., -5.],
[ 0., 0., 0., 0., 0.],
[ 5., 5., 5., 5., 5.],
[ 10., 10., 10., 10., 10.]])
Notice that these new arrays (named xx
and yy
so that we don’t
overwrite x
and y
) are two dimensional. However, by marching
through, for example, the top rows, we see that the resulting pairs of
numbers are \((-10, -10)\), \((-5, -10)\), \((0, -10)\) and
so on. Hence, all of the 5 possible values of \(x\) are paired with
the value \(y = -10\) (first row) and then \(y = -5\) (second
row), etc.
We can now evaluate the desired function.
In [9]:
f = (xx + 2*yy + 7)**2 + (2*xx + yy - 5)**2
f
Out[9]:
array([[1754., 949., 394., 89., 34.],
[1069., 464., 109., 4., 149.],
[ 634., 229., 74., 169., 514.],
[ 449., 244., 289., 584., 1129.],
[ 514., 509., 754., 1249., 1994.]])
Visualizing 2-D Arrays¶
We saw in Lesson 3 how easily one can produce a plot of 1-D data (in the form of NumPy arrays) by using Matplotlib. We can also visualize 2-D data. First, let’s give ourselves a somewhat richer set of data to visualize by increasing the number of \(x\) and \(y\) points we use to evaluate \(f(x, y)\).
In [10]:
x, y = np.linspace(-10, 10, 100), np.linspace(-10, 10, 100)
# note how two assignments can be written in one line
xx, yy = np.meshgrid(x, y)
f = (xx + 2*yy + 7)**2 + (2*xx + yy - 5)**2
One way to visualize \(f(x, y)\) is through use of
`contour <http://www.itl.nist.gov/div898/handbook/eda/section3/contour.htm>`__
plots. Matplotlib offers two versions of a contour plot: plt.contour
and plt.contourf
, where the latter is a “filled” version. Here they
are for our data:
In [11]:
import matplotlib.pyplot as plt
plt.contour(x, y, f)
plt.xlabel('x')
plt.ylabel('y')
plt.title("Booth's function via plt.contour") # Sometimes, a title is useful if no caption can be provided.
plt.colorbar() # The colorbar function produces the color legend at the right.
plt.show()
Out[11]:
<matplotlib.contour.QuadContourSet at 0x7f34134cf208>
Out[11]:
Text(0.5,0,'x')
Out[11]:
Text(0,0.5,'y')
Out[11]:
Text(0.5,1,"Booth's function via plt.contour")
Out[11]:
<matplotlib.colorbar.Colorbar at 0x7f34134a0208>
<Figure size 640x480 with 2 Axes>
In [12]:
plt.contourf(x, y, f, cmap=plt.cm.inferno)
plt.xlabel('x')
plt.ylabel('y')
plt.title("Booth's function via plt.contourf")
plt.colorbar()
plt.show()
Out[12]:
<matplotlib.contour.QuadContourSet at 0x7f341341fa90>
Out[12]:
Text(0.5,0,'x')
Out[12]:
Text(0,0.5,'y')
Out[12]:
Text(0.5,1,"Booth's function via plt.contourf")
Out[12]:
<matplotlib.colorbar.Colorbar at 0x7f34133dcbe0>
Here, I’ve added the cmap
argument to plt.contourf
, which
changes the colormap (i.e., color scheme) of the image. Matplotlib
provides dozens of
options,
but it’s strongly recommended to use one of the “perceptually uniform
sequential” colormaps, which provide good contrast in black and white
and are more easily interpreted by folks with colorblindness than more
traditional colormaps like “jet.”
Other functions of interest include plt.pcolor
and plt.imshow
,
which are left to the reader to explore using help
and online
documentation.
Slicing¶
Now we turn to a useful operation known as slicing, a process by which
we can access more than one element of an array. We’ll find that slicing
is also applicable to other data types, including the container types
list
and tuple
that will be explored later on.
Slicing in 1-D¶
For starters, let’s create a one dimensional array of the numbers \(0, 1, \ldots, 9\):
In [13]:
a = np.arange(10)
We already saw that individual elements of an array like a
can be
accessed via the []
operator, e.g., a[2]
gives us the third
element (because numbering starts at zero). Suppose we want to define a
new array b
that has the first three elements of a
. We can
slice a
by doing
In [14]:
b = a[0:3]
Here, the :
is the key, and the syntax 0:3
can be read as
from 0 up to but not including 3
. That’s key: we get a[0]
,
a[1]
, and a[2]
, but not a[3]
.
Note. Basic slicing of an arraya
has the forma[start:end]
, where the second numberend
is one plus the location of the last element desired.
The slicing syntax also allows a third number called the stride. For
example, a stride of two lets one select only every other element. Here,
we can get let c
be all the even-indexed elements of a
:
In [15]:
c = a[0:10:2]
Again, the syntax 0:10:2
can be read as
from 0 up to 10, skipping every other element
.
There are a couple of shortcuts one can use in slicing. For example,
each of the three numbers has a default value. The starting value
defaults to the beginning of the array, i.e., element zero, while the
ending value defaults to the length of the array. The stride, by
default, is one, which means no elements are skipped. When the stride is
not defined, the second :
can be omitted. Because these defaults are
defined, the following are equivalent:
In [16]:
a[0:10:1]
a[0:10]
a[:10:1]
a[0::1]
a[0::]
a[:10:]
a[::1]
a[::]
a[:]
a
Out[16]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[16]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[16]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[16]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[16]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[16]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[16]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[16]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[16]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Out[16]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Note. The complete slicing syntax for an arraya
isa[start:end:stride]
. By default,start = 0
,end = len(a)
, andstride = 1
. Hence, any or all ofstart
,end
, andstride
may be omitted. Whenstride
is omitted, the second:
may also be omitted.
Slicing of NumPy arrays provides not just access to selected elements: it also lets one modify those elements. Given the zeros array
In [17]:
z = np.zeros(10)
we can set all the odd elements to unity
In [18]:
z[1::2] = 1.0
We could then set the even elements to \(1, 2, 3, 4, 5\) via
In [19]:
z[::2] = np.arange(1, 6)
because z[::2]
and np.arange(5)
have the same length—we’d get
an error otherwise!
Slicing in 2-D¶
Slicing applies equally to two-dimensional arrays. Consider the
following code, in which the 1-D array of numbers 0, 1, 2, \ldots 24
is reshaped into a square 2-D array:
In [20]:
D = np.arange(25).reshape((5, 5))
D
Out[20]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
Just as we can access individual elements of a 1-D array, so too can we access an element of a 2-D array. Here, to get the first element of the first row, we would use
In [21]:
D[0, 0]
Out[21]:
0
If we want the whole first row, we can replace the column index by a
:
, i.e.,
In [22]:
D[0, :]
Out[22]:
array([0, 1, 2, 3, 4])
and similarly for the first column,
In [23]:
D[:, 0]
Out[23]:
array([ 0, 5, 10, 15, 20])
The same syntax used in 1-D, i.e., start:end:stride
works for the
row and column indices of a 2-D array independently. For example, we can
double the elements of the odd columns (i.e., columns 1 and 3) of D
by using
In [24]:
D[:, 1::2] *= 2
Other Indexing¶
Slicing is extremely powerful, but there are other ways to access elements of an array that are particularly well suited to special applications. One method that is useful when processing tabulated data is based on conditional selection of array elements.
Consider the following two-dimensional array, in which the elements are random numbers distributed evenly between 0 and 1:
In [25]:
E = np.random.rand(5, 5)
E
Out[25]:
array([[0.94186834, 0.64547139, 0.15322779, 0.16304291, 0.4661734 ],
[0.87379668, 0.81482831, 0.28383103, 0.59685898, 0.89252992],
[0.52517944, 0.79366051, 0.55013028, 0.84990704, 0.81530478],
[0.32832184, 0.33208004, 0.26212632, 0.61609737, 0.89364675],
[0.29166547, 0.47505599, 0.02964755, 0.27334342, 0.40288735]])
The array E
will, by default, be different every time this line is
executed, but on the average, half the elements will be greater than
0.5. How can we select those elements? We can do so with the comparison
operator >
. For example, we can get a boolean map of the elements of
E
that are larger than 0.5 via
In [26]:
E > 0.5
Out[26]:
array([[ True, True, False, False, False],
[ True, True, False, True, True],
[ True, True, True, True, True],
[False, False, False, True, True],
[False, False, False, False, False]])
The actual elements that satisfy that criterion can be accessed with
In [27]:
E[E > 0.5]
Out[27]:
array([0.94186834, 0.64547139, 0.87379668, 0.81482831, 0.59685898,
0.89252992, 0.52517944, 0.79366051, 0.55013028, 0.84990704,
0.81530478, 0.61609737, 0.89364675])
If, for example, we want to change all of those elements to 1.0, we could do
In [28]:
E[E > 0.5] = 1.0
E
Out[28]:
array([[1. , 1. , 0.15322779, 0.16304291, 0.4661734 ],
[1. , 1. , 0.28383103, 1. , 1. ],
[1. , 1. , 1. , 1. , 1. ],
[0.32832184, 0.33208004, 0.26212632, 1. , 1. ],
[0.29166547, 0.47505599, 0.02964755, 0.27334342, 0.40288735]])
The basic ingredients for this approach are an array of interest (here,
that’s E
) and an array of the same size that has elements of the
bool
type. Most often, the array of bool
‘s comes from a
conditional statement involving the original array (e.g., E > 0.5
).
However, we can also creat them explicitly. To get the second and fifth
element of np.arange(5)
, we could do
In [29]:
np.arange(5)[np.array([False, True, False, False, True])]
Out[29]:
array([1, 4])
Matrix-Vector and Matrix-Matrix Operations¶
As a final topic for our introduction to Python “as a calculator” is its
use—via NumPy–for handling matrix and vector operations. We’ve not
seen ways to create and manipulate one- and two-dimensional arrays. In
Lesson 3, np.dot
was introduced for dot
(sometimes scalar or inner) products of vectors (represented by
ndarray
‘s).
Matrix-Vector Multiplication¶
In NumPy, np.dot
can also be used for matrix-vector multiplication.
First, recall that the multiplication of an \(m \times n\) matrix
\(\mathbf{A}\) by a \(n \times 1\) vector \(\mathbf{v}\) is
defined formally as
where the indices start at zero and end with \(m-1\) or \(n-1\) for consistency with Python indexing. Note that the result on the right-hand side is an \(m \times 1\) vector. Its \(i\)th element has the form \(\sum_{j=0}^{n-1} A_{i, j} v_j\), which is just the dot product of the vector \(\mathbf{v}\) and the vector formed by \(i\)th row of \(\mathbf{A}\). In other words, matrix-vector multiplication is just a sequence of dot products.
Let’s try an example. First, let us construct a two-dimensional array to represent the following matrix:
In NumPy, one pretty nifty way to do that is
In [30]:
A = 2*np.diag(np.ones(5)) - np.diag(np.ones(4), -1) - np.diag(np.ones(4), 1)
A
Out[30]:
array([[ 2., -1., 0., 0., 0.],
[-1., 2., -1., 0., 0.],
[ 0., -1., 2., -1., 0.],
[ 0., 0., -1., 2., -1.],
[ 0., 0., 0., -1., 2.]])
although there are other ways that would work, too. We’ll multiply \(\mathbf{A}\) by \(\mathbf{v}\), a \(5 \times 1\) vector of ones, which we can represent as the NumPy array
In [31]:
v = np.ones(5)
The matrix-vector product can be computed via
In [32]:
A.dot(v)
Out[32]:
array([1., 0., 0., 0., 1.])
Admittedly, use of dot
for matrix-vector multiplication is somewhat
unintuitive. For many years, that’s all NumPy offered. In recent
versions, the matmul
function was added, which lets us do the same
thing:
In [33]:
np.matmul(A, v)
Out[33]:
array([1., 0., 0., 0., 1.])
Matrix-Matrix Multiplication¶
Both approached work just fine, and both can also be used to compute a matrix-matrix product (which is defined in much the same way as the matrix-vector product). For example, \(A \times A\) can be computed using
In [34]:
A.dot(A)
Out[34]:
array([[ 5., -4., 1., 0., 0.],
[-4., 6., -4., 1., 0.],
[ 1., -4., 6., -4., 1.],
[ 0., 1., -4., 6., -4.],
[ 0., 0., 1., -4., 5.]])
or
In [35]:
np.matmul(A, A)
Out[35]:
array([[ 5., -4., 1., 0., 0.],
[-4., 6., -4., 1., 0.],
[ 1., -4., 6., -4., 1.],
[ 0., 1., -4., 6., -4.],
[ 0., 0., 1., -4., 5.]])
Linear Systems¶
Beyond dot products and matrix-vector products, a key task involving matrices and vectors is the solution of linear systems. Such systems are essential to much engineering analysis, and often, they are best represented using matrices and vectors. Consider the particular set of equations
In matrix-vector form, these equations can be written
Using NumPy, we can solve this system via just a few lines of code:
In [36]:
# define the system matrix
A = np.array([[2, 3], [3, 1]])
# define the right-hand side
b = np.array([1, 2])
# solve the system
sol = np.linalg.solve(A, b)
# extract the elements that represent x and y and display
x, y = sol
x
y
Out[36]:
0.7142857142857143
Out[36]:
-0.14285714285714282
As a final comment, one will readily find that NumPy does include a
matrix
type. Although matrix
does offer some nice features
(e.g., *
can be used in place of dot
or matmul
), one loses
some of the expected element-wise and other behavior of ndarray
functions.
Note: Be aware thatnp.matrix
exists, but feel encouraged to usenp.ndarray
for both one- and two-dimensional array, vector, and/or matrix operations.
Further Reading¶
More details on array indexing can be found in the NumPy documentation.