Linear Systems

Overview, Objectives, and Key Terms

In this lecture, we consider the linear system \(\mathbf{Ax}=\mathbf{b}\), where \(\mathbf{A}\) is a square \(n\times n\) matrix, while both \(\mathbf{x}\) and \(\mathbf{b}\) are vectors of \(n\) elements. We’ll examine the basic elimination algorithm and review how to set up and solve such systems with NumPy.

Objectives

By the end of this lesson, you should be able to

  • Explain what it means for \(\mathbf{Ax}=\mathbf{b}\) to have a solution.
  • Define \(\mathbf{Ax}=\mathbf{b}\) from engineering or other inputs.
  • Solve linear systems using basic Gaussian elimination.
  • Solve linear systems using NumPy.

Key Terms

  • matrix
  • vector
  • matrix-vector product
  • column space
  • Gaussian elimination
  • np.linalg.solve

What Is A Linear System, and When Can It Be Solved?

From an earlier reading, the basic mechanics of vectors, matrices, and their products was reviewed using NumPy arrays. In addition, a very quick glance at solving small system was given. Here, it helps to review some fundamentals in a bit more depth before heading on to solving systems by elimination.

Consider the matrix-vector product \(w=\mathbf{Av}\). To illustrate, let element \((i, j)\) of \(\mathbf{A}\) be \(a_{ij}\) and element \(i\) of the vector \(\mathbf{v}\) be \(v_i\). Then, for the \(3\times 3\) case, we have

\[\begin{split}\mathbf{Av} = \begin{bmatrix} a_{00} & a_{01} & a_{02} \\ a_{10} & a_{11} & a_{12} \\ a_{20} & a_{21} & a_{22} \\ \end{bmatrix} \times \left[ \begin{array}{c} v_0 \\ v_1 \\ v_2 \end{array} \right] = \left[ \begin{array}{c} a_{00} v_0 + a_{01} v_1 + a_{02} v_2 \\ a_{10} v_0 + a_{11} v_1 + a_{12} v_2 \\ a_{20} v_0 + a_{21} v_1 + a_{22} v_2 \\ \end{array} \right] = \mathbf{w} \, .\end{split}\]

There are two ways to interpret the matrix-vector product \(\mathbf{w}\). The common way is to note that

\[w_i = a_{i0} v_0 + a_{i1} v_1 + a_{i2} v_2 = \mathbf{a}_{i:}^T \mathbf{v} = \mathbf{a}_{i:}\cdot \mathbf{v} \, .\]

Here, I’m leveraging the idea of slicing, so that \(\mathbf{a}_{i:}\) represents the \(i\)th row of the original matrix \(\mathbf{A}\). Viewed this way, the matrix-vector product is actually a series of dot products. Specifically, each element of \(\mathbf{w}\) is the dot product of one row of \(\mathbf{A}\) with the vector \(\mathbf{v}\), just like in the previous reading.

However, note the following way to represent \(\mathbf{w}\):

\[\begin{split} \mathbf{w} = v_0 \left[ \begin{array}{c} a_{00} \\ a_{10} \\ a_{20} \end{array} \right] + v_1 \left[ \begin{array}{c} a_{01} \\ a_{11} \\ a_{21} \end{array} \right] + v_2 \left[ \begin{array}{c} a_{02} \\ a_{12} \\ a_{22} \end{array} \right] = v_0 \mathbf{a}_{:0} + v_1\mathbf{a}_{:1} + v_2\mathbf{a}_{:2} \, .\end{split}\]

Again, a slicing-like notation is used with which \(\mathbf{a}_{:j}\) represents the \(j\)th column of the original matrix \(\mathbf{A}\). With this view of \(\mathbf{w}\), I hope you see that \(\mathbf{w}\) is a linear combination of the columns of \(\mathbf{A}\). All possible linear combinations of \(\mathbf{A}\)‘s columns has a special name: the column space of \(\mathbf{A}\).

The column space of a matrix \(\mathbf{A}\) represents all possible linear combinations of its columns.

Now, what has the column space to do with linear systems? A lot, it turns out. Consider the linear system \(\mathbf{Ax} = \mathbf{b}\). We’ll take \(\mathbf{A}\) to be a square, \(m\times m\) matrix, with \(\mathbf{x}\) and \(\mathbf{b}\) each having \(m\) elements. That’s \(m\) equations for \(m\) unknowns, and in theory, it can be solved. However, let’s view the system explicitly:

\[\begin{split}\begin{split} \mathbf{Ax} & = \begin{bmatrix} a_{00} & a_{01} & \cdots & a_{0,n-1} \\ a_{10} & a_{11} & \cdots & a_{1,n-1} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m-1,0} & a_{m-1,2} & \cdots & a_{m-1,n-1} \end{bmatrix} \times \left[ \begin{array}{c} x_0 \\ x_1 \\ \vdots \\ x_{m-1} \end{array} \right] \\ & = x_0 \mathbf{a}_{:, 0} + x_1 \mathbf{a}_{:, 1} + \ldots + x_{m-1} \mathbf{a}_{:,m-1} \\ & = \mathbf{b} \end{split}\end{split}\]

In other words, the left and right sides can only be equal if \(\mathbf{b}\) is a linear combination of the columns of \(\mathbf{A}\). In other words, \(\mathbf{b}\) must be in the columns space of \(\mathbf{A}\).

If it turns out that \(\mathbf{b}\) is not in the column space of \(\mathbf{A}\), then that column space is not big enough and \(\mathbf{A}\) has at least one column that is a linear combination of its other columns. That is to say, the columns of such an \(\mathbf{A}\) are dependent. Moreover, \(\mathbf{A}\) is not invertible, and \(\mathbf{Ax}=\mathbf{b}\) cannot be solved. Let’s hope for good \(\mathbf{A}\)‘s, and we’ll handle cases when that’s not true as needed.


Exercise: Why does

\[\begin{split}\begin{bmatrix} 1 & 2 \\ 1 & 2 \\ \end{bmatrix} \left[ \begin{array}{c} x_0 \\ x_1 \end{array} \right] = \left[ \begin{array}{c} 1 \\ 1 \end{array} \right]\end{split}\]

have a solution but

\[\begin{split}\begin{bmatrix} 1 & 2 \\ 1 & 2 \\ \end{bmatrix} \left[ \begin{array}{c} x_0 \\ x_1 \end{array} \right] = \left[ \begin{array}{c} 1 \\ 0 \end{array} \right]\end{split}\]

does not?

Solution: In the first, the vector \(\mathbf{b}=[1, 1]^T\) (where \(T\) means transpose) is exactly equal to the first column of the matrix, so \(x_0\) could be 1 and \(x_1\) could be zero. If you look closely, \(x_0 = 0\) and \(x_1 = 1/2\) is also a solution, and there are infinitely more. The important thing to note is that the second column of \(\mathbf{A}\) is equal to two times its first column—they are dependent, and in such cases, there are either an infinite number of solutions (as for \(\mathbf{b}=[1, 1]^T\)) or no solution (as for \(\mathbf{b}=[1, 0]^T\)). For the latter case, no combination of \([1, 1]^T\) and \([2, 2]^T\) can yield \([1, 0]^T\)!

Solving Linear Systems by Elimination

Would you agree that solving the following system is straightforward to do?

\[\begin{split}\mathbf{Ux} = \begin{bmatrix} u_{00} & u_{01} & u_{02} \\ 0 & u_{11} & u_{12} \\ 0 & 0 & u_{22} \\ \end{bmatrix} \times \left[ \begin{array}{c} x_0 \\ x_1 \\ x_2 \end{array} \right] = \left[ \begin{array}{c} c_0 \\ c_1 \\ c_2 \end{array} \right]\end{split}\]

If not, note that we can immediately find \(x_2 = c_2 / u_{22}\), from which

\[x_1 = (c_1 - u_{12} x_2)/u_{11} \, ,\]

and

\[x_0 = (c_0 - u_{01} x_2 - u_{02} x_2)/u_{00} \, .\]

This was all made easy because \(\mathbf{U}\) is an upper triangular matrix, i.e., it has zeros below the main diagonal for which \(i = j\). The goal of elimination is to turn \(\mathbf{A}\) into \(\mathbf{U}\) by eliminating entries below the main diagonal using linear operations.

Remember, one can modify linear systems by adding multiples of one row to another without changing the solution. If that bell is too distant, consider the following system,

\[\begin{split}\begin{bmatrix} 2 & 4 \\ 3 & 5 \\ \end{bmatrix} \left[ \begin{array}{c} x_0 \\ x_1 \end{array} \right] = \left[ \begin{array}{c} 6 \\ 8 \end{array} \right]\end{split}\]

for which the solution is \(\mathbf{x}=[1, 1]^T\). This is easily confirmed by noting \(2\cdot 1 + 4\cdot 1=6\) and \(3\cdot 1+5 \cdot 1 = 8\).

Now, if we add the first row of the system to the second row (that must include the right-hand side of the equation!), we find

\[\begin{split}\begin{bmatrix} 2 & 4 \\ 3+2 & 5+4 \\ \end{bmatrix} \left[ \begin{array}{c} x_0 \\ x_1 \end{array} \right] = \left[ \begin{array}{c} 6 \\ 8+6 \end{array} \right]\end{split}\]

which has the same solution: \(2\cdot 1 + 4\cdot 1\) is still \(6\), and now \((3+2)\cdot 1 + (5+4)\cdot 1 = 5+9 = 14\). The same \(\mathbf{x}\) works. This is not a proof by any means, but a good sanity check. Of course, the right modification would be to subtract \(3/2\) times that first row from the second, which leads to

\[\begin{split}\begin{bmatrix} 2 & 4 \\ 3+(-3/2)2 & 5+(-3/2)4 \\ \end{bmatrix} \left[ \begin{array}{c} x_0 \\ x_1 \end{array} \right] = \left[ \begin{array}{c} 6 \\ 8+(-3/2)6 \end{array} \right]\end{split}\]

or

\[\begin{split}\begin{bmatrix} 2 & 4 \\ 0 & -1 \\ \end{bmatrix} \left[ \begin{array}{c} x_0 \\ x_1 \end{array} \right] = \left[ \begin{array}{c} 6 \\ -1 \end{array} \right]\end{split}\]

Note that \(\mathbf{A}\) turned into \(\mathbf{U}\), which is called the row echelon form in fancy linear algebra circles. From this form, we find immediately that \(x_1 = (-1)/(-1) = 1\), and that \(x_0 = (6 - 4x_1)/2 = 1\). The key was the multiple of the first row we selected, i.e., -3/2. Why -3/2? Here, 3 is the value we want to knock out. Moreover, 2 is the value in the same column as the element to be eliminated that lives in the row we are adding to others. Often, 2 is called the pivot. More generally, the pivot is on the main diagonal, and everything below it in the same column is to be eliminated. Things go bad when the pivot is zero because anything divided by zero is undefined.

Let’s formalize this elimination process using (some rather) loose pseudocode for a square system of any size. Here, the output will be \(\mathbf{U}\) (from \(\mathbf{A})\) and the adjust right-hand side vector \(\mathbf{c}\) (from \(\mathbf{b}\)):

"""Algorithm to turn A into U and b into c by elimination"""
 1. Input: A, b # matrix A and right-hand side b
 2. Set U to A
 3. Set c to b
 4. For all columns j in U
 5.     # Set the pivot, below which we want to eliminate all values.
 6.     Set pivot to U[j, j]
 7.     If pivot is zero then
 8.         Scream for help!
 9.     # Eliminate/adjust values below and to the right of the pivot.
10. For all rows i > j in U
11.     Set multiple to U[i, j]/pivot
12.     # Knock out the element to be eliminated
13.     U[i, j] = U[i, j] - U[j, j]*multiple
14.     # Adjust all the other elements to the right
15.     Set c[i] = c[i] - c[j]*multiple
16.     For all columns k > j in U
17.         U[i, k] = U[i, k] - U[j, k]*multiple
18. Output: U and c

Exercise: Trace the elimination algorithm by hand for the following linear systems:

\[\begin{split}\begin{bmatrix} 2 & -1 & 0 \\ -1 & 2 & -1 \\ 0 & -1 & 2 \\ \end{bmatrix} \times \left[ \begin{array}{c} x_0 \\ x_1 \\ x_2 \end{array} \right] = \left[ \begin{array}{c} 1 \\ 1 \\ 1 \\ \end{array} \right]\end{split}\]

Exercise: Trace the elimination algorithm by hand for the following linear system:

\[\begin{split}\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \\ \end{bmatrix} \times \left[ \begin{array}{c} x_0 \\ x_1 \\ x_2 \end{array} \right] = \left[ \begin{array}{c} 1 \\ 1 \\ 1 \\ \end{array} \right]\end{split}\]

In Python, the following function implements the same elimination algorithm:

In [1]:
import numpy as np
def eliminate(A, b):
    U, c = np.copy(A), np.copy(b)
    for j in range(len(U)):
        pivot = U[j, j]
        assert pivot != 0.0, "arggggh!"
        for i in range(j+1, len(A)):
            multiple = U[i, j]/pivot
            c[i] = c[i] - c[j]*multiple
            for k in range(j, len(A)):
                U[i, k] = U[i, k] - U[j, k]*multiple
    return U, c

Exercise: Use the function eliminate to reduce the linear system

\[\begin{split}\begin{bmatrix} 2 & -1 & 0 \\ -1 & 2 & -1 \\ 0 & -1 & 2 \\ \end{bmatrix} \times \left[ \begin{array}{c} x_0 \\ x_1 \\ x_2 \end{array} \right] = \left[ \begin{array}{c} 1 \\ 1 \\ 1 \\ \end{array} \right]\end{split}\]

Solution:

In [2]:
A = np.array([[ 2.0, -1.0,  0.0],
              [-1.0,  2.0, -1.0],
              [ 0.0, -1.0,  2.0]])
b = np.array([1.0, 1.0, 1.0])
U, c = eliminate(A, b)
U
Out[2]:
array([[ 2.        , -1.        ,  0.        ],
       [ 0.        ,  1.5       , -1.        ],
       [ 0.        ,  0.        ,  1.33333333]])

Is this value for \(\mathbf{U}\) what you found by hand for the first exercise above? If not, go back and try it (again).


Exercise: Perform a numerical experiment to determine the order of the elimination algorithm.

Solution (Partial): Remember, the elapsed time of a function call can be computed using, e.g.,

from time import time
t0 = time()      # initial time
some_function_call()
te = time() - t0 # elapsed time (in seconds)

Moreover, the algorithm does not care about the values of \(\mathbf{A}\) (assuming no zero pivots), just how many there are. Hence, a good way to make a matrix of size \(n\) is to use np.random.rand(n, n). That should get you moving in the right direction.


Elimination is the hard part, but we do not yet have the solution \(\mathbf{x}\). However, we know from \(\mathbf{U}\) what the last element of \(\mathbf{x}\), and then we work backward by substituting each known value into the equation for the next unknown value. Here is that approach in pseudcode, where \(\mathbf{U}\) and \(\mathbf{c}\) are, e.g., outputs from the elimination process:

"""Algorithm for back substitution of an eliminated system"""
Input: U, c
Set m to the size of c
Set x to an array with m zeros
Set i = n - 1
While i >= 0
    # Add up all the pieces that go on the right hand side
    Set rhs = c[i]
    Set j = i + 1
    While j < n
        Set rhs = rhs - U[i, j]*x[j] # Recall that x[j] is known!
        Set j = j + 1
    Set x[i] = rhs / U[i, i] # Solve for the next unknown
    Set i = i - 1
Output: x

Exercise: Trace the back-substitution algorithm by hand for the following system:

\[\begin{split}\begin{bmatrix} 2 & -1 & 0 \\ 0 & 3/2 & -1 \\ 0 & 0 & 4/3 \\ \end{bmatrix} \times \left[ \begin{array}{c} x_0 \\ x_1 \\ x_2 \end{array} \right] = \left[ \begin{array}{c} 1 \\ 3/2 \\ 2 \\ \end{array} \right]\end{split}\]

For your trace, record the values of i, j, rhs, x[0], x[1], and x[2].


Exercise: Implement the back-substitution algorithm as the Python function back_substition(U, c).


Exercise: Perform a numerical experiment to determine the order of the back-substitution algorithm.


Together, the elimination and back-subsitution algorithms form what is traditionally called Gaussian elimination for the solution of \(\mathbf{Ax}=\mathbf{b}\).

Elimination via NumPy and SciPy

NumPy has a built in capability to do elimination directly for solving \(\mathbf{Ax}=\mathbf{b}\):

In [3]:
A = np.array([[ 2.0, -1.0,  0.0],
              [-1.0,  2.0, -1.0],
              [ 0.0, -1.0,  2.0]])
b = np.array([1.0, 1.0, 1.0])
np.linalg.solve(A, b)
Out[3]:
array([ 1.5,  2. ,  1.5])

Alternatively, we can get the LU factorization of \(\mathbf{A}\) by using SciPy. Recall that the LU factorization of \(\mathbf{A}\) is

\[\mathbf{A} = \mathbf{LU} \, ,\]

where \(\mathbf{U}\) is an upper triangle matrix and \(\mathbf{L}\) is a lower triangular matrix with ones on the main diagonal. It turns out, \(\mathbf{U}\) is what we’ve computed all along during elimination, while the multiple defined in elimination used to adjust element \(u_{ij}\) is actually \(1/l_{ij}\). Hence, we could define \(\mathbf{L}\) as part of the elimination algorithm if desired.

Using SciPy, we find

In [4]:
import scipy as sp
import scipy.linalg
P, L, U = sp.linalg.lu(A)
print(P)
print(L)
print(U)
[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]
[[ 1.          0.          0.        ]
 [-0.5         1.          0.        ]
 [ 0.         -0.66666667  1.        ]]
[[ 2.         -1.          0.        ]
 [ 0.          1.5        -1.        ]
 [ 0.          0.          1.33333333]]

Note, that \(\mathbf{U}\) is identical to what was computed above using our own eliminate function. Moreover, the product \(\mathbf{LU}\) gives us back the matrix \(\mathbf{A}\):

In [5]:
L.dot(U)
Out[5]:
array([[ 2., -1.,  0.],
       [-1.,  2., -1.],
       [ 0., -1.,  2.]])

What’s that matrix \(\mathbf{P}\)? It’s a permutation matrix. Here, it’s just the identity, but watch what happens when we compute the LU factorization of this matrix:

In [6]:
B = np.array([[0, -1, 2],
              [-1, 2 , -1],
              [2, -1, 0]])
In [7]:
P, L, U = sp.linalg.lu(B)
print(P)
print(L)
print(U)
[[ 0.  0.  1.]
 [ 0.  1.  0.]
 [ 1.  0.  0.]]
[[ 1.          0.          0.        ]
 [-0.5         1.          0.        ]
 [ 0.         -0.66666667  1.        ]]
[[ 2.         -1.          0.        ]
 [ 0.          1.5        -1.        ]
 [ 0.          0.          1.33333333]]

Now, \(\mathbf{P}\) is not the identify, but \(\mathbf{L}\) and \(\mathbf{U}\) are the same as they were for \(\mathbf{A}\). Hence, we still have \(\mathbf{A}=\mathbf{LU}\), and then \(\mathbf{P}\) reorders the rows of \(\mathbf{A}\) to get \(\mathbf{B}\):

In [8]:
P.dot(L.dot(U))
Out[8]:
array([[ 0., -1.,  2.],
       [-1.,  2., -1.],
       [ 2., -1.,  0.]])

You’ll see why we needed permutation of \(\mathbf{B}\). Look at that very first element: a zero pivot. (Try it yourself by modifying the code from above!). The trick is to find a better row. If there is a nonzero element below the zero pivot, then that row can be swapped with the row in question. If no such rows can be found, the game is over! No more pivots means not invertible.

Even when a pivot exists, it’s not always the best one possible. Imagine a pivot equal to \(10^{-12}\). That means division by a small number, which leads to a big number. Addition of a big number to smaller numbers leads to something call round-off error, due to which precision of a result is lost. Hence, a different pivot might be better, and choosing the one of largest magnitude is the process called partial pivoting, which is pretty common in production-level solvers (like SciPy’s LU implementation).

Further Reading

None for now.