Searching

Overview, Objectives, and Key Terms

In this lecture and Lecture 15, we tackle two of the most important practical problems in computing: searching and sorting. We’ll start with searching in this lecture because it is the simpler problem, but efficient searching depends on sorted values. Along the way, algorithms will be classified by their order, a way to describe how good (or bad) an algorithm is for problems of different sizes.

Objectives

By the end of this lesson, you should be able to

  • Search an array of numbers using a linear search.
  • Search an array of numbers using a binary search.
  • Describe what is meant by order and use it to compare algorithms
  • Perform simple, numerical experiments to confirm the order of an algorithm

Key Terms

  • linear search
  • binary search
  • time.time()
  • numerical experiment
  • order
  • \(\mathcal{O}\) (big-O) notation

A Bit About Order

Why call linear search linear? One reason might be that the process is pretty linear in that each element is checked one after another (i.e., no skipping around the elements to find the proverbial needle). Another reason is that the number of times elements have to be compared (i.e., the number of times a[i] > v is evaluated) is proportional (and, in the worst, case equal) to the number of elements. Any time we have a proportional relationship, that relationship is linear. The familiar \(y = ax + b\) is linear, because \(y\) varies linearly with \(x\). For an array of \(n\) elements, the number of comparisons is linear with \(n\). The exact number of comparisons depends on the value for which one is searching and the sequence of values.

Generically, this linear relationship is given the fancy name order :math:`n`, written compactly in the big O notation \(\mathcal{O}(n)\). Quite frequently, the computational cost of an algorithm (i.e., how long it takes to run) is directly proportional to its order. Hence, the time it takes for linear search to find its match should grow, roughly linearly, with the number of elements being searched.

This fact may be intuitive, but it’s always nice to see things demonstrated. Here’s what we’ll do. We’ll write a little loop that goes from \(n = 10\) to \(n = 10^6\). At each size of \(n\), we’ll draw a random number from 1 to \(10^6\) and search for it in the np.arange(1, n+1), a sorted sequence for sure. For each step, we’ll time things. All together, these steps form the basis for a numerical experiment. Here it all is:

In [7]:
"""Program to show time as a function of size for linear search."""
from time import time
import numpy as np
import matplotlib.pyplot as plt

# Set the seed to keep the output fixed
np.random.seed(12345)

# Set the n values and initialize a list for times
n_values = np.logspace(1, 6, 10)
t_values = []

for n in n_values:
    n = int(n)
    # Generate the array 1, 2, ..., n
    a = np.arange(1, n+1)
    # Get a random int from [1, n]
    v = np.random.randint(1,n+1)
    # Start the timer
    t_start = time()
    # Do the search
    location = linear_search(a, v)
    # Determine the elapsed time (in seconds) and save it
    t_elapsed = time() - t_start
    t_values.append(t_elapsed)

# Plot the times vs values on a log log plot.
# Also, an expected "trend" line is included.
plt.figure(1)
plt.loglog(n_values, t_values, 'k-o', label='times')
plt.loglog(n_values, 1e-7*n_values, 'r--', label='$\mathcal{O}(n)$')
plt.xlabel('n')
plt.ylabel('time (s)')
plt.legend()
plt.show()
../_images/lectures_Searching_13_0.png

Based on the figure, the times are not perfectly linear with \(n\), but then again, the experiment is based on searching for a random value (which could be at the very beginning or the very end of the sorted array). A more complete approach would be to do the search at each :math:`n` several times and average the resulting elapsed times.

Exercise: Modify the program above so that five different values of v are selected for each value of n. Then, average the times and plot the results.

Such numerical experiments based on randome numbers are great ways to explore the behavior of algorithms for a variety of inputs.

A Comment on Built-In Functions

Given a = [1, 3, 7, 9, 11], Python already has a way to find the location of v = 9:

In [9]:
a = [1, 3, 7, 9, 11]
a.index(9)
Out[9]:
3
In [10]:
help(a.index)
Help on built-in function index:

index(...) method of builtins.list instance
    L.index(value, [start, [stop]]) -> integer -- return first index of value.
    Raises ValueError if the value is not present.

In [11]:
a.index(2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-8e455ff4af22> in <module>()
----> 1 a.index(2)

ValueError: 2 is not in list

The index function does not require that the elements of a be sorted:

In [12]:
[3, 11, 7, 9, 1].index(9)
Out[12]:
3

Based on the documentation, it’s not clear what sort of search is powering the index function. You could dig in to the underlying code, or you could try a numerical experiment!

Exercise: Adapt the numerical experiment above to determine whether the index function is linear or binary. Does its behavior change if a is unsorted?

Further Reading

None at this time.