{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 15 - Sorting\n", "\n", "## Overview, Objectives, and Key Terms\n", " \n", "In this lecture and [Lecture 14](Searching.ipynb), we tackle two of the most important practical problems in computing: *searching* and *sorting*. In this lecture, we turn to *sorting*, the more challenging problem and one that was a focus of much research in the early years of computing for making searching easier. We'll flex our knowledge of functions to help write clean and clear sorting programs.\n", "\n", "### Objectives\n", "\n", "By the end of this lesson, you should be able to\n", "\n", "- Sort an array of numbers using brute-force, $\\mathcal{O}(n^2)$ schemes like *selection sort*.\n", "- Sort an array of numbers using divide-and-conquer, $\\mathcal{O}(n\\log n)$ schemes like *merge sort*.\n", "- Apply built-in Python functions to sort sequences.\n", "\n", "### Key Terms\n", "\n", "- selection sort\n", "- stable sorting algorithm (*hint*: see the exercises)\n", "- merge sort\n", "- merge \n", "- divide and conquer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sorting is Hard Work\n", "\n", "Sorting a sequence of randomly arranged items is no easy task, often resulting in a lot of repetive motions and paper cuts (if the items are, e.g., student exams). However, computers are great for such repetition, and they don't seem to suffer those paper cuts. The question is: *how to sort*? Sorting, it turns out, presents a rich opportunity for integrating much of the material covered so far.\n", "\n", "The simplest sorting algorithm is based on brute force and leverages the solution to a problem encountered as early as [Lecture 5](Algorithms_Flowcharts_and_Pseudocode.ipynb): finding the minimum value of a sequence of numbers. The basic idea is simple. Given a sequence of $n$ numbers to sort in increasing order, we \n", "\n", "1. Find the smallest value (just as we have previously done).\n", "2. Swap it with the first element in the sequence.\n", "3. Repeat steps (1) and (2) for the last $n-1$ items in the sequence.\n", "\n", "This algorithm is known as [selection sort](https://en.wikipedia.org/wiki/Selection_sort). In more detailed pseudocode, **selection sort** can be written as" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "'''The selection sort algorithm to sort an array'''\n", "Input: a, n # sequence of numbers and its length\n", "# Loop through each element a[i] \n", "Set i = 0\n", "While i < n do\n", " # Find the location k of the smallest element after a[i]\n", " Set j = i + 1\n", " Set k = i\n", " While j < n do\n", " If a[j] < a[k] then\n", " Set k = j\n", " Set j = j + 1\n", " # Switch the values of a[i] and a[k], putting them in order\n", " Swap a[i] and a[k]\n", " i = i + 1\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take some time and digest this algorithm. Then try the following exercises:\n", "\n", "***\n", "\n", "**Exercise**: Apply this algorithm to the sequence [5, 3, 7]. For each value of `i` and `j`, write out, using pen and paper, the values of `i`, `j`, `k`, `a[i]`, `a[j]`, `a[k]`, `a[j] < a[k]`, and `a` *before* the conditional statement.\n", "\n", "*Solution*: \n", "\n", "```\n", "i j k a[i] a[j] a[k] a[j] < a[k] a\n", "-------------------------------------------------------------\n", "0 1 0 5 3 5 True [5, 3, 7]\n", "0 2 1 5 7 3 False [5, 3, 7] \n", "1 2 1 5 7 5 False [3, 5, 7]\n", "2 3 2 7 N/A 7 N/A [3, 5, 7] \n", "```\n", "\n", "***\n", "\n", "**Exercise**: Repeat the above exercise for the sequence [4, 1, 8, 2, 3].\n", "\n", "***\n", "\n", "**Exercise**: What happens to `a[i]` if `a[j]` is never less than `a[k]`? \n", "\n", "***\n", "\n", "**Exercise**: Produce a flowchart for this algorithm.\n", "\n", "***\n", "\n", "**Exercise**: Apply this algorithm to the sequence [3, $1_0$, 4, $1_1$, 2]. Here, the subscript is used to count the number of times a 1 appears in the sequence. Does the algorithm produce [$1_0$, $1_1$, 2, 3, 4] or [$1_1$, $1_0$, 2, 3, 4]? A sorting algorithm is **stable** if it preserves the original order of equal values (i.e., $1_0$ and then $1_1$).\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's put selection sort into practice using Python. Because sorting is something to be done for any array, it makes sense to implement the algorithm as a function:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "def selection_sort(a):\n", " \"\"\"Applies the selection sort algorithm to sort the sequence a.\"\"\"\n", " i = 0\n", " while i < len(a):\n", " j = i + 1\n", " k = i\n", " while j < len(a):\n", " if a[j] < a[k]:\n", " k = j\n", " j += 1\n", " a[i], a[k] = a[k], a[i]\n", " i += 1\n", " return a" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[3, 5, 7]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "selection_sort([5, 3, 7])" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3, 4, 5]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "selection_sort([5, 4, 3, 2, 1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download this notebook or copy the function above into your own Python file. Then tackle the exercises that follow.\n", "\n", "***\n", "\n", "**Exercise**: Define `x = [5, 3, 7]` and execute `y = selection_sort(x)`. Print both `x` and `y`. Why are they the same? In other words, why is `x` also sorted?\n", "\n", "*Solution*: Remember, `y = x` does not produce a copy of `x`. Rather, `x` and `y` are now just two names for the same data. Similarly, when `x` is passed to `selection_sort`, it is given the new name `a` within the function, and any changes to `a` lead to changes in `x`. Because `a` is returned and assigned to `y`, `y` and `x` are two names for the same (now sorted) `list`.\n", "\n", "***\n", "\n", "**Exercise**: Define `x = (5, 3, 7)` and execute `y = selection_sort(x)`. Why does it fail?\n", "\n", "***\n", "\n", "**Exercise**: Modify `selection_sort` so that it sorts and returns a *copy* of `a`, making it suitable for sorting `tuple` variables.\n", "\n", "***\n", "\n", "**Exercise**: Modify `selection_sort` so that it optionally sorts a sequence in decreasing order. The function `def` statement should look like `def selection_sort(a, increasing=True):`.\n", "\n", "*Solution*: A one-line change is to switch `a[j] < a[k]` with `a[j] < a[k] if increasing else a[k] < a[j]`. Otherwise, an `if/elif` structure would work just as well. Revisit [Lecture 6](Conditional_Statements_and_the_Structure_of_Python_Code) for more tertiary `if` statements of the form `a if b else c`. Verify this works!\n", "\n", "***\n", "\n", "**Exercise**: Modify `selection_sort` so that it uses `for` loops instead of `while` loops. Check your result against the version provided above.\n", "\n", "***\n", "\n", "**Exercise**: Could `while i < len(a)` be changed to `while i < len(a) - 1`? Why or why not?\n", "\n", "*Solution*. Yes! Doing so would skip the loop when `a` has a single element, for which sorting is not required anyway.\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quantifying that Hard Work\n", "\n", "So what's the *order* of this algorithm? In other words, given a sequence of $n$ elements, how many comparisons must be made? \n", "\n", "Let's break it down a bit. The algorithm is comprised of two loops. The first one must go through all the elements of `a`. For each value of `i`, the second loop makes `n - i` comparisons. In other words, we have something like \n", "```\n", "i # comparisons\n", "-------------------\n", "0 n\n", "1 n - 1\n", "2 n - 2\n", "... ...\n", "n - 3 3\n", "n - 2 2\n", "n - 1 1\n", "```\n", "\n", "That total looks an awful lot like $1 + 2 + \\ldots + (n - 1) + n$, and that sum is $n(n+1)/2 = n^2/2 + n/2$. Because $n \\ll n^2$ for large $n$, the algorithm is $\\mathcal{O}(n^2)$. That's expensive, and for large sequences, we could wait a long, long time for the sort to finish. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's verify that selection sort is $\\mathcal{O}(n^2)$ by performing a little experiment. As done in [Lecture 14](Searching.ipynb) to assess the linear search, we can generate random sequences of different sizes using NumPy. Then, we can sort them with selection sort and time how long it takes to do so for each size." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEOCAYAAACetPCkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzt3Xt8zvX/x/HH28xmCDl+o236OsQsYg7RgZJTJEI0h+JrP6dQyeFL540ckvIlbTJfWUlR6ICEDg4lh5xPX2xGIYU0p9n798eFZja22bXPdu15v92uG9fn+lyf67nx2Wvvz/vwMdZaRERE0iuf0wFERCR3UeEQEZEMUeEQEZEMUeEQEZEMUeEQEZEMUeEQEZEMUeEQEZEMUeEQEZEMUeEQEZEMUeEQEZEMye90AHcoWbKkDQwMdDqGR/rrr78oVKiQ0zFEJAPSe96uW7fuN2ttqevt55GFIzAwkJ9++snpGB5pxYoVNGrUyOkYIpIB6T1vjTGx6TmeLlWJiEiGqHCIiEiGqHCIiEiGeGQfR2rOnz9PfHw8Z86ccTpKtvP19aV8+fJ4e3s7HUVEPECOLxzGmNuAEUBRa237zB4nPj6eIkWKEBgYiDEm6wLmcNZajh07Rnx8PBUqVHA6joi4QUxMDCNGjCAuLg5/f38iIiIIDQ112+e59VKVMWa6MeaIMWZLiu3NjTE7jTF7jDHDrnUMa+1ea23PG81y5swZSpQokaeKBoAxhhIlSuTJlpZIXhATE0NYWBixsbFYa4mNjSUsLIyYmBi3faa7+zhmAM2TbzDGeAGTgRZANaCzMaaaMSbYGPNZikfprAyT14rGJXn16xbxRDExMQQGBpIvXz4CAwMZOHAgCQkJV+yTkJDAiBEj3JbBrYXDWvst8HuKzXWBPRdbEueA2UAba+1ma22rFI8j7syXnY4fP86UKVMAOHToEO3bZ/qqm4jkUam1Lo4dO5bqvnFxcW7L4UQfRzngQLLn8UC9tHY2xpQAIoA7jTHDrbWj09gvDAgDKFOmDCtWrLji9aJFi/Lnn3+mO+ScOXN4+eWXiY+Pp3z58rz44ot07Ngx3e9PKT4+nv/85z907dqVIkWKEB0dnaE8N+rMmTNXfU8y49SpU1lyHBG5tqVLlzJt2jSOHDlC6dKl+de//kVkZORVrYu0lC5d+vK5muXnrbXWrQ8gENiS7HkHYFqy512BSVn5mbVr17Ypbdu27aptaZk1a5b18/OzwOWHn5+fnTVrVrqPkdJjjz1mfX19bY0aNWz79u1tUFCQtdba6Oho26ZNG9uqVSsbGBhoJ02aZF9//XVbs2ZNW69ePXvs2DFrrbV79uyxzZo1s7Vq1bJ333233b59u7XW2jlz5tigoCB7xx132HvuuSfNz8/I138ty5cvz5LjiEjaUvsZlD9//iueX+uR8udVes9b4Cebjp+xTrQ44oFbkz0vDxzKzgCDBg1i48aNab6+Zs0azp49e8W2hIQEevbsSVRUVKrvqVmzJhMnTkzzmK+99hpbtmxh48aN7N+/n1atWl1+bcuWLWzYsIEzZ85QsWJFxowZw4YNG3j66aeZOXMmgwYNIiwsjKlTp1KpUiV++OEH+vbty7Jly3jllVdYvHgx5cqV4/jx4xn8TohITjRixIirWhaJiYkYYy79wn2FEiVKUNHXly0HD1IyIMDto6qcKBxrgUrGmArAQaAT8HhWHNgY0xpoXbFixRs6Tsqicb3tN6px48YUKVKEIkWKULRoUVq3bg1AcHAwmzZt4tSpU6xatYoOHTpclaVhw4Y88cQTdOzYkXbt2rkln4hkr7T6J6y1+Pn5XVFUChUsyJetWlFn3jwYMgTGjHF7PrcWDmPMB0AjoKQxJh540Vr7rjGmP7AY8AKmW2u3ZsXnWWsXAgtDQkJ6XWu/a7UMwLVIYmzs1Wt9BQQEuOX6vo+Pz+W/58uX7/LzfPnykZiYSFJSEsWKFUu1lTR16lR++OEHPv/8c2rWrMnGjRspUaJElmcUkexx/vx5ChUqxKlTp656LeBiayL5nI0PH3iAOtOnQ9Om0Lt3tmR096iqztbaf1hrva215a21717c/oW1trK19p/W2gh3ZsiMiIgI/Pz8rtjm5+dHRETmoxYpUiTTneE33XQTFSpU4KOPPgJcv3X8/PPPAPzvf/+jXr16vPLKK5QsWZIDBw5c61AikoMdP36cli1bcurUKfLnv/L3+ks/g0JDQ9m/YwdJW7eyf/9+6k2ZAh9/DIsWQTZN8tVaVakIDQ0lMjKSgIAAjDEEBAQQGRl5Q9cMS5QoQcOGDalevTrPPfdcht8fExPDu+++S40aNQgKCmL+/PkAPPfccwQHB1O9enXuvfdeatSokemMIpL9ks/LKFWqFMuWLSM6OpoZM2ak/jPou++gZk1o1gzOnAEfH3j0UcjO+Vrp6UHPLQ+gNRBZsWLFq0YLZNWootxKo6pEcp7URk/5+PikPoLzjz+sDQuzFqwNDLR20aJ0f05Wj6ryqBaHtXahtTasaNGiTkcREbmu1EZPnT179upZ37GxUK0aTJsGgwfDli2uFodDcvwihyIiniqt0VOXt587BwUKgL8/tG8P3btD7drZmDB1HtXiEBHJLebPn5/qnAyAgFtvhcmT4bbbID7e1X/x1ls5omiACoeISLb79NNP6dChA//85z8pWLDgFa/V9vXlRx8f6N8fqlaFpCSHUqbNowqHMaa1MSbyxIkTTkcREbks+cip0qVL065dO2rXrs26deuIiopyjZ4CJhYtyg/nz1Pq2DH4739hyRLXZaocxqMKhzrHRSSnSbmi7dGjRzHG0LNnT4oWLeqal7F/P0nWMrBdO7w6dYIdO6Bbt+wdYpsBHlU4RERymtRGTiUlJREeHg7Hj0OfPrB+veuFyEiYNQtKlXIgafppVJWIiBulNXKq9qUhtocPQ1AQ1KoF+XPHj2S1OHKhTz/9lF69etGmTRuWLFnidBwRSYW1lg8//PCqO3DeAswD5gKUKQM//ODqCM9FPKpw5PbO8dOnT3Pfffdx4cKFa+73yCOPEBUVxYwZM/jwww85d+4c9957L4mJidmUVERSSt4BXr58eUJCQujUqRMBAQH4+vpe3u8JXPfT3tCpE/z4I4SEOBU50zyqcOSWzvHp06ezbNkywsPDeeihh5g+ffrl7e3atcPLyytdxwkPD6dfv34UKFCABx54gA8//NCdsUUkDSk7wA8ePMj69evp3Lkzu3btYs7LL9O+TBmMMXzk78+SCRO484MPwNvb6eiZ4lGFIzcYMmQIpUqV4ssvv6RLly6MHz+e/v3788svvxATE0ObNm0u79u2bVtGjhzJPffcQ9myZVm6dCngagIPHTqUFi1aUKtWLcDVComJiXHkaxLJ61LrAAf4aeVK8kdE0Pr55/mobFmSLlxgV2wsbZ5+2oGUWSd39MS4Q6NGV2/r2BH69oWEBGjZ8urXn3jC9fjtN9f0/+TScZ+OrVu3smzZMsaOHcuXX37J008/zSeffEKpUqWIi4tj7969BAYGXt5/y5YtNGzYkO+++4558+YRExNDkyZNmDRpEkuXLuXEiRPs2bOH3r17U716ddauXZuBb4CIZIWTJ0+mev+eBkBUXBy89BJ07gwTJ+bY4bUZlXcLhwPmzZvH/fffD8CUKVMAOHfuHEeOHKF8+fIUK1bs8r4JCQmcOHGCpy/+ZpKYmHj59QEDBjBgwIArju3l5UWBAgX4888/KVKkSHZ8OSJ5SkxMzBU3UAoPD+fChQsMHTr0qn3rASuBeC8vWLgQWrTI9rzulHcLx7VaCH5+1369ZMl0tTBSOnjw4BUtCoD33nuPDh064Ofnx5kzZy5v37p1K7Vr177c37Fp0yaqV69+zeOfPXv2ik44Eckal/owLl2Oio2NpXv37iQlJVG/fn369evHa6+9RsmEBOKAH4Cnvb2p//bbPOZhRQM8rI8jp4+qCg4OZtWqVZefr1q1iujoaCZOnEjx4sW5cOHC5eKxZcsWataseXnfTZs2cccdd6R57GPHjlGqVCm8c2lnm0hOltYkvhIlSrBy5Uqe79mT7dWqsdUYyuO6xWtIdDSP9ezpTGA386gWh03nPced0qtXL9auXUvLli3x9/enUKFCLFy4kOLFiwPQtGlTvv/+e5o0acLmzZupV6/e5fdu2bLlmi2O5cuX0zK1fhkRuWFpTeL749gx8kVFwZAh+J87B6NGceDZZ3PtaKn08qjCkdMVKFCAGTNmpPl6//79mTBhAk2aNGHChAlXvLZ3795rHvv9999n9OjRWRFTRJI5dOgQvr6+nD59+ortBYBvfXygd2+4/3545x2oWNGZkNnMoy5V5XZ33nknjRs3vu4EwJTOnTvHI488QpUqVdyUTCTvsdYSHR1NtWrVOH/+/FWXgfP7+VHkgQdg+nRYujTPFA1Q4chxevToke4JgJcUKFCAbt26uSmRSN6QcuZ3jRo16NGjB3fccQfbtm0jOjqatmXLshFoWbYskZGRVPv8c3jySY8ZZpteulQlInleylFTBw8e5ODBg3Tr1o3o6GjynTpFpVWrCD18GPz9+XzmTLjvPodTO0ctDhHJ89Ka+f3NN9+Q7/PPXavYvv02DBgAW7fm6aIBHtbiMMa0BlpXTONao7X2qpUq84K07mssIq7zI7WZ33BxNNWGDXDzzTBvHtStm83pciaPanFca5FDX19fjh07lud+iFprOXbsmCYGiqTiyJEjtG3b9optBvgX0ALw9/eHYcNg3ToVjWQ8qsVxLeXLlyc+Pp6jR486HSXb+fr6Ur58eadjiOQon3zyCWFhYfz555907tyZ+fPnUz4hgUjgPmC2lxcXIiKgQAGno+Y4eaZweHt7U6FCBadjiIgDkq8zVa5cOQICAli5ciW1atVi5syZBFWqxM+PP87tc+eSADxXogQ1J04kNDTU6eg5Up4pHCKSN6UcMRUfH098fDxt27blww8/dM3P+PhjasydCx074vPmm4wrW9bh1DmbR/VxiIiklNaIqZ0//YT3pbXjHn0UvvsOPvwQVDSuS4VDRDyWtTbVdaZaAYsOHIA2beDkSdcEvrvvzv6AuZQKh4h4pD179tC8efMrRlKWAT4EFgJ/eXvDokVw001ORcy1VDhExKOcPXuWV199lerVq7N69Wq6du2Kn58fJYFtwMPAS97ebIiKgvr1HU6bO6lwiEiulXx9qcDAQP79739To0YNXnjhBdq0acOOHTuYOXEikZGRFAoIIBxoccstVIqOpnP37k7Hz7U8alTV9WaOi4jnSO2ufKNHj6ZUqVJ8+eWXNL//fhg3DkaNIvTbbwndv9/ZwB7Eo1oc15o5LiKeJa3RUr6+vjQvXhxq14aRI+Ghh6BcOQcSei6PanGISN6R1l35njpwAO66y1UsFiyA1q2zOZnnU+EQkVwlKSmJyZMnp71DsWLQpQtERGjElJt41KUqEfFsu3bt4r777mPAgAEEBwdTsGBBSgMf4Bot5efnxy3/+Q9MmqSi4UYqHCKS4124cIHx48dTo0YNtmzZwowZM9i4YQNfd+nCznz5aAsE33wzkZGRWl8qG6hwiEiOknKI7ZgxY2jQoAHPPfcczZo1Y9u2bXRv0ADTpAl3RUVR7O678dmxg/Bjx1Q0son6OEQkx0htiO2wYcMoXLgws2fPpmPHjq6bsc2cCevXQ2Qk9OwJ+fQ7cHbSd1tEcoy0htgWK1aMxypUwMyZ49rQtSvs3g29eqloOEAtDhHJMVIbYlsIGBwf71oepHJl10q2+fNDqVLZH1AAtThEJIf4+uuv8fLyumJbC2Ar8BRAnz7www+uoiGOUuEQEUf9/vvv9OjRgyZNmlCiRAl8fHwAqAp8AfxlDF+98AJMngxaFSJHUOEQEUdYa5k9ezZVq1blvffeY/jw4ezbu5dP/v1vAgIC2GEM/ypVip+nT6fZyy87HVeS8ag2nxY5FMkd4uLi6Nu3L59//jl16tRhyZIl1ChUCFq3psWKFez/+WeoXt3pmJIGj2pxaJFDkZwn+byMgIAAunXrRlBQEMuXL+eNN95g9bffUuPLLyE4GH76CaZMgWrVnI4t1+BRLQ4RyVlSzsuIi4vjvffeIzg4mAULFhB4663QsKGr07tdO9dSIbfc4nBquR6PanGISM6S1ryMM8ePExgYCF5erjkZ8+bB3LkqGrmECoeIuE1q8zKaA0sOHID5810b+vWDtm2zN5jcEBUOEclyp0+fZsiQIVhrL28rBcQAXwLnvb2hTBmn4skNUuEQkSz13XffUaNGDcaNG0fjxo0pWLAgHYHtQHsg3NubnyIjXTPBJVdS4RCRLHHq1Cmeeuop7r33XhITE1m6dCnLli0jKiqKW0qUYDvQ4pZbqBAdTecnnnA6rtwAjaoSkRu2dOlSevXqRWxsLAMGDCDi5ZcpHBkJu3cT2rs3oY8/DtbytRYk9Aj6VxSRTDtx4gS9evXiwQcfpECBAnz33Xe82a0bhRs3hqFDYeVK147GaBVbD6J/SRFJl5Q3WBo8eDDVqlVj+vTpDB06lI0rV9Jw3jyoWxcOH3YNr33vPadjixvoUpWIXFdqN1h6/fXXKV++PD/88AMhISHw3XfwxhsQFgavvQbFijmcWtxFhUNEriutiXwlrSVk924ICYF77oFdu0BrxXk8XaoSketKbSJfV+CrgwehRw/XpSlQ0cgjVDhE5JoWLFjgus/3RRWAxcBMINbHx7UwoSbz5SkqHCKSqj/++IPu3bvTpk0bypUrh6+vL4WAn4D6wCBvb3ZERUFQkMNJJbupcIjIVRYtWkT16tWJiYnh+eefZ8/ixUybNo2SAQGEAU3LlaNOdDShXbs6HVUcoMIhIpedPHmSXr160aJFC4oXL87aFSt4JSGBAtWrE1qkCPv37+dja1kTH09oaKjTccUhGlUlIgB8/fXX9OjRg/j4eIYNG8bLDRtSoFs32LfPNcT23nudjig5RI5vcRhjHjHGRBlj5htjmjqdR8TTnDp1ir59+9KkSRMKFizIypUrGX3mDAVatwZvb/jmG3jnHc3LkMvcWjiMMdONMUeMMVtSbG9ujNlpjNljjBl2rWNYaz+11vYCngAec2NckTzn22+/pUaNGkydOpVnnn6aDevWUb9+fahTB0aOhJ9/VktDruLuFscMXPdtucwY4wVMBloA1YDOxphqxphgY8xnKR6lk7115MX3iUgmJF8yxN/fn+bNm9OoUSOMMfwwezavb9tGwWnTXDs//ji8+ir4+jobWnIkt/ZxWGu/NcYEpthcF9hjrd0LYIyZDbSx1o4GWqU8hnENIH8N+NJau96deUU8VcolQw4cOMCBAwdo9sADzH/gAXyefNK1COGjjzqcVHIDJzrHywEHkj2PB+pdY/+ngCZAUWNMRWvt1NR2MsaEAWEAZcqUYcWKFVmTVq5w6tQpfW9zoWefffaqJUOCgTHffIPP11/zW4MG7B44kLOlS4P+fT1OVp+3ThQOk8o2m8o21wvWvgW8db2DWmsjgUiAkJAQ26hRo8zmk2tYsWIF+t7mPkeOHLlqW3GgdGIizJlDyfbtKWlSOzXFE2T1eevEqKp44NZkz8sDhxzIIeLxrLW88847l58/AAy++PdvgUb+/tChg+t+GSLp5EThWAtUMsZUMMYUADoBC7LiwMaY1saYyBMnTmTF4URytaNHj/LII4/Qu3dvGlSpwnteXiwFngR8AD8/P14YNcrhlJIbuXs47gfAaqCKMSbeGNPTWpsI9Me1Ttp2YI61dmtWfJ61dqG1Nqxo0aJZcTiRXGvRokXccccdLF60iC+6dOG7337jcWDSTTdRGygbEEBkZKRmf0umXLePwxhTHler4B7gFuA0sAX4HNdIp6S03mut7ZzG9i+ALzITWETSdvr0aYYNG8Zbb71F9erVWT5zJre3bg01a2KiongqOJinnA4pud41C4cxJhrXKKjPgDHAEcAXqIxrfsYIY8wwa+237g4qIte2adMmHn/8cXZs3cq0Vq0I/egjfH194fvv4c47wcvL6YjiIa7X4njdWrslle1bgHkX+yj8sz5W5hhjWgOtK+pmMpKHJCUl8eabbzJs2DDuKVKElZUqUfSzz1z3ybj7btfd+USy0DX7OFIrGsaY4saYOy6+fs5au8dd4TJKfRyS1xw6dIhmzZox4pln+MDfn6+OH6foiRMwezY0bOh0PPFQ6eocN8asMMbcZIy5GfgZiDbGTHBvNBG5lk8++YTg4GBWrVzJ3ttuo92ePZju3WH7dnjsMQ2xFbdJ7wTAotbak8aYfwHR1toXjTGb3BlMRFJ36tQpBg0axLx336VirVq89/77lN21CwoXhsaNnY4neUB6h+PmN8b8A+iIq6M8R9I8DvE0yRcmDAwM5JVXXuHOmjVJePdd4vz8WNW5M1WqVIHWrVU0JNukt3C8gmvexR5r7VpjzG3AbvfFyhz1cYgnubQwYWxsLNZaYmNjeffFF5kSF8f7QOGgIPI31S1qJPul61KVtfYj4KNkz/cCWkZTxI1GjBhxxcKEjwPvACYxESZOhP79NcRWHHHNFocxZuTFDvG0Xr/fGHPVUugicuPi4uKufA6sAIKshYEDVTTEMddrcWwGFhpjzgDrgaO4JgBWAmoCSwEtdiOSxXbu3MlN3t4MPXcOL2Ao8P3FR0BAgLPhJM+73jyO+dbahkBvYCvgBZwEZgF1rbVPW2uPuj9m+qhzXHK7xMRExo4dy8DgYH46f57hQPImv5+fHxEREU7FEwHS38exmxzYGZ6StXYhsDAkJKSX01lEMmrz5s0M6taN0I0bWQQkBgaytHNnwt9/HxMXh7+/PxEREVqYUBznxI2cRCSZc+fO8dprrxEeHs6dhQvT1dcXO3Ag+V98kSYFC7JfS59LDqPCIeKgdevW8e8uXbhzxw7ad+7Mm2++ibeXF9yc5pgUEcepcIg44MyZM7z60kskjB3LPMDH15f84eFQqpTT0USuK71rVVU2xnxtjNly8fkdxpiR7o0m4pnWrFnDY9Wq8fCYMbxhLQUeeID8O3bAbbc5HU0kXdI7czwKGA6cB7DWbsJ1cycRSaeEhASeffZZ7rvrLt6Ji6Nm0aLw/vt4L1kCGmIruUh6L1X5WWt/NFeutpnohjw3RPfjkJzqm2++YVJoKJ8cPMj/9elDkbZt8alVC0qUcDqaSIalt3D8Zoz5J2ABjDHtgV/cliqTNBxXcpo///yTlwcN4vbp0/kY2Pn001SZoDsSSO6W3sLRD4gEbjfGHAT2AV3clkrEAyxZvJhPQ0N54dgxShnDuUGDqBIe7nQskRuWrj4Oa+1ea20ToBRwu7X2bmvtfrcmE8klUi59HhUVRc+ePdnWvDlTjh2jcJUqeK1bR4EJE8DPz+m4IjcsXS0OY0wxoBsQiOveHABYawe4LZlILnBp6fNLq9geiI1lQFgYZ4CpoaGcDw6m8LPPQn6NfBfPkd7/zV8Aa3AtepjkvjgiuUvypc+DcA0//AEYU7Ys/zdrlpPRRNwmvYXD11r7jFuTiORCcXFx+AAjgGHAcWAScPjwYUdzibhTeudxvGeM6WWM+Ycx5uZLD7cmywStjivZacOGDdT18uJn4HngA6DqxT/9/f0dzSbiTuktHOeAccBqYN3Fx0/uCpVZunWsZIcTJ04wYMAAQkJCOOvjwwVjaAp0B46hpc/F86W3cDwDVLTWBlprK1x8aH0EyVOstcTMmsXggACqT5pEnz59WB4fz4b//pddAQEYYwgICCAyMlJLn4tHS28fx1Yg4bp7iXio7du382KPHnRZs4Yo4K/bbycsIgKKFiW0a1dCu3Z1OqJItklv4bgAbDTGLAfOXtqo4bji6f766y8iXnmFv8aP592kJHy9vUmKiKDQ009riK3kWen9n//pxYdInmCtZcGCBQwYMICTcXHE+vpSoH59vKdPhwoVnI4n4qj03jr2v+4OIpJT7Nu3j2f79eMfX35J8aAgZn37LTf5+4O/P1y50KdInnTNwmGMmWOt7WiM2czFBQ6Ts9be4bZkItns7NmzjBs3jhWvvsqU8+epDCSOGkX+e+5xOppIjnK9FsfAi3+2cncQESd99dVXDOvdm7C9e1kKJN56K0ybRv6mTZ2OJpLjXHM4rrX20tLpfa21sckfQF/3x8sYTQCUjDp48CCdOnWiadOmTPrlF8Ly5YNnnyX/9u2goiGSqvTO43gwlW0tsjJIVtAEQEmvxMRE3njjDRpXrszSTz7h5ZdfpvaSJZgff4Tx46FQIacjiuRY1+vj6IOrZXGbMWZTspeKACvdGUzEXVauXEm/Pn1osHkzG728ONetG8VeeMHpWCK5xvX6ON4HvgRG41rD7ZI/rbW/uy2ViBv89ttvDB06lNXTpzOzQAFCANu4MX4jRjgdTSRXuWbhsNaeAE4AnbMnjkjWS0pKYtq0aQwfPpxWx4+zKV8+vAoVgqgoTNeuGmIrkkHp7eMQyRVS3o0vIiKCBg0a0Of//o/g4GBGfvYZ+bt1w+zcCd26qWiIZILWTBCPkfJufLGxsYwdOZIJ3t7MqV2bW5ctw+TLBy1y3LgOkVxFhUM8RvK78QG0Bf4DlDl/Hq/77oOkJMinRrbIjdJZJB4jLi4OgNLAvIuPw0B9gNdf16KEIllEhUNyvb/++ovhw4djrWtVnETgTmAIUBc4GhDgYDoRz6NfwSRXmz9/PgMGDMAvLo5Pypal6/Hj/H7mDJWB8+hufCLuoBaH5Er79u3j4YcfpuMjjzDkzBm2envzyLlzfPDiiwQEBJCou/GJuI1aHJKrnD17ltdff53w8HAaWMuh0qUpceQIdO4MEyfSqnRpWg0bdv0DiUimqXBIrvH111/Tr18/du7cSYdHHyVm82a8z5yBzz+Hli2djieSZ3jUpSqtjuuZfvnlFx5//HGaNGnCPSdOsGTuXOZ8/DHeCxfC1q0qGiLZzKMKh1bH9SyJiYm89dZbVKlShdVz57K1alWifv2VB3fudO1QuTIULuxsSJE8yKMKh3iONWvWUKdOHQYNHEj4rbeyx8eHavv2wejRMHiw0/FE8jQVDslRjh07RlhYGHfddRdHjx5le+vWDNi2Da+QENgJ6oBmAAAPi0lEQVS0CYYNA29vp2OK5GnqHJccISkpiRkzZjBkyBD++uMPRvbty5DXXqPI4cPQti088YQWJBTJIVQ4xHGbNm2iT58+rFq1it533MEbxYrhGx8PRYq4HhUrOh1RRJLRpSpxzMmTJ3nmmWeoVasWh3bsYPv99zNl82Z8z52DsDCn44lIGlQ4JNtZa5kzZw5Vq1Zl4sSJvNqmDf/z8eH25csxTz3lGmL70ENOxxSRNKhwSLbatWsXzZo147HHHqNsmTKsWbOG4VFR5AsKgtWr4c03XZenRCTHUh+HZIvTp08zevRoxowZQ0EfH5Z16kSjw4cxtWq5ljv/6iunI4pIOqnFIW73xRdfEBQUxKuvvsqAZs04EhRE49mzMdbC8eNOxxORDFLhELeJi4ujXbt2PPTQQxQqUID/9ejBuCVLKLBjB0ybBsuWQcmSTscUkQxS4ZAsd+7cOcaOHUvVqlVZtGgRo0ePZt3atdy2ejW0aQPbt0PPnpqXIZJLqY9DstQ333xD37592bZtG4+1bMk7FSpQtF8/V4f36tWgdcREcj21OCRLHD58mG7dutGoUSMSEhL4YeRIZm/aRNEpU2DxYtdOKhoiHkGFQ27IhQsXmDJlClWqVGH27NmMHjiQ3bVqUTc83FUoVq2C9u2djikiWUiFQzJt7dq11KtXj379+hESEsLmzZsZduAA+T/7DF59Fdavh/r1nY4pIllMfRySLjExMYwYMYK4uDjKlStHlSpVWLZsGWXLluWzCRNo2aEDpnx5GD8eRo2CKlWcjiwibqLCIdcVExNDWFgYCQkJAMTHxxMfH0+rpk35qG5dfIcPh7Vr4f33oUIFh9OKiLvl+MJhjKkKDARKAl9ba992OFKeM2LEiMtF45K6wLjly/FdssTVh/H6686EE5Fs59Y+DmPMdGPMEWPMlhTbmxtjdhpj9hhjhl3rGNba7dba3kBHIMSdeeVqR48eJTY29optjwGrgcLnz8P8+fDRR/CPfziST0Syn7s7x2cAzZNvMMZ4AZOBFkA1oLMxppoxJtgY81mKR+mL73kY+B742s155aLz58/z5ptvUrly5cvbCl38cwkwDmh+663w8MNOxBMRB7n1UpW19ltjTGCKzXWBPdbavQDGmNlAG2vtaKBVGsdZACwwxnwOvJ/aPsaYMCAMoEyZMqxYsSIrvoQ86ccff2Ty5MnExcUREhLCfbffTt333+efSUnUA/4AXvTxYXC3bvo+i+QCp06dytJz1Yk+jnLAgWTP44F6ae1sjGkEtAN8gC/S2s9aGwlEAoSEhNhGjRplQdS8Zffu3TzzzDN89tlnVKxYkYULFvDQkSOY557jQr58TCxShHwnTlA+IICIiAhCQ0Odjiwi6bBixQqy8meiE4UjtQWKbFo7W2tXACvcFUZcd+ILDw9n4sSJ+Pr6MnbsWAZ06oRP9+6wfDnccw9ekZE8e/vt1M7i/4Aikvs4UTjigVuTPS8PHHIgR56XlJTEjBkzGD58OEePHuXJJ58kIiKCsmXLwrlzcPYsvPMO/OtfkE9zRUXExYnCsRaoZIypABwEOgGPZ8WBjTGtgdYVK1bMisN5tFWrVjFgwADWrVtHgwYN+Pzzzwmx1rVq7QcfwE03wfffawVbEbmKu4fjfoBr5GYVY0y8MaantTYR6A8sBrYDc6y1W7Pi86y1C621YUW1mF6a4uPjCQ0NpWHDhvz666/ExMTw/aJFhMya5VoeZONG+N//XDuraIhIKtw9qqpzGtu/4Bod3ZL1Tp8+zfjx43nttde4cOECI0eOZNiwYRT65huoXh3i4qBPHxg9WqvYisg15fiZ43JjrLXMnTuXwYMHExsbS/v27Rk3bhyBgYFgLbzxBhQu7Los1bCh03FFJBfwqB5PY0xrY0zkiRMnnI6SI/z888/cf//9dOjQgaJFi7J8+XI+mjOHwBUr4MAB16WomBjXKrYqGiKSTh5VONTH4fLbb7/Rp08fatWqxebNm5k6dSrr16+nUfny8OCD8OST8PbFJb9KlwYfH2cDi0iu4lGFI6+7tExIpUqViIqK4qmnnmL37t38X48eeI0fD8HBrlVsp06F8HCn44pILqU+Dg+xePFiBg0axI4dO2jatClvvPEG1apVc7348svw0kvQrh1MmgS33OJoVhHJ3TyqxZEX+zh2797Nww8/TPPmzUlMTGTBggUsWrSIav7+sG+fa6cBA+DTT2HuXBUNEblhHlU48lIfx8mTJxkyZAhBQUGsWLGCsWPHsmXLFlq3bo1ZtMg1xLZ9e9fIqeLFoU0bpyOLiIfwqMKRFyQlJREdHU3lypUZN24cXbp0YdeuXTz33HP4nDgBoaHQsiUULAhvvqlJfCKS5dTHkYskXybkrrvuYuHChdSpU8f14s8/w/33w59/wosvwvDhGi0lIm6hFkcukHyZkF9++YVZs2axcuVKV9E4f961U9Wqrpsqbdzo6ghX0RARN/GowuFpneOnT58mPDycKlWqMHfuXEaOHMnOnTsJDQ3FXLgAY8dCUBCcPAkFCkB0NFwaSSUi4iYeVTg8pXP80jIh1apV4/nnn6dly5bs2LGDV199lcKFC8O6dVCnDgwd6ioUZ844HVlE8hCPKhyeYNOmTdx///20b9+em266ybVMyEcfudaWOn8enn0W6taFX3+Fjz+GTz5xzf4WEckmKhw5xKVlQu688042b97M22+/zbp16668217+/LBli+vGStu3w6OPatSUiGQ7FQ6HpbVMSO/evcmfPz8cPeoqFHFxriLx2Weuu/IVK+Z0dBHJo1Q4HLRkyRJq1KjBoEGDqFOnDps2bWLixIkUL17cNXFv5kzXaKmZM2HVKtebvL2dDS0ieZ5HFY7cMqrq0jIhzZo149y5cyxYsIDFixf/vbbU3r3QrBl07w6VK8OGDdCpk7OhRUQu8qjCkdNHVZ08eZKhQ4cSFBTE8uXLGTNmDFu3bnUtE5K8r2LcOFizBiZPdt1gKSjIudAiIilo5ng2SEpK4r///S/Dhw/n8OHDPPnkk4waNYqyZcv+vdP69eDlBTVqwKhRMGIElC/vXGgRkTR4VIsjJ1q9ejX16tWjR48e3Hbbbfz4449Mnz7976KRkADPPecaYjtsmGtb8eIqGiKSY6lwuEl8fDxdunShQYMGHDp06MplQi756ivXKrbjx0PPnvDBB84FFhFJJ12qymKnT59mwoQJjBo1igsXLjBy5EiGDh3qmvGd3KefQtu2rs7vb76Be+91JrCISAapcGQRay3z5s1j8ODB7N+/n0cffZRx48ZRoUKF5DvBoUNQrpxr6fM33oDevcHX17ngIiIZpEtVWSD5MiFFihRh2bJlfPzxx1cWjX37oEULqFfv70UJBw1S0RCRXMejCkd2z+NIbZmQ9evX07hx4793SkyECRNcfRkrV7o6wAsVypZ8IiLu4FGFI7vmcZw/f5633nrr8jIh/fv3Z9euXX8vE3LJ779D/fquhQkfeAC2bYP+/V3DbkVEcin1cWTQkiVLGDRoENu3b+fBBx9k4sSJf8/4vsRa17pSxYu7lj0fOtR1/28tSCgiHsCjWhzutGfPHtq0aXN5mZD58+dfuUzIJUuXwp13/r0o4cyZ0KGDioaIeAwVjuv4888/GTp0KNWqVWPZsmWXlwl5+OGHr1wm5Ngx19pSDz7omtT322/OhRYRcSNdqkpDUlISM2fOZPjw4fz666+pLxNyyfvvw8CBcPy4a6mQkSM1WkpEPJYKx0UxMTGMGDGCuLg4SpcujZ+fH/v27aN+/fosWLDgyhnfKS1fDv/8J0RFQXBw9oUWEXGACgeuohEWFkZCQgIAhw8fxhhDnz59mDx58pWXpMA1xHbSJLjnHggJgTffBB8fjZYSkTxBfRzAiBEjLheNS6y1fPHFF1cXjZ9/hrvugmeegdmzXdv8/FQ0RCTP8KjCkdkJgHFxcdfffvq0a/Je7dquEVOzZ7vumyEiksd4VOHI7ARAf3//62+PjIQxY1wjp7Zvh8ce0xBbEcmTPKpwZFZERAR+fn5XbPPz82PcsGHw00+uDX36uO7G9+67cPPNDqQUEckZ1DkOhIaGAlweVeV/663MatWKu194wbWu1O7drkUJGzZ0OKmIiPPU4rgoNDSU/fv3k7RvH/uDgrh7yhSoUAHmz4f8qq8iIpfoJ2JyO3e6Or8BJk7UgoQiIqlQ4UiucmUYPBiefBICApxOIyKSI6lwJGcMvPSS0ylERHI09XGIiEiGqHCIiEiGqHCIiEiGqHCIiEiGqHCIiEiGeFThyOwihyIikn4eVTgyu8ihiIikn0cVDhERcT9jrXU6Q5YzxhwFYjP59qJAVl3rutFjZfb9GX1fRvYvCfyW4UR5Q1b+33Enp3K683N13l5bes/bAGttqevuZa3VI9kDiMwpx8rs+zP6vozsD/zk9L9RTn1k5f8dT8zpzs/VeXvdfbP0vNWlqqstzEHHyuz7M/q+rPya87Lc8n10Kqc7P1fnbTbyyEtV4j7GmJ+stSFO5xCR9Mvq81YtDsmoSKcDiEiGZel5qxaHiIhkiFocIiKSISocIiKSISocIiKSISockmnGmEeMMVHGmPnGmKZO5xGR6zPGVDXGTDXGfGyM6ZOZY6hwyBWMMdONMUeMMVtSbG9ujNlpjNljjBkGYK391FrbC3gCeMyBuCJChs/b7dba3kBHIFNDdFU4JKUZQPPkG4wxXsBkoAVQDehsjKmWbJeRF18XEWfMIAPnrTHmYeB74OvMfJgKh1zBWvst8HuKzXWBPdbavdbac8BsoI1xGQN8aa1dn91ZRcQlI+ftxf0XWGsbAKGZ+bz8NxJW8oxywIFkz+OBesBTQBOgqDGmorV2qhPhRCRVqZ63xphGQDvAB/giMwdW4ZD0MKlss9bat4C3sjuMiKRLWuftCmDFjRxYl6okPeKBW5M9Lw8cciiLiKSP285bFQ5Jj7VAJWNMBWNMAaATsMDhTCJybW47b1U45ArGmA+A1UAVY0y8MaantTYR6A8sBrYDc6y1W53MKSJ/y+7zVosciohIhqjFISIiGaLCISIiGaLCISIiGaLCISIiGaLCISIiGaLCISIiGaLCISIiGaLCISIiGaLCIZINjDGBxpjtF++YuNUYs8QYU9DpXCKZocIhkn0qAZOttUHAceBRh/OIZIoKh0j22Wet3Xjx7+uAQAeziGSaCodI9jmb7O8X0P1wJJdS4RARkQxR4RARkQzRsuoiIpIhanGIiEiGqHCIiEiGqHCIiEiGqHCIiEiGqHCIiEiGqHCIiEiGqHCIiEiGqHCIiEiG/D+OH91PTfSKhgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "\"\"\"Program to show time as a function of size for selection sort.\"\"\"\n", "from time import time\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline \n", "\n", "# Set the seed to keep the output fixed\n", "np.random.seed(1234)\n", "\n", "# Set the n values and initialize a list for times\n", "n_values = np.arange(50, 1000, 50)\n", "t_values = []\n", "\n", "for n in n_values:\n", " # Generate an array of random numbers\n", " a = np.random.rand(n)\n", " # Start the timer\n", " t_start = time()\n", " # Do the search\n", " selection_sort(a)\n", " # Determine the elapsed time (in seconds) and save it\n", " t_elapsed = time() - t_start\n", " t_values.append(t_elapsed)\n", " \n", "# Plot the times vs values on a log log plot.\n", "# Also, an expected \"trend\" line is included.\n", "plt.figure(1)\n", "plt.loglog(n_values, t_values, 'k-o', label='times') \n", "plt.loglog(n_values, 1e-7*n_values**2, 'r--', label='$\\mathcal{O}(n^2)$')\n", "plt.grid(True)\n", "plt.xlabel('n')\n", "plt.ylabel('time (s)')\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It appears that the order is confirmed because the two lines are approximately parallel (for large enough $n$), and that the impact on time is as expected. Basically, each comparison (which requires accessing two elements of `a`) requires a certain amount of time. If the number of comparisons is goes as $n^2$, and the time goes with the number of comparisons, then selection sort is **100 times more expensive** for an array of size $10n$ than for an array of size $n$. That's huge, and better methods are surely needed for the massive data sets out in the real world that must be sorted (think Google searches and other everyday lists with thousands of entries)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "**Exercise**: The results from the numerical experiment suggest that an array of 100 elements takes about 0.001 s to sort using the selection sort algorithm. About how many seconds would you expect the same algorithm to take for 1 million elements?\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Better Sorting for Bigger Data\n", "\n", "We saw in [Lecture 14](ME400_Lecture_14.ipynb) that the task of searching a pre-sorted array could be done in as few as $\\mathcal{O}(n \\log n)$ operations. The trick was to split identify which half of a sequence contained the element, and to keep splitting the domain in half at every iteration until the element in question was found. It was even suggested that such a **binary search** is a form of divide and conquer: split the work into ever smaller chunks and solve the smaller problems more quickly. \n", "\n", "The same principle is the foundation of many of the most successive algorithms ever, including [quicksort](https://en.wikipedia.org/wiki/Quicksort), probably the most widely search algorithms, and the [fast Fourier transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform), which drives much of signal processing across multiple domains. Often, and as true for these algorithms, divide and conquer takes one from $\\mathcal{O}(n^2)$ to $\\mathcal{O}(n \\log n)$, a huge savings.\n", "\n", "For our purposes, it is sufficient to consider the simplest of the divide and conquer approaches to sorting: [merge sort](https://en.wikipedia.org/wiki/Merge_sort). The algorithm combines two very simple ideas: (1) divide a sequence into smaller chunks that can be sorted directly and (2) merge two sorted (sub)sequences. \n", "\n", "### Merging Sorted Sequences\n", "\n", "The fundamental work done by mergesort is to merge two smaller, *sorted* sequences into one larger, *sorted* sequence. For example, if one starts with `a = [1, 8]` and `b = [5, 7]`, the result of *merging* `a` and `b` should be `c = [1, 5, 7, 8]`. \n", "\n", "How is this accomplished? We know the total number of elements to be merged (here, that's 4). Each element of the new, sorted sequence `c` can be defined by choosing the lowest of the remaining elements in `a` and `b`. Because `a` and `b` are sorted, we can keep track of which elements have already been selected with two counters (one for `a` and one for `b`). If the counter for `a` is equal to the length of `a` (i.e., all elements of `a` have been merged), then all of the remaining elements of `c` must come from `b`. The same goes if the `b` counter equals the length of `b`.\n", "\n", "That's all a mouthful, for sure. The idea can be shown more succinctly in pseudocode:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "'''Merge: an algorithm to merge two sorted sequences'''\n", "Input: a, b \n", "Set m to the length of a\n", "Set n to the length of b\n", "# Initialize the sorted sequence\n", "Set c to be a sequence of length (m+n) \n", "# Initialize the counter for a, b, and c\n", "Set i = 0\n", "Set j = 0\n", "Set k = 0\n", "While k < m + n do\n", " If i < m and j < n then\n", " # Both a and b still have elements to merge\n", " If a[i] <= b[j] then\n", " Set c[k] = a[i] \n", " Set i = i + 1\n", " Otherwise\n", " Set c[k] = b[j]\n", " Set j = j + 1\n", " Otherwise, if i == m then\n", " # Only b has elements left to merge\n", " Set c[k] = b[j]\n", " Set j = j + 1\n", " Otherwise,\n", " # Only a has elements left to merge\n", " Set c[k] = a[i]\n", " Set i = i + 1\n", " Set k = k + 1\n", "Output: c\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take some time and digest this algorithm. Then try the following exercises:\n", "\n", "***\n", "\n", "**Exercise**: Apply this algorithm to `a = [1, 3, 8]` and `b = [2, 5, 7]`. Trace it by hand using the tecniques employed earlier in the course.\n", "\n", "***\n", "\n", "**Exercise**: Implement this algorithm as the Python function `merge(a, b)`.\n", "\n", "***\n", "\n", "**Exercise**: Is merge sort stable? Try it on [3, 1$_0$, 4, 1$_1$, 2].\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Divide and Conquer\n", "\n", "Merging is half thee battle: we need to dive down and get the smaller arrays to sort. A natural approach is to use recursion. Here is a function that will divide a sequence and print out subsequences. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--------> [8, 7, 6, 5, 4, 3, 2, 1]\n", "----> [8, 7, 6, 5]\n", "--> [8, 7]\n", "--> [6, 5]\n", "----> [4, 3, 2, 1]\n", "--> [4, 3]\n", "--> [2, 1]\n" ] } ], "source": [ "def divide_and_print(a):\n", " print('-'*len(a)+'>', a)\n", " if len(a) <= 2:\n", " return\n", " else:\n", " divide_and_print(a[:len(a)//2])\n", " divide_and_print(a[len(a)//2:])\n", " return\n", "divide_and_print([8,7,6,5,4,3,2,1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output of this function clearly shows how the original sequence is divided at each level of recursion. Like all recursive functions, there needs to be a termination criterion (or guard). Here, if the sequence has two or fewer elements, the function returns without further dividing `a`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To apply the basic idea of `divide_and_print` to sorting, the sequence must be sorted at the bottom level (i.e., when `len(a) <= 2`) and merged at successive levels. For example, the very first chunk of `a` is the subsequence `[8, 7]`. That is easily sorted by simple comparison: since `8 > 7`, swap the elements to yield (and return) `[7, 8]`. The next chunk is `[6, 5]`, which is also reversed to `[5, 6]`. These sorted subsequences of two elements can then be merged.\n", "\n", "Here is the full algorithm in pseudocode, where `Call` indicates that an algorithm (think a function, here) is to be supplied some input and is to provide some output:\n", "\n", "```\n", "'''Merge Sort: an algorithm to sort the sequence a'''\n", "Input: a\n", "Set n to the length of a\n", "If n == 2 and a[0] > a[1] then\n", " Swap a[0] and a[1]\n", "Otherwise, if n > 2 then\n", " Set a_L to be the first half of a \n", " Set a_R to be the second half of a\n", " Call Merge Sort to sort a_L and assign \n", " its output to a_L\n", " Call Merge Sort to sort a_R and assign\n", " its output to a_R\n", " Call Merge to combine a_L and a_R, and assign\n", " its output to a\n", "# Note, nothing is done when n is 1\n", "Output: a\n", "```\n", "\n", "Although perhaps not obvious, the merge sort algorithm is $\\mathcal{O}(n\\log n)$. Given $n$ elements in a sequence, the algorithm divides the array into smaller chunks $\\log_2 n - 1$ times. For example, if one starts with 16 elements, those are then divided into subsequences with 8 elements, and then 4, and then 2. That is $\\log_2 16 - 1 = 3$ levels. At the last (deepest) level with subsequences of 1 or 2 elements, up to one comparison is made to decide whether to swap the elements or not. For all other levels, the merge algorithm is applied. To merge two sequences each of length $m$ requires at least $m$ comparisons (and up to $2m$ comparisons). Because all $n$ elements of the original sequence are merged at every level (albeit as part of smaller subsequences), up to $2n$ comparisons are therefore required per level. Hence, there are up to approximately $(\\log_2 n - 1)\\cdot 2n = \\mathcal{O}(n\\log n)$ comparisons.\n", "\n", "Spend some time and digest this alorithm. Then tackle the following exercises.\n", "\n", "\n", "***\n", "\n", "**Exercise**: Apply this algorithm to `[6, 5, 4, 3, 2, 1]` and trace its progress by hand, writing down the value of `a` at the beginning of each call to Merge Sort. Number these calls in the order in which they are made. (In other words, predict what would happen if `a` were printed after `Input`.)\n", "\n", "*Solution*: \n", "\n", "```\n", "0 [6, 5, 4, 3, 2, 1]\n", "1 [6, 5, 4]\n", "2 [6] \n", "3 [5, 4] \n", "4 [3, 2, 1] \n", "5 [3] \n", "6 [2, 1] \n", "```\n", "\n", "***\n", "\n", "**Exercise**: Implement this algorithm as the Python function `merge_sort(a)`. \n", "\n", "***\n", "\n", "**Exercise**: How many comparisons are made for `a = [6, 5, 4, 3, 2, 1]`? Make sure to include the comparisons made when merging. How does this compare to selection sort?\n", "\n", "***\n", "\n", "**Exercise**: Suppose you implement merge sort (correctly) in Python and use it to sort a list of 100 elements. If the time required to sort 100 elements is about 0.001 s, how long would it take to sort a list with 1 million elements?\n", "\n", "*Solution*: $t \\approx \\frac{10^6 \\log 10^6}{100 \\log 100} \\times 0.001~\\text{s}= 30~\\text{s}$.\n", "\n", "***\n", "\n", "**Exercise**: Develop a modified merge sort algorithm that does not use recursion. *Hint*: start simple and reproduce the behavior of `divide_and_print` using only loops.\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Built-in Sorting\n", "\n", "Sorting is so fundamental that algorithms are often implemented by default in high-level languages (like Python, MATLAB, and C++). In Python, the built-in `sorted` function accepts as input a sequence `a` and returns the sorted results:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted([3, 2, 1])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on built-in function sorted in module builtins:\n", "\n", "sorted(iterable, /, *, key=None, reverse=False)\n", " Return a new list containing all items from the iterable in ascending order.\n", " \n", " A custom key function can be supplied to customize the sort order, and the\n", " reverse flag can be set to request the result in descending order.\n", "\n" ] } ], "source": [ "help(sorted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As `help` indicates, two optional arguments `key` and `reverse` can be provided. By default, `reverse` is `False`, and a sequence is sorted in increasing order. By setting `reverse=True`, the opposite order can be achieve:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted([1, 2, 3])" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[3, 2, 1]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted([1, 2, 3], reverse=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `key` argument is `None` by default. Any other value passed must be a callable function that is given an item of the sequence and returns a value to be used when comparing that item to others. \n", "\n", "Consider, for example, the list `a = [1, 2, 3]`. The value of `a[0]` is 1, while `a[1]` is 2. Hence, `a[0] < a[1]` is `1 < 2` is `True`. The result of this comparison (`True`) can be modified by using the `key` function to change the value of each item compared. For example, if both `a[0]` and `a[1]` are *negated* before their comparison, the result is `-1 < -2`, i.e., `False`. Such a change produces a sorted array in reverse order:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[3, 2, 1]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted([1, 2, 3], key=lambda x: -x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can also define a key function using a standard `def` statement, or" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[3, 2, 1]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def my_key_function(x):\n", " return -x # each item of the list is negated before comparing\n", "sorted([1, 2, 3], key=my_key_function)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `key` function can make sorting items with many components easier. Those items could be the rows of a spread sheet whose components are the valiues in various columns. Suppose such data were read from `.csv` file to produce a list of each row of data, e.g.," ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "data = [(\"Julia\", \"Roberts\", 2, 5), \n", " (\"Harrison\", \"Ford\", 4, 1), \n", " (\"Meryl\", \"Streep\", 3, 6)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perhaps obviously, these are famous names coupled with arbitrary numbers. We can print out these items by last name via" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('Harrison', 'Ford', 4, 1)\n", "('Julia', 'Roberts', 2, 5)\n", "('Meryl', 'Streep', 3, 6)\n" ] } ], "source": [ "for item in sorted(data, key=lambda x: x[1]):\n", " print(item)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `key` function returns `x[1]`, with is `Ford`, `Roberts`, or `Streep` for the given data. Certainly, more complex data types (dictionaries, nested lists, etc.) may be similarly *keyed* and sorted." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***\n", "\n", "**Exercise** Take this famous-person data and sort the three items in order based on the number of `e`'s in the last name.\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python's built-in `list` type has its own `sort` function. Like `sorted`, it accepts optional `key` and `reverse` arguments:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on method_descriptor:\n", "\n", "sort(...)\n", " L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*\n", "\n" ] } ], "source": [ "help(list.sort)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The major difference between `list.sort` and `sorted` is that `list.sort` sorts the elements *in place*. In other words, `a.sort()` changes `a` directly, while `b = sorted(a)` produced a copy of `a`, sorts that copy, and then returns the sorted sequence, which is assigned to `b`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, `NumPy` provides its own `np.sort(a)` function to sort an array `a`. It also accepts three optional arguments: `axis`, `kind`, and `order`. By setting the axis (e.g., 0 for rows and 1 for columns in a 2-D array), the elements are sorted along that axis. By default, the data is sorted along the last axis. \n", "\n", "For example, consider the following:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 3, 2, 1],\n", " [ 2, 4, 6],\n", " [ 1, 5, 10]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.array([[3, 2, 1],[2,4,6],[1, 5, 10]])\n", "A" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 2, 1],\n", " [ 2, 4, 6],\n", " [ 3, 5, 10]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sort(A, axis=0)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 2, 3],\n", " [ 2, 4, 6],\n", " [ 1, 5, 10]])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sort(A, axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For sorting NumPy arrays, the `np.sort` function is most likely the fastest option. \n", "\n", "***\n", "\n", "**Exercise**: Perform a numerical experiment like the one above for selection sort using `np.sort` with the optional argument `kind='mergesort'`. Does its order appear to be $\\mathcal{O}(n\\log n)$?\n", "\n", "***" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further Reading\n", "\n", "None at this time, but *sorting* is a rich topic for which many resources are available. We've seen $\\mathcal{O}(n^2)$ and $\\mathcal{O}(n\\log n)$ sorting. Nothing is better (for general data), but there exist other algorithms (insertion, bubble, heap, and other sorts). There's even one out there that of order $\\mathcal{O}(n!)$. Would you want to use it?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.5" } }, "nbformat": 4, "nbformat_minor": 2 }