{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Modules and Packages\n", "\n", "## Overview, Objectives, and Key Terms\n", " \n", "Python modules offer a unique way to organize your code, especially as it evolves into several functions. Moreover, any of the external packages you've already used, like NumPy and Matplotlib, are organized as modules within larger *packages*. By the end of this lesson, you'll understand how to organize your create and \n", "use your own modules and packages.\n", "\n", "\n", "### Objectives\n", "\n", "By the end of this lesson, you should be able to\n", "\n", "- Produce a module of functions\n", "- Set appropriate paths in order to import modules you create\n", "\n", "### Key Terms\n", "\n", "- `module`\n", "- `import`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Modules\n", "\n", "From nearly the beginning, we've imported modules like NumPy and used the functions they provide. In fact, a `module`\n", "is an actual Python type:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "module" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy\n", "type(numpy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Modules are easy to create: they can be as simple as a single Python script. Suppose we write a few functions for analyzing a vector $\\mathbf{e}$ that represents errors (perhaps due to measurement uncertainty or\n", "numerical approximation). Some useful error metrics include \n", "\n", " 1. *mean, absolute error*, $\\sum^n_i |e_i|/n$\n", " 2. *root-mean-square error*, $\\sqrt{\\sum^n_i e_i^2/n}$\n", " 3. *maximum, absolute error*, $\\max(|e_i|) \\, \\forall i \\in [1, n]$\n", "\n", "If $\\mathbf{e}$ is a *sequential* type, e.g., a `list`, `tuple`, or `np.ndarray`, then the following functions represent possible implementations of each metric:\n", "\n", "```python\n", "def mean_abs_error(e):\n", " \"\"\"Mean, absolute error.\"\"\"\n", " v = 0\n", " for i in range(len(e)) :\n", " v += abs(e[i])\n", " return v/len(e)\n", " \n", "def rms_error(e) :\n", " \"\"\"Root-mean-square error.\"\"\"\n", " v = 0\n", " for i in range(len(e)) :\n", " v += e[i]**2\n", " return v**0.5\n", " \n", "def max_abs_error(e) :\n", " \"\"\"Maximum, absolute error.\"\"\"\n", " v = 0\n", " for i in range(len(e)) :\n", " if abs(e[i]) > v:\n", " v = abs(e[i])\n", " return v\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although we can define these (and any other function) basically anywhere we want in a Python script, it can be helpful to collect these functions in a separate file, say `error_metrics.py`. Now, if you navigate in the command line to the directory that contains this file, you can list the directory contents and see something similar to the following:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```PowerShell\n", " C:\\Users\\robertsj\\Documents\\PythonForEngineers>dir\n", " Volume in drive C has no label.\n", " Volume Serial Number is CP-1\n", "\n", " Directory of C:\\Users\\robertsj\\Documents\\PythonForEngineers\n", "\n", " 12/02/1942 03:25 PM .\n", " 12/02/1942 03:25 PM ..\n", " 12/02/1942 03:25 PM 572 error_metrics.py\n", " 5 File(s) 3,141 bytes\n", " 3 Dir(s) 1,123,581,321 bytes free\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also navigate to this directory within Spyder (see the upper right-hand corner). As long as you are executing Python *from within the directory containing* the file, it should be possible to do the following (in Spyder, IPython, or IDLE):" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['__builtins__',\n", " '__cached__',\n", " '__doc__',\n", " '__file__',\n", " '__loader__',\n", " '__name__',\n", " '__package__',\n", " '__spec__',\n", " 'max_abs_error',\n", " 'mean_abs_error',\n", " 'pi',\n", " 'rms_error']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import error_metrics\n", "dir(error_metrics)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.15000000000000002" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "e = [0.1, -0.2, 0.25, -0.05]\n", "error_metrics.mean_abs_error(e)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.33911649915626346" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "error_metrics.rms_error(e)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.25" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "error_metrics.max_abs_error(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, the module is imported directly, and its name must therefore be used to access the functions it contains, e.g., `error_metrics.mean_abs_error`. We can use an abbreviation using the same approach we've used NumPy:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.15000000000000002" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import error_metrics as em\n", "em.mean_abs_error(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like other names in Python, we are free to reassign new names for modules." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.15000000000000002" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "em2 = em\n", "em2.mean_abs_error(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we are sure that the names of functions, variables, and other things defined in the module won't clash with anything, we can simply import everything:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.15000000000000002" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from error_metrics import *\n", "mean_abs_error(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When do things clash? Suppose we add the following line to our `error_metrics.py` file\n", "\n", "```python\n", "pi = 1.618033988749895\n", "```\n", "\n", "and then do" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.618033988749895\n" ] } ], "source": [ "from math import *\n", "from error_metrics import *\n", "print(pi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, that's not right. Although we hope $\\pi$ would never be so carelessly mis-defined (recognize the number we errantly used? ), this sort of conflict makes the \"import it all!\" approach a potential source of hard-to-diagnose bugs. \n", "\n", "> **Warning**: Avoid `from some_module import *` because it can lead to unintended consequences like redefining names.\n", "\n", "If you want just a name (or several names) from a module, you use import statements like the following:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from error_metrics import mean_abs_error, rms_error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, we can selectively import functions (or attributes like `pi`) and assign them new names via" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.618033988749895\n" ] } ], "source": [ "from error_metrics import rms_error as re\n", "from error_metrics import pi as not_the_real_pi\n", "print(not_the_real_pi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, one cannot functions (or attributes) selectively using " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "ename": "ModuleNotFoundError", "evalue": "No module named 'error_metrics.rms_error'; 'error_metrics' is not a package", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0merror_metrics\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrms_error\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mrms_error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'error_metrics.rms_error'; 'error_metrics' is not a package" ] } ], "source": [ "import error_metrics.rms_error as rms_error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It might appear to be valid (we did use `import matplotlib.pyplot as plt` afterall). However, the basic rule is that anything immediately after `import` must be a `module` (like `matplotlib`) or a \"submodule\" (like `matplotlib.pyplot`) and not a name defined in the module unless `import` follows `from`.\n", "\n", "> **Note**: Use `from some_module import some_function` and **not** `import some_module.some_function as some_function` to import `some_function` all by itself." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running Modules as Scripts\n", "\n", "\n", "Because any Python script can be imported as a module, you've already\n", "been running modules as scripts! However, any variables defined\n", "in the script (e.g., the errant `pi` we added above) become\n", "accessible upon import, and that's not always desired. Selective \n", "imports can help, but sometimes the extra code is for quick function testing.\n", "As an alternative, a special if-statement can be added to the bottom\n", "of scripts to separate the functions (and other things actually desired\n", "in the module) and some demonstration code. Here's an example:\n", "\n", "```python\n", "\n", " if __name__ == \"__main__\" :\n", " e = [0.1, 0.2, 0.3]\n", " print mean_abs_error(e)\n", " print rms_error(e)\n", " print max_abs_error(e)\n", "```\n", "\n", "\n", "Running this leads to \n", "\n", "```\n", " C:\\Users\\robertsj\\Documents\\PythonForEngineers>python error_metrics.py\n", " 0.2\n", " 0.374165738677\n", " 0.3\n", "```\n", "\n", "If the same four lines under the `if` were included in `error_metrics.py` **without** the `if`, then `e` would be defined (e.g., `error_metrics.e` would exist) and the print statements would be executed.\n", "\n", "> **Exercise**: Go and put the lines of code above into `error_metrics.py` and see for yourself that they are printed when the file is executed (either via `python error_metrics.py` or in Spyder). Show also that `import error_metrics` does not lead to the print statements when the `if __name__` line is included.\n", "\n", "Basically, the special variable `__name__` has the value `\"__main__\"` only if the script is executed and not if\n", "it is imported. Actually, any (built-in) name in Python that starts and ends with two underscores is usually special and not often needed. Use of `__name__` here is one of the exceptions to that rule of thumb." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Using Modules on the Moon\n", "\n", "### `PYTHONPATH`\n", "\n", "Now that you've done all that hard work to create a module, wouldn't \n", "it be nice to be able to use it anywhere you use the Python interpreter\n", "or an IDE like Spyder? To do so requires that we tell Python where to \n", "find the module by modifying the `PYTHONPATH`:\n", "\n", "```\n", " C:\\Users\\robertsj>cd Documents\\PythonForEngineers\\BackOnEarth\n", "\n", " C:\\Users\\robertsj\\Documents\\PythonForEngineers\\BackOnEarth>set PYTHONPATH=%PYTHONPATH%;C:\\Users\\robertsj\\Documents\\PythonForEngineers\\OnTheMoon\n", "\n", " C:\\Users\\robertsj\\Documents\\PythonForEngineers\\BackOnEarth>dir\n", " Volume in drive C has no label.\n", " Volume Serial Number is CP-1\n", "\n", " Directory of C:\\Users\\robertsj\\Documents\\PythonForEngineers\\BackOnEarth\n", "\n", " 12/02/1942 03:25 PM .\n", " 12/02/1942 03:25 PM ..\n", " 0 File(s) 0 bytes\n", " 2 Dir(s) 1,123,581,321 bytes free\n", "\n", "\n", " C:\\Users\\robertsj\\Documents\\PythonForEngineers\\BackOnEarth>dir ..\\OnTheMoon\n", " Volume in drive C has no label.\n", " Volume Serial Number is CP-1\n", "\n", " Directory of C:\\Users\\robertsj\\Documents\\PythonForEngineers\\OnTheMoon\n", "\n", " 12/02/1942 03:25 PM .\n", " 12/02/1942 03:25 PM ..\n", " 12/02/1942 03:25 PM 572 error_metrics.py\n", " 1 File(s) 572 bytes\n", " 2 Dir(s) 1,123,581,321 bytes free\n", "```\n", "\n", "Once that's done, we can use Python from the same directory \n", "while also using the module:\n", "\n", "```python\n", " >>> import error_metrics as em\n", " >>> em.rms_error([0.1, -0.2, 0.3])\n", " 0.37416573867739417\n", "```\n", "\n", "The `PYTHONPATH` can also be changed permanently. In Windows,\n", "this can be done by accessing the *Start* > *Control Panel* >\n", "*System and Security* > *System* menu. Then, click *Advanced System Settings*,\n", "which brings up a window with an *Environment Variables* button.\n", "By default, `PYTHONPATH` is not defined, so you'll need to \n", "create it first. \n", "\n", "On OS X and Linux machines, `PYTHONPATH` can be set using `export` in the terminal (or permanently in the`.bashrc` file), e.g., `export PYTHONPATH=$PYTHONPATH:/home/robertsj/Documents/PythonForEngineers\\OnTheMoon`." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### `sys.path`\n", "\n", "An alternative way to set the path is by modifying `sys.path`.\n", "\n", "> **Exercise**: Go to Python, `import sys`, and print `sys.path`. What type is `sys.path`? \n", "\n", "If `sys.path` is appended to include the location of the module file **before** the module is imported, then that module will be available in other Python files or within the console. \n", "\n", "```python\n", "import sys\n", "sys.path.append('C:\\Users\\robertsj\\Documents\\PythonForEngineers\\OnTheMoon')\n", "import error_metrics\n", "```\n", "\n", "This approach is quick and easy. It's major drawback is that it is very specific to the user and her machine." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Packages\n", "\n", "\n", "Modules are great ways to collect functions and other things in a single Python file. As programs grow even larger, it may be useful to have several files accessible by a single import statement. These collections of modules are known as *packages* (though \"package\" is not a Python type as is `module`).\n", "\n", "There are many ways to organize packages, but let's illustrate with a simple example. Let's make a new directory called `my_package` and place our previous module (i.e., `error_metrics.py`) in it. Now, in that same directory, \n", "create a new Python file with the special name `__init__.py`. For now, it can be an empty file. Its existence is enough to tell Python that `my_package` can be imported as long as the directory containing it is in the `PYTHONPATH`. In other words, if `my_package` were put in `C:\\Users\\robertsj\\Documents`, then `C:\\Users\\robertsj\\Documents` would need to be added to the path (*not* `C:\\Users\\robertsj\\Documents\\my_package`).\n", "\n", "\n", "Once that's done, the following is possible from anywhere:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['__builtins__',\n", " '__cached__',\n", " '__doc__',\n", " '__file__',\n", " '__loader__',\n", " '__name__',\n", " '__package__',\n", " '__path__',\n", " '__spec__']" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import my_package\n", "dir(my_package)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note, `error_metrics` (nor its functions) appear. Python does not by default import all modules within a package. One or more modules can be imported explicitly in the `__init__.py` file if desired. However, we *can* get the `error_metrics` module by importing it explicitly via" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n" ] } ], "source": [ "import my_package.error_metrics\n", "print(type(my_package))\n", "print(type(my_package.error_metrics))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although `my_package` is nothing more than a directory on our filesystem, it appears as a `module` in Python, as does `my_package.error_metrics`. Usually, the modules accessed by the \"dot\" operation are called *submodules* but that is not an actual Python type.\n", "\n", "With `my_module.error_metrics` imported, the contents of `my_package` are now extended:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['__builtins__',\n", " '__cached__',\n", " '__doc__',\n", " '__file__',\n", " '__loader__',\n", " '__name__',\n", " '__package__',\n", " '__path__',\n", " '__spec__',\n", " 'error_metrics']" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dir(my_package)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hence, we can now execute, e.g.," ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.37416573867739417" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_package.error_metrics.rms_error([0.1, -0.2, 0.3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further Reading\n", "\n", "Python's [documentation](https://docs.python.org/3/tutorial/modules.html) is a good resource for those interested in more details on how to create and use modules and packages." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }