Modules and Packages

Overview, Objectives, and Key Terms

Python modules offer a unique way to organize your code, especially as it evolves into several functions. Moreover, any of the external packages you’ve already used, like NumPy and Matplotlib, are organized as modules within larger packages. By the end of this lesson, you’ll understand how to organize your create and use your own modules and packages.

Objectives

By the end of this lesson, you should be able to

  • Produce a module of functions
  • Set appropriate paths in order to import modules you create

Key Terms

  • module
  • import

Creating Modules

From nearly the beginning, we’ve imported modules like NumPy and used the functions they provide. In fact, a module is an actual Python type:

In [1]:
import numpy
type(numpy)
Out[1]:
module

Modules are easy to create: they can be as simple as a single Python script. Suppose we write a few functions for analyzing a vector \(\mathbf{e}\) that represents errors (perhaps due to measurement uncertainty or numerical approximation). Some useful error metrics include

  1. mean, absolute error, \(\sum^n_i |e_i|/n\)
  2. root-mean-square error, \(\sqrt{\sum^n_i e_i^2/n}\)
  3. maximum, absolute error, \(\max(|e_i|) \, \forall i \in [1, n]\)

If \(\mathbf{e}\) is a sequential type, e.g., a list, tuple, or np.ndarray, then the following functions represent possible implementations of each metric:

def mean_abs_error(e):
    """Mean, absolute error."""
    v = 0
    for i in range(len(e)) :
        v += abs(e[i])
    return v/len(e)

def rms_error(e) :
    """Root-mean-square error."""
    v = 0
    for i in range(len(e)) :
        v += e[i]**2
    return v**0.5

def max_abs_error(e) :
    """Maximum, absolute error."""
    v = 0
    for i in range(len(e)) :
        if abs(e[i]) > v:
            v = abs(e[i])
    return v

Although we can define these (and any other function) basically anywhere we want in a Python script, it can be helpful to collect these functions in a separate file, say error_metrics.py. Now, if you navigate in the command line to the directory that contains this file, you can list the directory contents and see something similar to the following:

C:\Users\robertsj\Documents\PythonForEngineers>dir
 Volume in drive C has no label.
 Volume Serial Number is CP-1

 Directory of C:\Users\robertsj\Documents\PythonForEngineers

12/02/1942  03:25 PM    <DIR>          .
12/02/1942  03:25 PM    <DIR>          ..
12/02/1942  03:25 PM               572 error_metrics.py
               5 File(s)          3,141 bytes
               3 Dir(s)   1,123,581,321 bytes free

You can also navigate to this directory within Spyder (see the upper right-hand corner). As long as you are executing Python from within the directory containing the file, it should be possible to do the following (in Spyder, IPython, or IDLE):

In [2]:
import error_metrics
dir(error_metrics)
Out[2]:
['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'max_abs_error',
 'mean_abs_error',
 'pi',
 'rms_error']
In [3]:
e = [0.1, -0.2, 0.25, -0.05]
error_metrics.mean_abs_error(e)
Out[3]:
0.15000000000000002
In [4]:
error_metrics.rms_error(e)
Out[4]:
0.33911649915626346
In [5]:
error_metrics.max_abs_error(e)
Out[5]:
0.25

In this example, the module is imported directly, and its name must therefore be used to access the functions it contains, e.g., error_metrics.mean_abs_error. We can use an abbreviation using the same approach we’ve used NumPy:

In [6]:
import error_metrics as em
em.mean_abs_error(e)
Out[6]:
0.15000000000000002

Like other names in Python, we are free to reassign new names for modules.

In [7]:
em2 = em
em2.mean_abs_error(e)
Out[7]:
0.15000000000000002

If we are sure that the names of functions, variables, and other things defined in the module won’t clash with anything, we can simply import everything:

In [8]:
from error_metrics import *
mean_abs_error(e)
Out[8]:
0.15000000000000002

When do things clash? Suppose we add the following line to our error_metrics.py file

pi = 1.618033988749895

and then do

In [9]:
from math import *
from error_metrics import *
print(pi)
1.618033988749895

Well, that’s not right. Although we hope \(\pi\) would never be so carelessly mis-defined (recognize the number we errantly used? ), this sort of conflict makes the “import it all!” approach a potential source of hard-to-diagnose bugs.

Warning: Avoid from some_module import * because it can lead to unintended consequences like redefining names.

If you want just a name (or several names) from a module, you use import statements like the following:

In [10]:
from error_metrics import mean_abs_error, rms_error

Similarly, we can selectively import functions (or attributes like pi) and assign them new names via

In [11]:
from error_metrics import rms_error as re
from error_metrics import pi as not_the_real_pi
print(not_the_real_pi)
1.618033988749895

However, one cannot functions (or attributes) selectively using

In [12]:
import error_metrics.rms_error as rms_error
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-12-f511b4e62f30> in <module>()
----> 1 import error_metrics.rms_error as rms_error

ModuleNotFoundError: No module named 'error_metrics.rms_error'; 'error_metrics' is not a package

It might appear to be valid (we did use import matplotlib.pyplot as plt afterall). However, the basic rule is that anything immediately after import must be a module (like matplotlib) or a “submodule” (like matplotlib.pyplot) and not a name defined in the module unless import follows from.

Note: Use from some_module import some_function and not import some_module.some_function as some_function to import some_function all by itself.

Running Modules as Scripts

Because any Python script can be imported as a module, you’ve already been running modules as scripts! However, any variables defined in the script (e.g., the errant pi we added above) become accessible upon import, and that’s not always desired. Selective imports can help, but sometimes the extra code is for quick function testing. As an alternative, a special if-statement can be added to the bottom of scripts to separate the functions (and other things actually desired in the module) and some demonstration code. Here’s an example:

if __name__ == "__main__" :
    e = [0.1, 0.2, 0.3]
    print mean_abs_error(e)
    print rms_error(e)
    print max_abs_error(e)

Running this leads to

C:\Users\robertsj\Documents\PythonForEngineers>python error_metrics.py
0.2
0.374165738677
0.3

If the same four lines under the if were included in error_metrics.py without the if, then e would be defined (e.g., error_metrics.e would exist) and the print statements would be executed.

Exercise: Go and put the lines of code above into error_metrics.py and see for yourself that they are printed when the file is executed (either via python error_metrics.py or in Spyder). Show also that import error_metrics does not lead to the print statements when the if __name__ line is included.

Basically, the special variable __name__ has the value "__main__" only if the script is executed and not if it is imported. Actually, any (built-in) name in Python that starts and ends with two underscores is usually special and not often needed. Use of __name__ here is one of the exceptions to that rule of thumb.

Using Modules on the Moon

PYTHONPATH

Now that you’ve done all that hard work to create a module, wouldn’t it be nice to be able to use it anywhere you use the Python interpreter or an IDE like Spyder? To do so requires that we tell Python where to find the module by modifying the PYTHONPATH:

C:\Users\robertsj>cd Documents\PythonForEngineers\BackOnEarth

C:\Users\robertsj\Documents\PythonForEngineers\BackOnEarth>set PYTHONPATH=%PYTHONPATH%;C:\Users\robertsj\Documents\PythonForEngineers\OnTheMoon

C:\Users\robertsj\Documents\PythonForEngineers\BackOnEarth>dir
 Volume in drive C has no label.
 Volume Serial Number is CP-1

 Directory of C:\Users\robertsj\Documents\PythonForEngineers\BackOnEarth

12/02/1942  03:25 PM    <DIR>          .
12/02/1942  03:25 PM    <DIR>          ..
               0 File(s)              0 bytes
               2 Dir(s)   1,123,581,321 bytes free


C:\Users\robertsj\Documents\PythonForEngineers\BackOnEarth>dir ..\OnTheMoon
 Volume in drive C has no label.
 Volume Serial Number is CP-1

 Directory of C:\Users\robertsj\Documents\PythonForEngineers\OnTheMoon

12/02/1942  03:25 PM    <DIR>          .
12/02/1942  03:25 PM    <DIR>          ..
12/02/1942  03:25 PM               572 error_metrics.py
               1 File(s)            572 bytes
               2 Dir(s)   1,123,581,321 bytes free

Once that’s done, we can use Python from the same directory while also using the module:

>>> import error_metrics as em
>>> em.rms_error([0.1, -0.2, 0.3])
0.37416573867739417

The PYTHONPATH can also be changed permanently. In Windows, this can be done by accessing the Start > Control Panel > System and Security > System menu. Then, click Advanced System Settings, which brings up a window with an Environment Variables button. By default, PYTHONPATH is not defined, so you’ll need to create it first.

On OS X and Linux machines, PYTHONPATH can be set using export in the terminal (or permanently in the.bashrc file), e.g., export PYTHONPATH=$PYTHONPATH:/home/robertsj/Documents/PythonForEngineers\OnTheMoon.

sys.path

An alternative way to set the path is by modifying sys.path.

Exercise: Go to Python, import sys, and print sys.path. What type is sys.path?

If sys.path is appended to include the location of the module file before the module is imported, then that module will be available in other Python files or within the console.

import sys
sys.path.append('C:\Users\robertsj\Documents\PythonForEngineers\OnTheMoon')
import error_metrics

This approach is quick and easy. It’s major drawback is that it is very specific to the user and her machine.

Packages

Modules are great ways to collect functions and other things in a single Python file. As programs grow even larger, it may be useful to have several files accessible by a single import statement. These collections of modules are known as packages (though “package” is not a Python type as is module).

There are many ways to organize packages, but let’s illustrate with a simple example. Let’s make a new directory called my_package and place our previous module (i.e., error_metrics.py) in it. Now, in that same directory, create a new Python file with the special name __init__.py. For now, it can be an empty file. Its existence is enough to tell Python that my_package can be imported as long as the directory containing it is in the PYTHONPATH. In other words, if my_package were put in C:\Users\robertsj\Documents, then C:\Users\robertsj\Documents would need to be added to the path (not C:\Users\robertsj\Documents\my_package).

Once that’s done, the following is possible from anywhere:

In [13]:
import my_package
dir(my_package)
Out[13]:
['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__']

Note, error_metrics (nor its functions) appear. Python does not by default import all modules within a package. One or more modules can be imported explicitly in the __init__.py file if desired. However, we can get the error_metrics module by importing it explicitly via

In [14]:
import my_package.error_metrics
print(type(my_package))
print(type(my_package.error_metrics))
<class 'module'>
<class 'module'>

Although my_package is nothing more than a directory on our filesystem, it appears as a module in Python, as does my_package.error_metrics. Usually, the modules accessed by the “dot” operation are called submodules but that is not an actual Python type.

With my_module.error_metrics imported, the contents of my_package are now extended:

In [15]:
dir(my_package)
Out[15]:
['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'error_metrics']

Hence, we can now execute, e.g.,

In [16]:
my_package.error_metrics.rms_error([0.1, -0.2, 0.3])
Out[16]:
0.37416573867739417

Further Reading

Python’s documentation is a good resource for those interested in more details on how to create and use modules and packages.