Modules and Packages¶
Overview, Objectives, and Key Terms¶
Python modules offer a unique way to organize your code, especially as it evolves into several functions. Moreover, any of the external packages you’ve already used, like NumPy and Matplotlib, are organized as modules within larger packages. By the end of this lesson, you’ll understand how to organize your create and use your own modules and packages.
Objectives¶
By the end of this lesson, you should be able to
- Produce a module of functions
- Set appropriate paths in order to import modules you create
Key Terms¶
module
import
Creating Modules¶
From nearly the beginning, we’ve imported modules like NumPy and used
the functions they provide. In fact, a module
is an actual Python
type:
In [1]:
import numpy
type(numpy)
Out[1]:
module
Modules are easy to create: they can be as simple as a single Python script. Suppose we write a few functions for analyzing a vector \(\mathbf{e}\) that represents errors (perhaps due to measurement uncertainty or numerical approximation). Some useful error metrics include
- mean, absolute error, \(\sum^n_i |e_i|/n\)
- root-mean-square error, \(\sqrt{\sum^n_i e_i^2/n}\)
- maximum, absolute error, \(\max(|e_i|) \, \forall i \in [1, n]\)
If \(\mathbf{e}\) is a sequential type, e.g., a list
,
tuple
, or np.ndarray
, then the following functions represent
possible implementations of each metric:
def mean_abs_error(e):
"""Mean, absolute error."""
v = 0
for i in range(len(e)) :
v += abs(e[i])
return v/len(e)
def rms_error(e) :
"""Root-mean-square error."""
v = 0
for i in range(len(e)) :
v += e[i]**2
return v**0.5
def max_abs_error(e) :
"""Maximum, absolute error."""
v = 0
for i in range(len(e)) :
if abs(e[i]) > v:
v = abs(e[i])
return v
Although we can define these (and any other function) basically anywhere
we want in a Python script, it can be helpful to collect these functions
in a separate file, say error_metrics.py
. Now, if you navigate in
the command line to the directory that contains this file, you can list
the directory contents and see something similar to the following:
C:\Users\robertsj\Documents\PythonForEngineers>dir
Volume in drive C has no label.
Volume Serial Number is CP-1
Directory of C:\Users\robertsj\Documents\PythonForEngineers
12/02/1942 03:25 PM <DIR> .
12/02/1942 03:25 PM <DIR> ..
12/02/1942 03:25 PM 572 error_metrics.py
5 File(s) 3,141 bytes
3 Dir(s) 1,123,581,321 bytes free
You can also navigate to this directory within Spyder (see the upper right-hand corner). As long as you are executing Python from within the directory containing the file, it should be possible to do the following (in Spyder, IPython, or IDLE):
In [2]:
import error_metrics
dir(error_metrics)
Out[2]:
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'max_abs_error',
'mean_abs_error',
'pi',
'rms_error']
In [3]:
e = [0.1, -0.2, 0.25, -0.05]
error_metrics.mean_abs_error(e)
Out[3]:
0.15000000000000002
In [4]:
error_metrics.rms_error(e)
Out[4]:
0.33911649915626346
In [5]:
error_metrics.max_abs_error(e)
Out[5]:
0.25
In this example, the module is imported directly, and its name must
therefore be used to access the functions it contains, e.g.,
error_metrics.mean_abs_error
. We can use an abbreviation using the
same approach we’ve used NumPy:
In [6]:
import error_metrics as em
em.mean_abs_error(e)
Out[6]:
0.15000000000000002
Like other names in Python, we are free to reassign new names for modules.
In [7]:
em2 = em
em2.mean_abs_error(e)
Out[7]:
0.15000000000000002
If we are sure that the names of functions, variables, and other things defined in the module won’t clash with anything, we can simply import everything:
In [8]:
from error_metrics import *
mean_abs_error(e)
Out[8]:
0.15000000000000002
When do things clash? Suppose we add the following line to our
error_metrics.py
file
pi = 1.618033988749895
and then do
In [9]:
from math import *
from error_metrics import *
print(pi)
1.618033988749895
Well, that’s not right. Although we hope \(\pi\) would never be so carelessly mis-defined (recognize the number we errantly used? ), this sort of conflict makes the “import it all!” approach a potential source of hard-to-diagnose bugs.
Warning: Avoidfrom some_module import *
because it can lead to unintended consequences like redefining names.
If you want just a name (or several names) from a module, you use import statements like the following:
In [10]:
from error_metrics import mean_abs_error, rms_error
Similarly, we can selectively import functions (or attributes like
pi
) and assign them new names via
In [11]:
from error_metrics import rms_error as re
from error_metrics import pi as not_the_real_pi
print(not_the_real_pi)
1.618033988749895
However, one cannot functions (or attributes) selectively using
In [12]:
import error_metrics.rms_error as rms_error
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-12-f511b4e62f30> in <module>()
----> 1 import error_metrics.rms_error as rms_error
ModuleNotFoundError: No module named 'error_metrics.rms_error'; 'error_metrics' is not a package
It might appear to be valid (we did use
import matplotlib.pyplot as plt
afterall). However, the basic rule
is that anything immediately after import
must be a module
(like
matplotlib
) or a “submodule” (like matplotlib.pyplot
) and not a
name defined in the module unless import
follows from
.
Note: Usefrom some_module import some_function
and notimport some_module.some_function as some_function
to importsome_function
all by itself.
Running Modules as Scripts¶
Because any Python script can be imported as a module, you’ve already
been running modules as scripts! However, any variables defined in the
script (e.g., the errant pi
we added above) become accessible upon
import, and that’s not always desired. Selective imports can help, but
sometimes the extra code is for quick function testing. As an
alternative, a special if-statement can be added to the bottom of
scripts to separate the functions (and other things actually desired in
the module) and some demonstration code. Here’s an example:
if __name__ == "__main__" :
e = [0.1, 0.2, 0.3]
print mean_abs_error(e)
print rms_error(e)
print max_abs_error(e)
Running this leads to
C:\Users\robertsj\Documents\PythonForEngineers>python error_metrics.py
0.2
0.374165738677
0.3
If the same four lines under the if
were included in
error_metrics.py
without the if
, then e
would be defined
(e.g., error_metrics.e
would exist) and the print statements would
be executed.
Exercise: Go and put the lines of code above intoerror_metrics.py
and see for yourself that they are printed when the file is executed (either viapython error_metrics.py
or in Spyder). Show also thatimport error_metrics
does not lead to the print statements when theif __name__
line is included.
Basically, the special variable __name__
has the value
"__main__"
only if the script is executed and not if it is imported.
Actually, any (built-in) name in Python that starts and ends with two
underscores is usually special and not often needed. Use of __name__
here is one of the exceptions to that rule of thumb.
Using Modules on the Moon¶
PYTHONPATH
¶
Now that you’ve done all that hard work to create a module, wouldn’t it
be nice to be able to use it anywhere you use the Python interpreter or
an IDE like Spyder? To do so requires that we tell Python where to find
the module by modifying the PYTHONPATH
:
C:\Users\robertsj>cd Documents\PythonForEngineers\BackOnEarth
C:\Users\robertsj\Documents\PythonForEngineers\BackOnEarth>set PYTHONPATH=%PYTHONPATH%;C:\Users\robertsj\Documents\PythonForEngineers\OnTheMoon
C:\Users\robertsj\Documents\PythonForEngineers\BackOnEarth>dir
Volume in drive C has no label.
Volume Serial Number is CP-1
Directory of C:\Users\robertsj\Documents\PythonForEngineers\BackOnEarth
12/02/1942 03:25 PM <DIR> .
12/02/1942 03:25 PM <DIR> ..
0 File(s) 0 bytes
2 Dir(s) 1,123,581,321 bytes free
C:\Users\robertsj\Documents\PythonForEngineers\BackOnEarth>dir ..\OnTheMoon
Volume in drive C has no label.
Volume Serial Number is CP-1
Directory of C:\Users\robertsj\Documents\PythonForEngineers\OnTheMoon
12/02/1942 03:25 PM <DIR> .
12/02/1942 03:25 PM <DIR> ..
12/02/1942 03:25 PM 572 error_metrics.py
1 File(s) 572 bytes
2 Dir(s) 1,123,581,321 bytes free
Once that’s done, we can use Python from the same directory while also using the module:
>>> import error_metrics as em
>>> em.rms_error([0.1, -0.2, 0.3])
0.37416573867739417
The PYTHONPATH
can also be changed permanently. In Windows, this can
be done by accessing the Start > Control Panel > System and
Security > System menu. Then, click Advanced System Settings, which
brings up a window with an Environment Variables button. By default,
PYTHONPATH
is not defined, so you’ll need to create it first.
On OS X and Linux machines, PYTHONPATH
can be set using export
in the terminal (or permanently in the.bashrc
file), e.g.,
export PYTHONPATH=$PYTHONPATH:/home/robertsj/Documents/PythonForEngineers\OnTheMoon
.
sys.path
¶
An alternative way to set the path is by modifying sys.path
.
Exercise: Go to Python,import sys
, and printsys.path
. What type issys.path
?
If sys.path
is appended to include the location of the module file
before the module is imported, then that module will be available in
other Python files or within the console.
import sys
sys.path.append('C:\Users\robertsj\Documents\PythonForEngineers\OnTheMoon')
import error_metrics
This approach is quick and easy. It’s major drawback is that it is very specific to the user and her machine.
Packages¶
Modules are great ways to collect functions and other things in a single
Python file. As programs grow even larger, it may be useful to have
several files accessible by a single import statement. These collections
of modules are known as packages (though “package” is not a Python
type as is module
).
There are many ways to organize packages, but let’s illustrate with a
simple example. Let’s make a new directory called my_package
and
place our previous module (i.e., error_metrics.py
) in it. Now, in
that same directory, create a new Python file with the special name
__init__.py
. For now, it can be an empty file. Its existence is
enough to tell Python that my_package
can be imported as long as the
directory containing it is in the PYTHONPATH
. In other words, if
my_package
were put in C:\Users\robertsj\Documents
, then
C:\Users\robertsj\Documents
would need to be added to the path
(not C:\Users\robertsj\Documents\my_package
).
Once that’s done, the following is possible from anywhere:
In [13]:
import my_package
dir(my_package)
Out[13]:
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__path__',
'__spec__']
Note, error_metrics
(nor its functions) appear. Python does not by
default import all modules within a package. One or more modules can be
imported explicitly in the __init__.py
file if desired. However, we
can get the error_metrics
module by importing it explicitly via
In [14]:
import my_package.error_metrics
print(type(my_package))
print(type(my_package.error_metrics))
<class 'module'>
<class 'module'>
Although my_package
is nothing more than a directory on our
filesystem, it appears as a module
in Python, as does
my_package.error_metrics
. Usually, the modules accessed by the “dot”
operation are called submodules but that is not an actual Python type.
With my_module.error_metrics
imported, the contents of
my_package
are now extended:
In [15]:
dir(my_package)
Out[15]:
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__path__',
'__spec__',
'error_metrics']
Hence, we can now execute, e.g.,
In [16]:
my_package.error_metrics.rms_error([0.1, -0.2, 0.3])
Out[16]:
0.37416573867739417
Further Reading¶
Python’s documentation is a good resource for those interested in more details on how to create and use modules and packages.