Modules and Packages

Written by Alex Guyer | guyera@oregonstate.edu

This lecture is about modules and packages. We'll cover the following:

Defining custom modules
Defining custom packages
__pycache__

Defining custom modules

So far, all of the programs that we've written have been written in one gigantic Python file. As you start writing larger, more complex programs, it becomes important to organize your code into several smaller files rather than one giant file. Every programming language worth its salt provides some way of doing this. In Python, it's done through packages and modules. We'll start with modules.

For the most part, every Python file (i.e., any file ending in .py) is a Python module. Python modules (and packages) can be imported into other Python modules via the import keyword. When one module A imports another module B, A gets access to the various things (functions, classes, module-scope variables, module-scope constants, etc) that are defined within B. You've actually already done this several times to import modules and packages that are provided by the Python standard library (e.g., import math, and from typing import TextIO). But yes, the import keyword can be used to import your own custom modules and packages as well.

Wihin a Python module (i.e., within a file ending in .py), you can import another Python module via the following syntax:

import <name of other module>

Replace <name of other module> with the name of the other Python module that you'd like to import. However, when writing out the name of the other module, leave out the .py extension (for example, it should be import math, not import math.py).

When Mypy or the interpreter encounters an import statement, it searches for the specified module. If it (Mypy or the interpreter) fails to locate the specified module, errors will ensue.

When Mypy or the interpreter is looking for a module to be imported, it will look in various directories. These directories are listed together in a special (somewhat configurable) directory list known as the module search path. By default, the first directory in the module search path, and therefore the first directory in which Python will look for imported modules, is the directory containing the Python script that's currently being executed. If the imported module cannot be found in that directory, then Mypy or the interpreter will move on to check other directories in the module search path, including system-level and environment-level directories (e.g., directories containing the Python standard library, such as the math package, the typing package, and so on).

For example, suppose you have a Python module (file) named main.py that looks like this:

main.py

import hello

def main() -> None:
    # TODO Write some interesting code here...

if __name__ == '__main__':
    main()

When you run the above program through Mypy, or when you execute the above program through the Python interpreter, Mypy or the interpreter will encounter the import hello statement on line 1 and immediately look for a file named hello.py within the same directory that contains main.py (the above program itself). If there is no file named hello.py within that directory, it will move on and check other system-level directories (e.g., in case the hello module or package is provided by a system-installed / environment-installed library).

Moving on. Suppose hello.py is, indeed, present in the same directory as the one containing main.py, and suppose that hello.py looks like this:

hello.py

class Dog:
    name: str
    birth_year: int

def print_dog(dog: Dog) -> None:
    print(f'{dog.name} was born in {dog.birth_year}')

The hello.py module above defines a class, Dog, and a function, print_dog(). Suppose we want to use these things within some other Python module, such as main.py. First, we of course have to import the hello.py module via import hello, as I showed you just a moment ago. Once you've done that, you can access the things that are defined within the hello module via the dot operator, similar to how you can access attributes within a class instance via the dot operator. For example, within main.py, we could access the Dog class via hello.Dog, and we could access the print_dog function via hello.print_dog:

main.py

import hello

def main() -> None:
    # Rather than my_dog = Dog(), which is how we'd create a Dog
    # instance if the Dog class was defined within this module,
    # we instead use my_dog = hello.Dog() because the Dog class is
    # defined in the hello module (hello.py), imported above.
    spot = hello.Dog()

    # You can use the spot variable like normal
    spot.name = 'Spot'
    spot.birth_year = 2022
    
    # We have a neat function named print_dog() that prints a given
    # Dog instance, but it's defined in the hello module. So, to
    # access it, we have to write 'hello.print_dog' instead of
    # simply 'print_dog'
    hello.print_dog(spot)

if __name__ == '__main__':
    main()

To run the above program, simply type python main.py into the terminal. Yes, there are multiple Python modules involved now, but main.py is the one that contains the main() function and the code to execute it (i.e., the if statement at the very bottom of main.py), so it's the one that we should execute if our goal is to run the program. You can, technically, run python hello.py, but it wouldn't do anything interesting; it would define a class and a function, but it wouldn't actually use those things in any way. In contrast, main.py defines the main() function and calls it.

Here's an example run:

(env) $ python main.py
Spot was born in 2022

You might also be wondering how to run the above program through Mypy. If you just execute mypy main.py, that will only tell Mypy to analyze main.py—it will largely ignore hello.py. You don't want that. In our case, the simplest way to tell Mypy to analyze every file in the entire program is to give it a directory rather than a specific file. When Mypy is given a directory, it will analyze every file in that directory, as well as every file within every directory within that directory, and so on.

Suppose that main.py and hello.py are both in your working directory. Then, to get Mypy to analyze both of these files, simply specify your working directory as the argument to the mypy shell command. Now would be a good time to remind you that . (a single period) is an alias for a given directory (similar to how .. is an alias for a given directory's parent), so when used as a relative path, it refers to the working directory.

That's all to say, you can tell Mypy to analyze every file in your working direcory (and recursively) like so:

mypy .

And here's what it looks like if we do that right now:

(env) $ mypy .
Success: no issues found in 2 source files

Notice that it says 2 source files, indicating that it analyzed both main.py and hello.py.

Now, the name hello.py is not very accurate. Given that I've put a Dog class and a print_dog function in it, I probably should rename it to something like dog.py. Let's do that, and let's update our main.py file accordingly:

main.py

import dog

def main() -> None:
    # Rather than my_dog = Dog(), which is how we'd create a Dog
    # instance if the Dog class was defined within this module,
    # we instead use my_dog = dog.Dog() because the Dog class is
    # defined in the dog module (dog.py), imported above.
    spot = dog.Dog()

    # You can use the spot variable like normal
    spot.name = 'Spot'
    spot.birth_year = 2022
    
    # We have a neat function named print_dog() that prints a given
    # Dog instance, but it's defined in the dog module. So, to
    # access it, we have to write 'dog.print_dog' instead of
    # simply 'print_dog'
    dog.print_dog(spot)

if __name__ == '__main__':
    main()

Line 8 in particular might seem a bit confusing, but it's correct. We're instantiating the class named Dog, which is defined in a module named dog, which is imported on line 1 (and the module is named dog because it's provided by dog.py). That's a lot of dogs! (And that's why I initially named it hello.py instead of dog.py—to avoid confusing you with the word "dog" being used in so many different places for different reasons. But you'll have to get used to it!)

As you might recall, it's possible to import individual things from a module rather than importing the entire module itself. To do this, use the from <module> import <thing> syntax, where <module> is the name of the module from which you want to import something, and <thing> is the thing that you want to import. For example:

from dog import Dog

Or:

from dog import print_dog

You can also import multiple things from a single module at once by writing all of them out in a comma-separated list:

from dog import Dog, print_dog

And if you need to import so many things that you can't write them all in one line of code without violating the style guidelines, you can separate each imported thing into its own line, provided that each line (except for the last one) ends in a backslash:

from dog import Dog,\
    print_dog

Whenever something is imported using the from <module> import <thing> syntax, you can use the imported thing without prefixing it with the name of the module and the dot operator. For example:

main.py

from dog import Dog, print_dog

def main() -> None:
    # Notice: It's now just Dog(), as opposed to dog.Dog(), because of
    # the way that we imported it
    spot = Dog()

    spot.name = 'Spot'
    spot.birth_year = 2022
    
    # Notice: It's now just print_dog, as opposed to dog.print_dog,
    # because of the way that we imported it
    print_dog(spot)

if __name__ == '__main__':
    main()

Lastly, as you import things (or entire modules), you can essentially "rename" them as you do so. You do this using the as keyword. Consider the following example:

import dog as d
from dog import print_dog as pd

Line 1 in the above example code imports the entire dog module (dog.py), but as it imports it, it essentially renames it to d. From that point on, to access something within the dog module, you would write d instead of dog (e.g., d.Dog, as opposed to dog.Dog). Line 2 in the above example code imports the print_dog function from the dog module, but as it imports it, it essentially renames it to pd. From that point on, to use the print_dog function, you would write pd instead of print_dog (e.g., pd(spot), as opposed to print_dog(spot)).

To be clear, the imported thing is only "renamed" in the context of the module that's doing the importing. If main.py has a line that says import dog as d, but some other file coolfile.py has a line that says import dog, then in the context of main.py, d should be used to access things defined in dog.py, but in the context of coolfile.py, dog should be used to access things defined in dog.py.

(Some very popular Python libraries actually have conventions about how you should import them. For example, the numpy library is almost always imported via import numpy as np, renaming it to "np" as it's imported. Similarly, the pandas library is almost always imported via import pandas as pd)

Defining custom packages

You now know how to separate your code into multiple files, but what if you want to separate it into multiple directories? The simplest way to do this is via Python packages.

The Python import system is somewhat complicated, and this isn't meant to be a Python course (Python is just the tool through which we're learning about the fundamentals of computer science), so I'll try to keep it simple. For our purposes, a Python package is basically just a directory that contains one or more importable Python modules, along with a special file named __init__.py. This file, __init__.py (and yes, it must have that exact name), can even be completely empty if you want it to be. Its mere presence tells Mypy and the Python interpreter that the directory is not merely a directory, but rather a package, and therefore the Python modules contained within it can be imported into other Python modules that are not within that same directory.

For example, suppose we want to update our program and introduce a Cat class and a print_cat() function. We might put those definitions in cat.py (the exact contents of this file are irrelevant for this demonstration; just assume that it defines Cat and print_cat(), similar to how dog.py defines Dog and print_dog()). Since dogs and cats are both animals, it might make sense to put dog.py and cat.py together in, say, an animals/ directory. Our updated directory structure will look like this:

animals/
    dog.py
    cat.py
main.py

But, as I implied a moment ago, in order for main.py to still be able to import the dog and cat modules despite the fact that they're nested in a separate animals/ directory, the animals/ directory must be converted into a Python package (in general, Python modules cannot be imported from regular directories... unless you jump through some hoops to make use of so-called "implicit namespace packages", but those are beyond the scope of this course).

To tell Mypy and the Python interpreter that animals/ should be treated specifically as a package rather than a regular directory, we must provide a file named __init__.py within the animals/ directory. For basic use cases, __init__.py can be left empty, and its entire purpose is to mark the directory as a package (but it must exist nonetheless). Our updated directory structure looks like this:

animals/
    __init__.py
    dog.py
    cat.py
main.py

We can now import the dog and cat modules within main.py again. However, the syntax is slightly different. To import the dog module, rather than simply typing import dog as we did before, we now have to type import animals.dog. To clarify, this imports the module named dog (i.e., dog.py), which can be found in the package named animals (i.e., the animals/ directory). Similarly, rather than writing from dog import Dog, we would now have to write from animals.dog import Dog. If we wanted to import the Cat class from animals/cat.py, we'd have to write from animals.cat import Cat. And so on.

Let's update main.py accordingly:

main.py

from animals.dog import Dog, print_dog

def main() -> None:
    # Notice: It's now just Dog(), as opposed to dog.Dog(), because of
    # the way that we imported it
    spot = Dog()

    spot.name = 'Spot'
    spot.birth_year = 2022

    # Notice: It's now just print_dog, as opposed to dog.print_dog,
    # because of the way that we imported it
    print_dog(spot)

if __name__ == '__main__':
    main()

Running python main.py does the same thing as before.

It's also possible to import an entire module that's nested within a package. But from that point on, in order to use the module, you must write out its fully qualified name. For example:

import animals.dog # Import the entire dog module from the animals package

# And then later, you can do this. Notice: it's animals.dog.Dog instead of
# simply dog.Dog
spot = animals.dog.Dog()

Suppose you want to import the Dog class and the Cat class all at once in a single line of code. As it stands, there's no way to do that—the Dog class is provided by animals.dog, and the Cat class is provided by animals.cat. Yes, they're in the same package, but separate modules, so they must be imported separately:

from animals.dog import Dog
from animals.cat import Cat

However, there is technically a way to import them both in a single line of code: you can import them directly from the package rather than importing them from their respective modules. Although this is possible, it'd require more configuration. Currently, our __init__.py file is empty, but it doesn't have to be. The __init__.py file can be used to define package-level importables. That's to say, anything that is defined within a package's __init__.py file can then be imported from the package directly. For example, if main.py had a line of code written as from animals import xyz, that would import xyz directly from animals/__init__.py.

Moreover, anything that's imported within a package's __init__.py file can then be imported from that package within another module. For example, it's possible to have animals/__init__.py import the Dog and Cat classes from animals.dog and animals.cat (respectively), and then have main.py import these classes directly from the animals package (e.g., from animals import Dog, Cat).

However, getting this to work would require discussing several more details about Python's package import system, and that's really not the point of this class, so we'll stop here. If you're curious, I encourage you to research the purpose of __init__.py, including the __all__ symbol.

pycache

Once your code is organized into several modules and packages, you might notice a __pycache__ directory automatically appear inside your project directory and contained packages. This is normal. The __pycache__ directory is autogenerated by the Python interpreter and contains cached bytecode for compiled Python modules and packages (indeed, CPython compiles Python code into bytecode before interpreting said bytecode—it's a sort of hybrid between a compiled and an interpreted language implementation).

In general, you can ignore the __pycache__ directory. In practice, it's best to avoid committing it to version control, meaning you shouldn't stage it via git add to be included in subsequent commits. If you'd like, you can create a .gitignore file and add the __pycache__ directory to it. But that's beyond the scope of this course, and if you accidentally commit a __pycache__ directory in a lab or homework assignment, that's okay—you won't be penalized for it.