Generics

Written by Alex Guyer | guyera@oregonstate.edu

Here's the outline for this lecture:

Setting the stage
Generics
Data structures as generics
Other details

Setting the stage

Suppose you're developing an ecommerce platform where registered vendors can sell products to customers around the world. To separate concerns and keep reviews cleanly organized, each vendor is registered to sell products from a single, select category. For example, one vendor might sell furniture, and another vendor might sell automobiles, but a single vendor may not sell both of these things (if a vendor would like to sell both of these things, they must register two vendor accounts—one per product category).

Somehow, this needs to be represented in code. Suppose that one of the product categories is furniture. Then we need a FurnitureItem class to represent an item of furniture:

furnitureitem.py

class FurnitureItem:
    _name: str # Name of furniture item
    _price: float # Price in dollars
    
    def __init__(self, name: str, price: float) -> None:
        self._name = name
        self._price = price

    def get_price(self) -> float:
        return self._price

    def print(self) -> None:
        print(f'  - Item: {self._name}')
        # Note: {x:.2f} prints the value of x to 2 decimal places
        print(f'    Price: ${self._price:.2f}')

If our software platform has furniture, then surely it also has vendors who are registered to sell furniture. Then we need a FurnitureVendor class to represent these vendors:

furniturevendor.py

from furnitureitem import FurnitureItem

class FurnitureVendor:
    _name: str # The vendor's name
    _profit: float # Total profit made
    _furniture: list[FurnitureItem] # Inventory

    def __init__(self, name: str) -> None:
        self._name = name
        self._profit = 0.0
        self._furniture = []

    def add_to_stock(self, furniture_item: FurnitureItem) -> None:
        self._furniture.append(furniture_item)

    def sell(self, idx: int) -> None:
        # Increase the vendor's total profit based on the price of
        # the sold furniture item
        self._profit += self._furniture[idx].get_price()

        # Remove the sold furniture item
        del self._furniture[idx]

    # Prints the vendor's information to the terminal
    def print(self) -> None:
        print(f'Name: {self._name}')
        # Note: '{x:.2f}' Prints x to 2 decimal places
        print(f'Total profit: ${self._profit:.2f}')
        print(f'Inventory:')
        for item in self._furniture:
            item.print()

Just to get a vision for what's going on, here's a very simple example program:

main.py

from furniturevendor import FurnitureVendor
from furnitureitem import FurnitureItem

def main() -> None:
    john = FurnitureVendor('John')
    john.add_to_stock(FurnitureItem('Couch', 100.00))
    
    print("John's information prior to selling his couch:")
    john.print()
    print()

    john.sell(0)

    print("John's information after selling his couch:")
    john.print()


if __name__ == '__main__':
    main()

Running the above program produces the following output:

(env) $ python main.py 
John's information prior to selling his couch:
Name: John
Total profit: $0.00
Inventory:
  - Item: Couch
    Price: $100.00

John's information after selling his couch:
Name: John
Total profit: $100.00
Inventory:

So far so good, but our platform is meant to be a general purpose ecommerce platform. Currently, furniture is the only kind of product, and furniture vendors are the only kinds of vendors. Let's add one more kind of product and associated vendor: automobiles. First, the Automobile class:

automobile.py

class Automobile:
    _manufacturer: str # Manufacturer that made the car
    _year: int # Year in which the car was built
    _model: str # The name of the car model
    _price: float # Price in dollars
    
    def __init__(
            self,
            manufacturer: str,
            year: int,
            model: str,
            price: float) -> None:
        self._manufacturer = manufacturer
        self._year = year
        self._model = model
        self._price = price

    def get_price(self) -> float:
        return self._price

    def print(self) -> None:
        print(f'  - Item: {self._year} {self._manufacturer} '
            f'{self._model}')
        # Note: {x:.2f} prints the value of x to 2 decimal places
        print(f'    Price: ${self._price:.2f}')

Next, the AutomobileVendor class:

automobilevendor.py

from automobile import Automobile

class AutomobileVendor:
    _name: str # The vendor's name
    _profit: float # Total profit made
    _automobiles: list[Automobile] # Inventory

    def __init__(self, name: str) -> None:
        self._name = name
        self._profit = 0.0
        self._automobiles = []

    def add_to_stock(self, automobile: Automobile) -> None:
        self._automobiles.append(automobile)

    def sell(self, idx: int) -> None:
        # Increase the vendor's total profit based on the price of
        # the sold furniture item
        self._profit += self._automobiles[idx].get_price()

        # Remove the sold furniture item
        del self._automobiles[idx]

    # Prints the vendor's information to the terminal
    def print(self) -> None:
        print(f'Name: {self._name}')
        # Note: '{x:.2f}' Prints x to 2 decimal places
        print(f'Total profit: ${self._profit:.2f}')
        print(f'Inventory:')
        for item in self._automobiles:
            item.print()

Finally, I'll update the demonstration so that you can see these new classes in action:

main.py

from furniturevendor import FurnitureVendor
from furnitureitem import FurnitureItem
from automobilevendor import AutomobileVendor
from automobile import Automobile

def main() -> None:
    #################################
    #### Furniture demonstration ####
    #################################
    
    john = FurnitureVendor('John')
    john.add_to_stock(FurnitureItem('Couch', 100.00))
    
    print("John's information prior to selling his couch:")
    john.print()
    print()

    john.sell(0)

    print("John's information after selling his couch:")
    john.print()

    print()
    print()

    ##################################
    #### Automobile demonstration ####
    ##################################

    samantha = AutomobileVendor('Samantha')
    samantha.add_to_stock(Automobile('Ford', 2001, 'Taurus', 2000.00))

    print("Samantha's information prior to selling her Ford Taurus:")
    samantha.print()
    print()
    
    samantha.sell(0)

    print("Samantha's information after selling her Ford Taurus:")
    samantha.print()

if __name__ == '__main__':
    main()

Running the above program produces the following output:

(env) $ python main.py 
John's information prior to selling his couch:
Name: John
Total profit: $0.00
Inventory:
  - Item: Couch
    Price: $100.00

John's information after selling his couch:
Name: John
Total profit: $100.00
Inventory:


Samantha's information prior to selling her Ford Taurus:
Name: Samantha
Total profit: $0.00
Inventory:
  - Item: 2001 Ford Taurus
    Price: $2000.00

Samantha's information after selling her Ford Taurus:
Name: Samantha
Total profit: $2000.00
Inventory:

The above program works just fine, but if you look carefully at the FurnitureVendor and AutomobileVendor classes, you'll notice something peculiar: they're extremely similar to one another. Besides the fact that some of their variables' names are different from one another (which is immaterial—names are just names), the only difference between these two classes whatsoever is that the FurnitureVendor class has an an attribute that stores a list of FurnitureItem elements (_furniture) whereas the AutomobileVendor class has an attribute that stores a list of Automobile elements (_automobiles). And, of course, this also means that some of the classes' corresponding methods may have different parameter types or return types from one another (e.g., the FurnitureVendor class's add_to_stock() method has a parameter of type FurnitureItem, whereas the AutomobileVendor class's corresponding add_to_stock() method has a parameter of type Automobile). But they use those parameters and return values in the exact same ways.

In fact, if we took the FurnitureVendor class and used Vim's find-and-replace feature to replace all instances of the word FurnitureItem with the word Automobile (and updated the import statements at the top of the file appropriately), the resulting class would essentially be identical to the AutomobileVendor class (again, except for some variable names).

If that sounds like a lot of unnecessary code replication, that's because it is. Imagine how much worse it'd be if our platform supported 100 different categories of products instead of just two.

So let's try to eliminate that code replication. A naive idea might be to use polymorphism. Rather than having a FurnitureVendor class and an AutomobileVendor class, we might have a single Vendor class. And rather than a Vendor object having a list of FurnitureItem objects (_furniture) or a list of Automobile objects (_automobiles), they could just have a list of Item objects (e.g., _inventory). Finally, the FurnitureItem and Automobile classes could inherit from the Item class, which could in turn serve as an abstract interface (e.g., with an abstract print() method).

Let's give this a try, starting with the Item class:

item.py

from abc import ABC, abstractmethod

class Item:
    _price: float # Every item has a price

    def __init__(self, price: float) -> None:
        self._price = price

    def get_price(self) -> float:
        return self._price

    # Every item can be printed to the terminal (but different items
    # have different information that needs to be printed in different
    # formats, so we make this an abstract method, which we'll override
    # in the derived classes)
    @abstractmethod
    def print(self) -> None:
        pass

We want the FurnitureItem and Automobile classes to inherit from the Item class and override the print() method. This will require rewriting these two classes. First, the FurnitureItem class:

furnitureitem.py

from item import Item

class FurnitureItem(Item):
    _name: str # Name of furniture item
    
    def __init__(self, name: str, price: float) -> None:
        super().__init__(price)
        self._name = name

    def print(self) -> None:
        print(f'  - Item: {self._name}')
        # Note: {x:.2f} prints the value of x to 2 decimal places
        print(f'    Price: ${self.get_price():.2f}')

Next, the Automobile class:

automobile.py

from item import Item

class Automobile(Item):
    _manufacturer: str # Manufacturer that made the car
    _year: int # Year in which the car was built
    _model: str # The name of the car model
    
    def __init__(
            self,
            manufacturer: str,
            year: int,
            model: str,
            price: float) -> None:
        super().__init__(price)
        self._manufacturer = manufacturer
        self._year = year
        self._model = model

    def print(self) -> None:
        print(f'  - Item: {self._year} {self._manufacturer} '
            f'{self._model}')
        # Note: {x:.2f} prints the value of x to 2 decimal places
        print(f'    Price: ${self.get_price():.2f}')

Finally, we can fulfill our objective: we can remove the FurnitureVendor and AutomobileVendor classes entirely, replacing them with a single Vendor class. Each Vendor object simply has an inventory of (upcasted) Item objects. Here it is:

vendor.py

from item import Item

class Vendor:
    _name: str # The vendor's name
    _profit: float # Total profit made
    _inventory: list[Item] # Inventory

    def __init__(self, name: str) -> None:
        self._name = name
        self._profit = 0.0
        self._inventory = []

    def add_to_stock(self, item: Item) -> None:
        self._inventory.append(item)

    def sell(self, idx: int) -> None:
        # Increase the vendor's total profit based on the price of
        # the sold item
        self._profit += self._inventory[idx].get_price()

        # Remove the sold item
        del self._inventory[idx]

    # Prints the vendor's information to the terminal
    def print(self) -> None:
        print(f'Name: {self._name}')
        # Note: '{x:.2f}' Prints x to 2 decimal places
        print(f'Total profit: ${self._profit:.2f}')
        print(f'Inventory:')
        for item in self._inventory:
            item.print()

Let's update our demonstration again:

main.py

from vendor import Vendor
from furnitureitem import FurnitureItem
from automobile import Automobile

def main() -> None:
    #################################
    #### Furniture demonstration ####
    #################################
    
    john = Vendor('John')
    john.add_to_stock(FurnitureItem('Couch', 100.00))
    
    print("John's information prior to selling his couch:")
    john.print()
    print()

    john.sell(0)

    print("John's information after selling his couch:")
    john.print()

    print()
    print()

    ##################################
    #### Automobile demonstration ####
    ##################################

    samantha = Vendor('Samantha')
    samantha.add_to_stock(Automobile('Ford', 2001, 'Taurus', 2000.00))

    print("Samantha's information prior to selling her Ford Taurus:")
    samantha.print()
    print()
    
    samantha.sell(0)

    print("Samantha's information after selling her Ford Taurus:")
    samantha.print()

if __name__ == '__main__':
    main()

Running the above program produces the same output as before. But now, we only have a single Vendor class rather than having a separate class for each kind of vendor.

This seems like progress, but there's an issue with how we're representing vendors in our program now. Consider: There's absolutely nothing stopping us from adding an Automobile to john's inventory, nor from adding a FurnitureItem to samantha's inventory. For example, nothing is stopping us from doing this in main():

samantha.add_to_stock(FurnitureItem('Dining Table', 300.00))

That's a problem because, as I said in the very beginning of this lecture, we would like our platform to only support vendors who are registered to sell products from a single, select category. But the way the code is written, there's nothing stopping Samantha or John from selling both furniture and automobiles. That's to say, the above line of code is legal, but it shouldn't be because it represents a bug in the program: Samantha should not be allowed to sell furniture if she also sells automobiles.

Back when we had two separate classes for vendors (FurnitureVendor and AutomobileVendor), there was no problem. Back then, had we tried to add a furniture item to Samantha's stock (or an automobile to John's stock), Mypy would have reported a type error. So we've reduced our total amount of code, but in doing so, we've sacrificed specificity in our representation, opening up our codebase to new kinds of potential mistakes.

Generics

Luckily, there's a "best of both worlds" solution: generics. Generics are a metaprogramming technique, meaning they allow us to write programs that operate on programs (in the particular case of generics, they allow us to write programs that analyze and operate on their own code). Specifically, generics allow us to create classes that have "placeholders" in them (generic classes). Those "placeholders" can even be names of types. Later on, we can instantiate those generic classes by filling in the blanks.

Remember how I said that the only meaningful difference between the FurnitureVendor class and the AutomobileVendor class was in the type of their inventory? Imagine if we could create a single class that has an inventory of some type T, without having to specify the exact type of T up front. In other words, imagine if T could be a sort of placeholder for the name of a type that will be specified later on. Then we could create a single generic class Vendor that has an inventory of type list[T]. It would have a generic method called add_to_stock() that accepts an argument of type T, adding it to the inventory. Finally, when we want to create john within main(), we just specify that, in the case of john, T should be replaced with FurnitureItem (because John sells furniture). Similarly, when we want to create samantha, we just specify that, in the case of samantha, T should be replaced with Automobile (because Samantha sells automobiles).

To do this, we first have to create T, which will serve as a placeholder for the kind of product that a given vendor will sell. We can do this like so:

vendor.py

from item import Item

class Vendor[T]:
    ... # Class body is the same as before; omitted for brevity

In our case, we want T to be used as a placeholder for the type of product sold by a given vendor. To do this, we make the following changes to vendor.py:

vendor.py

from item import Item

class Vendor[T]:
    _name: str # The vendor's name
    _profit: float # Total profit made
    _inventory: list[T] # Inventory

    def __init__(self, name: str) -> None:
        self._name = name
        self._profit = 0.0
        self._inventory = []

    def add_to_stock(self, item: T) -> None:
        self._inventory.append(item)

    def sell(self, idx: int) -> None:
        # Increase the vendor's total profit based on the price of
        # the sold item
        self._profit += self._inventory[idx].get_price()

        # Remove the sold item
        del self._inventory[idx]

    # Prints the vendor's information to the terminal
    def print(self) -> None:
        print(f'Name: {self._name}')
        # Note: '{x:.2f}' Prints x to 2 decimal places
        print(f'Total profit: ${self._profit:.2f}')
        print(f'Inventory:')
        for item in self._inventory:
            item.print()

First, we modify the Vendor class to define a generic type named Tin its class definition. This is new syntax; it means that the Vendor class is now a generic class, and that uses T as a placeholder for the type of one or more objects throughout its definition (again, a generic class is just a class that's allowed to have type placeholders). Finally, we modify the Vendor class to do just that: throughout its definition, whenever we want to refer to the type of product that a vendor might sell, we use T to represent that type. Again: T is a placeholder at this stage. It does not represent a specific type of product that might be sold, but rather a placeholder for a type of product that might be sold.

Since T is a placeholder, we must, at some point, specify what exactly should be filled into that placeholder. We do not do this in the generic Vendor class. The entire point is that the Vendor class is "generic" rather than "specific"; it represents a "generic kind of vendor"—not a "specific kind of vendor", such as a furniture vendor or an automobile vendor.

So, where do we fill in that placeholder? Well, we do it when we need to: when creating vendors. For example, when I create john, I would like to somehow specify that John is a furniture vendor, and when I create samantha, I would like to somehow specify that Samantha is an automobile vendor. That's to say, in case of john, I'd like T to be filled in with the type FurnitureItem, but in the case of samantha, I'd like T to be filled in with the type Automobile. I do this in the main() function like so:

main.py

from vendor import Vendor
from furnitureitem import FurnitureItem
from automobile import Automobile

def main() -> None:
    #################################
    #### Furniture demonstration ####
    #################################
    
    john = Vendor[FurnitureItem]('John')
    john.add_to_stock(FurnitureItem('Couch', 100.00))
    
    print("John's information prior to selling his couch:")
    john.print()
    print()

    john.sell(0)

    print("John's information after selling his couch:")
    john.print()

    print()
    print()

    ##################################
    #### Automobile demonstration ####
    ##################################

    samantha = Vendor[Automobile]('Samantha')
    samantha.add_to_stock(Automobile('Ford', 2001, 'Taurus', 2000.00))

    print("Samantha's information prior to selling her Ford Taurus:")
    samantha.print()
    print()
    
    samantha.sell(0)

    print("Samantha's information after selling her Ford Taurus:")
    samantha.print()

if __name__ == '__main__':
    main()

Indeed, when calling the generic Vendor class's constructor—when creating a specific Vendor object—that's the moment that I specify what kind of products that specific vendor will sell. In other words, that's the moment that I specify what type T should be replaced with. I do this by writing the type that T should be replaced with in square brackets immediately between the class's name (Vendor) and the parentheses in the constructor call. For example, Vendor[FurnitureItem]('John') constructs a vendor named John who sells furniture; Vendor[Automobile]('Samantha') constructs a vendor named Samantha who sells automobiles; Vendor[Television]('Jill') would constructr a vendor named Jill who sells televisions (assuming Television is an existing class that has been imported); and so on.

Technically, the above program runs just fine, but Mypy will report an error at this stage:

(env) $ mypy .
vendor.py:22: error: "T" has no attribute "get_price"  [attr-defined]
vendor.py:34: error: "T" has no attribute "print"  [attr-defined]
Found 2 errors in 1 file (checked 5 source files)

If we take a closer look at the specified lines of code in vendor.py, we might be able to figure out what it's talking about:

vendor.py

...

class Vendor[T]:
    ...

    def sell(self, idx: int) -> None:
        # Increase the vendor's total profit based on the price of
        # the sold item
        self._profit += self._inventory[idx].get_price()

        # Remove the sold item
        del self._inventory[idx]

    ...

    def print(self) -> None:
        print(f'Name: {self._name}')
        # Note: '{x:.2f}' Prints x to 2 decimal places
        print(f'Total profit: ${self._profit:.2f}')
        print(f'Inventory:')
        for item in self._inventory:
            item.print()

Consider: when Mypy is analyzing vendor.py, it knows that _inventory is a list of something, but it doesn't exactly what it's a list of. That's the point: it's of type list[T], and T is itself a placeholder. But that's a bit problematic because Mypy is supposed to verify that our code doesn't contain type errors (or other syntax errors), and it can't do that if it knows nothing about the types that will be used to fill in the placeholder T.

For example, at runtime, if we create john and specify that T should be replaced with FurnitureItem, then the above code will work just fine. The FurnitureItem class defines a print() method, and it inherits the get_price() method from the Item class. But suppose that, at runtime, we create a vendor such as jim = Vendor[int]('Jim'). That is, we create a vendor named Jim who (somehow) sells integers. The problem is that integers do not have a get_price() method nor a print() method (they don't have methods at all). So, when we call jim.sell() or jim.print(), the above highlighted lines of code will fail.

It's Mypy's job to detect these sorts of errors, and that's exactly what it's doing. Since Mypy knows nothing about the types that will be used to fill in the placeholder T, it has no reason to believe that those types will necessarily provide a get_price() nor print() method.

The solution to this problem, then, is to tell Mypy a little bit about the types that will be used to fill in the placeholder T. Of course, we can't tell Mypy exactly what types might be used in place of T—it's supposed to be a generic placeholder. However, that doesn't stop us from saying, for example, that T will only ever be filled in by types that inherit from the Item class. We can do this by making the following change in vendor.py:

from item import Item

class Vendor[T: Item]:
    ... # Class body is the same as before; omitted for brevity

We can restrict placeholder types by defining them the same way we define types elsewhere, with a colon and the type definitions. If provided, then T may only be "filled in" by classes that inherit from that specified base class. So the above change tells Mypy that our program will only ever fill in the placeholder T with classes that inherit from the Item class. The Item class does, indeed, provide a get_price() method as well as an (abstract) print() method. So now, Mypy can be confident that self._inventory[idx].get_price() and item.print() will work just fine, regardless of what type is substituted for T. Mypy now produces no errors, and the program works just fine.

Suppose we try to fill in the placeholder T with a class that does not inherit from the Item class:

badtypeargument.py

from vendor import Vendor

def main() -> None:
    # Jim sells integers??? But 'int' is not a class that inherits from
    # the Item class, so Mypy will report an error.
    jim = Vendor[int]('Jim')

if __name__ == '__main__':
    main()

Then Mypy will detect this mistake and report an error:

(env) $ mypy .
badtypeargument.py:72: error: "T" has no attribute "get_price"  [attr-defined]
badtypeargument.py:84: error: "T" has no attribute "print"  [attr-defined]
Found 2 errors in 1 file (checked 1 source file)

If you actually wanted to support both int and Item types, you would use the standard union syntax and define your type with the following, which would again give mypy errors since int does not implement get_price() or print():

class Vendor[T: Item | int]:
    ... # Class body is the same as before; omitted for brevity

It's sometimes a bit difficult to understand the difference between using generics versus polymorphism. To highlight the difference, recall that, earlier, our polymorphic solution had a problem: there was no way to prevent a single vendor from selling both furniture and automobiles. For example, samantha.add_to_stock(FurnitureItem('Couch', 200.00)) was a perfectly legal line of code (in that it would not result in any errors reported by Mypy nor the interpreter) even though it shouldn't have been legal given that it reflects a bug in the program. Now, with generics, that problem is solved. Suppose we attempt to add furniture to Samantha's inventory:

badusage.py

from vendor import Vendor
from furnitureitem import FurnitureItem
from automobile import Automobile

def main() -> None:
    # Samantha is an automobile vendor
    samantha = Vendor[Automobile]('Samantha')

    # Suppose we accidentally add a couch to her stock
    samantha.add_to_stock(FurnitureItem('Couch', 200.00))

    # And then we try to sell the couch
    print("Samantha's information prior to selling her couch:")
    samantha.print()
    print()
    
    samantha.sell(0)

    print("Samantha's information after selling her Ford Taurus:")
    samantha.print()

if __name__ == '__main__':
    main()

Although the above program technically runs, Mypy detects our mistake and prints an error:

(env) $ mypy .
badusage.py:10: error: Argument 1 to "add_to_stock" of "Vendor" has incompatible type "FurnitureItem"; expected "Automobile"  [arg-type]
Found 1 error in 1 file (checked 5 source files)

Classes are no the only place we can define generics. You can define generic types for a function with similar syntax by defining the placeholder as part of the function definition:

def print_values[T](input: T) -> None:
    print(input)

Data structures as generics

You've actually worked with a generic class before: list. Indeed, basically all data structure types (i.e., collection types), such as list, dict, set, and so on, are all generic classes.

Think about it: when you specify the type of a list parameter in a function, you must write out the element type within square brackets after the word list. For example, if a parameter numbers is meant to accept a list of integers, it might be annotated as numbers: list[int]. If those square brackets look similar to the ones you see in Vendor[Automobile], that's because they are—list is a generic class with a placeholder type, and the int in list[int] is saying "I'd like to fill in the placeholder with the type int".

(Similarly, if you wanted a function to strictly accept furniture vendors as an argument, the parameter might be annotated some_vendor: Vendor[FurnitureItem].)

In fact, data structures / collection types are the classic and most obvious use case of generics. This means that it's very important to understand generics if you ever want to implement your own data structure / collection types (e.g., as you might learn how to do in a data structures course).

Other details

This lecture barely scratched the surface of generics. They support many more advanced features that we don't have time to cover, such as generic classes with more than one type parameter; covariant generic classes (e.g., Sequence); contravariant generic classes (e.g., Callable); and more. I encourage you to research things if you're curious, and feel free to stop by my office hours if you'd like to chat about them.