This lecture covers the following contents:
Most of our past lectures have served to teach you new language features that can be used to solve different kinds of computational problems. This lecture isn't so much about the technical details in solving computational problems as it is about the philosophy of software design.
(Yes, this means that this lecture will be less code-heavy and more theory-heavy. Apologies in advance.)
When you set out to write some code to solve a problem, there are often countless different approaches that would all work. Just take our labs and homework assignments, for example. Even in a class with hundreds of students, it's incredibly unlikely that two students would write the exact same code to solve one of these problems.
When considering all these different approaches, how do we choose between them? What makes one approach "better" or "worse" than another? This is a natural question to ask, and software engineers have been thinking about it for decades. But it's a very complicated question with a very complicated answer; there are many properties that distinguish "good code" from "bad code" (so to speak). Sometimes, these properties can even run counter to one another in some sort of tradeoff, and the software engineer has to prioritize between them.
But this isn't a software engineering course, so we don't have time to discuss the many properties of quality software design. Instead, we'll just focus on one of them: maintainability. As you might have guessed, maintainability describes how easy or hard it is to maintain a codebase, such as adding, removing, or changing features.
Consider a video game that has two kinds of monsters: zombies and vampires. These monsters take turns attacking the player. Suppose the video game is wildly successful, so the developer decides to roll out a free update to the game. In this update, the developer would like to introduce a third kind of monster to the game: werewolves. Thinking about maintainability, an interesting question to ask is: how difficult is it for the developer to implement werewolves into the game? Does it require modifying hundreds of lines of code sprawled across tens of source code files? Or does it require modifying just ten lines of code across two source files? The latter case, which seems much more maintainable in this example, would likely be preferred, assuming the benefits from this maintainability outweigh the initial costs of designing and implementing the codebase to be maintainable to begin with (not to skip over YAGNI; it has a place in this conversation as well).
To be clear, maintainability is about much more than the number of lines of code that need to be modified in order to introduce a feature. It's also not the only important property of good software design. You might learn about these things in greater detail if you take some software engineering courses.
There are many tools and techniques that software engineers can use to improve a codebase's maintainability. One such tool is known as encapsulation. If you ask 100 software engineers exactly what "encapsulation" is, you'll get 100 different answers (partly for historical reasons—encapsulation has often been entangled with some other techniques, such as information hiding and message passing, so the definitions of these terms often get jumbled together in messy ways). But the definition I'm going to go with is as follows: encapsluation means co-location, or bundling together, of data with the behavior that operates on that data.
The canonical example of encapsulation is classes and objects (but this certainly isn't the only example). Classes and objects are such a classic example of encapsulation that the term object-oriented programming is sometimes equated to encapsulation (depending on who you ask). As we've discussed, one common definition of objects, especially in the context of object-oriented programming, is a thing that has both state ("data"; e.g., attributes) and behavior (e.g., methods). The object's methods "operate on" (i.e., do things with) the object's attributes. Importantly, this data and behavior—the attributes and methods—are all defined in the same place: inside the class definition. Hence, the data (the attributes) are co-located (bundled) with the behaviors (the methods) that operate on that data. That's encapsulation.
Take our Dog class, for example:
dog.pyclass Dog:
name: str
birth_year: int
def __init__(self, n: str, b: int) -> None:
# n is the dog's name, and b is the dog's birth year.
# Store them in self.name and self.birth_year
self.name = n
self.birth_year = b
def print(self) -> None:
print(f'{self.name} was born in {self.birth_year}')
Every Dog object has two attributes: name and birth_year. Suppose we have a Dog object called spot, and we want to "operate on" spot's attributes. This is what spot's methods are for. For example, when we first create spot, we need to assign values to spot's attributes. We do this with the constructor. And afterwards, we might want to print spot's attributes to the terminal. We do this with spot's print() method (i.e., spot.print()). These methods are defined next to (co-located with) the attributes that they operate on, so this is an example of encapsulation.
We generally prefer encapsulation of data which belongs together. Data which is highly related should be grouped together, and data which is not very similar could potentially be part of a different class. The degree to which related data is encapsulated with each other and separate from unrelated data is called cohesion. We generally prefer code which is higly cohesive.
So, how can encapsulation be used to improve code maintainability? Well, it actually helps in a couple different ways:
Encapsulation can reduce coupling
Encapsulation can enable class invariants
Let's focus on coupling for now. We'll cover class invariants in a moment.
Coupling, like encapsulation, is a bit subjective. However, there is a theme among the various definitions: coupling, in some sense or another, refers to the case where changing one component of code requires changing one or more other components of code in turn. This is a somewhat vague definition, but it's good enough for our use case.
If you ask 100 software engineers, you'll get 100 different definitions. (I once watched an entire podcast about coupling and cohesion consisting of various highly regarded software engineers, including the likes of Kent Beck, Jim Weirich, and Ron Jeffries, and, even by the end of the podcast, they had mostly failed to agree on a unified definition of what these terms mean).
Not all coupling is the same. For one, there are different degrees of coupling. If two components of code are tightly coupled, that means one is highly dependent on the other (and possibly vice-versa as well), so changing one will almost always require changing the other. In contrast, if two components of code are loosely coupled, that means they are only loosely dependent on one another, so changing one may or may not require changing the other.
We generally prefer loose coupling over tight coupling.
Coupling can also be local or pervasive. These are not widely accepted terms (I think I just made them up), but I think they're useful, so I'll use them anyways. Under my definitions, local coupling is when a small, controlled number of adjacent code components are coupled together, whereas pervasive coupling is when a given code component is coupled to countless other code components throughout the entire codebase (and, perhaps more importantly, pervasive coupling tends to get worse over time as the codebase gets more complex, but local coupling is mostly fixed; this is a result of Bertrand Meyer's open-closed principle of object-oriented software design).
In many cases, coupling can make code hard to maintain (e.g., hard to change, hard to add to, hard to remove, etc). For example, suppose we have a class named Person, and every person has a name and an age:
person.pyclass Person:
name: str
age: int
def __init__(self, name: str, age: int) -> None:
self.name = name
self.age = age
Suppose we also have a class named PersonDatabase, and every person database has a list of people (which is initially empty):
persondatabase.pyfrom person import Person
class PersonDatabase:
people: list[Person]
def __init__(self) -> None:
self.people = []
Now suppose that, throughout our program, we often find ourselves needing to add Person objects to one or more databases. For example, suppose we have several lines of code that look something like these:
joe = Person("Joe", 42)
my_database.people.append(joe)
liang = Person("Liang", 23)
my_database.people.append(liang)
# And so on...And perhaps, throughout our program, we occasionally find ourselves needing to retrieve the age of a person with a given name from the database. For example, suppose we have several blocks of code that look something like this:
for p in my_database.people:
if p.name == "Joe":
print(f"Joe's age is {p.age}")
break
And lastly, perhaps, throughout our program, we occasionally find ourselves needing to retrieve the name of a person with a given age. For example, suppose we have several blocks of code that look something like this:
print('People who are 42 years of age:')
for p in my_database.people:
if p.age == 42:
print(f'\t{p.name}')
Perhaps all the above code works, but consider: as this program grows in complexity (and perhaps becomes more successful / popular, garnering more users, etc), the size of the person databases will likely grow as well. In each of the above examples, my_database might have, say, millions of people in it. At a certain point, simply using a for loop to iterate over every single person in the database, looking for all the people with a given name or age, might no longer be a good idea; such for loops can be extremely slow when iterating over a gigantic list.
So, when that day comes, how do we speed up the program? There are many techniques you will learn in your computer science education. In order to implement any optimizations, we'll almost surely have to rewrite most if not all of the above code.
For example, one way of speeding up a searching process is to use a binary search rather than a linear search. We might discuss the binary search algorithm later on in the term, but to give you a very general idea, it's a searching algorithm that searches through a sorted list to find a given value. It's much faster than a naive for loop (i.e., a linear search). However, it requires that the list be sorted ahead of time. To use a binary search to find a person with a certain name, you'd need a list of people sorted (e.g., alphabetically) by their names. And to use a binary search to find the people with a given age, you'd need another separate list of people sorted (e.g., in ascending order) by their ages. To implement this idea, we'd have to rewrite both of the above for loops, replacing them with a binary search implementation. But beyond that, we'd also have to change how we add people to the database: 1) there would now be two lists of people instead of one (one sorted by name, and one sorted by age), so whenever we want to add a person to the database, we'd have to add them to both of these lists; and 2) we'd have to make sure to insert people in their correct places within the respective lists so that the lists remain sorted (as opposed to simply appending them to the list, as we do above). Indeed, this would require rewriting all of the above code.
(There are other solutions as well, such as replacing our list with one or more dictionaries / hash tables, but that would have the exact same problem with regards to coupling.)
But suppose that there are hundreds of places throughout our codebase where we add people to one or more person databases (i.e., lines of code that look something like my_database.people.append(joe)). If we need to change how we add people to databases (e.g., because we now have two lists instead of one, and because those lists need to remain sorted), then we need to make that change everywhere. That could require changing hundreds of lines of code, then. To a novice programmer, that might seem surprising. Indeed, to the untrained eye, my_database.people.append(joe) might look fairly innocuous, but it can actually lead to a maintainability nightmare.
This is all a result of unchecked coupling. The lines of code such as my_database.people.append(joe) are tightly coupled to the representation of people within the PersonDatabase class. Hence, if we want to change that representation (e.g., to replace the unsorted people list with two separate sorted lists), we would also have to change those hundreds of lines of code that are coupled to it. Similarly, all our for loops that search through a PersonDatabase's people list to find people with a certain name or age are also tightly coupled to this representation. So changing the representation would require rewriting all those for loops as well.
In particular, this is an example of tight, pervasive coupling—many components of code sprawled throughout our entire codebase are tightly coupled to the representation of people within the PersonDatabase class. That makes it extremely difficult to change that representation; changing it would require changing the hundreds of other lines of code that are all coupled to it.
You might also notice that the PersonDatabase class is not very well encapsulated. There are countless lines of code sprawled throughout the codebase that directly operate on the people attribute (e.g., my_database.people.append(joe)), and none of those lines of code are co-located (bundled) with the people attribute (i.e., none of those lines of code are in methods of the PersonDatabase class). We'll explore this more in the next section.
I said that encapsulation can help "reduce" coupling, but what does that mean?
First, understand that some coupling is inevitable. Your functions, data, and classes exist to interact with each other. At the very least, coupling occurs between all direct dependencies. A dependency is just a component of code that another component of code depends / relies on. Dependencies are everywhere. Whenever one function A calls another function B, that's a dependency—A depends on B. Whenever a class A defines an attribute of some other class type B, that's a dependency (again, A depends on B). Even within a single function, there are plenty of dependencies. If a line of code prints the value of the variable x, then that print statement depends on x having previously been defined—that's a dependency between two lines of code. This clearly cannot be avoided.
Dependencies inevitably introduce coupling because in order to depend on something, you must interact with it. In particular, code components interact with one another through their interfaces. Definitionally, an interface is simply the part of a code component that other components need to know about in order to interact with it. Take this function, for example:
def foo(a: int, b: int) -> int:
return a + b * (a - b) ** 2.5In order to call the foo function (i.e., in order to interact with it), you need to know its name, its parameter list, its return type, the meanings of its parameters and return value, the types of any exceptions it might throw, and so on. All of these things make up the foo function's interface. This is in contrast to the foo function's implementation. An implementation is simply the parts of a code component that aren't part of its interface. In the case of functions, the implementation is given by the function body (you don't need to know how foo works in order to call it—you only need to know what it does).
Because dependencies interact with each other through their interfaces, changing a code component's interface requires changing how other components (specifically its dependents) interact with it (in contrast, changing a component's implementation generally does not require changing how other components interact with it). Keeping with our example, if I wanted to change the name of foo to bar, I'd have to change how I reference it in each and every foo function call. Or if I wanted to add a third parameter, I'd also have to add a third argument in each and every foo function call. (But if I wanted to change its body—its implementation—I would not need to change how I call it). That's all to say, coupling inherently occurs at interfaces, and dependencies inherently interact with each other through their interfaces. Therefore, dependencies inherently create coupling, and this is inevitable.
However, some coupling is essentially "unnecessary" and can be mitigated with better software design. For example, if a single block of code is copied and pasted many times over, then the coupling between it and its external dependencies is essentially replicated many times over as well. Moving the shared logic into a function and simply calling that function many times over can help reduce much of that unnecessary coupling.
Next, understand that coupling is not inherently evil; it only makes code harder to change. But remember that there are various forms and degrees of coupling. A component of code that's subject to tight, pervasive coupling is going to be much harder to change than a component of code that's subject to loose, local couplng.
With all that in mind, encapsulation can help us "reduce", or "manage" coupling in various ways:
It groups related things together (particularly, data and the behavior operating on that data), which keeps much of the relevant coupling contained in one place. This makes it easier to find all the coupled code that needs to be changed.
By grouping related things together, it makes unnecessary coupling more apparent / easier to notice, which allows us to eliminate or reduce it.
It can be enforced (strictly, in some programming languages) via private attributes and methods, solidifying all the above points.
I think the first point is fairly clear, and we'll discuss the third point in a bit, so let's focus on the second point. I think it'll make more sense with an example, so let's consider our PersonDatabase class. I said that it wasn't very well-encapsulated since it doesn't expose methods to operate on its people attribute. Instead, we simply operate on the people attribute directly from various places throughout our codebase (e.g., my_database.people.append(joe)). That's not encapsulation. Let's rewrite this class in a way that practices encapsulation. We'll start with an extremely naive rewriting of it; we'll take all the operations in our entire codebase that we're currently doing with the people attribute, and we'll move them each into their own method within the PersonDatabase class (yes, this will be painful, but bear with me):
persondatabase.pyclass PersonDatabase:
people: list[Person]
def __init__(self) -> None:
self.people = [] # Initially empty
# Create a person named "Joe", aged 42, and add them to the
# list of people
def add_joe(self) -> None:
joe = Person("Joe", 42)
self.people.append(joe)
# Create a person named "Liang", aged 23, and add them to the
# list of people
def add_liang(self) -> None:
liang = Person("Liang", 23)
self.people.append(liang)
# ... And so on
# Search for joe and print their age
def print_joes_age(self) -> None:
for p in self.people:
if p.name == "Joe":
print(f"Joe's age is {p.age}")
return
# Search for liang and print their age
def print_liangs_age(self) -> None:
for p in self.people:
if p.name == "Liang":
print(f"Liang's age is {p.age}")
return
# ... And so on
# Search for people who are 42 years old and print their names
def print_people_who_are_42(self) -> None:
print('People who are 42 years of age')
for p in self.people:
if p.age == 42:
print(f'\t{p.name}')
Now, suppose we want to create and add Joe, age 42, to my_database. Prevously, we did it like this:
joe = Person('Joe', 42)
my_database.people.append(joe)Practicing encapsulation, we'll now do it like this instead:
my_database.add_joe()Similarly, if we want to add Liang, age 23, to a person database, we can call the add_liang() method. If we want to find and print the age of Joe, we can call the print_joes_age() method. And so on.
Again, I know the above class seems ridiculous. I'll get to that in just a second. For now, the goal is simply to avoid ever operating on (using) the people attribute from anywhere in our codebase that isn't a method of the PersonDatabase class. That is, the point is to practice encapsulation—when we want to operate on the people attribute, we do so from within a co-located (bundled) method of the very same class.
Okay, let's address the elephant in the room: the above class definition is clearly ridiculous. Functions are supposed to be modular and reusable. The add_joe method is clearly not very modular nor reusable. It's only useful for one extremely specific thing: adding a person named Joe, age 42, to a given person database. The add_liang method suffers from the same problem, as do all of the print_XYZ_age methods as well as the print_people_who_are_42 method.
Indeed, these methods are ridiculous because they're not reusable, which requires us to copy and paste their internal logic whenever we want to accomplish a similar goal. But that's exactly my point: encapsulation makes all this repeated code extremely obvious. Again, to the untrained eye, my_database.people.append(joe) and my_database.people.append(liang) might not seem very problematic, even though they are (as we dicussed previously). But even a novice programmer should have no trouble recognizing how silly the print_people_who_are_42 method is.
The natural thing to do from here is refactor the above class implementation, making its methods much more modular. Let's do that:
persondatabase.pyfrom person import Person
class PersonDatabase:
people: list[Person]
def __init__(self) -> None:
self.people = [] # Initially empty
# Add any given person to the database (as opposed to always
# adding one hyperspecific person, such as Joe, age 42)
def add_person(self, p: Person) -> None:
self.people.append(p)
# Print the age of a person with a given name (as opposed to always
# searching for one hyperpsecific person)
def print_age_of_person(self, name: str) -> None:
for p in self.people:
if p.name == name:
print(f"{p.name}'s age is {p.age}")
return
# Search for people whose age matches a given value (as opposed to
# always searching for people who are one hyperspecific age, like
# 42).
def print_people_with_age(self, age: int) -> None:
print(f'People who are {age} years of age')
for p in self.people:
if p.age == age:
print(f'\t{p.name}')
Now, suppose we want to create and add Joe, age 42, to my_database. We could do that like so:
joe = Person('Joe', 42)
my_database.add_person(joe)Similarly, we could find and print the age of someone named Liang via my_database.print_age_of_person('Liang'); we could print the names of every aged 42 via my_database.print_people_with_age(42); and so on.
Now all I have to do is convince you that this program design is better than what we started with, and hopefully you'll have a newfound appreciation of encapsulation. And I think that's pretty straightforward—let's revisit our previous example wherein we suddenly need to reimplement our searching strategy, again replacing the naive for loops with some sort of binary search, replacing the people list with two separate sorted lists, changing how we add people to a person database, and so on. Recall: in order to change how people are represented in PersonDatabase, we have to change all of the code throughout our codebase that's coupled to the people attribute. In our original program design, this would have meant changing hundreds of components of code sprawled throughout or codebase—components of code such as my_database.people.append(joe), and various duplicated for loops that each searches through a database to find a person with a certain name or age.
But now, things are much simpler. Now that everything's well-encapsulated, we can be confident that the only lines of code that are directly coupled to the people attribute must exist within the PersonDatabase class. After all, that's the entire point of encapsulation: behaviors that operate on data should be co-located (bundled) with that data, meaning that all functions that operate on the person attribute should be methods of the PersonDatabase class. And look—there's only four of them. Not hundreds of them. So if we update these four methods, that's sufficient.
To be clear, encapsulation is not an inherent property of good software design. Rather, encapsulation is a tool that, in certain cases, can help manage coupling, and well-managed coupling is, in turn, a property of good software design. We'll revisit this idea at the end of the lecture.
Encapsulation can be helpful toward managing coupling, but what's to stop us (or perhaps our naive coworker, or the newly hired intern) from breaking it? The whole point is that lines of code such as my_database.people.append(joe) can be problematic because they're tightly coupled to the representation of people within the PersonDatabase class (i.e., they're tightly coupled to the people attribute). And if such lines of code are sprawled throughout the entire codebase, then the representation of people suffers from tight, pervasive coupling, making it hard to change that representation should we ever need to (e.g., to support binary search, speeding up queries). Creating modular methods within the PersonDatabase class does not solve the problem—only removing those problematic lines of code will solve the problem. Yes, my_database.add_person(joe) serves as an alternative to my_database.people.append(joe), but how do prevent us (or our coworker, or the intern) from writing my_database.people.append(joe) anyway? That is, how do we enforce encapsulation?
Enter private attributes. A private attribute is an attribute of a class that may only be accessed by methods of that very same class. The alternative to a private attribute is a public attribute, which is an attribute that may be accessed from anywhere in the codebase.
For example, if the people attribute was a private attribute of the PersonDatabase class (rather than a public attribute, as it is now), then it would only be accessible from within the four methods of the PersonDatabase class itself (__init__, add_person, print_age_of_person, and print_people_with_age). Hence, there would be no risk of statements such as my_database.people.append(joe) being sprawled across our entire codebase, breaking the encapsulation. That's to say, private attributes represent data (attributes) that can only be operated on by co-located behaviors (methods of the same class), thereby enforcing encapsulation.
In Python, a private attribute is simply an attribute whose name starts with an underscore (_). Let's update the PersonDatabase class, making the people attribute private by renaming it to _people:
persondatabase.pyfrom person import Person
class PersonDatabase:
people: list[Person]
def __init__(self) -> None:
self.people = [] # Initially empty
# Add any given person to the database (as opposed to always
# adding one hyperspecific person, such as Joe, age 42)
def add_person(self, p: Person) -> None:
self.people.append(p)
# Print the age of a person with a given name (as opposed to always
# searching for one hyperpsecific person)
def print_age_of_person(self, name: str) -> None:
for p in self.people:
if p.name == name:
print(f"{p.name}'s age is {p.age}")
return
# Search for people whose age matches a given value (as opposed to
# always searching for people who are one hyperspecific age, like
# 42).
def print_people_with_age(self, age: int) -> None:
print(f'People who are {age} years of age')
for p in self.people:
if p.age == age:
print(f'\t{p.name}')
Now, I've sort of just lied to you. In Python, there's technically no such thing as a "private attribute". However, it is well understood by the Python community, and indeed stated as guidance by the official style guide for Python code (PEP 8), that all attributes whose names start with an underscore are meant to be treated as private attributes and therefore should not be accessed from anywhere other than a method of the class that defines it (unless you're keen on getting into trouble). For example:
main.pyfrom person import Person
from persondatabase import PersonDatabase
def main() -> None:
my_database = PersonDatabase()
joe = Person('Joe', 42)
# This is technically legal but extremely ill-advised:
my_database._people.append(joe)
# This is what you should do instead:
my_database.add_person(joe)
if __name__ == '__main__':
main()
Heed the comments: you should generally avoid accessing private attributes (attributes whose names start with an underscore) from anywhere other than within a method of the class that defines those attributes. And if you go against this advice and access those attributes anyway, understand that this couples your code to those attributes' representations, and should those representations ever change, your code will break. This is especially important when interacting with code that is not your own (e.g., code from a library, or code written by a coworker). If an attribute is private, then the person who wrote that attribute does not intend for you to access it directly. If you do, and then they change the attribute's representation, breaking your code in the process, that's your fault—not theirs (it also opens you up to accidentally breaking class invariants, which would also be your fault). That's to say, ignore this advice at your own peril.
In many other object-oriented programming languages (and programming languages that support encapsulation by other means), access to private attributes is strictly protected. Indeed, in C++, Java, C#, and countless other examples, attempting to access a private attribute from anywhere other than within a method of the class defining said attribute is considered to be a syntax error. But Python's philosophy with regards to encapsulation is a bit less "strict" (a common expression in the Python community is "we're all consenting adults here", meaning that you can access private attributes if you so choose, but you should know what you're getting yourself into).
Just as a leading underscore in an attribute name indicates that it it's a private attribute, the same goes for methods: a method whose name starts with an underscore is meant to be treated as a private method. The rules for private methods are the same as those for private attributes: they should not be accessed from anywhere other than within other methods of the very same class.
"Accessed", in this case, basically means "called". Indeed, methods can be called by other methods. Hopefully that isn't too surprising given that I've never told you anything that would suggest otherwise. But here's an example just in case:
dog.pyclass Dog:
name: str
birth_year: int
def __init__(self, n: str, b: int) -> None:
# n is the dog's name, and b is the dog's birth year.
# Store them in self.name and self.birth_year
self.name = n
self.birth_year = b
def bark(self) -> None:
print('Bark! Bark!')
def print(self) -> None:
print(f'{self.name} was born in {self.birth_year}')
# Call the bark() method on the calling object. To clarify: the
# calling object is a Dog object (it must be, given that this
# print() method is a method of the Dog class). All Dog objects
# additionally have a bark() method. That's the method that
# we're calling here.
self.bark()
Given the above class, creating a Dog object, say spot, and executing spot.print() would print Spot's name and birth year to the terminal followed by "Bark! Bark!" as a result of the self.bark() call at the bottom of the Dog class's print method.
As I was saying, methods can be made private in the same way that attributes can be made private:
dog.pyclass Dog:
name: str
birth_year: int
def __init__(self, n: str, b: int) -> None:
# n is the dog's name, and b is the dog's birth year.
# Store them in self.name and self.birth_year
self.name = n
self.birth_year = b
def _bark(self) -> None:
print('Bark! Bark!')
def print(self) -> None:
print(f'{self.name} was born in {self.birth_year}')
self._bark()
This indicates that the _bark method should not be called from anywhere other than within another method of the Dog class. For example, it's okay for the Dog class's print method to in turn call the _bark method, and it's okay for, say, the main() function to call spot.print(), but it would not be okay for the main() function to directly call spot._bark().
The purpose of private methods is the same as that of private attributes. It enforces encapsulation, allowing you to change the representations of private methods (e.g., changing their names, parameter types, return types, etc) while being confident that such changes will not break anything outside the class because, outside the class, those methods should not be used directly. It can also helps with the protection of class invariants, as we'll discuss momentarily.
Important: In this course, you should never access a private attribute or method of a class from anywhere other than within a method of the very same class. Even though Python and even Mypy technically allow it, it's ill-advised, and doing so may result in a grade penalty.
Besides reigning in coupling, encapsulation, supported by private attributes and methods, helps with something else as well: it allows us to establish class invariants. To be invariant means to not change. A class invariant, then, is a property of a class—or rather, of an object—that never changes.
Consider a simple video game. The player might have a certain amount of hitpoints (HP; e.g., when it drops to zero, the player loses). Perhaps the player's HP can go up and down—up when they acquire a healing item, and down when they're attacked by an enemy. But one property should remain constant: the player's HP may never exceed some maximum value. For example, perhaps they begin the game with 10 HP, and it can never exceed that starting value. Or maybe there are opportunities for the player's max HP to be increased (e.g., by leveling up), but their HP should still not exceed their max HP at any give point in time.
To implement such a system, you'd surely need two separate variables: one to store the player's HP, and one to store their max HP. But how do you prevent the value of the HP variable from ever exceeding value of the max HP variable? Well, you have to be very careful: whenever you increase the player's HP, make sure to clip it down to the max if it would otherwise exceed that max. Fine, but if the player's HP variable is accessible from everywhere in the entire codebase, you—or your coworker, or the intern—are likely to mess up at some point. Eventually, someone, somewhere will write a line of code that says (for example) player.hp += 1, forgetting to clip it down to player.max_hp after the fact.
But encapsulation, and enforcement thereof, helps prevent such bugs by establishing class invariants. Consider this simple example of a Player class implementation:
player.py# This class should have two class invariants:
# 1. The player's HP should never exceed their max HP
# 2. The player's HP should never be negative
class Player:
_max_hp: int
_hp: int
def __init__(self, max_hp: int) -> None:
# The player should start out with a positive amount of HP.
# This helps preserve the second class invariant above.
if max_hp <= 0:
raise ValueError('max_hp must be positive!')
self._max_hp = max_hp
self._hp = max_hp # The player starts out with maximum HP
# Adjusts the player's health by the specified amount (positive
# amount heals the player, negative amount damages the player)
def adjust_health(self, amount: int) -> None:
self._hp += amount
if self._hp > self._max_hp:
# Preserve the first class invariant
self._hp = self._max_hp
elif self._hp < 0:
# Preserve the second class invariant
self._hp = 0
The above Player class has two class invariants: 1) the player's HP cannot exceed their maximum HP, and 2) the player's HP cannot be negative. And, indeed, looking at the class's methods, it seems that those invariants hold true. When a Player object is first created, the constructor requires that their starting HP (which is also their max HP) be positive. From there, the only way to modify the player's HP is via the adjust_health method, which has some if statements to ensure that their HP is not adjusted to be larger than their max HP nor smaller than 0. This means that you can be fairly confident that the program will never have any bugs that cause the player to have an invalid amount of HP.
Well, that's almost true. There is of course one way to break this class invariant: by directly accessing and modifying a player object's private _hp and / or _max_hp attributes from some other, external part of the codebase. But assuming that doesn't happen (and it shouldn't, because directly accessing private attributes from outside the class's methods is extremely ill-advised), such bugs cannot occur.
Protecting your whole codebase, including parts that will be written by other people, from an entire category of possible bugs is an extremely powerful idea. This is, again, only possible by making use of encapsulation and supporting it with private attributes and methods.
I'd be remiss if I didn't mention and extremely common "trick" when working with encapsulation and private attributes: getters and setters. A getter is simply a method that returns ("gets") the value of a private attribute contained within an object. A setter is simply a method that allows you to modify ("sets") the value of a private attribute contained within an object. In many cases, getters start with the prefix get_, and setters start with the prefix set_. But this is by no means a requirement.
Consider the following example:
circle.pyclass Circle:
_radius: float
def __init__(self, radius: float) -> None:
self._radius = radius
# Getter for the _radius attribute
def get_radius(self) -> float:
return self._radius
# Setter for the _radius attribute
def set_radius(self, value: float) -> None:
self._radius = value
The _radius attribute is private to the Circle class, meaning that we're not supposed to directly access it from anywhere other than within methods of the Circle class itself. But what if we want to be able to access it in such places, perhaps for arbitrary / flexible use cases? Then we can use the Circle class's getters and setters (they're public methods—not private ones—so we can call them from anywhere):
main.pyfrom circle import Circle
def main() -> None:
# We create a circle with radius 5
c = Circle(5.0)
# Later, suppose we want to change the circle's radius. We
# can't access c._radius directly, but we can call c.set_radius()
# to modify it
c.set_radius(10.0)
# Later, suppose we want to get the circle's radius, such as to
# print it. We can't access c._radius directly, but we can call
# c.get_radius() to retrieve its current value
print(c.get_radius()) # Prints 10.0
if __name__ == '__main__':
main()
A common question is, "what's the difference between making an attribute private that's accessible via getters and setters versus simply making it public to begin with?" It's an important question. Indeed, getters and setters can undo some of the benefits of encapsulation (some people even say they "break" encapsulation outright). That shouldn't be surprising; the idea of encapsulation is to only access attributes from within co-located methods, and making the attributes private supports that goal. Meanwhile, getters and setters allow near arbitrary access to an otherwise private attribute from anywhere in the entire codebase—that sounds like the opposite of encapsulation.
However, making an attribute private and accessing it via getters and setters isn't quite the same thing as making it public. For example, suppose that, one day, I decide that I want to modify the Circle class, replacing its _radius attribute with a _diameter attribute. This seems like an innocuous change, but once again, this will break every line of code in the codebase that directly accesses the _radius attribute. If the _radius member were public and accessed from all over the place, that could break hundreds or thousands of lines of code. But in this case, it just breaks three lines of code: one in each of the three Circle class methods. And we can fix those lines of code easily:
circle.pyclass Circle:
_diameter: float
def __init__(self, radius: float) -> None:
self._diameter = radius * 2 # Diameter = 2 * specified radius
# Getter for the radius
def get_radius(self) -> float:
return self._diameter / 2 # Radius = diameter / 2
# Setter for the radius
def set_radius(self, value: float) -> None:
self._diameter = value * 2 # Diameter = 2 * specified radius
Indeed, although the Circle class no longer has an attribute to store its radius, that's not to say that a circle's radius can't be computed from its other attributes, or that its other attributes can't be computed from a given radius (in this case, its _diameter attribute can be converted to or from a radius value). Hence, the get_radius and set_radius methods can be kept around, and we didn't even need to change their headers (their names, parameter lists, return types, etc); we just needed to reimplement their bodies.
In this small example, this means that main.py doesn't need to be updated whatsoever. It still works exactly as written before. Here it is again for your convenience:
main.pyfrom circle import Circle
def main() -> None:
# We create a circle with radius 5
c = Circle(5.0)
# Later, suppose we want to change the circle's radius. We
# can't access c._radius directly, but we can call c.set_radius()
# to modify it
c.set_radius(10.0)
# Later, suppose we want to get the circle's radius, such as to
# print it. We can't access c._radius directly, but we can call
# c.get_radius() to retrieve its current value
print(c.get_radius()) # Prints 10.0
if __name__ == '__main__':
main()
That may seem like a small deal. But in a much larger program, it's a huge deal—only needing to make changes to the Circle class, and being able to leave the entire rest of our codebase intact, is hugely beneficial for maintainability.
This is proof that using private attributes and accessing them via getters and setters can be better than using public attributes when it comes to coupling. However, again, getters and setters do still reduce the benefits of encapsulation because, in general, they provide nearly the same kind of unrestricted access as simply making the attributes public. This can, in some cases, open up the attribute to loose but pervasive coupling with the rest of the codebase. We didn't run into any issues in the above example, but there's a simple reason for that: when we changed the getters and setters, we only changed their implementations—not their interfaces. As we discussed earlier, coupling occurs at interfaces, so changing an implementation while leaving the interface alone typically does not require making any changes to external dependents.
It's not too hard to construct alternative examples where getters and setters can cause major problems, though. As a simple one, suppose that you want to remove an attribute from a class altogether, perhaps because you've decided that you no longer need to store its information anywhere. In such a case, if you have a getter for the attribute, then the getter cannot simply be reimplemented—it'd have to be removed entirely, which clearly constitutes an interface change (information cannot be retrieved, or "gotten", if it does not exist). But removing the getter would break all lines of code that call it, and since getters are (typically) public methods, they might be called from hundreds or thousands of places sprawled throughout the codebase. Indeed, this is a kind of coupling issue that can be caused by getters (setters are often even worse).
(Getters and setters can also break a class's invariants, especially if you're not careful about how you implement them.)
Still, getters and setters can be useful tools; don't treat them as if they're inherently evil. As in our Circle example, they can occasionally offer some useful protection against coupling (when compared to public attributes), even while providing nearly arbitrary access to an object's internal state. Nearly arbitrary access can be useful in some contexts (e.g., when implementing custom data structures; getters and setters are basically unavoidable in such cases since arbitrary access to a data structure's internal contents is necessary by definition).
Given their complicated tradeoffs, exactly when to define getters and setters (and how to design "good classes" in general) is a complicated question, and the answer is largely a matter of opinion, so that discussion is beyond the scope of this course. But if you're curious, I have tons of thoughts on the matter, and I'd be thrilled to have a conversation with you in my office hours. And if you like to research on your own, I recommend starting with: 1) the difference between interface and implementation; 2) "Tell Dont Ask" (e.g., see Martin Fowler's article, which I'm partial to); and 3) SOLID.