Python's Basic Data Structures

Written by Alex Guyer | guyera@oregonstate.edu

This lecture will teach you about some of Python's basic built-in collections and data structures (beyond lists). We'll cover the following:

Tuples
Sets
Dictionaries
Type errors with collections

Tuples

A tuple is a group of values. In Python, tuples are immutable and fixed in size, meaning that once a tuple has been created, you cannot modify the values in it, nor can you add or remove values to / from it. This is in contrast to lists, which are both mutable (modifiable) and resizable.

Because tuples are immutable and fixed in size, they're usually only used to store a small number of (e.g., two or three) related values. Lists, in contrast, are often used to store hundreds, thousands, or even millions of values. However, this is not a hard and fast rule. Tuples can technically be as large as you want them to be, and lists can even be explicitly type-casted into tuples (and vice-versa).

There are various ways to create tuples in Python. A simple way is to write out a comma-separated list of values in between a pair of parentheses. The interpreter will automatically convert the entire grouping into a single tuple containing all of the provided values. For example:

tuples.py

def main() -> None:
    my_tuple = ('James', 25, 'Strawberries')

if __name__ == '__main__':
    main()

Once a tuple has been created, you can access the values within it using square brackets and an index, just like you would with a list:

tuples.py

def main() -> None:
    my_tuple = ('James', 25, 'Strawberries')
    print(my_tuple[0]) # Prints James
    print(my_tuple[1]) # Prints 25
    print(my_tuple[2]) # Prints Strawberries

if __name__ == '__main__':
    main()

Running the above program produces the following output:

(env) $ python tuples.py 
James
25
Strawberries

You might have noticed that my_tuple from the above example is heterogeneous, meaning that it contains values of different types (it contains a str value, an int value, and another str value). Indeed, although it's uncommon to find a good use case for a heterogeneous list, heterogeneous tuples are incredibly common. This is because a tuple is usually used to group a few pieces of related data—not to construct an ordered sequence of potentially thousands of different instances of the same kind of thing. For example, a tuple might store a few pieces of information about a person (e.g., their name, age, and favorite fruit, as in the above example), and those pieces of information might not be of the same type. A list, in contrast, might store an ordered sequence of a thousand people, or a thousand people's names, or a thousand people's ages, etc—every element of the list is of the same type.

As with lists, attempting to index a tuple with an out-of-bounds index results in an exception being thrown (specifically an IndexError), causing the program to crash if the exception isn't caught.

Again, although you can access the values within a tuple, you cannot modify the values within a tuple. Attempting to do so also results in an exception being thrown (specifically a TypeError). And, like all other exceptions, this causes your program to crash if the exception isn't caught:

cant_modify_tuple.py

def main() -> None:
    my_tuple = ('James', 25, 'Strawberries')
    print(my_tuple[0]) # Prints James
    print(my_tuple[1]) # Prints 25
    print(my_tuple[2]) # Prints Strawberries

    # Try to change 'James' to 'Jesse'. This isn't allowed. Crashes the
    # program
    my_tuple[0] = 'Jesse' 

if __name__ == '__main__':
    main()

Running the above program produces the following output:

(env) $ python cant_modify_tuple.py 
James
25
Strawberries
Traceback (most recent call last):
  File "/home/alex/instructor/static-content/guyera.github.io/code-samples/pythons-basic-data-structures/cant_modify_tuple.py", line 12, in <module>
    main()
    ~~~~^^
  File "/home/alex/instructor/static-content/guyera.github.io/code-samples/pythons-basic-data-structures/cant_modify_tuple.py", line 9, in main
    my_tuple[0] = 'Jesse'
    ~~~~~~~~^^^
TypeError: 'tuple' object does not support item assignment

A tuple can be unpacked into a set of individual variables. Those variables then refer to the respective values within the tuple. There are various contexts in which a tuple can be unpacked. The simplest is via assignment. Suppose you have a tuple called my_tuple with three elements (values) inside it. Then those three values can be unpacked into three individual variables like so:

my_first_variable, my_second_variable, my_third_variable = my_tuple

Here's a complete example:

unpacking.py

def main() -> None:
    my_tuple = ('James', 25, 'Strawberries')
    print(my_tuple[0]) # Prints James
    print(my_tuple[1]) # Prints 25
    print(my_tuple[2]) # Prints Strawberries

    print('-------------------------')
    print('Unpacking tuple correctly:')
    
    # Unpack the three tuple values into three variables: name, age,
    # and favorite_fruit.
    name, age, favorite_fruit = my_tuple
    print(f'name: {name}') # Prints James
    print(f'age: {age}') # Prints 25
    print(f'favorite_fruit: {favorite_fruit}') # Prints Strawberries

    print('-------------------------')
    print('Unpacking tuple incorrectly:')

    # Tuples are always unpacked left-to-right. my_tuple has three
    # elements: 'James', 25, and 'Strawberries'. So those three
    # values will be unpacked into the three respective variables
    # in that exact order. If you mess up the order of your variables,
    # the wrong values will be unpacked into them:
    favorite_fruit, age, name = my_tuple
    print(f'name: {name}') # Prints Strawberries
    print(f'age: {age}') # Prints 25
    print(f'favorite_fruit: {favorite_fruit}') # Prints James


if __name__ == '__main__':
    main()

Running the above program produces the following output:

(env) $ python unpacking.py 
James
25
Strawberries
-------------------------
Unpacking tuple correctly:
name: James
age: 25
favorite_fruit: Strawberries
-------------------------
Unpacking tuple incorrectly:
name: Strawberries
age: 25
favorite_fruit: James

As demonstrated in the above example, it's easy to mess up when unpacking tuples. The values are always unpacked from the tuple in the order of their indices, and those unpacked values are stored in the respective variables in left-to-right order. Listing the variables in the wrong order, then, can result in those variables storing the wrong values. If you're lucky, it can sometimes introduce type errors, which Mypy is capable of detecting (e.g., if name is already defined as a string variable, but you accidentally list it as the second variable when unpacking my_tuple, that would try to store the value 25 in it, which would change the type of name from str to int—Mypy doesn't allow that). But in some cases, Mypy is not capable of detecting these kinds of mistakes, especially if two or more values in the tuple are of the same type.

Besides unpacking values from a tuple in the wrong order, another common mistake is unpacking the wrong number of values. For example, suppose we have a tuple of three values, but we try to unpack it into just two variables:

bad_unpacking.py

def main() -> None:
    my_tuple = ('James', 25, 'Strawberries')
    print(my_tuple[0]) # Prints James
    print(my_tuple[1]) # Prints 25
    print(my_tuple[2]) # Prints Strawberries

    name, age = my_tuple


if __name__ == '__main__':
    main()
i

Luckily, Mypy is capable of catching the above mistake:

(env) $ mypy bad_unpacking.py 
bad_unpacking.py:7: error: Too many values to unpack (2 expected, 3 provided)  [misc]
Found 1 error in 1 file (checked 1 source file)

If you ignore Mypy's errors and try to run the above program anyways, it throws an exception (specifically a ValueError) and crashes, assuming the exception isn't caught:

(env) $ python bad_unpacking.py 
James
25
Strawberries
Traceback (most recent call last):
  File "/home/alex/instructor/static-content/guyera.github.io/code-samples/pythons-basic-data-structures/bad_unpacking.py", line 11, in <module>
    main()
    ~~~~^^
  File "/home/alex/instructor/static-content/guyera.github.io/code-samples/pythons-basic-data-structures/bad_unpacking.py", line 7, in main
    name, age = my_tuple
    ^^^^^^^^^
ValueError: too many values to unpack (expected 2)

A similar exception is thrown if you try to unpack a tuple into too many variables.

Suppose you want to write a function that accepts a tuple as a parameter or returns a tuple. In that case, you'll need to know how to explicitly type-annotate tuples so that they're accepted by Mypy. The syntax for a tuple type annotation is as follows:

tuple[<type1>, <type2>, <type3>, ..., <typeN>]

Replace each of the the <typeX> instances with the type of the element in the tuple at the corresponding position.

For example, suppose you want to write a function named foo() that has a single parameter x, which is in turn a tuple that consists of a string, followed by an integer, followed by another string. You could define foo() like so:

def foo(x: tuple[str, int, str]):
    # Do something interesting with x...

Here's a more complete example that defines a function named quadratic_formula that returns the two roots of a quadratic equation in a single tuple:

tuple_typing.py

from math import sqrt

def quadratic_formula(
        a: float,
        b: float,
        c: float) -> tuple[float, float]:
    first_root = (-b - sqrt(b**2 - 4*a*c)) / (2 * a)
    second_root = (-b + sqrt(b**2 - 4*a*c)) / (2 * a)
    return (first_root, second_root) # Return the roots as a tuple

def main() -> None:
    # Consider the quadratic equation 4x^2 + 2x - 3 = 0. Under the
    # standard pattern (ax^2 + bx + c), that means:
    # a = 4, b = 2, c = -3.

    # Compute the roots using the quadratic formula:
    roots = quadratic_formula(4, 2, -3)

    # Unpack the tuple
    first_root, second_root = roots

    print('The roots of the equation "4x^2 + 2x - 3 = 0" are:')
    print(f'x1: {first_root}')
    print(f'x2: {second_root}')

if __name__ == '__main__':
    main()

Running the above program produces the following output:

(env) $ python tuple_typing.py 
The roots of the equation "4x^2 + 2x - 3 = 0" are:
x1: -1.1513878188659974
x2: 0.6513878188659973

We can actually clean up the above code in a couple of ways:

When a tuple is used as a return value, you can optionally leave out the parentheses. For example, to return a tuple consisting of the variables a, b, and c, you can simply write return a, b, c as opposed to return (a, b, c). Leaving out the parentheses in such a case is more idiomatic / conventional, so we should do that.
We're currently storing the returned tuple in a variable named roots and then unpacking it into first_root and second_root. But, as you might have surmised, there's no reason that we couldn't directly unpack the return value into first_root and second_root, skipping the need for the roots variable altogether. This is also more idiomatic, so we should do this as well.

Here's the updated code taking into account the above changes:

tuple_typing.py

from math import sqrt

def quadratic_formula(
        a: float,
        b: float,
        c: float) -> tuple[float, float]:
    first_root = (-b - sqrt(b**2 - 4*a*c)) / (2 * a)
    second_root = (-b + sqrt(b**2 - 4*a*c)) / (2 * a)
    return first_root, second_root # Return the roots as a tuple

def main() -> None:
    # Consider the quadratic equation 4x^2 + 2x - 3 = 0. Under the
    # standard pattern (ax^2 + bx + c), that means:
    # a = 4, b = 2, c = -3.

    # Compute the roots using the quadratic formula:
    x1, x2 = quadratic_formula(4, 2, -3)

    print('The roots of the equation "4x^2 + 2x - 3 = 0" are:')
    print(f'x1: {x1}')
    print(f'x2: {x2}')

if __name__ == '__main__':
    main()

(Running it does the same thing as before)

Lastly, it's technically possible to iterate over the elements of a tuple using a for loop in much the same way that you can iterate over the elements of a list:

tuple_iteration.py

def main() -> None:
    my_tuple = ('James', 25, 'Strawberries')
    for elem in my_tuple:
        print(elem)

if __name__ == '__main__':
    main()

Running the above program produces the following output:

James
25
Strawberries

While the above program passes Mypy's type checks and runs just fine, there is something about it that might surprise you: the type of the variable named elem changes throughout the for loop. In the first iteration, it's a string ('James'). In the second iteration, it's an integer (25). In the third iteration, it's a string again ('Strawberries'). I've previously told you that Mypy generally doesn't allow a variable's type to change throughout a program. Although that's generally true, there are some nuances surrounding this rule. For now, you do not need to understand exactly why Mypy allows this in this case. It will likely make more sense later on in the term once we've covered dynamic types and polymorphism.

(For the curious reader: Mypy infers elem's static type to be some sort of union type, such as Union[str, int], which allows its static type to be fixed even as its dynamic type changes. Indeed, Mypy is a static type checker; it forbids a variable's static type from changing, but a variable's dynamic type is allowed to change so long as it remains compatible with its fixed static type).

Sets

A set is a collection of unique values. If you'd like, you can think of a set as being similar to a list, except 1) a given value may only appear at most once within a set (as opposed to lists wherein a given value may appear arbitrarily many times), and 2) the values within a set do not have user-specified positions (e.g., there are no indices, so there is no "element 0", nor "element 1", nor "element 2", etc). Some people discribe sets as "bags of unique data". The "bag of data" description helps illustrate the nonpositional nature of sets.

Since the values within a set do not have indices, you might be wondering how to access a value within a set. Well, you usually don't. There are three operations that are commonly done on sets:

Add a value to the set
Remove a value from the set
Check whether a given value is already present in the set

Notice that "Access the Nth element in the set" is not one of the above operations. Again, sets are nonpositional; there is no "Nth element".

Moreover, while sets themselves are mutable (i.e., they can be changed, such as by adding values to them and removing values from them), the values within a set are immutable. If you want to change a value within a set, you must instead remove the value and add a new one.

Lists can do all of the above operations as well and more. However, sets are particularly efficient at the third operation—checking whether a given value is present in the set. They're much faster at this than a list is, especially if you're dealing with an extremely large collection of values.

Sets have various purposes. They can be used to keep track of all the unique values that are observed throughout a task; they can be used remove duplicates from a list; and so on.

To create a set in Python, simply type out a comma-separated list of values enclosed in curly braces ({}) as opposed to square brackets (which would create a list instead). For example:

sets.py

def main() -> None:
    # Create a set of integers denoting all of the recent leap years
    recent_leap_years = {1996, 2000, 2004, 2008, 2012, 2016, 2020}

    print(recent_leap_years)

if __name__ == '__main__':
    main()

Running the above program produces the following output (or similar):

(env) $ python sets.py 
{2016, 2000, 2020, 2004, 2008, 2012, 1996}

Notice that the values were not printed in the same order that they were specified when the set was created. Again, this is because sets are nonpositional. They are simply "bags of unique data". In fact, there's no guarantee as to what order the above values will be printed in whatsoever.

You can add a value to a set using the .add() method. You can remove an value from a set using the .remove() method. You can check if an value is present in a set in the same way you do with a list—using the in operator. Let's update our example:

sets.py

def main() -> None:
    # Create a set of integers denoting all of the recent leap years
    recent_leap_years = {1996, 2000, 2004, 2008, 2012, 2016, 2020}

    # Add 2024 to the set
    recent_leap_years.add(2024)

    # Remove 1996 from the set
    recent_leap_years.remove(1996)

    # Print the set
    print(recent_leap_years)

    # Ask user for a year:
    chosen_year = int(input('Specify a year later than 1999: '))

    # Check if the user's specified year is in the set
    if (chosen_year in recent_leap_years):
        print('The specified year was a recent leap year')
    else:
        print('The specified year was NOT a recent leap year')

if __name__ == '__main__':
    main()

Here's one example run of the above program:

(env) $ python sets.py 
{2016, 2000, 2020, 2004, 2008, 2024, 2012}
Specify a year later than 1999: 1996
The specified year was NOT a recent leap year

Here's another:

(env) $ python sets.py 
{2016, 2000, 2020, 2004, 2008, 2024, 2012}
Specify a year later than 1999: 2024
The specified year was a recent leap year

Because the values in a set must be unique (i.e., each value can appear at most once), if you try to add a value to a set that's already present in the set, nothing happens:

sets.py

def main() -> None:
    # Create a set of integers denoting all of the recent leap years
    recent_leap_years = {1996, 2000, 2004, 2008, 2012, 2016, 2020}

    # Add 2024 to the set
    recent_leap_years.add(2024)

    # Try to add 2024 to the set again. This does NOTHING.
    recent_leap_years.add(2024)

    # Remove 1996 from the set
    recent_leap_years.remove(1996)

    # Print the set
    print(recent_leap_years)

    # Ask user for a year:
    chosen_year = int(input('Specify a year later than 1999: '))

    # Check if the user's specified year is in the set
    if (chosen_year in recent_leap_years):
        print('The specified year was a recent leap year')
    else:
        print('The specified year was NOT a recent leap year')

if __name__ == '__main__':
    main()

However, if you try to remove a value from a set that's not present in the set, an exception is thrown (specifically a KeyError), which causes the program to crash if it's not caught:

sets.py

def main() -> None:
    # Create a set of integers denoting all of the recent leap years
    recent_leap_years = {1996, 2000, 2004, 2008, 2012, 2016, 2020}

    # Add 2024 to the set
    recent_leap_years.add(2024)

    # Try to add 2024 to the set again. This does NOTHING.
    recent_leap_years.add(2024)

    # Remove 1996 from the set
    recent_leap_years.remove(1996)

    # Try to remove 1996 from the set again. This throws a KeyError,
    # causing the program to crash if it's not caught.
    recent_leap_years.remove(1996)

    # Print the set
    print(recent_leap_years)

    # Ask user for a year:
    chosen_year = int(input('Specify a year later than 1999: '))

    # Check if the user's specified year is in the set
    if (chosen_year in recent_leap_years):
        print('The specified year was a recent leap year')
    else:
        print('The specified year was NOT a recent leap year')

if __name__ == '__main__':
    main()

Running the above program produces the following output:

(env) $ python sets.py 
Traceback (most recent call last):
  File "/home/alex/instructor/static-content/guyera.github.io/code-samples/pythons-basic-data-structures/sets.py", line 31, in <module>
    main()
    ~~~~^^
  File "/home/alex/instructor/static-content/guyera.github.io/code-samples/pythons-basic-data-structures/sets.py", line 16, in main
    recent_leap_years.remove(1996)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
KeyError: 1996

If you want to create an empty set, you unfortunately can't just use an empty pair of curly braces. Well, Python technically allows that, but it confuses Mypy. To avoid issues, use the built-in set() function instead, passing in no arguments. For example:

my_cool_set = set()

You could then proceed to add values to my_cool_set using the .add() method, as per usual.

To type-annotate a set, use the following syntax:

set[<type>]

Replace <type> with the type of value that you want to put in the set. For example, the recent_leap_years set in the previous examples could be type-annotated as set[int] because it's a set that contains integers. This type annotation syntax implies that sets are generally homogeneous. And, indeed, they are. Much like lists, Mypy expects sets to be homogeneous even though Python technically allows them to be heterogeneous. Creating and using heterogeneous sets requires more complicated type annotation syntax, and it's usually ill-advised.

Here's an example program that uses a set type annotation for a function return type:

set_type_annotation.py

# Given a list of names, find the UNIQUE names within that
# list and return them as a set
def get_unique_names(names: list[str]) -> set[str]:
    result = set() # Initially empty
    for name in names:
        # Add each name to the set, or do nothing if it's already in
        # the set
        result.add(name)

    return result
    

def main() -> None:
    # Notice: Mahatma is present in the list twice
    names = ['Mahatma', 'Aditya', 'Mohammad', 'Samantha', 'Richard',
             'Mahatma', 'John']

    unique_names = get_unique_names(names)

    # Mahatma will only be present in this set once
    print(unique_names)


if __name__ == '__main__':
    main()

Running the above program produces the following output (or similar):

(env) $ python set_type_annotation.py 
{'Mohammad', 'Aditya', 'Mahatma', 'Richard', 'John', 'Samantha'}

Python supports type-casting a list into a set and vice-versa. Type-casting a list into a set creates a set containing all of the unique values from the list, and type-casting a set into a list creates a list containing all of the same values as the set. Here's the syntax:

my_cool_list = list(my_set) # Creates a list from my_set
my_cool_set = set(my_list) # Creates a set from my_list

As you might have noticed, this makes the get_unique_names() function in the previous example pretty silly. We could accomplish the same objective with a simple type cast:

set_type_casting.py

def main() -> None:
    # Notice: Mahatma is present in the list twice
    names = ['Mahatma', 'Aditya', 'Mohammad', 'Samantha', 'Richard',
             'Mahatma', 'John']

    unique_names = set(names)

    # Mahatma will only be present in this set once
    print(unique_names)


if __name__ == '__main__':
    main()

The above program does the same thing as the previous example, but notice that it no longer needs a get_unique_names() function.

You can iterate over the values of a set using a for loop, just like you can with a list or a tuple. However, because sets are nonpositional, there's no guarantee as to the order in which the values will be iterated.

Lastly, understand that everything that can be done with a set can also be done with a list, but sets are better at certain things than lists are (particularly at checking whether a given value is present).

Dictionaries

A dictionary is a map from keys to values. There are four common operations done on dictionaries:

Insert a key-value pair into the dictionary
Access the value associated with a given key in the dictionary (either to modify or simply to retrieve it)
Check whether a given key is present in the dictionary
Remove a key-value pair from the dictionary

The purpose of a dictionary is to build an association, or mapping, that makes it easy to perform lookups. For example, suppose your program commonly needs to look up the age of a person given their name. A simple way to accomplish this would be to create a dictionary that maps names to ages. In that case, the names would be the keys, and the ages would be the values.

The syntax for creating a dictionary in Python is as follows:

my_dictionary = {<key1>: <value1>, <key2>: <value2>, ..., <keyN>: <valueN>}

Replace each <keyX> with a key and <valueX> with the value associated with that key.

Here's my previous example written out in code:

dictionaries.py

def main() -> None:
    ages_of_people = {
        'John': 46,
        'Mahatma': 72,
        'Aditya': 34
    }

if __name__ == '__main__':
    main()

To retrieve a value associated with a given key within a dictionary, type the name of the dictionary followed by the given key enclosed in square brackets. Notice that this is the same syntax for indexing a list, except rather than putting an index in between the square brackets, you put a key (a key is actually sometimes referred to as an index for this reason). Let's update our example:

dictionaries.py

def main() -> None:
    ages_of_people = {
        'John': 46,
        'Mahatma': 72,
        'Aditya': 34
    }

    chosen_name = input('Whose age would you like to look up?: ')

    # Retrieve the age of the person with the specified name from the
    # dictionary
    associated_age = ages_of_people[chosen_name]

    # Print their age
    print(f"That person's age is {associated_age}")

if __name__ == '__main__':
    main()

Here's an example run:

(env) $ python dictionaries.py 
Whose age would you like to look up?: John
That person's age is 46

Attempting to look up a key that's not present within the dictionary results in an exception being thrown (specifically a KeyError), causing the program to crash if it's not caught:

(env) $ python dictionaries.py 
Whose age would you like to look up?: Joseph
Traceback (most recent call last):
  File "/home/alex/instructor/static-content/guyera.github.io/code-samples/pythons-basic-data-structures/dictionaries.py", line 18, in <module>
    main()
    ~~~~^^
  File "/home/alex/instructor/static-content/guyera.github.io/code-samples/pythons-basic-data-structures/dictionaries.py", line 12, in main
    associated_age = ages_of_people[chosen_name]
                     ~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'Joseph'

Key-value lookups are case-sensitive, meaning that if the user typed in 'john' (with a lowercase 'j' instead of an uppercase 'J'), the same KeyError would have been raised. This is actually the case with lookups using the in operator as well, including with lists, sets and other built-in collection types.

The syntax for adding a new key-value pair to a dictionary is as follows:

my_dictionary[<key>] = <value>

Replace <key> with the desired key and <value> with the value that you'd like to associate with that key. Notice that this is exactly the same syntax as is used for key-value lookups, except for the fact that the dictionary and its indexing happens on the left side of an assignment operator. This indicates to Python that you're trying to store a key-value pair in the dictionary rather than retrieve a key-value pair from the dictionary. Hence, it does not raise a KeyError.

Let's update our example:

dictionaries.py

(env) $ cat dictionaries.py 
def main() -> None:
    ages_of_people = {
        'John': 46,
        'Mahatma': 72,
        'Aditya': 34
    }

    # Add another person and their age to the dictionary:
    ages_of_people['Mohammad'] = 21

    chosen_name = input('Whose age would you like to look up?: ')

    # Retrieve the age of the person with the specified name from the
    # dictionary
    associated_age = ages_of_people[chosen_name]

    # Print their age
    print(f"That person's age is {associated_age}")

if __name__ == '__main__':
    main()

Here's an example run:

(env) $ python dictionaries.py 
Whose age would you like to look up?: Mohammad
That person's age is 21

The keys in a dictionary are immutable, but the values in a dictionary are not. That's to say, you cannot modify a key once it has been added to a dictionary, but you can modify the value in a dictionary associated with a given key. Funny enough, the syntax for modifying a value associated with a given key is identical to the syntax for adding a new key-value pair to the dictionary. In other words, my_dictionary[<key>] = <value> will either a) change the value associated with the given key (<key>) to the specified value (<value>), or b) create a new key-value pair. If the dictionary already contains a key-value pair with the specified key, it does the former. Otherwise, it does the latter. Here's an updated example:

dictionaries.py

def main() -> None:
    ages_of_people = {
        'John': 46,
        'Mahatma': 72,
        'Aditya': 34
    }

    # Add another person and their age to the dictionary:
    ages_of_people['Mohammad'] = 21

    # Change Mohammad's age to 22 (notice---it's exactly the same
    # syntax as above!)
    ages_of_people['Mohammad'] = 22

    chosen_name = input('Whose age would you like to look up?: ')

    # Retrieve the age of the person with the specified name from the
    # dictionary
    associated_age = ages_of_people[chosen_name]

    # Print their age
    print(f"That person's age is {associated_age}")

if __name__ == '__main__':
    main()

You often need to check whether a key is present in a given dictionary before attempting to lookup the associated value (otherwise, if the key isn't present, attempting to look up the key-value pair results in a KeyError being raised, as we saw a few moments ago). The syntax for checking whether a key is present in a dictionary is identical to the syntax for checking whether a value is present in a set (or list, or tuple)—use the in operator:

dictionaries.py

def main() -> None:
    ages_of_people = {
        'John': 46,
        'Mahatma': 72,
        'Aditya': 34
    }

    # Add another person and their age to the dictionary:
    ages_of_people['Mohammad'] = 21

    # Change Mohammad's age to 22 (notice---it's exactly the same
    # syntax as above!)
    ages_of_people['Mohammad'] = 22

    chosen_name = input('Whose age would you like to look up?: ')

    # When used on a dictionary, the 'in' operator checks whether
    # the given KEY is present
    if chosen_name in ages_of_people:
        associated_age = ages_of_people[chosen_name]

        print(f"That person's age is {associated_age}")
    else:
        print(f"Sorry! I don't know the age of {chosen_name}")

if __name__ == '__main__':
    main()

Specifying an unknown name at runtime no longer results in a KeyError being raised:

(env) $ python dictionaries.py 
Whose age would you like to look up?: Joseph
Sorry! I don't know the age of Joseph

You can create an empty dictionary by simply leaving the curly braces empty (e.g., my_dictionary = {}). You can then proceed to add key-value pairs to it as per usual. (Actually, this syntax is precisely why you can't use an empty pair of curly braces to create an empty set—Mypy gets confused and thinks that you're trying to create an empty dictionary instead.)

The syntax for remove a key-value pair from a dictionary is the same as the syntax for removing a value from a list, except you replace the index with the key:

del my_dictionary[<key>]

Replace <key> with key of the key-value pair that you'd like to remove from the dictionary. For example:

dictionaries.py

def main() -> None:
    ages_of_people = {
        'John': 46,
        'Mahatma': 72,
        'Aditya': 34
    }

    # Add another person and their age to the dictionary:
    ages_of_people['Mohammad'] = 21

    # Change Mohammad's age to 22 (notice---it's exactly the same
    # syntax as above!)
    ages_of_people['Mohammad'] = 22

    # Remove John and his age from the dictionary
    del ages_of_people['John']

    chosen_name = input('Whose age would you like to look up?: ')

    # When used on a dictionary, the 'in' operator checks whether
    # the given KEY is present
    if chosen_name in ages_of_people:
        associated_age = ages_of_people[chosen_name]

        print(f"That person's age is {associated_age}")
    else:
        print(f"Sorry! I don't know the age of {chosen_name}")

if __name__ == '__main__':
    main()

And an example run:

(env) $ python dictionaries.py 
Whose age would you like to look up?: John
Sorry! I don't know the age of John

To type-annotate a dictionary, use the following syntax:

dict[<key type>, <value type>]

Replace <key type> with the type of the keys, and replace <value type> with the type of the values. For example, if we wanted to pass ages_of_people into a function as an argument, the corresponding parameter should be type-annotated as dict[str, int] because the keys are peoples' names (which are strings), and the values are peoples' ages (which are integers).

Lastly, you can iterate through a dictionary using a for loop, but if done in the standard way (e.g., for k in my_dictionary), it only iterates through the keys of the dictionary—not the values (though, of course, you can easily look up the values associated with those keys using the syntax that we've discussed). There are ways of directly iterating over the values of a dictionary, as well as ways of iterating over the key-value pairs in the form of tuples, but we won't discuss them (for the curious reader, look up the .values() and .items() dictionary methods).

Type errors with collections

As with lists, Mypy can get confused about the type of a set or dictionary if you initialize it to be empty and never add any elements to it anywhere in your code (and possibly in some other niche cases). For example, consider the following code:

ambiguous_type_set.py

def main() -> None:
    # Mypy doesn't know what kind of set this is. Is it a set of
    # integers? A set of strings? It has no way of knowing since we
    # never actually put anything in it.
    my_set = set()

if __name__ == '__main__':
    main()

Running the above program through Mypy produces the following output:

(env) $ mypy ambiguous_type_set.py 
ambiguous_type_set.py:5: error: Need type annotation for "my_set" (hint: "my_set: set[<type>] = ...")  [var-annotated]
Found 1 error in 1 file (checked 1 source file)

A similar error occurs when you do the same thing with a dictionary.

Mypy is telling us that it can't infer the type of my_set based on the context, so it needs us to explicitly type-annotate it. As discussed in a previous lecture, we can do this like so:

unambiguous_type_set.py

def main() -> None:
    # If we want this to be a set of integers, but Mypy isn't able
    # to figure that out on its own, we can explicitly annotate its
    # type as set[int] (for example)
    my_set: set[int] = set()

if __name__ == '__main__':
    main()

The above code passes through Mypy with no errors.

Also, as you know, Mypy does not allow a variable's (static) type to change in the middle of a program. For this reason, whenever you create a collection (e.g., a list, a set, a dictionary, etc), you must immediately decide what type of data you plan on storing in that collection, and then you must make sure to never add any other kind of data to it. If my_set is meant to be a set of strings, you cannot add anything other than strings to it. If my_dictionary is meant to map integers to booleans, then all key-value pairs should have an integer key and a boolean value. And so on. Otherwise, Mypy will raise various type errors.