Top-down Design

Written by Alex Guyer | guyera@oregonstate.edu

Here's the outline for this lecture:

Top-down design
Example

Top-down design

As you begin to engineer larger and more complex software products, it eventually becomes impractical to simply jump into the codebase and start writing code. At a certain point, you have to start spending some time thinking about the code components that you're going to create and how they're going to interface with one another before you actually implement them. This process of designing a codebase prior to implementation is referred to as software design.

The practice of software design is its own field of study. We don't have time to cover it in detail. This lecture will focus on some of the basics. You may learn about some more formal software design principles if you take a software engineering course.

The goal of software design is to produce a model, or representation, of the codebase up front without having to go through all the effort of actually writing the code. This model might be written in the form of a design document, for example, perhaps shared amongst and updated by the members of the software team.

Software design models (e.g., design documents) are useful because analyzing them often reveals large, high-level mistakes. These mistakes are much easier to fix if they're discovered early on in the software development lifecycle, such as by analyzing the software design model rather than by encountering them after implementing thousands of lines of actual code.

For example, perhaps you've found yourself deep into implementing a class or function when you suddenly realized that, in order to get it to fit into the rest of the application, you have to scrap the whole thing and start over. One of the main goals of software design is to work out these sorts of high-level issues before investing too much time into implementation.

There are lot of different strategies surrounding software design, but we'll focus on top-down design. Top-down design means to start with a very zoomed-out, "high-level" view of the software product and its objectives, and, as you work on the design, slowly work your way downward toward the "lower-level" details. Once you get to a "low enough" level that you can reason about the solutions easily, you might feel comfortable terminating the design process and jumping into the code.

(There also exist bottom-up design strategies, but they're beyond the scope of this lecture. We will, however, cover bottom-up implementation strategies in the next lecture.)

Example

Discussion of software design can often seem very abstract, so let's make it concrete with an example.

Suppose you're tasked with creating a program that computes the number of days between two dates (i.e., a program that "subtracts" one date from another date). This is a fairly complicated task given that different months have different numbers of days, and February in particular has a different number of days depending on whether it's a leap year (and leap year conditions are more complicated than you probably think they are). Because this task is complicated, it probably wouldn't be a good idea to just jump in and start writing code. It'd be a better idea to think about things and design a solution carefully up front, allowing you to work out the high-level logic before investing a ton of effort into implementation.

The top-down design philosophy suggests that we should start by analyzing the problem at an extremely high level. Basically, this means breaking down the problem into just a couple slightly simpler problems. One possible design might start out like this:

# Computes the distance in days between two given Dates,
# date1 and date2. Assumes that date1 comes before date2.
days_between_dates(date1, date2):
    if date1 is in the same year as date2:
        return days_between_dates_in_year(date1, date2)
    
    otherwise:
        answer = days_between_dates_in_year(date1, December 31 of same year as date1)
        answer = answer + 1 # To get to January 1 of the next year
        for each year y between date1 and date2 (exclusive):
            answer = answer + number_of_days_in_year(y)
        answer = answer + days_between_dates_in_year(January 1 of same year as date2, date2)
        return answer

The above represents a design component, meaning that it's a component of a complete software design model, but it isn't, itself, a complete software design model. More on this in a moment.

It might look a bit like code to you. That's because I've chosen to model my software components using pseudocode. Pseudocode is exactly what it sounds like—a sort of low-effort, informal code-like description of a computational process. Importantly, pseudocode is not "real code". Software design models should usually be language-agnostic, meaning that they shouldn't be expressed in a specific programming language. Assuming your design is written in English, it should readable to any English-speaking programmer, regardless of what programming languages they're familiar with. This allows treating the programming language of choice as less of a design decision and more of an implementation decision (as it often should be... though there are cases where it shouldn't be, and in those cases it's usually okay for your design to conform more closely with some particular programming language).

Pseudocode does not have a strict, universally agreed-upon syntax. It's simply a "code-like" written description of a computational process, and it shouldn't have the hallmarks of a particular programming language. For example, notice that I write "January 1 of same year as date2" to express a certain date without relying on language-specific syntax. In Python, this expression might look something like Date('Jan', 1, date2.year), or similar, depending on how the Date type is represented. But that's not how I expressed it—I used English. It can be easily understood regardless of what programming languages the reader is familiar with. Moreover, I didn't have to think carefully about the strict syntax of a particular programming language when writing it; I only had to think about the high-level logic.

Pseudocode is not the only way to describe a software design model. In many cases, a visual model consisting of diagrams and charts (e.g., flowcharts, etc) may be preferred. In fact, there's an entire family of visual diagrams that's collectively referred to as the Unified Modeling Language (UML). UML diagrams are more formalized than pseudocode, and they're generally understood among the software engineering community, so they're a common way of modeling software systems. There are many different kinds of UML diagrams, and they're far beyond the scope of this course. For this course, we'll just stick with pseudocode (this will make the TAs' lives easier).

Before moving on, we should probably describe what a "date" consists of in the context of our system. After all, the above component receives two dates (date1 and date2) as inputs, and it implies that dates somehow contain days, months, and years. Let's formalize that:

type Date:
    day (integer)
    month (string)
    year (integer)

Again, the exact syntax is not strict. I've chosen to write it as I did because I think it clearly communicates that 1) a "Date" is some sort of data type in our system, and 2) Date instances contain (public-facing) day, month, and year attributes. It also explains the types of those attributes.

Moving on. As I said, top-down design refers to a design philosophy wherein the design process starts at a zoomed-out, "high-level" view of the system and, as the design process goes on, slowly works down toward a zoomed-in, "lower-level" view. Take a look back at the days_between_dates component. In theory, that one component solves the entire problem. After all, the two dates must either be in the same year or different years. The days_between_dates component checks which of those two scenarios is the case, and then it handles the case accordingly.

However, even though that single component theoretically solves the entire problem, it does not describe the entire solution. Indeed, many important details are left out. For one, it seems to rely on another component referred to as days_between_dates_in_year. Presumably, this component somehow computes the number of days between two dates that take place in the same year. However, it does not explain how that component works. Second, it seems to rely on another component referred to as number_of_days_in_year. Presumably, this component somehow computes the number of days in a given year. But again, it does not explain how that value will be computed (some years have 365 days, but others have 366).

This is what I meant by "high-level". The days_between_dates component solves the entire problem, but it does not describe all the low-level details about the solution. It's only a "zoomed-out" view of a few high-level steps that, if implemented correctly, will indeed solve the problem. That is, it describes what some of the high-level steps are, but it does not describe how those high-level steps will be completed.

Practicing proper top-down design requires a strong understanding of abstraction. Recall that abstraction refers to the concept of substituting complicated low-level details with simpler high-level ideas. In programming, functions and classes are common kinds of abstractions. For example, if you want to compute the square root of a value x in Python, you simply write math.sqrt(x). You don't have to think about all the messy low-level details involved in actually computing that square root (the "implementation", or the "how")—you just have to think about the function's name, arguments, and return value (the "interface", or the "what").

Top-down design is all about abstraction, especially in the earlier stages. When you're working at a zoomed-out, high-level view, the components that you design will need to refer to some other components that will, eventually, describe some lower-level details (e.g., like how days_between_dates refers to days_between_dates_in_year and number_of_days_in_year). But the higher-level components themselves should not describe those lower-level details—they should abstract away those details.

Now that we've implemented our highest-level component that describes the entire system at an extremely zoomed-out level, we can start working our way down closer to the lower-level details. Let's proceed with the number_of_days_in_year component:

# Computes the number of days in the given year y
number_of_days_in_year(y):
    if is_leap_year(y):
        return 366
    otherwise:
        return 365

Again, this component is still fairly high-level. It makes it very clear that all leap years have 366 days, and all non-leap years have 365 days. It checks whether y is a leap year and returns the correct value. However, it does not explain how it will be determined whether y is a leap year or not. Instead, it delegates that responsibility to another component through an abstraction: is_leap_year.

The idea is to continue in this manner, working our way closer and closer to the low-level details one component at a time.

A natural question is: at what point do you stop designing and start coding? Well, a truly "complete" design should work out all the "logic" involved in the solution, leaving only trivial implementation details. That's to say, a truly complete design should be easy to convert to actual code because doing so should just be a menial translation effort, rewriting the psuedocode in a specific programming language.

However, truly complete designs are extremely uncommon in practice. Rather, most software engineers will pause the design process (perhaps returning to it later—design documents are living documents) and move onto implementation once they feel confident that the remaining logic is sufficiently simple that they can figure it out during the implementation process without making large, high-level mistakes (e.g., the kinds of mistakes that would require scrapping large system components and rewriting them from scratch).

For software designs that you write in this class, the rubric(s) will explain the expected degree of specificity / "completeness" in your submission.

Let's finish up our design. First, the is_leap_year component:

# Determines whether a given year is a leap year
is_leap_year(y):
    # A leap year occurs every year that's a multiple of 4,
    # except for years that are divisible by 100 but not 400.
    # (Yes, this is a true fact. See the below reference.)
    # https://en.wikipedia.org/wiki/Leap_year

    if y is divisible by 4:
        if y is divisible by 100 and y is not divisible by 400:
            return false
        otherwise:
            return true
    otherwise:
        return false # Not divisible by 4

My design could go a little further into detail (e.g., replacing y is divisible by 4 with y modulo 4 is 0), but I think this is good enough. Any reasonably decent programmer should know how to write an expression that determines whether one number is divisible by another—I don't think I need to spell that out.

Next, the days_between_dates_in_year component:

# Computes the number of days between two Dates that take place
# in the same year as each other.
days_between_dates_in_year(date1, date2):
    if date1's month is the same as date2's month:
        return date2's day - date1's day
    otherwise:
        answer = days_in_month(date1's month) - date1's day # To get to the end of date1's month
        answer = answer + 1 # To get to the first of the NEXT month
        for each month m between date1 and date2 (exclusive):
            answer = answer + days_in_month(m, date1's year) # To get past month m
        answer = answer + date2's day - 1 # To get to date2's day

Here we refer to another new component: days_in_month. This component will compute the number of days in the given month (for the given year). It accepts the year as an input because the number of days in a given month might vary by year (as is the case with February, in particular). Indeed, this means that, while designing one high-level component, you may need to think about some of the lower-level details of its dependencies (lower-level components) insofar as you need to determine what information will need to be passed to those components (i.e., you need to at least figure out what their interfaces will look like).

Finally, here's the days_in_month component:

# Computes the number of days in the given month m, for the given year y
days_in_month(m, y):
    if m is in ['Sep', 'Apr', 'June', 'Nov']:
        return 30 # These months all have 30 days
    otherwise if m is NOT 'Feb':
        return 31 # These months all have 31 days
    otherwise:
        # m must be February. If it's a leap year, it has 29 days.
        # Otherwise, it has 28 days.
        if is_leap_year(y):
            return 29
        otherwise:
            return 28

And at this point, all our components are fleshed out. Again, we could go into slightly more detail here and there, but the level of detail at this point is sufficient that most reasonably decent programmers (including every student this class, hopefully!) should be able to translate this design to actual code without too much effort (we'll actually do some of the implementation in the next lecture). So I'll stop here.