Recursion

Written by Alex Guyer | guyera@oregonstate.edu

Here's the outline for this lecture:

Recursion

Recursion describes a computational process, such as a function, that calls itself. It's a relatively simple concept to explain but very difficult to master, so we'll practice with several examples.

The classic example of recursion is the fibonacci sequence. It's the sequence of numbers that starts with 0 and then 1, and then every subsequent number in the sequence is the sum of the previous two numbers. To illustrate, the first ten numbers in the fibonacci sequence are: 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. If you look at any three adjacent numbers in that subsequence, you'll notice that the third number is the sum of the first two numbers; that's how the fibonacci sequence is defined.

Recursion is not purely a programmatic concept. In fact, the fibonacci sequence is a mathematical function. Its definition looks like this:

fib(1) = 0

fib(2) = 1

fib(x) = fib(x - 1) + fib(x - 2), for all x > 2

The argument to the fib() function specifies which number in the sequence should be computed. If the argument is 1, then the first number should be computed. If it's 10, then the 10th number should be computed. The function then computes that number of the fibonacci sequence, which serves as the function's output. The first two numbers in the fibonacci sequence are 0 and 1, hence fib(1) = 0 and fib(2) = 1. Each subsequent number in the fibonacci sequence is the sum of the previous two numbers, hence fib(x) = fib(x - 1) + fib(x - 2).

A Python representation of this mathematical function might look like something like this:

def fib(x: int) -> int:
    if x <= 0:
        raise ValueError('x must be positive')

    if x == 1:
        return 0
    elif x == 2:
        return 1
    return fib(x - 1) + fib(x - 2)

Recursion can sometimes feel like black magic. I think that tracing a recursive function can be helpful in getting over that feeling. Suppose that the main() function has a single line of code: print(fib(4)). What value would be printed, and how would it be computed? Let's go through the process step by step:

  1. When the computer encounters print(fib(4)), it must compute fib(4) in order to print it, so it will jump up to the fib() function's definition, passing the argument 4 to the parameter x. 4 is positive, but it's not 1 nor 2, so the recursive statement will execute: return fib(x - 1) + fib(x - 2). In this moment, x is 4, so this statement is equivalent to return fib(3) + fib(2).

    But how does the computer compute those values? Well, it pauses what it's doing and restarts the same process two more timesonce for fib(3), and once for fib(2):

    1. Let's start with fib(3). The computer once again jumps up to the top of the fib() function, but in this particular function call, x is given the argument 3 instead of 4. Again, 3 is positive, but it's not 1 nor 2, so the recursive statement will execute: return fib(x - 1) + fib(x - 2). But since x is 3 in this moment, this statement is equivalent to return fib(2) + fib(1). Again, computing this statement requires restarting the process two more times:

      1. To compute fib(2), the computer jumps to the top of the fib() function, passing 2 as the argument to x. This time, the second base case's if statement will trigger, returning 1 and immediately ending the function call. To be clear, only this particular (fib(2)) function call is endedthe return statement does not end the fib(3) function call that in turn called fib(2), nor the fib(4) function call that in turn called fib(3).

      2. To compute fib(1), the computer jumps to the top of the fib() function, passing 1 as the argument to x. This time, the first base case's if statement will trigger, returning 0 and immediately ending the function call.

    2. Now that fib(2) and fib(1) have been computed as 1 and 0, respectively, the sum can be computed within the fib(3) function call as 1 + 0 = 1. That sum is returned from the fib(3) function call (i.e., return fib(2) + fib(1) -> return 1 + 0 -> return 1).

    3. But remember: the fib(4) function call in turn called fib(3) and fib(2). Everything I've described up to this point has been regarding the fib(3) computation. So, once it's done computing fib(3) to be 1, it moves on to compute fib(2). Again, the computer jumps to the top of the fib() function, passing 2 as the argument to x. The second base case's if statement triggers, returning 1 and ending the function.

  2. Within the fib(4) function call, fib(3) has been computed as 1, and fib(2) has also been computed as 1. All that's left is to add them up and return the sum (2).

The main() function then receives that sum and prints it. So it prints 2.

Stop and ask yourself: why is it necessary to explicitly spell out the fact that fib(1) = 0 and fib(2) = 1? That is, why do we need those if statements in our code? Well, suppose we omitted them. In such a case, what would happen when the computer needs to compute fib(2)? Without those if statements, the recursive statement would execute, in turn computing fib(1) and fib(0). The fib(0) call would then raise a ValueError, propagating down the entire call stack and crashing the program.

Now, what if we got rid of all three if statements, omitting the requirement that x be positive? Well, then we'd run into a different problem: fib(2) would recurse to fib(1) + fib(0); fib(0) would recurse to fib(-1) + fib(-2); fib(-2) would recurse to fib(-3) + fib(-4); and so on, forever and ever. Mathematically, this would correspond to an infinite expansion of terms, never resolving to an actual value. Programatically, this would mean the function would call itself forever and evera sort of infinite loop (this is sometimes referred to as infinite recursion).

(Technically, infinite recursion usually causes a stack overflow, which crashes the program and is, therefore, not really infinite. More on this later.)

In the case of the fibonacci sequence, both fib(1) and fib(2) must be explicit defined as the hard-coded (non-recursive) values 0 and 1, respectively. That is, even leaving out just one of these two if statements would break the entire function. I encourage you to think about why that's the case.

fib(1) = 0 and fib(2) = 1 are known as base cases. A base case of a recursive function is a special set of argument values for which the function does not recurse (i.e., does not call itself). If x is equal to 1 or 2, then the fib() function does not call itself. Hence, x == 1 and x == 2 are base cases of the fib() function.

Every recursive function inherently needs at least one base case, or else nothing will stop it from either a) calling itself forever, resulting in infinite expansion of terms / infinite recursion, or b) producing undefined values / raising an error. Some recursive functions, such as fib(), need more than one base case, but every well-defined recursive function will have at least one.

Why recursion?

As you might have already figured out, it's possible to rewrite the fib() function without any recursion whatsoever. Indeed, even though the fibonacci sequence is defined recursively in mathematics, it can be defined without recursion in an imperative programming paradigm:

def fib(x: int) -> int:
    if x <= 0:
        raise ValueError('x must be positive')

    if x == 1:
        return 0
    elif x == 2:
        return 1
    
    val1 = 0
    val2 = 1
    for _ in range(x - 2):
        val3 = val1 + val2 # Compute sum of previous two numbers
        # Update previous two numbers
        val1 = val2
        val2 = val3

    return val3

Since recursion involves a function calling itself, it behaves a bit like a sort of loop, so it shouldn't be surprising that, at least in some cases, it can be replaced with an actual loop. In this case, we replaced the recursion with a for loop. When comparing a loop-based solution with a recursive one, the loop-based solution is often referred to as an iterative solution.

Believe it or not, it's not too hard to prove that all recursive solutions can be rewritten in an iterative form, and all iterative solutions can be rewritten in a recursive form. That's to say, there's no such thing as a problem that can be solved with recursion but not a loop, and there's no such thing as a problem that can be solved with a loop but not recursion (in fact, some "functional-heavy" programming languages don't even offer loop-based mechanisms, resorting purely to recursion and higher-order functions to accomplish repetition).

But if that's the case, then how do you choose between recursion and iteration?

The answer is simple: recursion and iteration are just two different ways of thinking about computational problems and their solutions, and, in most cases, you should just use whichever one makes the most sense. In the case of the fibonacci sequence, it's easy to solve the problem using either a loop or recursion. But there exist many more complex problems for which the recursive solution is intuitive and elegant, but the iterative solution is messy and complicated. By the same token, there exist problems for which the iterative solution is intuitive and elegant, but the recursive solution is messy and complicated. That's to say, some problems lend themselves well to recursion, and some lend themselves well to iteration.

As a general rule of thumb, the kinds of problems that lend themselves well to recursion are problems that can be solved by 1) breaking them down into smaller instances of themselves, then 2) computing and combining the solutions of those smaller problems to formulate a solution for the larger, original problem. For example, computing the 100th number in the fibonacci sequence sounds difficult, but what if you could compute the 99th and 98th values? Those values are slightly smaller and simpler, so maybe they're easier to compute. If you could compute them, then you could simply add them up to compute the 100th value in the fibonacci sequence. I've just described a problem that can be solved by 1) breaking it down into two smaller instances of the same problem, then 2) combining the solutions of those two smaller problems to solve the larger, original problem. Hence, recursion might be reasonable for this problem.

In the above description, the process of breaking the problem down into smaller versions of the same problem is, itself, recursion. In our fib() function, this process is given by return fib(x - 1) + fib(x - 2)the original problem, fib(x), is broken down into two smaller problems, fib(x - 1) and fib(x - 2), and then their solutions are combined to produce the final answer (via summation, in this case).

Now, where do base cases fit into this line of thinking? Well, as you break the problem down into smaller and smaller instances of the same kind of problem, it will eventually be so small that you either can't break it down any further, or you simply have no need to break it down any further (e.g., because the solution is "obvious"). In the case of the fib() function, when x == 1 or x == 2, we're trying to compute the first or second (resp.) number in the fibonacci sequence. Since these are the first two numbers in the sequence, they can't possibly be defined as the sum of the two numbers that come before them (because there aren't two numbers that come before them). Indeed, these are the first and smallest numbers in the sequence, so the process of computing them can't be broken down into smaller steps. That is, fib(1) and fib(2) are "obvious"they are trivially defined as 0 and 1, respectively, and that's that. Hence, they represent the base cases.

Palindromes

Recursion gets easier with practice, so let's practice.

You're tasked with writing a function that accepts a string s and returns a boolean: True if s is a palindrome, and False otherwise. A palindrome is a string that's the same forwards as it is backwards. For example, "mom", "dad", and "racecar" are all palindromes because each of them remains the same when written in reverse. For our purposes, an empty string ("") should be considered a palindrome.

This particular problem, like the fibonacci sequence, has a relatively simple iterative solution as well as a relatively simple recursive solution. Here's a recursive solution:

palindrome.py
def palindrome(s: str) -> bool:
    # A string is a palindrome if and only if:
    #   1. The first character is equal to the last character
    #   AND
    #   2. The smaller string consisting of everything in between the
    #      first and last characters is, itself, a palindrome. This
    #      is the recursive part of our solution.

    # As we recurse, we'll be dealing with smaller and smaller strings.
    # The size will decrease by 2 at each recursive step. The base
    # cases are strings of length 0 and strings of length 1 (because,
    # in either of those two cases, there is no "inner" substring
    # on which to recurse). Let's handle these base cases up front:
    if len(s) <= 1:
        # s is of length 0 or 1. All strings of length 0 or 1 are
        # inherently palindromes. So just return true.
        return True

    # In all other cases, we have to check both conditions. First,
    # verify that the first and last character are equal to each other:
    if s[0] != s[len(s) - 1]:
        # The first and last character are NOT the same, so s must
        # not be a palindrome. Return False
        return False

    # If the function is still going, then the first and last character
    # must be the same. Check the second condition: verify that
    # the "inner" substring is a palindrome. We can use Python's
    # slice indexing to get the inner substring: use 1 for the start
    # index and len(s) - 1 as the (exclusive) end index.
    inner_substring = s[1 : len(s) - 1]
    
    # Now check whether inner_substring is a palindrome
    if not palindrome(inner_substring):
        # It's not a palindrome, so return False
        return False

    # If the function is still going, then s passed both conditions,
    # so it must be a palindrome. Return True
    return True

def main() -> None:
    word = input('Enter a word: ')
    if palindrome(word):
        print(f'"{word}" is a palindrome')
    else:
        print(f'"{word}" is not a palindrome')

if __name__ == '__main__':
    main()

Here are some example runs:

(env) $ python palindrome.py 
Enter a word: racecar
"racecar" is a palindrome
(env) $ python palindrome.py 
Enter a word: antarctica
"antarctica" is not a palindrome

Greatest common divisor

Let's try a harder example. You're tasked with writing a function that accepts two positive integer arguments, a and b, and returns the greatest common divisor of those two arguments. A divisor is a number by which another number is divisible. A common divisor of two numbers a and b, then, is a third number by which both a and b are divisible. The greatest common divisor is exactly thatit's the greatest possible value that is a common divisor of a and b.

For example, suppose a is 56 and b is 24. The greatest common divisor of these two numbers is 8. 56 is divisible by 8 (8 * 7 = 56), and 24 is divisible by 8 (8 * 3 = 24). There are some other common divisors (4, 2, and 1), but 8 is the greatest of all of them. Hence, it's the greatest common divisor.

This sounds like a hard problem, so let's think about it some more. To make things simpler, let's first assume that b is always smaller than or equal to a (if it isn't, we can just swap the two values with each other to ensure that it is).

In the simplest case, suppose a is divisible by b. The answer, in such a case, is simply b. The greatest possible divisor (factor) of any number is itself, meaning that the largest divisor of b is itself b. And since a is divisible by b, it's also a common divisor of the two arguments. Hence, it's the greatest common divisor.

Now for the more complicated case. Suppose that a is not divisible by b. Then we need to find a third number, which I'll call x, that's the greatest common divisor of a and b. What do we know about x? Obviously it's a factor (divisor) of a and a factor (divisor) of b, but there's another small piece of information that we can glean with a bit of careful analysis. Since x is a factor of b, it must also be a factor of all multiples of b (e.g., suppose b is 24 and x is 8; consider that 8 is also a factor of all multiples of 24, such as 48, 72, etc).

Consider the largest multiple of b that's still smaller than a. Let's refer to this number as y (e.g., if a is 56 and b is 24, then y would be 48). Since y is a multiple of b, x (the answer that we're looking forthe greatest common divisor of a and b) must also be a factor of y.

Now, if x is a factor of y and a factor of a (which it is, as I just explained), then consider that it must also be a factor of a - y. I'll use z to denote this difference. If you're not convienced that x is a factor of z, think about it: if x is a factor of y, then if you count upward from 0 by increments of x, you'll eventually reach y (this is precisely the definition of a "factor", or "divisor"). But if you keep going from there, you'll eventually also reach a (since x is also a factor of a). This means that, if you were to count upward from y by increments of x, you'd eventually reach a. More generally, if you count upward by increments of x, you'll eventually reach a value that is z larger than your starting value, where z is the distance between y and a (i.e., z = a - y). By definition, this means that x is a factor of z.

If you think carefully, you might realize that z is, by definition, equal to a % b. So all this analysis tells us something very important: if a given number x is a common divisor of a and b, then it's also a divisor of a % b (because it's a divisor of z, and z = a % b).

But there's an inverse statement that's also true: if x is a common divisor of b and a % b, then it must also be a divisor of a. This is also fairly easy to show (again, think in terms of "counting upward by increments of x"you'll eventually reach y, and then from there you'll eventually close the distance to a).

So, the goal is to find the greatest common divisor x of a and b. But as we just proved, any common divisor of a and b is also a divisor of a % b, and any common divisor of b and a % b is also a divisor of a. That's to say, finding the greatest common divisor of a and b is equivalent to finding the greatest common divisor of b and a % b.

Why is that useful? Well, of the three values a, b, and a % b, the latter two are the smallest (z is necessarily smaller than b, which, as per our assumption, is necessarily smaller than a). This means that our function, gcd(a, b), is equivalent to gcd(b, a % b). The latter expression is "simpler" in the sense that the arguments are smaller.

Remember: the goal of recursion is often to break problems down into smaller versions of themselves. I'm saying that we can compute the greatest common divisor of two numbers by instead computing the greatest common divisor of a smaller pair of numbers. That sounds like recursion.

I think that's enough analysis. Here's the code:

gcd.py
def gcd(a: int, b: int) -> int:
    # Require that a and b both be positive
    if a <= 0 or b <= 0:
        raise ValueError('a and b must both be positive!')

    # Make sure that b <= a. If not, swap them to make it so.
    if b > a:
        temp = b
        b = a
        a = temp

    # Now, in the simple case, if a is divisible by b, simply return
    # b
    if a % b == 0:
        # Remainder of 0 after division means a is divisible by b
        return b

    # If the function is still running, then this is the more
    # complicated case. As we proved, in this case, gcd(a, b) is
    # equivalent to gcd(b, a % b). And that's a simpler / smaller
    # version of the same problem, so we compute the answer via a
    # recursive call and return it
    return gcd(b, a % b)

def main() -> None:
    val1 = int(input('Enter an integer: '))
    val2 = int(input('Enter another integer: '))

    answer = gcd(val1, val2)

    print(f'The GCD if {val1} and {val2} is {answer}')

if __name__ == '__main__':
    main()

Here are some example runs:

(env) $ python gcd.py 
Enter an integer: 56
Enter another integer: 24
The GCD if 56 and 24 is 8
(env) $ python gcd.py 
Enter an integer: 7
Enter another integer: 14
The GCD if 7 and 14 is 7

By the way, this is actually a famous algorithm. It's known as Euclid's Algorithm, and it's an extremely efficient way of computing the greatest common divisor of two numbers.

Performance of recursion

In some programming languages, including Python, recursion can often have a noticeable impact on memory consumption (and sometimes runtime, but we'll focus on memory consumption).

Think back to our fibonacci example. In order to compute fib(4) so that it can be printed to the terminal, the fib(4) function call must in turn compute fib(3) and fib(2). Think carefully about how function calls actually work for a moment: before fib(4) can return a value back to the main() function, it must call fib(2) and fib(3). This means that these recursive calls must run to completion before the fib(4) function call can return.

In essence, when one function A calls another function B, A "pauses" until B terminates (either by returning or by raising an error). But during that timewhen A is "paused" so that B can executeall of A's variables, including its local variables and parameters, must be retained (i.e., not deleted from memory) so that, when B eventually terminates and A is "unpaused", it still has access to those variables in case it needs them.

Put in context, when fib(4) reaches the point that it needs to know the values of fib(3) and fib(2), it must "pause" to compute those values by calling itself twice (one at a timeit pauses to call fib(3), unpauses when fib(3) terminates, pauses again to call fib(2), and then unpauses again when fib(2) terminates). This means that when fib(3) is executing, fib(4) is essentially "paused", but its variables, including its parameter(s), still exist in memory.

This means that, for example, when fib(3) is executing, there are actually two parameters named x in memory: there's the parameter named x that exists within the context of the fib(3) call, and there's the parameter named x that exists within the context of the (currently paused, but still alive) fib(4) call.

Let's take this reasoning to the logical conclusion: as a function calls itself, and then that call results in more recursive calls, and then those calls result in more recursive calls, and so on, your computer may eventually reach a very "deeply" nested recursive call (e.g., a fib(2) call, which was executed by fib(3), which was executed by fib(4), and so on, perhaps all the way up to fib(n) for some large, initial value n). When your computer is executing that very deeply nested recursive call, at that very moment, there are many function calls that are all "paused", and each of those function calls has its own local variables and parameters. At that moment, all of those local variables and parameters exist in the computer's memory simultaneously, being kept around for when their respective call "unpauses" so that it may resume using those variables if it needs to.

That's to say, deeply nested recursive calls can have a noticeable impact on your program's memory consumption (actually, deeply nested calls in general can have this effect, but recursion is a natural reason for deeply nested calls).

In Python, all objects are stored in a special place in memory known as the heap, but all references to those objects (e.g., as represented by variables) are stored in a separate place in memory known as the stack (more on these terms in a future lecture). Since all variables are references, including local variables and parameters, deeply nested recursive calls can cause too much memory to be allocated and stored on the stack. What's worse, the stack is typically restricted to be fairly small. This means that deeply nested recursive calls can easily cause the stack to run out of available space. This is known as a stack overflow (the infamous error after which the popular programming stack exchange is named).

(A true stack overflow is unrecoverable and will always result in the program crashing because there's no space in memory left to even run an exception handling mechanism. However, Python has its own "softer" recursion limit. When this limit is exceeded, which usually happens before a true stack overflow occurs, it automatically raises a RecursionError. This can be caught like any other exception. Still, it's possible for a true stack overflow to occur if the recursion limit is too small, either because it has been configured to be small (yes, it's configurable) or because each function call uses a large amount of memory.)

This does not mean that you should avoid recursion. It only means that you should avoid recursion when 1) you've determined that the memory consumption of a recursive solution would be significant enough to matter for your use case, and 2) there's an alternative iterative solution that doesn't have such a significant memory footprint. For example, the fibonacci sequence can be computed via a simple iterative solution that's part of a family of algorithms known as dynamic programming. This solution is more efficient than our naive recursive implementation in terms of both memory consumption and runtime (I won't show you the algorithm since that's not the point of this course; just understand the potential performance implications of recursion).

Now, in some other programming languages, the interpreter / compiler can use certain "tricks" to alleviate this issue. A common trick employed by many "functional-heavy" programming languages (e.g., Haskell) is tail recursion, which allows a recursive call to "reuse" some of the space in memory that was used by its caller. But this is only possible in certain cases (e.g., when the compiler / interpreter can determine with certainty that the caller will not need to refer back to that memory when it "unpauses", such as when the recursive call appears as the very last statement in the function (the "tail")), and it's only supported by programming languages that have tight restrictions on scope and control flow. Python's control flow and object reference model are too flexible, so it could never support tail recursion in a robust manner.

Whatever programming language you're working in, you should be aware of the performance implications of deeply nested recursive calls.