Recursive Data Structures#

Just as a function can call itself, a collection data structure like a dictionary, list, or tuple can contain other collections of the same kind. We refer to these as nested structures.

We have already seen nested lists to represent matrices,the grid of letters in our Boggle project. Those nested structures had a fixed dimension, e.g., our Boggle boards used a single level of nesting to represent two-dimensional grids. We never had to determine whether one of the lists contained individual tiles or another level of nested list. Now we will consider structures in which the level of nesting may vary.

A computer file system is a familiar nested collection. Directories (also called “folders”) are collections that may contain other directories, individual files, or both. The operating system of your computer can display parts of this nested structure through some kind of “Finder” or “Explorer” application:

Finder display of directory structure

Like recursive algorithms, recursive data structures have one or more base cases and one or more recursive cases. We process recursive data structures with recursive functions. A base case in the data structure will be handled by a base case in a function, and a recursive case in the data structure will be handled by a recursive function call.

We will need a way to distinguish the base cases from recursive cases. For a directory structure, Python provides functions os.isfile and os.isdir to identify the base case (a regular file) and the recursive case (a directory), respectively, as well as functions to combine and extract parts of paths and to find the current working directory. A function to recursively print the directory tree would use these to choose between the base and recursive cases of the algorithm.

import os

def print_directory_tree(path: str, level: int):
    """Represent nesting by indentation"""
    name = os.path.basename(path)  # Just the last part of the path
    if os.path.isfile(path):
        # The base case ...
        # no recursive call
        print(f"{leader(level)}{name}")
    elif os.path.isdir(path):
        # The recursive case ...
        # a directory that may contain other files and directories
        print(f"{leader(level)}{name}")
        for content in os.listdir(path):
            print_directory_tree(os.path.join(path, content), level + 1)
    else:
        # "Hidden" files may be identified as neither files nor directories
        print(f"{leader(level)}[Hidden: {name}]")

def leader(level: int) -> str:
    """A leader in typography is a series of characters
    that are used as a visual aid to connect items on a page.
    [per Wikipedia]
    """
    if level == 0:
        return ""
    return "|   " * (level - 1) + "|–– "

# Print directory (folder) tree from current working directory
print_directory_tree(os.getcwd(), level=0)
09-Recursive-Data
|–– 09-05-Exercises.md
|–– 09-03-Treemap-Project.md
|–– img
|   |–– directory-tree-vertical.png
|   |–– major-codes.png
|   |–– coffee-tree.svg
|   |–– coffee-tree.graffle
|   |–– chap08.svg
|   |–– directory-tree-horizontal.png
|   |–– roster.jpg
|–– Samples
|   |–– dir_tree_printer.py
|   |–– dir_tree.py
|–– 09-01-Recursive-Data-Structures.md
|–– 09-04-Shape-of-Data.md
|–– 09-02-Nested-Dicts.md

Note

You needn’t remember the functions for manipulating file paths.
The point of the example is to see how containment of a file or directory within another directory is exactly mirrored by recursive calls to the print_directory_tree function. This exact mirroring between the structure of data and recursion in computation will be same when our recursive data structures are lists, dictionaries, tuples, or any other kind of nested collection.

Using isinstance to distinguish types#

Suppose we wanted to sum all the integers in a nested list like [[1, 2], 3, [4, [5, 6]]]. We’ll call that a deep sum, because it sums even integers that are deeply nested in the list. Our recursive algorithm should be something like this:

  • (Base case) The deep sum of a single integer is that integer itself.

  • (Recursive case) The deep sum of a nested list of integers is the sum of the deep sums of each element of the list.

To distinguish the base case from the recursive case, we need to determine whether a value is a list or an integer. The Python function isinstance can tell us that.

def what_is_it(value):
    """Distinguish possibly nested list from integer"""
    if isinstance(value, int):
        print(f"{value} is an int")
    elif isinstance(value, list):
        print(f"{value} is a list")
    else:
        print(f"{value} is neither int nor list")

what_is_it([1, 2, 3])
what_is_it([[1, 2], 3, [4, [5, 6]]])
what_is_it(42)
what_is_it([])
[1, 2, 3] is a list
[[1, 2], 3, [4, [5, 6]]] is a list
42 is an int
[] is a list

Using isinstance, we can write a deep_sum function with the base case and recursive case as outlined above:

def deep_sum(nest) -> int:
    """Return a sum of all integers in a nested list of int"""
    if isinstance(nest, int):
        return nest
    elif isinstance(nest, list):
        total = 0
        for el in nest:
            total += deep_sum(el)
        return total
    else:
        assert False, f"Wait, what is this thing? {nest}"

print(f"Deep sum of [1, 2, 3] is {deep_sum([1, 2, 3])}")
print(f"Deep sum of [[1, 2], 3, [4, [5, 6]]] is {deep_sum([[1, 2], 3, [4, [5, 6]]])}")
print(f"Deep sum of 42 is {deep_sum(42)}")
print(f"Deep sum of [] is {deep_sum([])}")
Deep sum of [1, 2, 3] is 6
Deep sum of [[1, 2], 3, [4, [5, 6]]] is 21
Deep sum of 42 is 42
Deep sum of [] is 0

Type declarations for recursive collection types#

As you may have noticed, I did not annotate the formal argument value with a type. If I annotated it as value: int, it would be wrong when value was a list. If I annotated as value: list, it would be wrong when value was an int. How can I give a clear, concise, and correct type declaration for a recursive collection, such as a nested list?

We will use two features of Python type annotations to solve this conundrum. First, Python allows us to give names to data types. For example, suppose I were using list[list[int]] in many places to represent a grid of integers. I could introduce a new type name Grid as a more readable and descriptive substitute:

Grid = list[list[int]]

I could then use the name Grid as a type annotation, e.g.,

def grid_sum(grid: Grid) -> int: 
    """Total of all cells in the grid"""
    total = 0
    for row_num in range(len(grid)):
        for col_num in range(len(grid[row_num])):
            total += grid[row_num][col_num]
    return total
     
g = [[1, 2, 3], [4, 5, 6]]
print(f"grid_sum of g is {grid_sum(g)}")
grid_sum of g is 21

The second new feature we will use is to indicate that a value can be of one type or another. In Python we use the vertical bar symbol |, pronounced “or”, to indicate that a variable could hold either of two or more different types of value. For example, if we wanted to indicate that x is either an int or a str, we could write int | str. If an argument to a function could be either an integer or a list of integers, we could annotate it as int | list[int].

Putting these together, we can create a description of a recursive data type:

Nest = int | list['Nest']

Annoyingly, we had to put Nest in quotes in list['Nest'], because the type Nest technically does not exist until the whole statement is processed. In return for this small aggravation, though, we now have a powerful way of annotating recursive data structures. Our deep_sum function could use be written like this:

def deep_sum(nest: Nest) -> int:
    """Return a sum of all integers in a nested list of int"""
    if isinstance(nest, int):
        return nest
    elif isinstance(nest, list):
        total = 0
        for el in nest:
            total += deep_sum(el)
        return total
    else:
        assert False, f"Wait, what is this thing? {nest}"

print(f"Deep sum of [1, 2, 3] is {deep_sum([1, 2, 3])}")
print(f"Deep sum of [[1, 2], 3, [4, [5, 6]]] is {deep_sum([[1, 2], 3, [4, [5, 6]]])}")
print(f"Deep sum of 42 is {deep_sum(42)}")
print(f"Deep sum of [] is {deep_sum([])}")
Deep sum of [1, 2, 3] is 6
Deep sum of [[1, 2], 3, [4, [5, 6]]] is 21
Deep sum of 42 is 42
Deep sum of [] is 0

Features and terms#

We’ve used several features of Python for the first time, and introduced some new terminology. Let’s review them quickly before moving on to more nested structures.

  • When the elements of one collection contain elements of the same or a different kind of collection, we say the collections are nested. We could have tuples nested in lists, for example, but the most important kind of nested collection is a recursive data structure, when a collection can contain elements of the same kind of collection.

  • When we process a nested collection, the computation is a mirror image of the data structure: Base cases are elements that are not collections (at least not nested collections), and recursive cases are collections in which further collections may be nested.

  • We can check whether a value is of a certain type using the Python isinstance function, e.g., isinstance(v, list).

  • We can describe a nested collection using a named, recursive type, using the | symbol (pronounced “or”) to separate cases.
    For example, a nested list of strings might be described by defining StringNest = str | list['StringNest'].

The next two sections provide additional examples of nested data structures.