9. Trees and directed acyclic graphs

The abstract data types that we met in Chapter 5 were all fairly simple sequences of objects that were extensible in different ways. If that were all the sum total of abstract data types then the reader might reasonably wonder what all the fuss is about. In this chapter we’ll look at trees and directed acyclic graphs (DAGs), which are abstract data types which look very different from the ones we’ve met so far.

Trees and DAGs provide some great examples of inheritance and give us the chance to look at some new algorithm types. They are also the core data types underpinning computer algebra systems, so studying trees and DAGs will enable us to gain a little insight into how systems such as SymPy, Maple and Mathematica actually work. As we come to the latter parts of the course, we’ll also step back a little from laying out a lot of code, and instead focus on the mathematical structure of the objects in question. When we come to the exercises, you will then take a little more responsibility for translating the maths into code.

9.1. The splat and double splat operators

Video: splat and double splat.

Imperial students can also watch this video on Panopto.

Before we go on to write code for trees and their traversal, we need to digress ever so slightly in order to explain a new piece of syntax. We’re going to want to write functions that can take a variable number of arguments. For example, we’re going to want the constructor of a tree node object to be able to take a variable number of children. We can do this by writing the relevant parameter as *children. The character * in this case is the argument packing operator, also known as the splat operator [2]. When used in the parameter list of a function, splat takes all of the remaining positional arguments provided by the caller and packs them up in a tuple. In this case, this enables any number of child nodes to be specified for each node.

The splat operator can also be used when calling a function. In that case it acts as a sequence unpacking operator, turning a sequence into separate arguments. For example:

In [1]: a = (1, 2, 3)

In [2]: print(*a)
1 2 3

which is identical to:

In [3]: print(1, 2, 3)
1 2 3

but different from:

In [4]: print(a)
(1, 2, 3)

The double splat operator, ** plays a similar role to the single splat operator, but packs and unpacks keyword arguments instead of positional arguments. When used in the parameter list of a function, ** gathers all of the keyword arguments that the caller passes, other than any which are explicitly named in the interface. The result is a dict whose keys are the argument names, and whose values are the arguments. Listing 9.1 demonstrates the argument packing function of **, while Listing 9.2 shows the unpacking function.

Listing 9.1 An illustration of keyword argument packing. All of the keyword arguments are packed into the dictionary kwargs, except for b, because that explicitly appears in the parameter list of fn().
In [1]: def fn(a, b, **kwargs):
   ...:     print("a:", a)
   ...:     print("b:", b)
   ...:     print("kwargs:", kwargs)

In [2]: fn(1, f=3, b=2, g="hello")
a: 1
b: 2
kwargs: {'f': 3, 'g': 'hello'}
Listing 9.2 Keyword argument unpacking. Notice that the arguments matching the explicitly named keywords are unpacked, while the remainder are repacked into the **kwargs parameter.
In [3]: kw = {"a": "mary", "b": "had", "c": "a", "d": "little",
              "e": "lamb"}

In [4]: fn(**kw)
a: mary
b: had
kwargs: {'c': 'a', 'd': 'little', 'e': 'lamb'}

Combining the splat and double splat operators, it is possible to write a function that will accept any combination of positional and keyword arguments. This is often useful if the function is intended to pass these arguments through to another function, without knowing anything about that inner function. For example:

def fn(*args, **kwargs):
    a =  inner_fn(*args, **kwargs)

The names *args and **kwargs are the conventional names in cases where nothing more specific is known about the parameters in question.

9.2. Graph and tree definitions

Trees and DAGs are examples of graphs, a type of mathematical object that you may have met in previous courses. Before moving on to define data structures and algorithms that work with them, it’s helpful to state the relevant definitions.

Definition 9.1 (Graph)

A graph \((V, E)\) is a set \(V\) known as the vertices or nodes, and a set of pairs of vertices, \(E\), known as the edges.

A graph describes connections between its vertices, and can be used to model a huge range of interconnected networks. Fig. 9.1 illustrates a simple example.

strict graph{ a -- b b -- c b -- d c -- f f -- b d -- a f -- a e -- c e -- d }

Fig. 9.1 A graphical representation of the graph \((\{a, b, c, d, e, f\},\) \(\{(a, b), (a, d), (a, f), (b, c), (b, d), (b, f), (c, e), (c, f), (d, e) \})\)

strict digraph{ a -> b [color=red] b -> c b -> d [color=red] c -> f f -> b d -> a [color=red] f -> a e -> c e -> d }

Fig. 9.2 A directed graph. The edges in red depict a cycle.

Definition 9.2 (Directed graph)

A directed graph is a graph in which the pair of nodes forming each edge is ordered. In other words each edge points from one node (the source) and to another (the target).

Fig. 9.2 shows a directed graph with similar topology to the previous example.

Definition 9.3 (Cycle)

A cycle in a graph is a sequence of edges such that the target of each edge is the source of the next, and the target of the last edge is the source of the first.

Definition 9.4 (Directed acyclic graph.)

A directed acyclic graph (DAG) is a directed graph in which there are no cycles.

Fig. 9.3 shows a directed acyclic graph, or DAG. The acyclic nature of the graph imposes a certain form of hierarchy. For example the graph formed by the inheritance relationship of classes is a DAG. The hierarchy implied by a DAG also lends itself to similar nomenclature to that which we use for class hierarchies: the source node of an edge is also referred to as the parent node and the target nodes of the edges emerging from a node are referred to as its child nodes or children.

strict digraph{ a -> b b -> c b -> d c -> f b -> f a -> f e -> c e -> d }

Fig. 9.3 A directed acyclic graph, formed by reversing edges in Fig. 9.2 so that no cycles remain.

strict digraph{ a -> b b -> d b -> e b -> f a -> c c -> g }

Fig. 9.4 A tree.

Definition 9.5 (Tree)

A tree is a directed acyclic graph in which each node is the target of exactly one edge, except for one node (the root node) which is not the target of any edges [1].

Tree nodes with no children are called leaf nodes.

9.3. Data structures for trees

Video: Tree data structures.

Imperial students can also watch this video on Panopto.

Unlike the sequence types we have previously met, trees are not linear objects. If we wish to iterate through every node in the tree then we have choices to make about the order in which we do so. In particular, there are two obvious classes of traversal order:

preorder traversal

A traversal in which each node is always visited before its children.

postorder traversal

A traversal order in which each node is always visited after its children.

Neither order is unique: a node can have any number of children and the definitions are silent on the order in which these are visited. There is furthermore no guarantee that the children of a node will be visited immediately before or after their parent, and once we look at visitors for DAGs it will become apparent that it is not always possible to do so.

Listing 9.3 A basic tree implementation. This code is available as the example_code.graphs.TreeNode class.
 1class TreeNode:
 2    """A basic tree implementation.
 4    Observe that a tree is simply a collection of connected TreeNodes.
 6    Parameters
 7    ----------
 8    value:
 9        An arbitrary value associated with this node.
10    children:
11        The TreeNodes which are the children of this node.
12    """
14    def __init__(self, value, *children):
15        self.value = value
16        self.children = children
18    def __repr__(self):
19        """Return the canonical string representation."""
20        return f"{type(self).__name__}{(self.value,) + self.children}"
22    def __str__(self):
23        """Serialise the tree recursively as parent -> (children)."""
24        childstring = ", ".join(map(str, self.children))
25        return f"{self.value!s} -> ({childstring})"

Listing 9.3 shows a very simple class which implements a tree. For example, we can represent the tree in Fig. 9.4 using:

In [1]: from example_code.graphs import TreeNode

In [2]: tree = TreeNode(
   ...:     "a",
   ...:     TreeNode("b", TreeNode("d"), TreeNode("e"), TreeNode("f")),
   ...:     TreeNode("c", TreeNode("g"))
   ...: )

In [3]: print(tree)
a -> (b -> (d -> (), e -> (), f -> ()), c -> (g -> ()))

The reader might immediately observe that serialised trees can be a little hard to read! This is the reason that trees are often represented by diagrams.

9.3.1. Traversing TreeNode

Video: tree traversal.

Imperial students can also watch this video on Panopto.

A function which traverses a tree is often called a tree visitor, because it visits every node in the tree. What does it do when it visits? Well it could do just about anything that one can do on the basis of the data available at that node, and the results of visiting whichever nodes have already been visited. The way that such a wide range of options can be accommodated is by the tree traversal function taking another function as an argument. This enables the caller to specify what should happen when each node is visited. An approach like this is a good example of a separation of concerns: the process of visiting a tree in the correct order is separated from the question of what to do when we get there.

We’ll consider postorder traversal first, as it’s the easier to implement.

Listing 9.4 A basic postorder tree visitor. This code is also available as example_code.graphs.postvisitor().
 1def postvisitor(tree, fn):
 2    """Traverse tree in postorder applying a function to every node.
 4    Parameters
 5    ----------
 6    tree: TreeNode
 7        The tree to be visited.
 8    fn: function(node, *fn_children)
 9        A function to be applied at each node. The function should take
10        the node to be visited as its first argument, and the results of
11        visiting its children as any further arguments.
12    """
13    return fn(tree, *(postvisitor(c, fn) for c in tree.children))

Listing 9.4 implements this visitor. Notice that there is only one line of executable code, at line 13. This recursively calls postvisitor() on all of the children of the current node, before calling fn() on the current node. As a trivial example, Listing 9.5 prints out the nodes of the graph in Fig. 9.4 in postorder.

Listing 9.5 A trivial postorder tree traversal which simply prints the node values in order. Observe that d, e, and f are printed before b; g is printed before c; and both b and c are printed before a.
 1In [1]: from example_code.graphs import TreeNode, postvisitor
 3In [2]: tree = TreeNode("a", TreeNode("b", TreeNode("d"), TreeNode("e"),
 4   ...:                                    TreeNode("f")),
 5   ...:                      TreeNode("c", TreeNode("g")))
 7In [3]: fn = lambda n, *c: print(n.value)
 9In [4]: postvisitor(tree, fn)

The preceding example is possibly a little too trivial, because we didn’t at all use the result of visiting the child nodes in visiting the parent node. For a marginally less trivial case, let’s count the number of nodes in the tree:

In [5]: fn = lambda n, *c: sum(c) + 1

In [6]: postvisitor(tree, fn)
Out[6]: 7

This time the visitor sums the results from its children, and adds one for itself.

What about preorder traversal? This time we need a little more code (not much) as Listing 9.6 shows. This time, we call fn() on the current tree node first, and then pass this result through as we recursively call previsitor() on the child nodes.

Listing 9.6 The simple preorder visitor from example_code.graphs.previsitor().
 1def previsitor(tree, fn, fn_parent=None):
 2    """Traverse tree in preorder applying a function to every node.
 4    Parameters
 5    ----------
 6    tree: TreeNode
 7        The tree to be visited.
 8    fn: function(node, fn_parent)
 9        A function to be applied at each node. The function should take
10        the node to be visited as its first argument, and the result of
11        visiting its parent as the second.
12    """
13    fn_out = fn(tree, fn_parent)
15    for child in tree.children:
16        previsitor(child, fn, fn_out)

What can we do with a preorder traversal? As an example, we can measure the depth in the tree of every node:

In [7]: from example_code.graphs import previsitor

In [8]: def fn(node, p):
   ...:     depth = p + 1 if p else 1
   ...:     print(f"{node.value}: {depth}")
   ...:     return depth

In [9]: previsitor(tree, fn)
a: 1
b: 2
d: 3
e: 3
f: 3
c: 2
g: 3

Observe here that the node visitor order is different from the postvisitor, and that we successfully diagnosed the depth of each node.

9.4. Expression trees

One important application of trees is in representing arithmetic expressions. Consider the expression \(2y + 4^{(5 + x)}\). Suppose, further, that we want to represent this on a computer in such a way that we can perform mathematical operations: evaluation, differentiation, expansion, and simplification. How would we do this? Well, thanks to the rules for order of operations, this expression forms a hierarchy from the operators to be evaluated first, down to the last one. Fig. 9.5 shows a tree representation for our mathematical expression. The evaluation rule for trees of this type is a postorder traversal, first the leaf nodes are evaluated, then their parents and so on up until finally the addition at the root node is evaluated.

strict digraph{ a [label="+"]; b [label="⨉"]; c [label="^"]; 2; y [font="italic"]; a->b a->c b->{2 y} c->4 d[label="+"] c->d d->5 x [font="italic"] d->x }

Fig. 9.5 Expression tree for the expression \(2y + 4^{(5 + x)}\).

We will first consider how to construct trees like this, then consider the question of the operations we could implement on them.

9.4.1. An expression tree class hierarchy

The nodes of an expression tree don’t just have different values, they have different type. That is to say, the meaning of operations changes between, say \(+\) and \(2\). For example the evaluation rule for these nodes will be different, as will the differentiation rule. At the same time, all the nodes are still expressions and will share many common features. This is a textbook case of inheritance. There should be a most general class, covering all types of expression nodes, and then more specialised node types should inherit from this. The most basic distinction is between operators, which have at least one operand (represented by a child node), and terminals, which have no children. In practice, it will result in simpler, more elegant code if terminals actually have an empty tuple of operands rather than none at all. This facilitates writing, for example, tree visitors which loop over all of the children of a node.

strict digraph{ node [ shape = "record" ] edge [ arrowtail = "empty"; dir = "back"; ] Expression -> Terminal Terminal -> Number Terminal -> Symbol Expression -> Operator Operator -> Add Operator -> Mul Operator -> Sub Operator -> Div Operator -> Pow }

Fig. 9.6 Inheritance diagram for a very basic symbolic language. Each box represents a class, with the arrows showing inheritance relationships. Note that the edges in such diagrams conventionally point from the child class to the parent class, because it’s the child class that refers to the parent. The parent does not refer to the child.

Let’s consider what would be needed at each layer of the hierarchy. Expression should implement everything that is the same for all nodes. What will that comprise?


The constructor will take a tuple of operands, since every expression has operands (even if terminals have zero operands).

__add__(), __sub__(), __mul__(), __truediv__(), __pow__()

Implementing the special methods for arithmetic is necessary for expressions to exhibit the correct symbolic mathematical behaviour. We met this idea already in Section 3.3.6. Arithmetic operations involving symbolic expressions return other symbolic expressions. For example if a and b are expressions then a + b is simply Add(a, b). The fact that these rules are the same for all expressions indicates that they should be implemented on the base class Expression.

Just as was the case when we implemented the Polynomial class, it will be necessary to do something special when one of the operands is a number. In this case, the right thing to do is to turn the number into an expression by instantiating a Number with it as a value. Once this has been done, the number is just another Expression obeying the same arithmetic rules as other expressions. The need to accommodate operations between symbolic expressions and numbers also implies that it will also be necessary to implement the special methods for reversed arithmetic operations.

Let’s now consider Operator. The operations for creating string representations can be implemented here, because they will be the same for all operators but different for terminals.


Remember that this is the canonical string representation, and is usually the code that could be passed to the Python interpreter to construct the object. Something like the following would work:

def __repr__(self):
    return type(self).__name__ + repr(self.operands)

This approach is valid because the string representation of a tuple is a pair of round brackets containing the string representation of each item in the tuple.


This is the human-readable string output, using infix operators, so in the example above we would expect to see 2 * y + 4 ^ (5 + x) This looks sort of straightforward, simply associate the correct symbol with each operator class as a class attribute and place the string representation of the operands on either side.

The challenge is to correctly include the brackets. In order to do this, it is necessary to associate with every expression class a class attribute whose value is a number giving an operator precedence to that class. For example, the precedence of Mul should be higher than Add. A full list of operators in precedence order is available in the official Python documentation. An operand \(a\) of an operator \(o\) needs to be placed in brackets if the precedence of \(a\) is lower than the precedence of \(o\).

Individual operator classes therefore need to define very little, just two class attributes, one for the operator precedence, and one to set the symbol when printing the operator.

Let’s now consider Terminal. What does it need to set?


The constructor for Expression assumes that an expression is defined by a series of operands. Terminals have an empty list of operands but do have something that other expressions lack: a value. In the case of Number, this is a number, while for Symbol the value is a string (usually a single character). Terminal therefore needs its own __init__() which will take a value argument. Terminal.__init__ still has to call the superclass constructor in order to ensure that the operands tuple is initialised.

__repr__() and __str__()

The string representations of Terminal are straightforward, simply return repr(self.value) and str(self.value) respectively. Note that in order for Operator.__str__() to function correctly, Terminal will need to advertise its operator precedence. The reader should think carefully about what the precedence of a Terminal should be.

The two Terminal subclasses need to do very little other than identify themselves. The only functionality they might provide would be to override __init__() to check that their value is a numbers.Number in the case of Number and a str in the case of Symbol.

9.4.2. Operations on expression trees

Video: evaluating expressions.

Imperial students can also watch this video on Panopto.

Many operations on expression trees can be implemented using tree visitors, most frequently by visiting the tree in postorder. An example is expression evaluation. The user provides a dict mapping symbol names to numerical values, and we proceed from leaf nodes upwards. Every Symbol is replaced by a numerical value from the dictionary, every Number stands for itself, and every Operator performs the appropriate computation on its operands (which are now guaranteed to be numbers).

The leaf-first order of execution makes this a postorder tree visitor, but what is the visitor function? It seems we need a different function for every type of expression node we encounter. It turns out that this is exactly what is required, and Python provides this functionality in the form of the single dispatch function.

9.4.3. Single dispatch functions

In all of the cases we have encountered so far, there is a unique mapping from the name of a function to the code implementing that function. That is, no matter what arguments are passed to a function, the same code will execute (so long as the right number of arguments are passed). A single dispatch function is not like this. Instead, calling a single function name causes different function code to execute, depending on the type of the first argument [3].

Listing 9.7 shows a single dispatch function for a visitor function which evaluates a Expression. Start with lines 6-24. These define a function evaluate() which will be used in the default case, that is, in the case where the type of the first argument doesn’t match any of the other implementations of evaluate(). In this case, the first argument is the expression that we’re evaluating, so if the type doesn’t match then this means that we don’t know how to evaluate this object, and the only course of action available is to throw an exception.

The new feature that we haven’t met before appears on line 5. functools.singledispatch() turns a function into a single dispatch function. The @ symbol marks singledispatch() as a decorator. We’ll return to them in Section 10.1. For the moment, we just need to know that @singledispatch turns the function it precedes into a single dispatch function.

Listing 9.7 A single dispatch function implementing the evaluation of a single Expression node. The implementation of the expressions language itself, in the expressions module, is left as an exercise.
 1from functools import singledispatch
 2import expressions
 6def evaluate(expr, *o, **kwargs):
 7    """Evaluate an expression node.
 9    Parameters
10    ----------
11    expr: Expression
12        The expression node to be evaluated.
13    *o: numbers.Number
14        The results of evaluating the operands of expr.
15    **kwargs:
16        Any keyword arguments required to evaluate specific types of
17        expression.
18    symbol_map: dict
19        A dictionary mapping Symbol names to numerical values, for
20        example:
22        {'x': 1}
23    """
24    raise NotImplementedError(
25        f"Cannot evaluate a {type(expr).__name__}")
29def _(expr, *o, **kwargs):
30    return expr.value
34def _(expr, *o, symbol_map, **kwargs):
35    return symbol_map[expr.value]
39def _(expr, *o, **kwargs):
40    return o[0] + o[1]
44def _(expr, *o, **kwargs):
45    return o[0] - o[1]
49def _(expr, *o, **kwargs):
50    return o[0] * o[1]
54def _(expr, *o, **kwargs):
55    return o[0] / o[1]
59def _(expr, *o, **kwargs):
60    return o[0] ** o[1]

Next we turn our attention to the implementation of evaluation for the different expression types. Look first at lines 28-30, which provide the evaluation of Number nodes. The function body is trivial: the evaluation of a Number is simply its value. The function interface is more interesting. Notice that the function name is given as _. This is the Python convention for a name which will never be used. This function will never be called by its declared name. Instead, look at the decorator on line 28. The single dispatch function evaluate() has a method register(). When used as a decorator, the register() method of a single dispatch function registers the function that follows as implementation for the class given as an argument to register(). On this occasion, this is expressions.Number.

Now look at lines 33-35. These contain the implementation of evaluate() for expressions.Symbol. In order to evaluate a symbol, we depend on the mapping from symbol names to numerical values that has been passed in.

Finally, look at lines 38-40. These define the evaluation visitor for addition. This works simply by adding the results of evaluating the two operands of expressions.Add. The evaluation visitors for the other operators follow in an analogous manner.

9.4.4. An expanded tree visitor

The need to provide the symbol_map parameter to the expressions.Symbol evaluation visitor means that the postorder visitor in Listing 9.4 is not quite up to the task. Listing 9.8 extends the tree visitor using the double splat operator to pass arbitrary keyword arguments through to the visitor function.

Listing 9.8 A recursive tree visitor that passes any keyword arguments through to the visitor function. This is available as example_code.expression_tools.post_visitor(). We also account for the name changes between TreeNode and Expression.
 1def postvisitor(expr, fn, **kwargs):
 2    '''Visit an Expression in postorder applying a function to every node.
 4    Parameters
 5    ----------
 6    expr: Expression
 7        The expression to be visited.
 8    fn: function(node, *o, **kwargs)
 9        A function to be applied at each node. The function should take
10        the node to be visited as its first argument, and the results of
11        visiting its operands as any further positional arguments. Any
12        additional information that the visitor requires can be passed in
13        as keyword arguments.
14    **kwargs:
15        Any additional keyword arguments to be passed to fn.
16    '''
17    return fn(expr,
18              *(postvisitor(c, fn, **kwargs) for c in expr.operands),
19              **kwargs)

Assuming we have an implementation of our simple expression language, we are now in a position to try out our expression evaluator:

In [1]: from expressions import Symbol

In [2]: from example_code.expression_tools import evaluate, postvisitor

In [3]: x = Symbol('x')

In [4]: y = Symbol('y')

In [5]: expr = 3*x + 2**(y/5) - 1

In [6]: print(expr)
3 * x + 2 ^ (y / 5) - 1

In [7]: postvisitor(expr, evaluate, symbol_map={'x': 1.5, 'y': 10})
Out[7]: 7.5

9.5. Avoiding recursion

The recursive tree visitors we have written require very few lines of code, and very succinctly express the algorithm they represent. However, in most programming languages (including Python), recursion is a relatively inefficient process. The reason for this is that it creates a deep call stack requiring a new stack frame for every level of recursion. In extreme cases, this can exceed Python’s limit on recursion depth and result in a RecursionError.

In order to avoid this, we can think a little more about what a recursive function actually does. In fact, a recursive function is using the call stack to control the order in which operations are evaluated. We could do the same using a stack to store which tree nodes still need processing. There are a number of ways to do this, but one particular algorithm emerges if we wish to be able to represent expressions not only as trees but as more general directed acyclic graphs.

9.6. Representing expressions as DAGs

If we treat an expression as a tree, then any repeated subexpressions will be duplicated in the tree. Consider, for example, \(x^2 + 3/x^2\). If we create a tree of this expression, then \(x^2\) will occur twice, and any operation that we perform on \(x^2\) will have to be done twice. If, on the other hand, we treat the expression as a more general directed acyclic graph, then the single subexpression \(x^2\) can have multiple parents, and so can appear as an operand more than once. Fig. 9.7 illustrates this distinction.

strict digraph{ a1 [label="+"] pow1 [label="^"] x1 [label="x"] n1 [label="2"] a1 -> pow1 pow1 -> x1 pow1 -> n1 m1 [label="/"] n0 [label=3] pow2 [label="^"] x2 [label="x"] n2 [label="2"] a1 -> m1 m1 -> n0 m1 -> pow2 pow2 -> x2 pow2 -> n2 a3 [label="+", ordering="out"] pow3 [label="^"] x3 [label="x"] n3 [label="2"] a3 -> pow3 pow3 -> x3 pow3 -> n3 m3 [label="/", ordering="out"] n4 [label=3] a3 -> m3 m3 -> n4 m3 -> pow3 }

Fig. 9.7 Tree (left) and DAG (right) representations of the expression \(x^2 + 3/x^2\). Notice that the DAG representation avoids the duplication of the \(x^2\) term.

The difference between a tree and a DAG may seem small in the tiny examples we can print on our page, but realistic applications of computer algebra can easily create expressions with thousands or tens of thousands of terms, in which larger common subexpressions themselves contain multiple instances of smaller subexpressions. The repetition of common terms, and therefore data size and computational cost, induced by the tree representation is exponential in the depth of nesting. This can easily make the difference between a computation that completes in a fraction of a second, and one which takes hours to complete or which exhausts the computer’s memory and therefore fails to complete at all.

9.6.1. Building expression DAGs

Using somewhat more complex data structures, it is possible to create expressions that automatically result in DAGs rather than trees. That is beyond our scope, but we can construct expression DAGs using our existing tools, at the expense of a little extra code. Take as an example the expression we used above: \(x^2 + 3/x^2\). If we write:

In [1]: from expressions import Symbol

In [2]: x = Symbol('x')

In [3]: expr = x**2 + 3/x**2

then each occurrence of x**2 creates a separate Expression object and we have a tree. However, if we instead write:

In [1]: from expressions import Symbol

In [2]: x = Symbol('x')

In [3]: x2 = x**2

In [4]: expr = x2 + 3/x2

then we now have a DAG in which the two occurrences of the \(x^2\) term refer to the same x**2 object.

9.6.2. DAG visitors

Even if we represent an expression as a DAG rather than a tree, the simple recursive tree visitors we have used thus far will undo all of our good work, because common subexpressions will be visited via each parent expression, rather than just once. This compounds the disadvantages of recursive visitors that we discussed above. Instead, we can construct a postorder DAG visitor using a stack to replace the recursion in keeping track of what to evaluate next, and a dict to record the nodes we have already visited, and the results of visiting them. Listing 9.9 illustrates one such algorithm.

Listing 9.9 Pythonic pseudocode for a non-recursive postorder DAG visitor.
 1def visit(expr, visitor):
 2    stack = []
 3    visited = {}
 4    push expr onto stack
 5    while stack:
 6        e = pop from stack
 7        unvisited_children = []
 8        for o in e.operands:
 9            if o not in visited:
10                push o onto unvisited_children
12        if unvisited_children:
13            push e onto stack # Not ready to visit this node yet.
14            # Need to visit children before e.
15            push all unvisited_children onto stack
16        else:
17            # Any children of e have been visited, so we can visit it.
18            visited[e] = visitor(e, *(visited(o) for o in e.operands))
20    # When the stack is empty, we have visited every subexpression,
21    # including expr itself.
22    return visited[expr]


Every operation we have defined on a symbolic mathematical expression defined as a tree or a DAG has produced a new tree or DAG as a result. This matches mathematical convention: operations produce new expressions, they don’t change their inputs. This is an important design principle of symbolic mathematical software. The confusion that frequently results from modifying symbolic expressions in-place far outweighs any possible advantage of avoiding the creating of new objects.

9.7. Differentiation as an expression visitor

In Section 9.4 we showed how a tree visitor could implement the evaluation of a symbolic expression. You might very well protest that if the only thing you wanted to do was evaluate arithmetic expressions then you could have just written a Python function and avoided a lot of code. In fact, almost any algebraic manipulation that you could conduct with pen and paper can be automated using expression visitors. We will illustrate this by sketching how differentiation can be achieved using a visitor function.

Let’s first consider terminals. Numbers are easy: the derivative of any number with respect to any symbol is simply 0. Symbols are nearly as straightforward. The derivative of a symbol with respect to itself is 1, while the derivative of a symbol with respect to any other symbol is 0. Because terminals have no operands, the implementation of differentiation when visiting a terminal is particularly easy. Note that the symbol with respect to which we are differentiating will need to be passed in to the visitor. This can be achieved with a keyword argument in a manner analogous to symbol_map in Listing 9.7.

The differentiation of operators is achieved by an applying the chain rule. For a binary operator \(\odot\), with operands \(o_0\) and \(o_1\), the chain rule is given by:

(9.1)\[\frac{\partial\, (o_0 \odot o_1)}{\partial x} = \frac{\partial o_0}{\partial x} \frac{\partial\, (o_0 \odot o_1)}{\partial o_0} + \frac{\partial o_1}{\partial x} \frac{\partial\, (o_0 \odot o_1)}{\partial o_1}\]

For example if the operator is multiplication, then:

(9.2)\[\frac{\partial\, o_0 o_1}{\partial o_0} = o_1\]

and the product rule follows immediately. Similarly, the sum and quotient rules for differentiation are simply special cases of the chain rule. This means that the particular implementation of differentiation for a given Operator subclass simply encodes the version of the chain rule for that operator. This will require the original operands to the operator, which are available from the operator object itself, and the results of differentiating the operands, which are given by the *o argument to the visitor function.

9.8. Glossary

child node

The children of a node in a DAG are the targets of the edges emerging from that node. In this context, the node from which these edges emanate is called the parent node.

directed acyclic graph

A graph in which the edges are directed (i.e. the edges point from one vertex to another) and where there are no cycles of edges.


A connection between two nodes in a graph. In a directed graph, the edges have an orientation, from a source node to a target node.


An abstract data type comprising a set of nodes or vertices \(V\) and a set of edges \(E\) each of which connects two vertices.

graph visitor

A function which iterates over all of the nodes of a graph, calling calling a visitor function for each node.

leaf node

A node in a DAG with no children.

parent node

From the perspective of a child node, the source of an incoming edge.

postorder traversal

A visitor for a DAG in which the each parent node is visited after its children.

preorder traversal

A visitor for a DAG in which each parent node is visited before its children.

root node

A node in a DAG with no parent.

single dispatch function

A function with a single name but multiple implementations. The implementation which executes when the function is called is chosen based on the type of the first argument to the function. The functools.singledispatch() function facilitates the creation of single dispatch functions.


A DAG in which every vertex is the target of at most one edge.

9.9. Exercises

Using the information on the book website obtain the skeleton code for these exercises. The exercises also make use of the book repository which you installed in in Chapter 2.

Exercise 9.6

In the exercise repository, create a package expressions which implements the class hierarchy in Section 9.4.1. When implementing __str__(), use the symbols +, -, *, /, and ^. The names Symbol, Number, Add, Sub, Mul, Div, and Pow should be directly importable in the expressions namespace.

Exercise 9.7

Write a function importable as expressions.postvisitor() with the same interface as Listing 9.8. Your implementation, however, should not be recursive, and should only visit repeated subexpressions once, no matter how many times they occur in the expression.

Exercise 9.8

Write a single dispatch function importable as expressions.differentiate() which has the correct interface to be passed to expressions.postvisitor() or example_code.expression_tools.postvisitor() and which differentiates the expression provided with respect to a symbol whose name is passed as the string keyword argument var.

As a simplification, the tests will assume that var does not appear in an exponent. As an extension, you could consider that case too, but you’d probably need to extend your symbolic language to include the natural logarithm as a symbolic function.