Recursion Schemes in Haskell

<aside> ⚠️

WARNING: This page discusses recursion schemes in Haskell

</aside>

Introduction

For any recursive datatype, such as

data Tree a = Leaf a | Branch [Tree a]

we can factor out self-reference into a type parameter, as in

data TreeF a s = LeafF a | BranchF [s]

This new type represents a “single layer” of a tree, where the type of leaves is given by a and the type of sublayers is given by s.

The type TreeF is called the base functor for Tree. We can recover the original datatype as

-- Infinite self-application of a functor: Fix f ≅ f (Fix f)
data Fix f = Fix (f (Fix f))

Fix (TreeF a) ≅ Tree a

These gymnastics are useful because we can now write a function encapsulating recursion:

-- Collapse a structure by folding it bottom-up
fold :: Functor f => (f r -> r) -> Fix f -> r
fold f (Fix whole) = f (fold f <$> whole)

The input f r -> r specifies how to collapse a single layer of our structure, granted that sublayers have already been collapsed. It is sometimes called an f-algebra. fold then handles the work of applying that function to each layer of the structure, bottom-up.

For instance, we can specify a recursive sum-of-leaves like this:

type Tree a = Fix (TreeF a)

sumTree :: Tree Int -> Int
sumTree = fold \\case
	LeafF a -> a
	BranchF (as :: [Int]) -> sum as

The function fold (also known as cata) is a recursion scheme — it gives a strategy for walking over a recursive structure, without tying itself to any particular structure.

There are loads of recursion schemes; this is just one. Read on!

Related Resources

The Haskell data-fix package provides a basic collection of recursion schemes over the Fix datatype. It also works with the Mu and Nu combinators, which I know less about.
The Haskell recursion-schemes package is a more powerful option, providing more generality, better ergonomics, and a greater variety of recursion schemes, at the cost of more complexity.