Note: This is not really a thorough treatment of the Catalan Numbers, just a small one, motivated by the "Uniscraper" problem in this week's SRM 115 (Level-3, Division I).

If you haven't read the problem statement yet, you might want to do so now. But here's what I need for the rest of this page: The problem deals with a card game, where cards are dealt in a triangle, namely like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36Those are the 36 positions where cards can be (zero or one cards on each position). A card distribution is "valid" if for every position,

- the position is empty (no card lies on it)
or- it does contain a card and the positions directly above (upper left and upper right) also contain cards.
For example, this distribution (where a black circle represents a position with a card and a white circle is an empty position)

is valid, but this one is not:

A question of interest is "How many valid distributions are there?". That's because this number can make the difference between the need for strong optimization and the possibility of more or less stupid brute force. It turned out that for the side length 8 used in the problem statement, this number is 4862, a fairly small number. Also surprising is that this is the 9'th so-called "Catalan Number". In fact, for a board of side length N, the number of valid card distributions is the (N+1)th Catalan Number.

Note: As you will see in the following, Catalan Numbers appear in a lot of places (even many more than I will mention here). That's why they're very interesting. First of all, it might decide between optimization or brute force. Secondly, they bring together things that at first sight you wouldn't think have much to do with each other...

Now what exactly are these numbers? Here are the first few:

N 1 2 3 4 5 6 7 8 9 10 N-th Catalan Number 1 2 5 14 42 132 429 1430 4862 16796 For the general case, I could for example give you one of these definitions (where C(N) denotes the N-th Catalan number):

C(N) := choose(2N,N) / (n+1)

C(0) := 1, and C(N+1) := Sum(i=0 to n) C(i)*C(n-i)

C(N) := The number of valid Uniscraper distributions of a triangle with side length N-1.

C(N) := The number of binary trees that can be built with N nodes.

C(N) := The number of possible triangulations (with N triangles) of a convex polygon with N+2 sides.

It turns out that all these definitions are equivalent. But there are big differences in quality:

1. and 2. give a definition without any meaning. Without meaning, they're senseless. These might be helpful when we need to calculate the values, but they're bad definitions in a deeper sense.

Definition 3. is bad because:

For C(0) we talk about negative side lengths.

This is where we are coming from. When we want to find a connection between Uniscraper and Catalan Numbers, we shouldn't *define* them to be equal. That's cheating ;-)

4. and 5. are great, and the binary tree problem is probably the problem where most of us first saw the Catalan Numbers. Unfortunately, these don't come with mathematical formulas to calculate the values. But it's fairly easy to prove that the number of binary trees is equivalent to the recursive 2. definition.

On this page, I will therefore use the 4. definition (the one with binary trees).

Sorry, I haven't done this section yet. It should contain an example (the C(3)=5 binary trees built from 3 nodes) and a proof that the 2. and 4. definition from above are equivalent. However, you don't really need this section to go on. Well, here's at least the example:

Btw, the proof of equivalence between definitions 2. and 4. is extremely similar to the proof that our 2. definition also describes the number of correct terms of parentheses (you'll find this below). So if you want to, maybe try to prove the connection yourself...

Consider our example distribution again:

Now let's add a grid:

And then draw a red line on the grid from the left to the right that separates the black circles from the white ones:

Let's call this red line a "mountain". More precisely, a mountain is such a red line on the grid that starts on the left end, ends on the right end, and always goes right up or right down. It should be easy to see that the number of possible mountains is exactly the number of valid card distributions. That's because every card distribution can be transformed to exactly one mountain and vice versa.

Ok, now we project the mountain to the plane, converting each "up" part (when going from left to right) to an opening parenthesis and each "down" part to a closing parenthesis:

Again, it should be fairly easy to see that the number of mountains is equal to the number of "correct" terms of parentheses (for example, "())(" is not correct, because the second closing parenthesis is not closing any preceding opening parenthesis).

Ok, the final step will be to show that the number of correct terms of parentheses is equivalent to the 2. definition of C(N). But wait a second, what N are we talking about? Well, it turns out that a triangle of side length N will result in N+1 pairs of parentheses (check it if you want to). What I mentioned earlier is that there are C(N+1) valid card distributions for a triangle of size N. To finish the chain and prove that statement, we only have to prove now that the number of correct terms of N parentheses pairs is exactly C(N). Well, we'll at least show that it's equivalent to our 2. definition for C(N).

Base case (N=0): There's exactly one way to write down zero pairs of parentheses on a sheet of paper, namely to leave the sheet empty. Good, so far we say the same as our 2. definition.

Step case (N+1): The first parenthesis must be an opening one. Now consider where we put the corresponding closing parentheses. More precisesly, consider how many other pairs we put between them. Well, we can put any number of pairs in there from zero (none) to N (all). And when we put i pairs inside, then the remaining n-i pairs must be appended after the closing parentheses (for example

). These two parts can be any correct parentheses term with i (respectively n-i) pairs. That's where we get the

"((()()))()((())())" C(N+1) := Sum(i=0 to n) C(i)*C(n-i)

from: For each choice of i, we have C(i) ways to fill the first part and C(n-i) ways to fill the second one. Multiply these and add all possibilities together.

In case you want to know more about Catalan Numbers, you might want to look here:

- http://mathworld.wolfram.com/CatalanNumber.html
- http://www-gap.dcs.st-and.ac.uk/~history/Miscellaneous/CatalanNumbers/catalan.html
One thing you can learn there is that C(N) is approximately

4^N ------------------ (about 5477 for N=9) sqrt(Pi) * N^(3/2)Btw, the standard text book Introduction to Algorithms also covers the Catalan Numbers in a nice problem (well, the problem uses only formal math, no visualization, so it's not the nicest problem I've ever seen). And from The Book of Numbers I got the idea of the mapping between mountains and parentheses terms. It also has some more examples where Catalan Numbers occur and connects all of them graphically, so that you get a better understanding of the connections.

Stefan Pochmann Last modified: Fri Oct 4 02:19:23 MET DST 2002