Data Structures, Algorithms, & Applications in C++
Chapter 18, Exercise 47

(a)
We do this by using induction on n.

Induction Base
When n = 0 the binary tree has no internal node and 1 external node. For this tree E = I = n = 0. Therefore, E = I + 2n.

Induction Hypothesis
Let m be any integer >= 0. Assume that E = I + 2m for all binary trees that have m internal nodes.

Induction Step
We will show that E = I + 2n for all binary trees that have m + 1 internal nodes. Consider any binary tree T that has m + 1 internal nodes. Remove any one of the internal nodes that is a leaf. The resulting tree, T' has m internal nodes. From the induction hypothesis it follows that E' = I' + 2m where E' and I' are, respectively, the external and internal path lengths of T'.

Suppose that the removed leaf was at level level of T. It follows that E = E' + level + 2 and that I = I' + level where E and I are, respectively, the external and internal path lengths of T. Therefore,
E = E' + level + 2
= I' + 2m + level + 2
= I - level + 2m + level + 2
= I + 2(m + 1)


(b)
The internal path length of a binary tree that has n internl nodes is sum (i=1 to n) leveli where leveli is the length of the path from the root to internal node i. In a complete binary tree one leveli is 0, two levelis are 1, four levelis are 2, eight levelis are 4, and so on. Further, there is no binary tree that has more levelis equal to j for any nonnegative integer j. Therefore, a complete binary tree minimizes the internal path length.

(c)
We will prove this equality by using induction on n.

Induction Base
When n = 0 the right side evaluates to 1*0 - 21 + 2 = 0, which equals the internal path length of a binary tree that has no internal nodes.

Induction Hypothesis
Let m be any integer >= 0. Assume that the equality is correct for all binary trees that have n = m internal nodes.

Induction Step
We will show that I = (n+1)p - 2p+1 + 2 for all binary trees that have n = m + 1 internal nodes. Consider any binary tree T that has m + 1 internal nodes. Remove any one of the internal nodes that is a leaf. The resulting tree, T' has m internal nodes. We consider the two cases (1) p = floor(log2(m+2)) = floor(log2(m+1)) and (2) p = floor(log2(m+2)) = floor(log2(m+1)) + 1.

Case 1:
From the induction hypothesis it follows that the internal path length of T' is (m+1)p - 2p+1 + 2. The internal path length of T is p more than that of T'. Therefore, the internal path length of T is
(m+1)p - 2p+1 + 2 + p
= (m+2)p - 2p+1 + 2
.

Case 2:
From the induction hypothesis it follows that the internal path length of T' is (m+1)(p-1) - 2p + 2. The internal path length of T is p-1 more than that T'. Furthermore, 2p = m + 2. Therefore, the internal path length of T is
(m+1)(p-1) - 2p + 2 + p - 1
= (m+1)p - (m+1) - (m+2) + 2 + p - 1
= (m+2)p - 2(m+2) + 2
= (m+2)p - 2p+1 + 2


(d)
From part (a) we see that a binary tree than has minimum internal path length also has minimum external path length. Therefore, part (b) implies that complete binary trees have minimum external path length also. Now from part (c) we get (n+1)p - 2p+1 + 2n + 2 as the minimum value of E.

(e)
From parts (a) and (b) it follows that a complete binary tree has the smallest average external path. From part (d) we see that this smallest average external path length is ((n+1)p - 2p+1 + 2n + 2)/(n + 1) =; p - 2p+1/(n + 1) + 2 = Omega(log n).

Therefore the average external path length of every binary tree that has n internal nodes is Omega(log n).