We prove: A sequence of MAKE-SET, LINK, FIND-SET operations, of which are MAKE-SET operations, can be performed on a disjoint-set forest with union by rank and path compression in worst-case time .
In general, we use the variable names from the course book unless otherwise defined. Note that - we use different braces for the ease of reading.
Assign
# operations in total
# MAKE-SET operations (hence )
Set of the nodes in the forest
is not a root and
arbitrary parts of trees (we use in diagrams)
Using the potential method of amortized analysis, our goal is to show the following inequality chain
which especially holds, if
We can arbitrarily set := 0 ( will follow from the definition of ). Hence our goal is to show that any of the operations (MAKE-SET, LINK, FIND-SET) at any position in the sequence of operations has amortized cost
.
Important properties of the function are
for a fixed node can only increase during execution
if
if
Let us define:
We refer to the table on the last page of the handout where we give some values for the two functions where and .
More than once in the proof we will need the properties
for a fixed node can only increase during execution
when level(x) does not change, , for a fixed node , can only increase.
Now we are set to define the potential function:
It is imporant to observe, that only depends on and , to be more specific, only depends on and . To avoid the overuse of indicees, we do not assign to the rank, level and iter functions, although they may vary over the sequence of operations!
Lemma 21.9a: Let . Then after the -th operation the potential of does not change if and do not change, otherwise drops at least by one.
Proof: Follows from the properties of the function level and iter and their image sets. For example in the case that increases by one, can only drop by .
Lemma 21.9b: Let but is not a root then after the -th operation the potential of does not change. Proof follows from definition of , since does not increase.
Amortized cost of MAKE-SET: Assume that the -th operation is a MAKE-SET() operation, where denotes the new node. The actual cost of this operation is = 1. Let us diagram the evolution of the datastructure:
Since we assign to the new node the potential of is . From the definition of MAKE-SET we get that the rank values of the existing nodes do not change. Hence their potential does not change and the computation of the amortized cost yields
Amortized cost of LINK: Assume that the -th operation is a LINK() operation, where and are roots of two trees. W.l.o.g. shall become a node of the remaining root . The actual cost of this operation is = 1. Let us diagram the evolution of the datastructure:
The amortized cost can then be written as
The sums over cancel immediately. The potential of decreases considering the image sets of the level and the iter function:
Either is left alone or it is increased. In either case we have:
So the potential of increases at most by .
From Lemma 21.9a and Lemma 21.9b we conclude that
.
All together, it follows that .
Amortized cost of FIND-SET: Assume that the -th operation is a FIND-SET() operation. The actual cost of this operation is where is the path length from to the root the specific tree. Let us diagram the evolution of the datastructure:
The Find-path is illustrated as a thick line. We regard as the first node on the Find-path. Allow us the remark that in the following lies the very magic of the function and the only justification for its use. We define
lies on the Find-path
for
The amortized cost can then be written as
Again, the sums over the nodes of cancel immediately. Note that if for each set there is a node, (listen to more Bach music) which is has no successor on the find path in . Hence, in total there are at most such nodes. Let us define the set
Note that
By Lemma 21.9a the potential of nodes from does not increase. If we can show that for all nodes the potential , e.g. decreases at least by one, then the amortized cost computation resolves in
To see for all nodes the potential , choose where succeeds in the find path. Note that . Then prior to path compression the following inequalities hold:
Putting these together yields
.
So increases and by Lemma 21.9a we have for all nodes .
We want to conclude with the remark, that we did not say it is going to be easy, and that the are actually and , but we can overcome the constant hidden behind the actual costs by scaling up the 's.
References:
1. CLRS (mostly green cover).
2. Galil et al. Data Structures and Algorithms for Disjoint Set Union Problems. ACM Computing Surveys, Vol. 23, No. 3, September 1991