Introduction
Create a graph
Run the algorithm
Description of the algorithm
Exercise
More

Minimum spanning forest (MSF) shown in green!

The Minimum Spanning Tree Algorithm

A telecommunication company wants to connect all the blocks in a new neighborhood. However, the easiest possibility to install new cables is to bury them along roads. So the company decides to use hubs which are placed at road junctions. How can the cost be minimized if the price for connecting two hubs corresponds to the length of the cable attaching them?

Considering the roads as a graph, the above example is an instance of the Minimum Spanning Tree problem. Prim's and Kruskal's algorithms are two notable algorithms which can be used to find the minimum subset of edges in a weighted undirected graph connecting all nodes.

This tutorial presents Kruskal's algorithm which calculates the minimum spanning tree (MST) of a connected weighted graphs. If the graph is not connected the algorithm will find a minimum spannig forest (MSF). For a comparison you can also find an introduction to Prim's algorithm.

What do you want to do first?

SVG Download

Legend

	Node
	Edge with weight 50

Legend

Which graph do you want to execute the algorithm on?

Start with an example graph:

Select

Modify it to your desire:

To create a node, make a double-click in the drawing area.
To create an edge, first click on the output node and then click on the destination node.
The edge weight can be changed by double clicking on the edge.
Right-clicking deletes edges and nodes.

Download the modified graph:

Download

Upload an existing graph:

occured when reading from file:

the contents:

What next?

SVG Download

Legend

	Node that is currently being processed
	Edge used for building Minimum Spanning Tree.

Legend

Algorithm status

rewind fast forward

Explanation
Pseudocode

Algorithm status will appear here

BEGIN

T ← ∅

queue ← sort {u, v} edges of E using l.

FOREACH v in G.V

make-tree(v);

WHILE queue ≠ ∅ AND trees-count > 1 DO

{u, v} ← queue.extractMin()

IF !(T ∪ {{u, v}} has cycle)

T.add({u, v})

merge(tree-of(u), tree-of(v))

END

If you switch tabs, the execution will be terminated.

You can open another browser window to read the description in parallel.

What is the cheapest way to connect all vertices?

Minimum Spanning Tree

Given a connected, undirected graph, a spanning tree of that graph is an acyclic subgraph that connects all vertices. Obviously, a graph may have many spanning trees. Weights may be assigned to each edge of the graph, then the total weight of a subgraph is the sum of its edge weights. The spanning tree with the weight less than or equal to all other spanning trees is called the minimum spanning tree (MST). More generally, any undirected graph has a minimum spanning forest (MSF), which is a union of minimum trees for its connected components.

Kruskal's algorithm is a greedy algorithm (a problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum) that efficiently finds the minimum spanning tree for any connected weighted undirected graph. In the case that the graph is not connected this algorithm will calculate minimum spanning forest without need for any further modifications.

Idea of the algorithm

Kante vor Update — An edge may be connecting two vertices in the same tree or in different trees. If the later is the case, then the trees are merged.

The algorithm initializes a forest consisting of trees with single nodes. The edges are sorted in a queue based on their weights. In each round an edge is removed from the queue. If the edge's endpoints belong to different trees then the trees are merged using the edge. The process continues until all the nodes are in the same tree or the edge-queue is empty.

The most efficient way to implement the algorithm is using a Union-Find data structure which provides efficient set operations such as the union of two sets and checking the membership of an element. When an edge is removed from the queue this data structure makes it possible to easily check if two endpoints of the edge belong to the same tree. If this is the case, the trees, which are presented as sets, can be easily merged.

The algorithm may be described step-by-step. First is the initialization part:

All the edges are sorted based on their weight, and they are added to a queue (a priority queue may be used instead).
Each vertex of the graph is added to the Union-Find data structure. The data structure assigns an ID to each vertex of the graph. This ID represents the tree which the node belongs to. These IDs were represented by the colors during the execution of the algorithm.

Next, the algorithm iterates over the queue. In each round, the edge in front of the queue is extracted. Then the two endpoints are compared. If their IDs are not equal (they belong to different trees), then the trees which contain these endpoint are merged. A Union-Find data structure merges two trees by assigning a new ID to all vertices contained in both trees. In this case the edge is also added to the resulting tree. Otherwise, the endpoints are in the same tree, so adding the edge would result in a circle in the tree.

The above iteration continues until no more edges are included in the queue, or all vertices are contained in the same tree (their IDs are equal).

Construction of the MSF

When two endpoints of an edge are checked during the algorithm, this edge is added to the result if it is connecting two different trees. The resulting minimum spanning forest may be represented as the union of all such edges.

What now?

Create a graph and play through the algorithm

Test your knowledge in the exercises

SVG Download

Legend

	Node that is currently being processed
	Edge used for building Minimum Spanning Tree.

Legend

Test your knowledge: How does the algorithm decide?

rewind Forward: Next question

Explanation
Pseudocode

BEGIN

T ← ∅

queue ← sort {u, v} edges of E using l.

FOREACH v in G.V

make-tree(v);

WHILE queue ≠ ∅ AND trees-count > 1 DO

{u, v} ← queue.extractMin()

IF !(T ∪ {{u, v}} has cycle)

T.add({u, v})

merge(tree-of(u), tree-of(v))

END

If you switch tabs, the execution will be terminated.

You can open another browser window to read the description in parallel.

What is the pseudocode of the algorithm?

Input: Weighted, undirected graph G = (V, E) with weight function l.
Output: A list T of edges representing the MST (or MSF if G is not connected).

BEGIN
  T ← ∅
  queue ← sort {u, v} edges of E using l.
  FOREACH v in G.V
    make-tree(v);
  WHILE queue ≠ ∅ AND trees-count > 1 DO
    {u, v} ← queue.extractMin()
    IF !(T ∪ {{u, v}} has cycle)
      T.add({u, v})
      merge(tree-of(u), tree-of(v))	
END

How fast is the algorithm?

Speed of the algorithms

The Speed of an algorithm is the total number of individual steps which are performed during the execution. These steps are for example:

Assignments – Set distance of a node to 20.
Comparisons – Is 20 greater than 23?
Comparison and assignment – If 20 is greater than 15, set variable n to 20
Simple Arithmetic Operations – What is 5 + 5?

Since it can be very difficult to count all individual steps, it is desirable to only count the approximate magnitude of the number of steps. This is also called the running time of an algorithm. Particularly, it is intersting to know the running time of an algorithm based on the size of the input (in this case the number of the vertices and the edges of the graph).

Running time of Kruskal's algorithm

First, sorting the edges using comparison sorts, or adding them to a priority queue can be done proportional to m · log(m) steps (this allows to perform the extractMin() operation in constant time). As m ≤ n * n this time is at most m · log(n^2) = 2 * m · log(n)

Next, a disjoint-set data structure (Union&Find) is used to keep track of which vertices are in which tree. Even a simple disjoint-set data structure can perform operations proportional to log(size).

In each iteration, two find-set() operation and possibly one union() operation are performed. Considering all the edges, the total time required for processing the loop will be a multiply of m · log(n). Adding this value to the time required for sorting the edges, the total time needed for running the algorithm is proportional to m · log(n).

Provided that edges are already sorted or can be sorted in linear time (i.e using counting sort or radix sort), the algorithm can use more sophisticated disjoint-set data structure to run proportional to m · α(n), where α is the extremely slowly growing inverse of the single-valued Ackermann function.

Does the algorithm work correctly?

The proof consists of two parts. First, it is proved that the algorithm produces a spanning tree. Second, it is proved that the constructed spanning tree is of minimal weight.

Spanning tree

Let G=(V, E) be a connected, weighted graph and let T be the subgraph of G produced by the algorithm. T cannot have a cycle because the algorithm only chooses those edges which are connecting two different trees. T cannot be disconnected, since the first encountered edge that joins two components of T would have been added by the algorithm. Thus, T is a spanning tree of G.

Minimality

We show that the following proposition P is true by induction: If E1 is the set of edges chosen at any stage of the algorithm, then there is some minimum spanning tree that contains E1.

Clearly P is true at the beginning, when E1 is empty: any minimum spanning tree will do, and there exists one because a weighted connected graph always has a minimum spanning tree.

T1 shown green.
Now assume P is true for some non-final edge set E1 and let T1 be a minimum spanning tree that contains E1. If the next chosen edge e is also in T1, then P is true for the union of E1 and e. Otherwise, this union T1 + {e} has a cycle C and there is another edge f that is in C but not E1. (If there were no such edge f, then e could not have been added to E1, since doing so would have created the cycle C.) Then T1 − f + e is a tree, and it has the same weight as T1, since T1 has minimum weight and the weight of f cannot be less than the weight of e, otherwise the algorithm would have chosen f instead of e. So T1 − f + e is a minimum spanning tree containing E1 + e and again P holds.
Therefore, by the principle of induction, P holds when E1 has become a spanning tree, which is only possible if E1 is a minimum spanning tree itself.

Where can I find more information about graph algorithms?

Other graph algorithms are explained on the Website of Chair M9 of the TU München.

Furthermore there is an interesting book about shortest paths: Das Geheimnis des kürzesten Weges

Studying mathematics at the TU München answers all questions about graph theory (if an answer is known).

A last remark about this page's content, goal and citations

Chair M9 of Technische Universität München does research in the fields of discrete mathematics, applied geometry and the mathematical optimization of applied problems. The algorithms presented on the pages at hand are very basic examples for methods of discrete mathematics (the daily research conducted at the chair reaches far beyond that point). These pages shall provide pupils and students with the possibility to (better) understand and fully comprehend the algorithms, which are often of importance in daily life. Therefore, the presentation concentrates on the algorithms' ideas, and often explains them with just minimal or no mathematical notation at all.

Please be advised that the pages presented here have been created within the scope of student theses, supervised by Chair M9. The code and corresponding presentation could only be tested selectively, which is why we cannot guarantee the complete correctness of the pages and the implemented algorithms.

Naturally, we are looking forward to your feedback concerning the page as well as possible inaccuracies or errors. Please use the suggestions link also found in the footer.

To cite this page, please use the following information:

Title: Kruskal's Algorithm
Authors: Wolfgang F. Riedl, Reza Sefidgar; Technische Universität München
Link: https://www-m9.ma.tum.de/graph-algorithms/mst-kruskal