Introducing the Huffman Coding data structure – Arrays, collections and data structures


119. Introducing the Huffman Coding data structure

The Huffman Coding algorithm was developed by David A. Huffman in 1950 and it can be easily understood via an example. Let’s assume that we have the string from the following figure.

Figure 5.40 – Initial string

Let’s assume that each character needs 8 bits to be represented. Since we have 14 characters we can say that we need 8*14=112 bits to send this string over a network.

Encode the string

The idea of Huffman Coding is to compress (shrink) such strings to a smaller size. For this, we create a tree of character frequencies. A Node of this tree can be shaped as follows:

public class Huffman {
  private Node root;
  private String str;
  private StringBuilder encodedStr;
  private StringBuilder decodedStr;
  private final class Node {
      
    private Node left;
    private Node right;
    private final Character character;
    private final Integer frequency;
    //  constructors
  }
  …
}

For instance, the following figure calculates the frequency of each character from our string in ascending order:

Figure 5.41 – Calculating the frequency of each character

After sorting, these characters are stored in a priority queue (PQ). Each character will become a leaf node in a tree by following several steps:

Step 1: Create a node with two children (a partial tree). The left child holds the minimum frequency, and the right child holds the next minimum frequency. The node itself holds the sum of its left and right children.

Step 2: Remove these two frequencies from PQ.

Step 3: Insert in PQ this partial tree.

Step 4: Repeat steps 1-3 until PQ is empty and we obtain a single tree from these partial trees.

If we apply Steps 1-3 twice we obtain the following figure:

Figure 5.42 – Apply Steps 1-3 once

In code lines these steps are coded as follows:

public void tree(String str) {
  this.str = str;
  this.root = null;
  this.encodedStr = null;
  this.decodedStr = null;
  Map<Character, Integer> frequency = new HashMap<>();
  for (char character : str.toCharArray()) {
    frequency.put(character,
           frequency.getOrDefault(character, 0) + 1);
  }
  PriorityQueue<Node> queue = new PriorityQueue<>(
  Comparator.comparingInt(ch -> ch.frequency));
  for (Entry<Character, Integer> entry:frequency.entrySet()) {
    queue.add(new Node(entry.getKey(), entry.getValue()));
  }
  while (queue.size() != 1) {
    Node left = queue.poll();
    Node right = queue.poll();
    int sum = left.frequency + right.frequency;
    queue.add(new Node(null, sum, left, right));      
  }
  this.root = queue.peek();
}

Repeating these steps until PQ is empty we obtain the final tree. Next, for each node of this tree that is not a leaf, we assign the value 0 to the left edge and 1 to the right edge. This is the encoding step that can be coded as follows:

public String encode() {
  Map<Character, String> codes = new HashMap<>();
  encode(this.root, “”, codes);
  this.encodedStr = new StringBuilder();
  for (char character : this.str.toCharArray()) {
    this.encodedStr.append(codes.get(character));
  }      
  return this.encodedStr.toString();
}
private void encode(Node root, String str,
                    Map<Character, String> codes) {
  if (root == null) {
    return;
  }
  if (isLeaf(root)) {
    codes.put(root.character, str.length() > 0 ? str : “1”);
  }
  encode(root.left, str + ‘0’, codes);
  encode(root.right, str + ‘1’, codes);
}

The final result looks like this:

Figure 5.43 – The final tree

Now, sending this tree over the network will send the compressed string. The next figure gives us the new size of this string:

Figure 5.44 – The size of the compressed string

So, we reduce the size of the string from 112 bits to 41 bits. This is the compressed or encoded string.

Decode the string

Decoding the string is a simple step. We take each code and traverse the tree to find the assigned character. For instance, we take 0111 and we find d, we take 110 and we find a, and so on. Decoding can be implemented as follows:

public String decode() {
  this.decodedStr = new StringBuilder();
  if (isLeaf(this.root)) {         
    int copyFrequency = this.root.frequency;
    while (copyFrequency– > 0) {
      decodedStr.append(root.character);
    }
  } else {
    int index = -1;
    while (index < this.encodedStr.length() – 1) {
      index = decode(this.root, index);
    }
  }
  return decodedStr.toString();
}
private int decode(Node root, int index) {
  if (root == null) {
    return index;
  }
  if (isLeaf(root)) {
    decodedStr.append(root.character);
    return index;
  }
  index++;
  root = (this.encodedStr.charAt(index) == ‘0’)
    ? root.left : root.right;
  index = decode(root, index);
  return index;
}
private boolean isLeaf(Node root) {
  return root.left == null && root.right == null;
}

After processing all codes we should obtain the decoded string.

Leave a Reply

Your email address will not be published. Required fields are marked *