top of page

ALGORYTHM | Compress Like a Pro: Huffman Coding

Have you ever wondered how your phone fits gigabytes of photos and videos? Or how websites load lightning fast despite mountains of data?

One unsung hero behind this magic is Huffman coding, a clever algorithm that squeezes the most out of digital information. Let's dive into the fascinating world of Huffman coding and discover how it compresses data without losing a single bit!

Imagine Morse code: Each letter has a unique sequence of dots and dashes. Similarly, computers represent text, images,and everything digital using binary codes (0s and 1s). But some characters appear more frequently than others. Wouldn't it be smart to assign shorter codes to frequently used characters, making the overall data size smaller?


This is the core idea behind Huffman coding. It analyzes the frequency of each symbol in your data and builds a tree:


  • Characters with higher frequencies get shorter codes, like the most common letter in Morse code having fewer dots and dashes.

  • Less frequent characters get longer codes, ensuring efficient representation for all.


Building this tree isn't chaotic – it uses a greedy approach, always merging the least frequent symbols first. This guarantees an optimal code, meaning no other arrangement can achieve smaller overall data size.


The benefits of Huffman coding are numerous:

  • Smaller file sizes: Imagine squeezing an entire book into a smaller email attachment!

  • Faster transmission: Smaller files transfer quicker, improving website loading speeds and reducing download times.

  • Efficient storage: Compressing data saves precious storage space on your devices and servers.

import heapq
class Node:
def init(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def lt(self, other):
return self.freq < other.freq
def build_huffman_tree(text):
"""Builds a Huffman tree from the given text."""
freq = {} # Frequency of each character
for char in text:
freq[char] = freq.get(char, 0) + 1
heap = []
for char, f in freq.items():
heapq.heappush(heap, Node(char, f))
while len(heap) > 1:
node1 = heapq.heappop(heap)
node2 = heapq.heappop(heap)
new_node = Node(None, node1.freq + node2.freq)
new_node.left = node1
new_node.right = node2
heapq.heappush(heap, new_node)
return heapq.heappop(heap)
def generate_codes(node, code=""):
"""Generates Huffman codes for each character."""
codes = {}
if node:
if node.char:
codes[node.char] = code
generate_codes(node.left, code + "0")
generate_codes(node.right, code + "1")
return codes
def encode(text, codes):
"""Encodes the text using the given Huffman codes."""
encoded_text = ""
for char in text:
encoded_text += codes[char]
return encoded_text
def decode(encoded_text, tree):
"""Decodes the encoded text using the Huffman tree."""
decoded_text = ""
current_node = tree
for bit in encoded_text:
if bit == "0":
current_node = current_node.left
elif bit == "1":
current_node = current_node.right
if current_node.char:
decoded_text += current_node.char
current_node = tree
return decoded_text
# Example usage:
text = "AAABBAAC"
tree = build_huffman_tree(text)
codes = generate_codes(tree)
encoded_text = encode(text, codes)
decoded_text = decode(encoded_text, tree)
print("Huffman Codes:", codes)
print("Encoded Text:", encoded_text)
print("Decoded Text:", decoded_text)






But it's not all sunshine and rainbows:

  • Decompression overhead: Unpacking the compressed data requires extra processing power, which might be negligible for small files but significant for larger ones.

  • Universal format needed: Both sender and receiver need to agree on the Huffman code used, limiting its application in some scenarios.


Despite these limitations, Huffman coding remains a fundamental algorithm with diverse applications. It's used in:

  • File compression formats: ZIP, RAR, and PNG files often utilize Huffman coding.

  • Image and video compression: JPEG and MPEG standards leverage this technique for efficient storage and transmission.

  • Data transmission protocols: Modems and communication protocols sometimes employ Huffman coding for faster data exchange.

So, the next time you send a compressed file or enjoy a speedy website, remember the invisible hero working behind the scenes – the ingenious Huffman coding algorithm, efficiently managing your digital world!


Pied Piper anyone?


Comments


bottom of page