String Algorithms
Rabin-Karp Substring Search Algorithm
Given two strings:
Text (T) of length
nPattern (P) of length
m
The goal is to find all occurrences of the pattern P in the text T.
Brute-force substring search compares the pattern at every position in the text. Its worst-case time is O(n * m).
Rabin-Karp improves this by using hashing. It compares hashes instead of characters, so multiple character comparisons are replaced by simple number comparisons.
Convert the pattern and substrings of the text to numbers using a hash function.
Instead of comparing strings, compare their hashes.
Only do full character comparison if the hashes match (to confirm the match, in case of hash collisions).
This gives us average-case linear time, and it's especially useful when:
We are searching for multiple patterns.
We want a fast filtering technique.
How the Hash Works
The pattern and substrings are treated as numbers in a base (commonly base 256, for ASCII).
Example (simplified):
Assume base = 10.
String "abc" → hash = a × 10² + b × 10¹ + c × 10⁰
If a = 1, b = 2, c = 3 → hash = 1×100 + 2×10 + 3 = 123
We use a rolling hash to compute the next substring's hash in constant time, using the previous hash:
A prime number is used to mod the hash to prevent overflow and reduce collisions.
Step-by-Step Algorithm
Choose:
A base (like 256)
A large prime number (say, 101)
Compute the hash of the pattern.
Compute the hash of the first window (first m characters) of the text.
Slide the window:
At each step, compare the hash of the window with the pattern hash.
If they match, compare the actual substrings (to avoid false matches due to collisions).
Use rolling hash to compute next window’s hash efficiently.
Implementation
Find all occurrences of a pattern string in a text string.
Text: "abedabcabcabcde"
Pattern: "abc"
1. Brute Force Approach
Slide the pattern over the text one character at a time.
At each position, compare the entire pattern with the substring from the text.
If all characters match, we found an occurrence.
Output
Rabin-Karp Approach
Convert the pattern and substrings of the text into hash values.
Slide a window and compare hash values.
If hashes match, do a character-level comparison (to handle collisions).
Use rolling hash to compute the next window hash in
O(1)time.
Output
Time and Space Complexity
Best/Average
O(n + m)
Because of rolling hash and few hash matches
Worst
O(n * m)
Too many hash collisions lead to full comparisons
Space
O(1) or O(m)
For hash calculation
Last updated