String Algorithms

Rabin-Karp Substring Search Algorithm

Given two strings:

Text (T) of length n
Pattern (P) of length m

The goal is to find all occurrences of the pattern P in the text T.

Brute-force substring search compares the pattern at every position in the text. Its worst-case time is O(n * m).

Rabin-Karp improves this by using hashing. It compares hashes instead of characters, so multiple character comparisons are replaced by simple number comparisons.

Convert the pattern and substrings of the text to numbers using a hash function.
Instead of comparing strings, compare their hashes.
Only do full character comparison if the hashes match (to confirm the match, in case of hash collisions).

This gives us average-case linear time, and it's especially useful when:

We are searching for multiple patterns.
We want a fast filtering technique.

How the Hash Works

The pattern and substrings are treated as numbers in a base (commonly base 256, for ASCII).

Example (simplified):

Assume base = 10.

String "abc" → hash = a × 10² + b × 10¹ + c × 10⁰
If a = 1, b = 2, c = 3 → hash = 1×100 + 2×10 + 3 = 123

We use a rolling hash to compute the next substring's hash in constant time, using the previous hash:

bashCopyEdithash(i+1) = (base × (hash(i) - T[i] × base^(m-1)) + T[i+m]) % prime

A prime number is used to mod the hash to prevent overflow and reduce collisions.

Step-by-Step Algorithm

Choose:
- A base (like 256)
- A large prime number (say, 101)
Compute the hash of the pattern.
Compute the hash of the first window (first m characters) of the text.
Slide the window:
- At each step, compare the hash of the window with the pattern hash.
- If they match, compare the actual substrings (to avoid false matches due to collisions).
- Use rolling hash to compute next window’s hash efficiently.

Implementation

Find all occurrences of a pattern string in a text string.

Text: "abedabcabcabcde" Pattern: "abc"

1. Brute Force Approach

Slide the pattern over the text one character at a time.
At each position, compare the entire pattern with the substring from the text.
If all characters match, we found an occurrence.

public class BruteForceSearch {
    public static void search(String text, String pattern) {
        int n = text.length();
        int m = pattern.length();

        for (int i = 0; i <= n - m; i++) {
            int j = 0;
            // Compare character by character
            while (j < m && text.charAt(i + j) == pattern.charAt(j)) {
                j++;
            }
            // If full pattern matched
            if (j == m) {
                System.out.println("Pattern found at index " + i);
            }
        }
    }

    public static void main(String[] args) {
        String text = "abedabcabcabcde";
        String pattern = "abc";
        search(text, pattern);
    }
}

Output

Pattern found at index 4  
Pattern found at index 7  
Pattern found at index 10

Rabin-Karp Approach

Convert the pattern and substrings of the text into hash values.
Slide a window and compare hash values.
If hashes match, do a character-level comparison (to handle collisions).
Use rolling hash to compute the next window hash in O(1) time.

public class RabinKarpSearch {
    public static void search(String text, String pattern) {
        int base = 256;             // For ASCII characters
        int prime = 101;            // A prime number to reduce hash collisions
        int n = text.length();
        int m = pattern.length();

        int patternHash = 0;        // Hash of the pattern
        int windowHash = 0;         // Hash of current window in text
        int h = 1;

        // Compute base^(m-1) % prime, used in removing the first digit
        for (int i = 0; i < m - 1; i++) {
            h = (h * base) % prime;
        }

        // Calculate hash for pattern and first window
        for (int i = 0; i < m; i++) {
            patternHash = (base * patternHash + pattern.charAt(i)) % prime;
            windowHash = (base * windowHash + text.charAt(i)) % prime;
        }

        // Slide the window
        for (int i = 0; i <= n - m; i++) {
            // If hash matches, check characters one by one
            if (patternHash == windowHash) {
                boolean match = true;
                for (int j = 0; j < m; j++) {
                    if (text.charAt(i + j) != pattern.charAt(j)) {
                        match = false;
                        break;
                    }
                }
                if (match) {
                    System.out.println("Pattern found at index " + i);
                }
            }

            // Compute hash for next window
            if (i < n - m) {
                windowHash = (base * (windowHash - text.charAt(i) * h) + text.charAt(i + m)) % prime;

                // Handle negative hash
                if (windowHash < 0) {
                    windowHash += prime;
                }
            }
        }
    }

    public static void main(String[] args) {
        String text = "abedabcabcabcde";
        String pattern = "abc";
        search(text, pattern);
    }
}

Output

Pattern found at index 4  
Pattern found at index 7  
Pattern found at index 10

Time and Space Complexity

Case

Time Complexity

Why

Best/Average

O(n + m)

Because of rolling hash and few hash matches

Worst

O(n * m)

Too many hash collisions lead to full comparisons

Space

O(1) or O(m)

For hash calculation

PreviousArray Algorithms NextProblems - Set 1

Last updated 4 months ago