Example
Web Log Analysis – Count 404 Errors
Parse a large web server log file and count how many times the server returned a 404 (Not Found) HTTP status code.
Input Sample (webserver.log)
127.0.0.1 - - [23/Jul/2024:10:00:00 +0000] "GET /index.html HTTP/1.1" 200 1024
127.0.0.1 - - [23/Jul/2024:10:01:00 +0000] "GET /notfound.html HTTP/1.1" 404 512
127.0.0.1 - - [23/Jul/2024:10:02:00 +0000] "GET /page.html HTTP/1.1" 404 0Solution 1: Using Spring libraries
Sample Log File (webserver.log)
webserver.log)127.0.0.1 - - [23/Jul/2024:10:00:00 +0000] "GET /index.html HTTP/1.1" 200 1024
127.0.0.1 - - [23/Jul/2024:10:01:00 +0000] "GET /notfound.html HTTP/1.1" 404 512
127.0.0.1 - - [23/Jul/2024:10:02:00 +0000] "GET /page.html HTTP/1.1" 404 0Place this file in src/main/resources/ as webserver.log.
Java Code: Simulating MapReduce
package com.example.logprocessor;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Stream;
@SpringBootApplication
public class WebLog404CounterApp implements CommandLineRunner {
public static void main(String[] args) {
SpringApplication.run(WebLog404CounterApp.class, args);
}
@Override
public void run(String... args) throws Exception {
AtomicInteger count404 = new AtomicInteger(0);
try (BufferedReader reader = new BufferedReader(new InputStreamReader(
getClass().getClassLoader().getResourceAsStream("webserver.log")))) {
Stream<String> lines = reader.lines();
// Mapper + Reducer (Combined using Stream API)
lines.parallel()
.map(line -> {
String[] tokens = line.split(" ");
if (tokens.length > 8 && "404".equals(tokens[8])) {
return 1; // 404 found
} else {
return 0;
}
})
.forEach(count -> count404.addAndGet(count));
System.out.println("Total 404 Errors: " + count404.get());
}
}
}Output
Sales Aggregation – Total Sales per Product
Given a file containing sales data (product,price), calculate the total revenue per product.
Input Sample (sales.txt)
Solution 1: Using Hadoop
1. Mapper Class
2. Reducer Class
3. Driver Class
Running the MapReduce Job
hadoop jar sales-aggregation.jar com.example.SalesDriver /input /output
This does the following:
Sets up the Hadoop environment with the configuration and resources.
Initializes YARN or LocalRunner (depending on cluster mode).
Submits your job using the Hadoop JobTracker / ResourceManager.
Spawns Mapper and Reducer tasks in containers.
Reads from HDFS input, writes to HDFS output.
Output
Last updated