Streams
About
Java Streams, introduced in Java 8, revolutionized how developers process collections and sequences of data. They provide a modern, functional approach to handling data manipulation tasks, making code more concise, readable, and expressive.
What Are Streams?
Java Streams API is a powerful abstraction introduced in Java 8 that allows functional-style operations on collections, arrays, or I/O resources. It enables declarative and parallel processing of data, making it easier to work with large datasets efficiently.
A Stream is a sequence of elements that supports various operations such as filtering, mapping, and reducing, without modifying the original data source.
Streams reduce boilerplate code and improve readability
Example: Traditional Loop vs Stream Processing
Without Streams (Imperative Approach)
With Streams (Declarative Approach)
Why use Streams?
Streams offer several advantages over traditional iterative approaches (e.g., for
loops):
Readability: Streams allow us to write code in a declarative style, focusing on what needs to be done rather than how to do it.
Conciseness: Streams reduce boilerplate code, making programs shorter and easier to maintain.
Functional Programming: Streams support functional programming constructs like lambda expressions and method references, enabling cleaner and more modular code.
Parallel Processing: Streams can easily be parallelized using the
parallelStream()
method, allowing efficient utilization of multi-core processors for large datasets.Lazy Evaluation: Intermediate operations (e.g.,
filter
,map
) are only executed when a terminal operation (e.g.,collect
,forEach
) is invoked. This improves performance by avoiding unnecessary computations.Immutability: Streams do not modify the source data, promoting immutability and reducing side effects.
Key Characteristics of Streams
1. Streams Do Not Store Data
Streams operate on a source (Collection, Array, or I/O resource) and process data without storing it.
No additional memory overhead since it processes elements on-demand.
2. Streams Are Functional in Nature
Streams allow functional transformations using methods like map()
, filter()
, and reduce()
, without modifying the original data.
Original list remains unchanged.
3. Streams Are Lazy (Lazy Evaluation)
Intermediate operations (filter()
, map()
) are executed only when a terminal operation (collect()
, forEach()
) is called.
Reduces unnecessary computations.
4. Streams Support Parallel Execution
Streams support parallel execution via .parallelStream()
, allowing tasks to be executed concurrently.
Utilizes multiple CPU cores for faster processing.
5. Streams Support Pipeline Processing
Streams allow chained operations where the output of one method is passed as input to the next.
Makes processing clear and structured.
6. Streams Have Two Types of Operations
Intermediate Operations (Return a Stream, lazy execution)
filter()
,map()
,flatMap()
,distinct()
,sorted()
,peek()
Terminal Operations (Trigger execution, consume the Stream)
collect()
,count()
,forEach()
,reduce()
,min()
,max()
Memory Usage
Memory usage by Java Streams is an important consideration, especially when dealing with large datasets or performance-critical applications. Streams are designed to be efficient, but their memory usage depends on several factors, including the data source, intermediate operations, and terminal operations.
Overview
Streams themselves do not store data; they operate on a data source (e.g., a collection, array, or I/O channel). However, memory is used in the following ways:
Data Source: The memory usage of the data source (e.g., a collection or array) remains unchanged. Streams do not copy the data but instead provide a view or pipeline to process it.
Intermediate Operations: Intermediate operations (e.g., filter
, map
, sorted
) create new streams but do not immediately process the data. They are lazily evaluated, meaning they only define the pipeline and do not consume memory until a terminal operation is invoked.
Terminal Operations: Terminal operations (e.g., collect
, forEach
, reduce
) trigger the processing of the stream and may consume memory depending on the operation:
Operations like
collect
may store results in a new collection.Operations like
reduce
orforEach
process elements one at a time and typically use minimal additional memory.
Parallel Streams: Parallel streams divide the data into multiple chunks for concurrent processing, which may increase memory usage due to the overhead of managing multiple threads and intermediate results.
Key Factors Affecting Memory Usage in Streams
1️. Lazy Evaluation (Efficient Memory Usage)
Streams process elements only when needed, reducing memory consumption compared to eager execution.
Example: Lazy Execution (Efficient)
Example: Eager Execution (Memory-Intensive)
2️. Intermediate Operations and Memory Impact
Intermediate operations (map()
, filter()
, distinct()
, sorted()
) do not store elements but may require extra memory under certain conditions.
Operations with Minimal Memory Usage
map()
, filter()
, peek()
→ Process elements one by one, no additional memory overhead.
Operations That Require More Memory
sorted()
, distinct()
, flatMap()
→ Require extra memory for processing.
3️. Parallel Streams and Memory Consumption
Parallel Streams split data into multiple threads for faster execution but can increase memory consumption due to:
Thread creation overhead
Higher temporary memory usage for merging results
Example: Memory Overhead in Parallel Streams
Memory Usage Considerations:
Each thread holds a portion of the dataset in memory.
More CPU threads = more memory required.
Avoid parallel streams for small datasets (overhead is higher than benefit).
4️. Collecting Data (collect()
& Memory Allocation)
collect()
& Memory Allocation)Using collect()
stores all stream elements in memory, which can be problematic for large datasets.
Example: Collecting Large Data
Better Approach: Use forEach()
Instead
5️. Large Data Streams (Handling Gigabytes of Data)
For large datasets (e.g., processing files, databases), avoid materializing the entire dataset into memory.
1. Using Stream.generate()
(Infinite Streams)
Streams can generate infinite sequences, consuming memory if not terminated.
Fix: Use limit()
to Avoid Memory Overload
2. Streaming Large Files with BufferedReader
Reading large files into a list causes high memory usage.
Solution: Use BufferedReader.lines()
6. Memory Allocation When Creating Multiple Streams
Each time we create a new Stream, memory is allocated for:
Stream object itself (small overhead)
Pipeline of operations (intermediate and terminal operations)
Data source reference (list, array, file, etc.)
Stream Operations Overview
Java Streams API consists of Intermediate and Terminal operations that work together to process data efficiently. Understanding these operations and the Stream Pipeline is essential for writing clean, functional, and performant code.
1. What Are Intermediate and Terminal Operations?
Stream operations are categorized into two types:
Intermediate Operations – Transform a stream and return a new Stream. They are lazy (executed only when a terminal operation is called).
Terminal Operations – Consume the stream and produce a result (such as a collection, count, or boolean value). Terminal operations trigger execution of intermediate operations.
Intermediate Operations (Lazy and Return a Stream)
These operations do not process elements immediately; instead, they build up a pipeline and execute only when a terminal operation is encountered.
Method
Description
filter(Predicate<T>)
Filters elements based on a condition.
map(Function<T, R>)
Transforms each element in the stream.
flatMap(Function<T, Stream<R>>)
Flattens multiple nested streams into a single stream.
distinct()
Removes duplicate elements.
sorted(Comparator<T>)
Sorts elements in natural or custom order.
peek(Consumer<T>)
Debugging tool; applies an action to each element.
limit(n)
Limits the number of elements in the stream.
skip(n)
Skips the first n
elements.
Terminal Operations (Trigger Execution and Produce a Result)
Once a terminal operation is called, the stream pipeline is executed in one pass and cannot be reused.
Method
Description
forEach(Consumer<T>)
Iterates over each element.
collect(Collector<T, A, R>)
Converts stream elements into a collection (List, Set, Map).
count()
Returns the total number of elements.
reduce(BinaryOperator<T>)
Aggregates elements into a single result (sum, max, etc.).
min(Comparator<T>)
Finds the minimum element.
max(Comparator<T>)
Finds the maximum element.
anyMatch(Predicate<T>)
Checks if at least one element matches the condition.
allMatch(Predicate<T>)
Checks if all elements match the condition.
noneMatch(Predicate<T>)
Checks if no elements match the condition.
toArray()
Converts a stream into an array.
2. Understanding the Stream Pipeline
A Stream Pipeline consists of three stages:
1. Data Source
A stream is created from a data source like a Collection, Array, or I/O Channel.
2. Intermediate Operations (Lazy)
Intermediate operations transform the data but do not execute immediately.
No execution happens yet because Streams are lazy
3. Terminal Operation (Triggers Execution)
Once a terminal operation is called, the pipeline is executed in a single pass.
Now execution happens! The output will be:
Complete Stream Pipeline Example
Pipeline Execution Order (Optimization)
Streams process data in one pass, applying operations only to elements that reach the terminal operation.
Creating Streams in Java
Java provides multiple ways to create streams from different data sources, such as Collections, Arrays, and Generators.
1. Creating Streams from Collections
Java Collections (like List
, Set
) have a built-in stream()
method that allows easy stream creation.
Parallel Stream from a Collection
If we want to process elements in parallel, use parallelStream()
. Parallel streams are useful for large datasets but can have overhead for small ones.
2. Creating Streams from Arrays
We can create a stream from an array using Arrays.stream()
or Stream.of()
.
Stream from a Primitive Array
Use IntStream
, LongStream
, or DoubleStream
for primitives:
3. Using Stream.of()
, Stream.generate()
, and Stream.iterate()
Stream.of()
, Stream.generate()
, and Stream.iterate()
Stream.of()
– Creating Streams from Values
Stream.of()
– Creating Streams from ValuesThe Stream.of()
method can be used to create a stream from multiple values.
Stream.generate()
– Infinite Stream with Supplier
Stream.generate()
– Infinite Stream with SupplierStream.generate()
produces an infinite stream using a Supplier<T>
.
Stream.iterate()
– Infinite Stream with Iteration
Stream.iterate()
– Infinite Stream with IterationStream.iterate()
generates an infinite stream using a function and an initial value.
🔹 Java 9+ introduced a predicate-based Stream.iterate()
Intermediate Operations in Java Streams
Intermediate operations transform a stream and return another stream. They are lazy—executing only when a terminal operation is called.
1. Filtering with filter()
filter()
Used to retain elements that satisfy a condition.
2. Transforming with map()
map()
Used to transform each element in a stream.
3. Flattening with flatMap()
flatMap()
Used when elements themselves contain collections—it flattens them into a single stream.
4. Removing Duplicates with distinct()
distinct()
Removes duplicate elements based on .equals()
.
5. Sorting Elements with sorted()
sorted()
Sorts elements naturally or using a custom comparator.
Custom Sorting
6. Debugging with peek()
peek()
Useful for debugging—allows inspecting elements without modifying them.
Terminal Operations in Java Streams
Terminal operations consume the stream and produce a result (e.g., a collection, a value, or a side effect). After a terminal operation, the stream cannot be reused.
1. Iterating with forEach()
forEach()
Executes an action for each element in the stream.
Note: Avoid using forEach()
for modifying elements since streams are immutable.
2. Collecting with collect()
collect()
Converts the stream into a collection (List, Set, Map) or another structure.
Grouping elements
The Collectors.groupingBy()
method groups elements of a stream based on a classifier function and returns a Map<K, List<T>>
, where:
K → The grouping key (e.g., length of a string).
List<T> → The list of elements sharing the same key.
3. Counting with count()
count()
Counts the number of elements in a stream.
4. Finding Min/Max with min()
and max()
min()
and max()
Finds the smallest or largest element based on a comparator.
5. Reducing with reduce()
reduce()
It is used to combine elements of a stream into a single result. It performs reduction using an accumulator function, optionally with an identity value and/or a combiner function.
Sum of all elements
Concatenating Strings
6. Matching Elements with anyMatch()
, allMatch()
, noneMatch()
anyMatch()
, allMatch()
, noneMatch()
Used to test conditions on elements.
anyMatch()
– At least one element matches
anyMatch()
– At least one element matchesallMatch()
– All elements match
allMatch()
– All elements matchnoneMatch()
– No elements match
noneMatch()
– No elements matchParallel Streams in Java
Parallel Streams in Java allow for concurrent processing of data by utilizing multiple CPU cores. This enables faster execution for large datasets by dividing the workload across threads in the Fork/Join framework.
1. What are Parallel Streams?
A parallel stream processes elements simultaneously in multiple threads rather than sequentially. It splits the data into smaller chunks and processes them in parallel using Java's ForkJoinPool.
How to Create a Parallel Stream?
2. When to Use Parallel Streams?
Parallel streams are useful when: Large datasets → Parallelism benefits large collections. Independent tasks → Operations should not depend on each other. CPU-intensive tasks → Parallel execution benefits complex computations. Multi-core processors → Takes advantage of multi-threading.
Example: Using Parallel Stream for Sum Calculation
3. Performance Considerations for Parallel Execution
While parallel streams improve performance, they have overhead costs. Consider:
When Parallel Streams are Beneficial:
✔ CPU-bound operations → Complex computations (e.g., matrix multiplication). ✔ Large collections → Overhead is negligible when processing thousands of elements. ✔ Stateless operations → Operations do not modify shared data.
When NOT to Use Parallel Streams:
✖ Small datasets → Thread management overhead outweighs benefits. ✖ I/O-bound tasks → Parallel execution does not speed up database or network calls. ✖ Mutable shared state → Can cause race conditions and inconsistent results.
Example: Incorrect Use of Parallel Streams (Race Condition)
Primitive Streams (IntStream, LongStream, DoubleStream)
Java provides specialized primitive streams (IntStream
, LongStream
, DoubleStream
) to efficiently process numerical data without the overhead of boxing/unboxing found in Stream<Integer>
, Stream<Long>
, and Stream<Double>
.
1. Why Use Primitive Streams?
Using Stream<Integer>
creates unnecessary autoboxing (conversion from int
to Integer
), which impacts performance.
Instead of using Stream<Integer>
, we can directly use IntStream.
IntStream
avoids Integer
objects, reducing memory usage.
2. Creating and Processing Primitive Streams
2.1 Creating Primitive Streams
Primitive streams can be created from arrays, ranges, or generators.
(a) From Arrays
(b) Using IntStream.of()
and range()
(c) Using generate()
and iterate()
🔹 generate()
generates infinite values (hence limit(5)
).
🔹 iterate()
applies a function (n -> n + 2
) to generate values.
3. Specialized Operations for Numeric Streams
Primitive streams provide specialized numeric operations that are not available in normal Stream<T>
.
3.1 Summing Elements (sum()
)
sum()
)3.2 Finding Min and Max (min()
, max()
)
min()
, max()
)3.3 Finding the Average (average()
)
average()
)3.4 Collecting Statistics (summaryStatistics()
)
summaryStatistics()
)3.5 Boxing Back to Stream<Integer>
(boxed()
)
Stream<Integer>
(boxed()
)The boxed()
method is used to convert a primitive stream (IntStream
, LongStream
, or DoubleStream
) into a stream of wrapper objects (Stream<Integer>
, Stream<Long>
, Stream<Double>
). Primitive streams (IntStream
, LongStream
, DoubleStream
) provide specialized operations like sum()
, min()
, average()
, and summaryStatistics()
, but they cannot be used with collectors like Collectors.toList()
because collectors expect a Stream of objects (Stream<T>
).
Working with Collectors
(Collectors Utility Class)
Collectors
(Collectors Utility Class)The Collectors
class in Java (java.util.stream.Collectors) provides a set of predefined collectors that can be used to accumulate elements from a stream into various data structures such as List, Set, Map, or even summary statistics. It is widely used with the collect()
terminal operation.
1. Collecting to List
, Set
, and Map
List
, Set
, and Map
Collecting Stream Elements into a List
List
The Collectors.toList()
method collects elements into a List<>
.
Collecting Stream Elements into a Set
Set
The Collectors.toSet()
method collects elements into a Set<>
, which removes duplicates and does not guarantee order.
Collecting Stream Elements into a Map
Map
The Collectors.toMap()
method collects elements into a Map<>
with a key-value mapping function.
Key = String length, Value = String itself If duplicate keys exist, it throws an exception unless a merge function is provided.
2. Grouping and Partitioning Data
Grouping Elements using groupingBy()
groupingBy()
The Collectors.groupingBy()
method groups elements by a classifier function.
Groups names based on length.
Returns a Map<Integer, List<String>>
where the key is the length and the value is a list of names.
Partitioning Elements using partitioningBy()
partitioningBy()
The Collectors.partitioningBy()
method divides elements into two groups (true/false) based on a predicate. It is used when data needs to be divided into two categories.
Two partitions:
true
→ Even numbersfalse
→ Odd numbers
3. Summarizing Data with Collectors
Collectors
Summing Elements using summingInt()
summingInt()
Calculating Average using averagingInt()
averagingInt()
Getting Summary Statistics using summarizingInt()
summarizingInt()
Last updated
Was this helpful?