Working with collections in Java used to involve a lot of loops and verbose code. That changed significantly with the introduction of the Stream API in Java 8. It introduced a functional approach to data processing, resulting in cleaner, more concise, and easier-to-read code.
This blog walks through the basics of the Stream API and dives into some of the newer enhancements introduced in Java 9 and beyond. Along the way, real-life examples help make the concepts more practical and relatable.
What is the Stream API All About?
Think of a stream as a pipeline of data. It does not store data, but rather processes elements from a source, such as a list or array. Operations are chained in a fluent style, making complex data handling tasks much more straightforward.
Creating Streams
Streams can be created from many different sources:
List<String> names = List.of("Alice", "Bob", "Charlie"); Stream<String> nameStream = names.stream(); Stream<Integer> numberStream = Stream.of(1, 2, 3); String[] array = {"A", "B", "C"}; Stream<String> arrayStream = Arrays.stream(array);
Use Case: Processing Employee Data
Here is a familiar example processing a list of employees:
List<Employee> employees = Arrays.asList( new Employee("Alice", 70000), new Employee("Bob", 50000), new Employee("Charlie", 120000), new Employee("David", 90000) ); List<Employee> highEarners = employees.stream() .filter(e -> e.salary > 80000) .map(e -> new Employee(e.name.toUpperCase(), e.salary)) .sorted((e1, e2) -> Double.compare(e2.salary, e1.salary)) .collect(Collectors.toList());
This snippet filters employees earning above 80k, transforms their names to uppercase, sorts them by salary in descending order, and collects the result into a new list.
Common Stream Operations
Streams typically use two types of operations:
- Intermediate (like filter, map, sorted) — These are lazy and build up the processing pipeline.
- Terminal (like collect, reduce, forEach) — These trigger execution and produce a result.
A combination of these can handle most common data transformations in Java applications.
Stream API Enhancements (Java 9+)
New features made streams even more powerful:
-
takeWhile() and dropWhile()
Great for slicing a stream based on a condition:
Stream.of(100, 90, 80, 70, 60) .takeWhile(n -> n >= 80) .forEach(System.out::println); // Outputs: 100, 90, 80
-
Stream.ofNullable()
Helps avoid null checks by returning an empty stream if the value is null.
Stream.ofNullable(null).count(); // Returns 0
-
Collectors Enhancements
Grouping and filtering in one go:
Map<String, List<Employee>> grouped = employees.stream() .collect(Collectors.groupingBy( e -> e.department, Collectors.filtering(e -> e.salary > 80000, Collectors.toList()) ));
Also helpful is Collectors.flatMapping() when flattening nested data structures during grouping.
Parallel Streams in Java
Parallel streams in Java are a powerful feature that enables concurrent data processing, leveraging multiple threads to enhance performance, particularly with large datasets. Here is a closer look at how you can use parallel streams effectively.
Leveraging Parallel Streams
For large datasets, parallelStream() allows you to split work across multiple threads, which can significantly improve performance:
double totalHighSalary = employees.parallelStream() .filter(e -> e.getSalary() > 80000) .mapToDouble(Employee::getSalary) .sum();
This approach can speed up processing, but it is essential to exercise caution with shared resources due to potential concurrency issues.
Use Case: Grouping by Department
Parallel streams can also be useful for grouping data, which is particularly helpful in applications like payroll, HR, or dashboard services:
Map<string, list> departmentWise = employees.parallelStream()</string, list .collect(Collectors.groupingBy(Employee::getDepartment));
Grouping employees by department in parallel can make reporting and analysis more efficient, especially with large datasets.
Best Practices for Using Parallel Streams
To get the most out of parallel streams, keep these tips in mind:
-
- Use for Large, CPU-Intensive Tasks: Ideal for processing large datasets with intensive computations.
- Avoid Shared Mutable Data: Ensure operations are thread-safe and don’t modify shared state.
- Measure Performance: Always benchmark to confirm that parallelism is improving speed.
- Use Concurrent Collectors: When collecting results, use thread-safe collectors like toConcurrentMap().
When Not to Use Parallel Streams
- For Small Datasets: The overhead of managing multiple threads can outweigh the benefits when working with small collections, making sequential streams more efficient.
- In I/O-Heavy Operations: Tasks involving file access, database queries, or network calls don’t benefit much from parallelism and may even perform worse due to thread blocking.
Conclusion
Java Stream API streamlines data processing by replacing boilerplate heavy code with expressive, functional patterns. The enhancements introduced in Java 9 and beyond, including advanced collectors and conditional stream slicing, provide even more powerful ways to handle data. With just a little practice, working with streams becomes second nature, and the code ends up both cleaner and faster to write.
Source: Read MoreÂ