[javaone 2025] stream gatherers: the architect’s cut

Speaker: Viktor Klang

See the table of contents for more posts


Oracle has sample code so I didn’t take notes on all the code

General

  • Reviewed source, intermediate operations, terminal operations vocabulary
  • Imagine if could have any intermediate Stream operations; can grow API
  • Features need (collectors don’t meet all needs) – consume/produce ratios, finite/infinite, stateful/stateless, frugal/greedy, sequential/parallelizable, whether to react to end of stream
  • Stream gatherers preview in Java 22/23. Released in Java 24

New interface

  • Gatherer<T, A, R> – R is what goes to next step
  • Supplier<A> initializer()
  • Integrator<A, T, R> integrator() – single abstract method boolean integrate(A state, T element, Downstream<R> downstream) – single abstract method – boolean push(R element)
  • BinaryOperator<A> combiner()
  • BIConsumer<A, Downstream<R>> finisher()

Basic Examples

  • Showed code to implemented map()
  • Gatherer.of() to create
  • Call as .gather(map(i -> I +1)
  • Other examples: mapMulti(), limit()

Named Gatherers

  • Progression – start as inline code and then refactor to be own class for reuse.

Parallel vs Sequential

  • For sequential, start with evaluate() and call in a loop while source.hasNext() and integrator.integrate() returns true
  • For parallel, recursively split the upstream until the chunks are small. (Split/fork into distinct parts)
  • For takeWhile(), need to deal with short circuiting/infinite streams. Can cancel() or propogate short circuit symbol

Other built in Gatherers

  • scan() – kind of like an incremental add/accumulator
  • windowFixed() – get immutable list of certain sized optionally keeping last
  • mapConcurrent() – specify maximum concurrency level

Other notes

  • Can compose
  • Stream pipeline – spliterator + Gatherer? + Collector

My take

This is the first time I’ve seen a presentation on this topic. It was great hearing the explanation and seeing a bunch of example. The font for the code was a little smaller than I’d like but I was able to make it out. Only a bit blurry. Most made sense. A few parts I’m going to need to absorb. He did say “it’s a bit tricky” so I don’t feel bad it wasn’t immediately obvious! The diagrams for parallel were helpful

toList() vs collect(Collectors.toList())

I had some extra time this week so went through a bunch of Sonar findings. One was interesting – in Java 17 you can use .toList() instead of .collect(Collectors.toList()) on a stream.

[Yes, I know this was introduced in Java 16. I live in a world where only LTS releases matter]

Cool. I can fix a lot of these without thinking. It’s a search and replace on the project level after all. I then ran the JUnit regression tests and got failures. That was puzzling to me because I’ve been using .toList() in code I write for a good while without incident.

After looking into it, I found the problem. .toList() guarantees the returned List is immutable. However, Collectors.toList() makes no promises about immutability. The result might be immutable. Or you can change it freely. Surprise?

That’s according to the spec. On the JDK I’m using (and Jenkins is using), Collectors.toList() was returning an ArrayList. So people were treating the returned List as mutable and it was working. I added a bunch of “let’s make this explicitly mutable” and then I was able to commit.

Here’s an example that illustrates the diference

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import java.util.*;
import java.util.stream.*;
 
public class PlayTest {
 
    public static void main(String[] args) {
 
        var list = List.of("a", "b", "c");
        var collectorReturned = collector(list);
        var toListReturned = toList(list);
         
        System.out.println(collectorReturned.getClass());  // ArrayList (but doesn't have to be)
        System.out.println(toListReturned.getClass());  // class java.util.ImmutableCollections$ListN
         
        collectorReturned.add("x");
        System.out.println(collectorReturned);  // [bb, cc, x]
        toListReturned.add("x");  // throws UnsupportedOperationException
 
    }
 
    private static List<String> toList(List<String> list) {
        return list.stream()
                .filter(s -> ! s.equals("a"))
                .map(s -> s + s)
                .toList();
    }
 
    private static List<String> collector(List<String> list) {
        return list.stream()
                .filter(s -> ! s.equals("a"))
                .map(s -> s + s)
                .collect(Collectors.toList());
                 
    }

Collectors.toList() also makes no promises about serializablity or thread safety but I wasn’t expecting it to.

Multi statement lambda and for each anti patterns

When I do a code review of lambda/stream code, I am immediately suspicious of two things – block statement lambdas and forEach().

What’s wrong with this? It’s functional programming right?

List<Integer> list = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
		
AtomicInteger sum = new AtomicInteger();
List<Integer> odds = new ArrayList<>();
List<Integer> evens = new ArrayList<>();
		
list.forEach(n -> {
	sum.addAndGet(n);
	if (n % 2 == 0) {
		evens.add(n);
	} else {
		odds.add(n);
	}
});
		
odds.forEach(System.out::println);
System.out.println();
evens.forEach(System.out::println);
System.out.println();
System.out.println(sum);

Well? Not really. It does have a lambda. It doesn’t have a stream, but that’s easy enough to fix: list.stream().forEach(…).

All better? No. Just because you are using a stream doesn’t mean you are doing functional programming. I would much rather see this code as:

List<Integer> list = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
		
		
list.stream()
   .filter(x -> x % 2 == 1)
   .forEach(System.out::println);

System.out.println();

list.stream()
   .filter(x -> x % 2 == 0)
   .forEach(System.out::println);

System.out.println();

list.stream()
   .mapToInt(x -> x)
   .sum();

Yes, I’m still using forEach(). But now I’m using it for one purpose (printing) rather than sticking logic in it.

Whenever I see a forEach() or lambda with more than one statement, my first thought is “could this be clearer or more functional.” Often the answer is yes. Filter(), map() and collect() are you friends.

And if I did need that List?

list.stream()
   .filter(x -> x % 2 == 1)
   .collect(Collectors.toList());