“Make your CPU cores sweat with Parallel Streams”
Speaker: Lukasz Pater
For more blog posts from JavaOne, see the table of contents
Started with the canonical Person/Car/age example to show what streams are [I think everyone at Javaone knows that]
Then showed code that finds all the prims under 10 milion that are palindromes when expressed in binary. Moores law is now relying on multi-core architecture. This example takes 1.6 seconds vs 8.6 seconds as parallel improvement. Five times faster.
Good analogy: If told to wash the fork, then the knife, then the… it is slow. If told to wash the dishes, you can optimize internally.
History of threads
- JDK 1 – Threads – still good for small background task
- JDK 5 – ExecutorService, concurrent objects
- JDK 7 – fork/join framework – recursively decompose tasks into subtasks and then combine results
- JDK 8 – parallel streams – use fork/join framework and Spliterator behind the scenes
Making a parallel stream
- parallelStream() – for source
- parallel() – anywhere in stream pipeline intermediate operation list
Fork Join Pool
- Uses work stealing to balance tasks amongst workers in pool
- All parallel streams use one common pool instance with # threads = # CPU cores – 1. That final thread is for the master to assign work.
- Can change by setting system property java.util.concurrent.ForkJoinPool.common.parallelism. Must pass on commands line because first call to parallel stream resets it if set in code
- If want custom fork join pool, create one and submit your stream to it. Does not recommend doing this. One reason you might want to is to add a timeout to the stream
Warnings
- Avoid IO – burn CPU cycles waiting for IO/network
- Use only for CPU intensive tasks
- Be careful with nested paralle streams
- Having many smaller tasks in the pool will better balance the workload
- Don’t create your own fork join pool
Spliterator
- splitable iterator
- to traverse elements of a source in parallel
- tryAdvance(Consumer) – do something if an element exists
- trySplit() – partition off some elements to another spliterator leaving less elements in the original – fork so have tree of spliterators until run out of elements
- characteristics()
- estimateSize()
- StreamSupport.stream(mySplierator, true) – creates parallel stream from spliterator – shouldn’t need to do this
- ArrayList decomposes into equal sizes. LinkedList gives a smaller % of the elements because linear to get elements and want to minimize wait time
- ArrayList and IntStream.range decompose well
- LInkedLIst and Stream.iterate() decompose poorly – could even run out of memory
- HashSet and TreeSet decompose in between
Other tips
- Avoiding autoboxing also saves time. iterate() creates boxed objects where range() creates primitives
- Parallel streams perform better where order doesn’t atter. findAny() or unordered().limit() [he missed the terminal operation in the limit example]
- Avoid shared state
- If have multiple calls to sequential() and parallel(), the last one wins and takes effect for the entire stream pipeline
My take:
Good discussion of performance and things to be beware of. My blog wasn’t live becase I couldn’t get internet in the room. I typed it live though! A couple typos like findFist() but nothing signficiant