JavaOne – Make your CPU cores sweat with Parallel Streams

Posted on October 4, 2017 by Jeanne Boyarsky

“Make your CPU cores sweat with Parallel Streams”

Speaker: Lukasz Pater

For more blog posts from JavaOne, see the table of contents

Started with the canonical Person/Car/age example to show what streams are [I think everyone at Javaone knows that]

Then showed code that finds all the prims under 10 milion that are palindromes when expressed in binary. Moores law is now relying on multi-core architecture. This example takes 1.6 seconds vs 8.6 seconds as parallel improvement. Five times faster.

Good analogy: If told to wash the fork, then the knife, then the… it is slow. If told to wash the dishes, you can optimize internally.

History of threads

JDK 1 – Threads – still good for small background task
JDK 5 – ExecutorService, concurrent objects
JDK 7 – fork/join framework – recursively decompose tasks into subtasks and then combine results
JDK 8 – parallel streams – use fork/join framework and Spliterator behind the scenes

Making a parallel stream

parallelStream() – for source
parallel() – anywhere in stream pipeline intermediate operation list

Fork Join Pool

Uses work stealing to balance tasks amongst workers in pool
All parallel streams use one common pool instance with # threads = # CPU cores – 1. That final thread is for the master to assign work.
Can change by setting system property java.util.concurrent.ForkJoinPool.common.parallelism. Must pass on commands line because first call to parallel stream resets it if set in code
If want custom fork join pool, create one and submit your stream to it. Does not recommend doing this. One reason you might want to is to add a timeout to the stream

Warnings

Avoid IO – burn CPU cycles waiting for IO/network
Use only for CPU intensive tasks
Be careful with nested paralle streams
Having many smaller tasks in the pool will better balance the workload
Don’t create your own fork join pool

Spliterator

splitable iterator
to traverse elements of a source in parallel
tryAdvance(Consumer) – do something if an element exists
trySplit() – partition off some elements to another spliterator leaving less elements in the original – fork so have tree of spliterators until run out of elements
characteristics()
estimateSize()
StreamSupport.stream(mySplierator, true) – creates parallel stream from spliterator – shouldn’t need to do this
ArrayList decomposes into equal sizes. LinkedList gives a smaller % of the elements because linear to get elements and want to minimize wait time
ArrayList and IntStream.range decompose well
LInkedLIst and Stream.iterate() decompose poorly – could even run out of memory
HashSet and TreeSet decompose in between

Other tips

Avoiding autoboxing also saves time. iterate() creates boxed objects where range() creates primitives
Parallel streams perform better where order doesn’t atter. findAny() or unordered().limit() [he missed the terminal operation in the limit example]
Avoid shared state
If have multiple calls to sequential() and parallel(), the last one wins and takes effect for the entire stream pipeline

My take:
Good discussion of performance and things to be beware of. My blog wasn’t live becase I couldn’t get internet in the room. I typed it live though! A couple typos like findFist() but nothing signficiant

JavaOne – Streams in JDK 8

Posted on October 4, 2017 by Jeanne Boyarsky

“Streams in Java 8 – The good, the bad & the ugly”

Speaker: Simon Ritter & Stuart Marks
[Simon has the Twtter handle @speakjava; very cool]

For more blog posts from JavaOne, see the table of contents

Need to think differently. We are used to imperative programming with loops and variables.

Dealing wih exceptions
ugly code – three lines of code and hard to tell what it does

Problems:

looks like Perl
returns null (vs Optional or empty string)
split is called twice so wasted work
skipped URLDecoder.decode() because didn’t want to deal with a checked exception – but lost functionality. Problem caused by a missing API in Java so have to use decode.

Better approach:

use a method with a try/catch block; call that method from the stream
use Map.entry to simulate a tuple
Use single char (vs regex) in split. If only pass one character to split, far faster
split() is overloaded to take a numeric limit to how many are returned

Imperative streams
inside the for each is a print, and if statement and a LongAdder variable (good for frequent writes and infrequent reads)

then refactored to use mapToInt, a println and an if statement and a local variable. more complicated and still not functional

then switched to peek and no variabe but still an if statement (well a ternary)

finally switched to use a filter and count instead of sum

still not 100% functional because println is a side effect. ok for debugging

[good showing evolution to get functional]

Problems

Easy to misuse forEach() because feels familiar. But easily leads to side effects
Imperative thinking “for each of these I want to..”
Pause to consider if should use for each

Mixing internal and external iteration
for loop running 12 times and then getting data for each month with filter checking Month.of(x) – doesn’t work because x isn’t effectively final

“solve” effectively final by setting to different interim variable

IntStream.range(0,12).forEach – uses internal iteration but forEach. Marginaly better as don’t need interim variable

Instead return a nested map of Month to Map with nested grouping by so only need one iteration – the data stream

Problems

Going through data.stream() 12 times
forEach cheat
array not right data structure; it’s really a map of month to value

Hands on lab question
reduce (“”, (a,b) -> a+b) – works but inefficient because String concatenation

reduce(a,b) -> sb.append(b) – fails because ignores the first letter.

next attempt uses an if statement in reduce

then tried a custom collector. works but more complicated than necessary
Collector.of(StringBuilder::new, StringBuilder::append, StringBuilder::append, StringBuilder::toString

or just use Collectors.joining()

Problems

If not using a parameter, it is probably wrong
Side effects
if stateent version not associative so would fail when run in parallel

Misc

can’t use same stream multiple times
method references are slightly more efficient than lambdas because lambda gets added into a method in bytecode. Saves a level of indirection by using method reference. But only slightly
Calling .sorted() multiple times vs chaining comparing.thenComparing – the later is better [also works because preserves sort :)]
parallel streams do more work. might or might not complete faster. uses fork-join pool. number of threads defaults to number of CPUs. In Java9, this is # CPUs for container. On Jaa 8, it was for physical machine
Nested parallel streams is bad idea because using same threads so performance is worse. Can create ForkJoinPool if must. Buyer beware; this is an implementation specific behavior and tied to the profile of the machine you write it for.

My take: Fun start to he morning. I like that they covered common things in an entertaining way and not common things. Something to learn for everyone!

JavaOne – Maven BOF

Posted on October 3, 2017 by Jeanne Boyarsky

“Maven 5 BOF”

Speaker: Brian Fox, Manfred Moser & Robert Scholte

For more blog posts from JavaOne, see the table of contents

[I was late because we talked more about JUnit 5 after the BOF]

Only 26.4% of Maven Central traffic is from Maven. Nothing else is more than 10% though; not even Ivy or Gradle

Some projects don’t have snapshots; instead every commit is a release

Talked about version ranges. Depends on proximity to your project rather than the latest version. Important to clean up pom dependencies before Java 9 so not in module path. Use Maven dependency plugin (analyze) to find unused ones. Make sure to use latest version of depedency plugin.

Maven won’t generate module descriptor. Different purpose. Not all modules are dependencies. More info in module descriptor. What to export is a decision that needs to be decided by developer. jdeps can generate a rough descriptor to get started based on binaries.

Can have .mvn file inside projects with preferences startig in Maven 3.3.5. For example, you can specify to provide more memory.

Shouldn’t be issues going from 3.3.5 to 3.5.9

Maven (dependency) resolver is now a standalone project

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Java/J2EE Software Development and Technology Discussion Blog

Category Archives: Conferences

JavaOne – Make your CPU cores sweat with Parallel Streams

JavaOne – Streams in JDK 8

JavaOne – Maven BOF

Share this:

Share this:

Share this: