java 8 stream performance – maurice naftalin – qcon

This is part of my live blogging from QCon 2015. See my QCon table of contents for other posts.

See http://www.lambdafaq.org

Background
He started with background on streams. (This is old news by now, but still taking some notes). The goals were to bring a functional style to Java and “explicit but unobtrusive” hardware parallelism. The former is more important than performance.

The intention is to replace loops with aggregate operations. [I like that he picked an example that required three operations and not an oversimplified example]. More concise/readable. Easy to change to parllelize.

Reduction == terminal operation == sink

Performance Notes
Free lunch is over. Chips don’t magically get faster over time. Intead, add core. The goal of parallel streamsisfor the intermediate operations in parallel and then bringing them together in reduction.

What to measure?

  • We want to know how code changes affect system performance in prod. Not feasible though because would need to do a controlled eperiment in prod conditions. Instead, we do a controlled experiment in lab conditions and hope not answering a simplified question.
  • Hard to microbenchmark because of inaccuracy, garbage collection, optimization over time, etc. There are benchmarking libraries – Caliper or JMH. [or better if don’t need to microbenchmark]
  • Don’t optimize code if don’t have a problem. What’s your performance requirement? [and is it the bottleneck]. Similarly don’t optimize the OS or the problem lies somewhere else.

Case study
This was a live demo. First we saw that not using BufferedReader makes a file slow to read. [not about streams]. Then we watched my JMeter didn’t work on the first try. [the danger of a live demo]. Then he showed how messing with the GC size and making it too small is bad for performance as well [still not on streams]. He is trying to shw the process of perofrmance tuning overall. Which is valid info. Just not what I expected this session to be about.

Then [after I didn’t see the stream logic being a problem in th first plae], he showe how to solve subproblems and merge them.[oddly not calling it map reduce]

8 minutes before the end of the talk, we finally see the non-parallel code for the case study. It’s interesting code becauase it uses two terminal operations and two streams. At least reading in the file is done normally. Finally, we see that the combiner is O(n) which prevents speeding it up.

Some rules

  • The workload of the intermedidate operations must be great enough to outweith the overheads. Often quoted as size of data set * processing cost per element
  • sorted() is worse
  • Collectors cost extra. toMap*( merging maps is slow. toList, toSet() is dominated by the accumulator.
  • In the real world, the fork/join pool doesn’t operate in isolation

My impressions: A large amount of this presentation wasn’t stream performance. Then the case study shows that reading without a BufferedReader is slow. [no kidding]. I feel like the example was contrived and we “learned” that poorly written code behaves poorly. I was hopingthe talk would actually be about parallelization. When parallelStream() saves time and when it doesn’t for example. What I learned was for this particular scenario, parallelization wasn’t helpful. And then right at the end, the generic rules. Which felt rushed and thown at us.

qcon – live blog table of contents

I’m attending QCon New York which is run by InfoQ.com. At the end, I’ll update this post to be a table of contents of my blog posts from the conference.

My live blog posts

Wednesday

Thursday

Friday

That’s 9742 words live blogged not counting this post (which gets it to 10K) and an average blog post size of 487. The “Too Big To Fail” session was an outlier at 827; must have liked it a lot.

My overall impressions
The conference in general seem set up well with 25 minutes between talks along with an open space by area at the end of the day (not presentations; discussions). For lunch they have tables designed for discussion – large normal confernece tables, 4 people discussion tables and “loner” tables. I also like the intro about usbility including the big names on the badge.

The intro also had each track lead give an overview of th talks in their track. This felt like overkill as this was online and most people think about what they want to attend before showing up.

Logistically, I really like that you gave feedback by putting a green, yellow or red paper as you walk out the door of the session. Low overhead; low time commitment and asked while you still remember the details.

How to Study for the Java Foundations Junior Associate exam

Oracle’s Java Foundations Junior Associate exam is brand new (in beta at the time of this post). Since this isn’t an upgraded version of an existing exam, it’s going to take time for books to come out on the topic. [Update: If they do at all. This isn’t a popular exam]

I took the beta today to see if Scott and my OCA 8 (or 11) book could be used to study. The answer is yes supplemented by a few other things. (In the interest of disclosure, this is true of any OCA book you might have access to as well.)

Before you read any further, see my other post on why I think you shouldn’t take the Junior Associate exam.

Still reading? Ok. If you have your mind set on taking the Junior Associate exam, here’s what you need to know.

Objectives Mapping

This cross references the objectives between the OCA 8 and Junior Associate exam. As you can see, reading an OCA book will put you in good shape. Then there are the “new” objectives. See the table at the bottom of our Java Foundations page for links to blog posts with sample questions.