[2019 oracle code one] Lambda, Streams and Collectors Lab

Lambda, Streams and Collectors Lab

Speakers: Stuart Marks, Maurice Naftalin, Jose Paumard & Gustavo Durand

For more blog posts, see The Oracle Code One table of contents


The lab is self paced

https://github.com/JosePaumard/code-one-2019-lambda-stream-collector-lab

I like that it is organized by topic so you can pick what you want to learn. Since “someone” believed I didn’t need to be here, I decided to blog about what I did and learned.

  • O_SimpleCollectors
    • I forgot Comparator.naturalOrder() exists because I hardly use it.
    • I forgot you can’t use Function.identity() with primitives and instead have to write a lambda: ex: x -> x.
  • P_HarderCollectors
    • I almost never use flatMap. I didn’t think to use it combined with splitAsStream to read words from a file. I do a lot of file processing though so this is definitely an idiom I need to remember! I shall type it in for each exercise in this lab that uses it (vs copy/paste) in order to ingrain it in my fingers! reader.lines().flatMap(SPLIT_PATTERN::splitAsStream) [edit: I think I’ve typed this enough times to remember it forever!]
    • The extra challenge to write a groupingBy using toMap. I knew I needed to use a merge function, but the types didn’t match my expectations. I learned that:
      • if you write a value instead of a lambda for the value function, you get a compiler error on the merge function (about the + operator being invalid ) and not the value function for not being a lambda
      • if you write a value function that returns an int instead of a long, you get a compiler error on the merge function and not the value function (because the merge function result is what gets set to the return value)
    • If you try to read from a reader that has already called reader.lines(), you don’t get any lines. Doh!
    • flatMap(String::chars) doesn’t work because chars() returns an IntStream. flatMapToInt(String::chars) does work
    • Entry has comparingByKey() and comparingByValue() methods
    • Remember to use groupingBy when aggregating and toMap when one to one
    • Need to call boxed() on an IntStream to be able to use partitioning by. An IntStream doesn’t have a collect method that takes a Collector as a parameter
    • Partitioning by can take a nested collector (ex: summing int)
  • L_HarderStreams
    • Convert type using mapToInt() before calling max
    • Forgot about Comparator.comparing() – I knew about this one earlier today! I think I’m getting tired :). I’m a morning person. Coding at 6:30pm is less than ideal for me.
    • Character.toString(Int) exists in Java 11. This means you can call mapToInt(String::chars).mapToObj(Character::toString)
    • The concept of using IntStream for an index and referring to a separate list (I would use a for loop for this since I need the index, but it’s a good tool to have in the toolbox)

My take

The lab is great. I like that it can be as easy or as hard as you want. I like they support multiple versions of Java and multiple IDEs. I completed the hard stream and collectors hard exercises

The room is terrible. It’s not really a room. It’s a grid of pipe/drape separated areas. I can hear every word in the room next to us. It got better. Once I got into the lab I was able to tune out the surroundings.

Also, my back hurts. Live blogging was fine. Different angle and I hardly look at the laptop. Coding for two hours (and I did take a break and stretch) was an awkward neck/back angle. How do people code on a laptop full time?

JavaOne – Streams in JDK 8

“Streams in Java 8 – The good, the bad & the ugly”

Speaker: Simon Ritter & Stuart Marks
[Simon has the Twtter handle @speakjava; very cool]

For more blog posts from JavaOne, see the table of contents


Need to think differently. We are used to imperative programming with loops and variables.

Dealing wih exceptions
ugly code – three lines of code and hard to tell what it does

Problems:

  • looks like Perl
  • returns null (vs Optional or empty string)
  • split is called twice so wasted work
  • skipped URLDecoder.decode() because didn’t want to deal with a checked exception – but lost functionality. Problem caused by a missing API in Java so have to use decode.

Better approach:

  • use a method with a try/catch block; call that method from the stream
  • use Map.entry to simulate a tuple
  • Use single char (vs regex) in split. If only pass one character to split, far faster
  • split() is overloaded to take a numeric limit to how many are returned

Imperative streams
inside the for each is a print, and if statement and a LongAdder variable (good for frequent writes and infrequent reads)

then refactored to use mapToInt, a println and an if statement and a local variable. more complicated and still not functional

then switched to peek and no variabe but still an if statement (well a ternary)

finally switched to use a filter and count instead of sum

still not 100% functional because println is a side effect. ok for debugging

[good showing evolution to get functional]

Problems

  • Easy to misuse forEach() because feels familiar. But easily leads to side effects
  • Imperative thinking “for each of these I want to..”
  • Pause to consider if should use for each

Mixing internal and external iteration
for loop running 12 times and then getting data for each month with filter checking Month.of(x) – doesn’t work because x isn’t effectively final

“solve” effectively final by setting to different interim variable

IntStream.range(0,12).forEach – uses internal iteration but forEach. Marginaly better as don’t need interim variable

Instead return a nested map of Month to Map with nested grouping by so only need one iteration – the data stream

Problems

  • Going through data.stream() 12 times
  • forEach cheat
  • array not right data structure; it’s really a map of month to value

Hands on lab question
reduce (“”, (a,b) -> a+b) – works but inefficient because String concatenation

reduce(a,b) -> sb.append(b) – fails because ignores the first letter.

next attempt uses an if statement in reduce

then tried a custom collector. works but more complicated than necessary
Collector.of(StringBuilder::new, StringBuilder::append, StringBuilder::append, StringBuilder::toString

or just use Collectors.joining()

Problems

  • If not using a parameter, it is probably wrong
  • Side effects
  • if stateent version not associative so would fail when run in parallel

Misc

  • can’t use same stream multiple times
  • method references are slightly more efficient than lambdas because lambda gets added into a method in bytecode. Saves a level of indirection by using method reference. But only slightly
  • Calling .sorted() multiple times vs chaining comparing.thenComparing – the later is better [also works because preserves sort :)]
  • parallel streams do more work. might or might not complete faster. uses fork-join pool. number of threads defaults to number of CPUs. In Java9, this is # CPUs for container. On Jaa 8, it was for physical machine
  • Nested parallel streams is bad idea because using same threads so performance is worse. Can create ForkJoinPool if must. Buyer beware; this is an implementation specific behavior and tied to the profile of the machine you write it for.

My take: Fun start to he morning. I like that they covered common things in an entertaining way and not common things. Something to learn for everyone!

refactoring legacy code for java 8

I asked a teammate at CodeRanch if I could use a method he wrote to show how it could be written more clearly/succinctly in Java 8. He said yes. This is the original method. It is 95 lines and has a cyclomatic complexity of 27.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
public static Map<Category, List<Forum>> getBetaTopForums(BestTopicsDAO bestTopicDAO, int userId, List<Topic> bestThisYear, List<Topic> secondaryBestList){
        Map<Integer, Integer> ratings = bestTopicDAO.selectForumRatingsByUserId(userId);
         
        List<Integer> forumIds = new ArrayList<>();
        List<Integer> ratedForumIds = CollectionCommon.getKeysSortedByValues(ratings, true);
        final int defaultUserId = SystemGlobals.getIntValue(ConfigKeys.BEST_TOPICS_DEFAULT_USER_ID);
         
        // Let's first add the forums that are rated above 5/10
        for(Integer forumId: ratedForumIds){
            Integer r = ratings.get(forumId);
            if(r != null && r > 5){
                forumIds.add(forumId);
            }
        }
         
        // Now let's add the forums found in best topics.
        if(bestThisYear != null){
            for(Topic topic: bestThisYear){
                int forumId = topic.getForumId();
                if(!forumIds.contains(forumId)){
                    forumIds.add(forumId);
                }
            }
        }
        if(secondaryBestList != null){
            for(Topic topic: secondaryBestList){
                int forumId = topic.getForumId();
                if(!forumIds.contains(forumId)){
                    forumIds.add(forumId);
                }
            }
        }
         
        // If this user is not the default-rating-user, let's add the top forums
        // rated by the default-rating-user.
        if(userId != defaultUserId) {
            Collection<Integer> fids = getTopRatedForumIds(bestTopicDAO);
            for(Integer fid: fids) {
                if(fid != null && !forumIds.contains(fid)) {
                    forumIds.add(fid);
                }
            }
        }
         
         
        // Let's also add some forums found in hot topics.
        List<Topic> tmpTopics = TopicRepository.getHottestTopics();
        for(Topic topic: tmpTopics){
            int forumId = topic.getForumId();
            if(!forumIds.contains(forumId)){
                forumIds.add(forumId);
            }
        }
         
        // Finally, let's add the forums that were rated below 6/10
        for(Integer forumId: ratedForumIds){
            Integer r = ratings.get(forumId);
            if(r != null && r < 6 && !forumIds.contains(forumId)){
                forumIds.add(forumId);
            }
        }
         
        Map<Category, List<Forum>> forums = new LinkedHashMap<>();
         
        PermissionControl pc = SecurityRepository.get(userId);
         
        final int maxTopForums = SystemGlobals.getIntValue(ConfigKeys.BETA_VIEW_MAX_TOP_FORUMS_IN_HOME);
        int forumsToShow = 0;
         
        // We need to remove the forums that are not visible, or rated 0/10
        for(Integer forumId: forumIds){
            if(pc.canAccess(SecurityConstants.PERM_FORUM, String.valueOf(forumId))){
                Forum f = ForumRepository.getForum(forumId);
                Category c = f != null ? ForumRepository.getCategory(f.getCategoryId()) : null;
                if(f != null && c != null && f.isShowInForumList()){
                     
                    if(++forumsToShow > maxTopForums){
                        break;
                    }
                     
                    List<Forum> fs;
                    if(forums.containsKey(c)){
                        fs = forums.get(c);
                    }else{
                        fs = new ArrayList<>();
                        forums.put(c, fs);
                    }
                    fs.add(f);
                }
            }
        }
        return forums;
         
    }

First step – write unit tests

First, I wrote unit tests for the existing code to ensure I didn’t break anything. While doing that, I did some minor refactorings. (I didn’t do extract method to make the method shorter since I knew I’d be updating that later). What I did change:

  1. Removed BestTopicsDAO as a parameter. It comes from a factory and we have a mock framework setup for that factory already. So there was no reason it had to be a parameter
  2. Renamed bestThisYear to primaryBestList. (It doesn’t always represent the best for the year)

I couldn’t get 100% test coverage because a few of the null checks were unreachable due to early logic (in the helper methods.) I left them in to see the effect they’d have on conversion.

Refactoring to Java 8 – Getting started

There’s a lot of for loops here. And I suspect they are very similar. Let’s start with the first one:

1
2
3
4
5
6
for(Integer forumId: ratedForumIds){
  Integer r = ratings.get(forumId);
  if(r != null && r > 5){
    forumIds.add(forumId);
  }
}

I converted this to a helper method and a stream:

1
2
3
4
forumIds.addAll(ratedForumIds.stream().filter(id -> {
  Integer r = ratings.get(id);
  return r != null && r > 5;
}).collect(Collectors.toList()));

I’ll grant you that this code isn’t shorter than the original code. It is more similar in form to the code I expect to write next though.  Also, it’s not easier to read given I didn’t extract the two line lambda. I’m going to do that after I get rid of the similar for loops.

The next refactoring

The next one is interesting. It’s just a for loop and if statement which is easy to rewrite.

1
2
3
4
5
6
for(Topic topic: primaryBestList){
  int forumId = topic.getForumId();
  if(!forumIds.contains(forumId)){
    forumIds.add(forumId);
  }
}

which is equivalent to

1
2
3
4
forumIds.addAll(primaryBestList.stream()
  .map(Topic::getForumId)
  .filter(f -> ! forumIds.contains(f))
  .collect(Collectors.toList()));

However, it is interesting because it check if the forum is already in forumIds. This shouldn’t be necessary as it is something a data structure could do. We need a data structure that is both a list (preserves order) and a set (checks for uniqueness.) Enter LinkedHashSet. It’s a set and preserves the insertion order. The JavaDoc even specifies that if you add the same element again, the order doesn’t change. Perfect. Switching forumIds to a LinkedHashSet allows for getting rid of the filter intermediate operation. And the unit tests still work.  I used the same techniques for the next five for loops.

Time to use methods

I converted two methods that had to do with ratings and three that were straight conversions.  I could have one common method they all use with identities for the filter/map that don’t apply. Two methods would be clearer. Giving me:

1
2
3
4
5
6
7
8
9
10
11
12
13
private static void addForumIds(Collection<Integer> forumIds, List<Topic> candidates) {
  if (candidates != null) {
    List<Integer> toAdd = candidates.stream().map(Topic::getForumId).collect(Collectors.toList());
    forumIds.addAll(toAdd);
  }
}
 
private static void addForumIds(Collection<Integer> forumIds, Collection<Integer> candidates, Predicate<Integer> predicate) {
  if (candidates != null) {
    List<Integer> toAdd = candidates.stream().filter(predicate).collect(Collectors.toList());
    forumIds.addAll(toAdd);
  }
}

The last one

The last chunk of code needed more refactoring to take care of all the forum filtering logic first. It also needed a mutable object for the counter rather than a primitive so it could be updated inside a lambda.

Conclusion

I could refactor this more. There’s duplication in the two lambdas. And I could extract more code into separate methods. Interestingly, the total code base is about the same as before. (It’s actually three lines longer, but I have more than three lines of comments.) But the method with the main logic is shorter and far less complicated:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
public static Map<Category, List<Forum>> getBetaTopForums(int userId, List<Topic> primaryBestList, List<Topic> secondaryBestList){
        BestTopicsDAO bestTopicDAO = DataAccessDriver.newBestTopicDAO();
        Map<Integer, Integer> ratings = bestTopicDAO.selectForumRatingsByUserId(userId);
         
        Collection<Integer> forumIds = new LinkedHashSet<>();
        List<Integer> ratedForumIds = CollectionCommon.getKeysSortedByValues(ratings, true);
        final int defaultUserId = SystemGlobals.getIntValue(ConfigKeys.BEST_TOPICS_DEFAULT_USER_ID);
         
        // Let's first add the forums that are rated above 5/10
         addForumIds(forumIds, ratedForumIds, id -> {
            Integer r = ratings.get(id);
            return r != null && r > 5;
        });
         
        // Now let's add the forums found in best topics.
        addForumIds(forumIds, primaryBestList);
        addForumIds(forumIds, secondaryBestList);
           
        // If this user is not the default-rating-user, let's add the top forums
        // rated by the default-rating-user.
        if(userId != defaultUserId) {
            Collection<Integer> fids = getTopRatedForumIds(bestTopicDAO);
            addForumIds(forumIds, fids, id -> id != null);
        }
         
        // Let's also add some forums found in hot topics.
        List<Topic> tmpTopics = TopicRepository.getHottestTopics();
        addForumIds(forumIds, tmpTopics);
 
        // Finally, let's add the forums that were rated below 6/10
       addForumIds(forumIds, ratedForumIds, id -> {
            Integer r = ratings.get(id);
            return r != null && r < 6;
        });
         
        Map<Category, List<Forum>> forums = new LinkedHashMap<>();
         
        PermissionControl pc = SecurityRepository.get(userId);
         
        final int maxTopForums = SystemGlobals.getIntValue(ConfigKeys.BETA_VIEW_MAX_TOP_FORUMS_IN_HOME);
        AtomicInteger countForumsToShow = new AtomicInteger(0);
         
        // We need to remove the forums that are not visible, or rated 0/10
        forumIds.stream()
             // skip if can't access forum
            .filter(id -> pc.canAccess(SecurityConstants.PERM_FORUM, String.valueOf(id)))
            // get forum
            .map(ForumRepository::getForum)
            // skip if forum not found or can't access
            .filter(f -> f != null && f.isShowInForumList())
            // get category
            .forEach(f -> addForumToCategory(forums, f, maxTopForums, countForumsToShow));
         
        return forums;
         
    }
     
    private static void addForumToCategory(Map<Category, List<Forum>> forums, Forum f, int maxTopForums,
            AtomicInteger countForumsToShow) {
        Category c = ForumRepository.getCategory(f.getCategoryId());
        if (c != null && countForumsToShow.incrementAndGet() <= maxTopForums) {
            List<Forum> fs;
            if (forums.containsKey(c)) {
                fs = forums.get(c);
            } else {
                fs = new ArrayList<>();
                forums.put(c, fs);
            }
            fs.add(f);
        }
    }
 
    private static void addForumIds(Collection<Integer> forumIds, List<Topic> candidates) {
        if (candidates != null) {
            List<Integer> toAdd = candidates.stream().map(Topic::getForumId).collect(Collectors.toList());
            forumIds.addAll(toAdd);
        }
    }
     
    private static void addForumIds(Collection<Integer> forumIds, Collection<Integer> candidates, Predicate<Integer> predicate) {
        if (candidates != null) {
            List<Integer> toAdd = candidates.stream().filter(predicate).collect(Collectors.toList());
            forumIds.addAll(toAdd);
        }   
    }