[2019 oracle code one] Machine Learning

Machine Learning for Java Developers in 45 Minutes

Speakers: Zoran Sevarac & Frank Greco – @zsevarac & @frankgreco

For more blog posts, see The Oracle Code One table of contents


General

  • “AI is the new electricity” – Andrew Ng (societies with AI were above those without
  • For many tasks, algorithms are well known
  • Other algorithms harder – image recognition. Rule based. Constantly add rules. Large number of rules. Complex.
  • When complexity goes up, bells should go off. Avoid complexity.
  • When complexity index is too big, it isn’t scalable. Breading ground for bugs.
  • Not all use cases are not good for ML
  • Core of ML – recognizing patterns in data and making predictions against the data
  • Learn language by understanding all the rules (algorithm) or observing patterns (ML)

Terms

  • AI – type of algorithm where machine emulates aspects of human behavior
  • ML – subset of AI. Allows machine to learn from experience/data
  • Deep learning. Subset of ML. Uses powerful computing and advanced nueral networks

Deep learning

  • Accuracy grows with more data.
  • Older learning algorithms get outperformed after a certain amount of data.
  • Think of deep learning as a graph. Each node performs computation. Computation can be reconfigured by tweaking coefficients on edges
  • Layer – groups of nodes

Examples

  • Image recognition
  • Spam classification
  • Data classification
  • Identifying handwritten characters/image transformation

Data

  • Training data
  • Try to minimize differences as go thru
  • Once goes below a certain threshold, training stops
  • Determine whether false positives or false negatives are worse for your use case

JSR381 – Visual Recognition API

  • Standard API for computer vision tasks using machine learning
  • Provides generic ML API design to support other libraries
  • Next phase is to figure out who/what get wider support/adoption
  • Brings ML closer to general Java dev audience
  • App programmers need to know this. Don’t need to become a data scientist to use.

Why matters

  • Patterns
  • Can change data structures
  • The case for Learned Index Structures – https://arxiv.org/abs/1712.01208
  • New hardware for API
  • What happens to countries that host call centers and their economy?

Issues

  • Need clean data
  • Privacy and ethics
  • Correlation vs causality
  • Data hacking/poisoning
  • DeepFakes – can create people that don’t exist
  • Interpretability
  • AI/ML talent is scarce

My take

This was a great way to get started. There were a bunch of code samples as well using Java APIs.

[2019 oracle code one] JPMS Modules with Gradle

Building JPMS Modules with Gradle

Speakers: Paul Bakker @pbakker

For more blog posts, see The Oracle Code One table of contents


General

  • Nobody in room yet using module system
  • Often takes several years after a feature release to be adopted.
  • Took 2-3 years after lambda use was common
  • Most open source provides a real module or at least a module name in the meta-inf

Plugin

  • Maven supported module system early on and still does.
  • Gradle did not so developed a plugin
  • https://github.com/java9-modularity/gradle-modules-plugin

IntelliJ

  • Shows modules as folders

Gradle

  • Each module has own build.gradle
  • Root build.gradle file is where the modular plugin is applied
  • external plugin so set classpath and apply to all java projects
  • Dependencies are defined in both build.gradle and module-info.java. Have to do this in Maven too.
  • Plugin adds gradle configuration when compile/test.

Flags

  • Showed exports – plugin has addExports option when compile
  • Showed opens – plugin has addOpens option when compile
  • Also supports AddModules and addReads like the Java options

Application plugin

  • Creates tar/zip with lib and bin folder
  • Supports module path

Packages

  • Split packages
  • “The classpath always works – as long as you don’t have expectations”
  • Always a problem; now in your face
  • patch-module as workaround. Puts library in patchlibs instead of being on module path

Multiple start scripts

  • Out of box, can create one zip file/one starting point
  • Module plugin allows creating custom startup scripts

Testing

  • For whitebox testing, ok to break encapsulation.
    • Classpath is ok
    • Plugin provides moduleOptions on test to choose
  • Black box testing – want module path to test how module works from outside module boundaries.
    • Use module path
    • Need additional module for this approach for tests

Java versions

  • plugin has modularity.mixedJavaReleases
  • compiler runs in two steps
  • compiles everything module-info.java using recent java version (presumably Java 9) and pulls out module-info.java
  • compiles other sources using java 8
  • the module-info.java is ignored by Java 8

My take

I found the gradle files hard/impossible to read (red/purple/green) on dark background. That made it hard for me to follow the parts I didn’t already know. I did learn some though so it was worth going. And it’s great that gradle has a JPMS plugin! I like how he mapped the command line to Gradle code.

[2019 oracle code one] exploring collectors

Exploring Collectors

Speakers: Venkat Subramaniam

For more blog posts, see The Oracle Code One table of contents


General

  • Common operations; filter, map, reduce
  • Filter – like a coin sorter. Will let some values through and discard others
  • Reduce – go from a stream to a non stream
  • Collect is a reduce operation
  • Functions should be pure – doesn’t change anything *and* doesn’t depend on anything that could change

Collectors – concepts

  • Don’t call add() in a forEach. Should be using a Collector
  • Can’t parallelize code when have shared mutability (ex: add() in forEach)
  • Can’t say “the code worked”. Can say “the code behaved”
  • If use ConcurrentList, it’s a ticking time bomb for when someone changes the list type.
  • Should write collect(Collectors.toList()) or collect(toList()). Already handles concurrency (when running with a parallel stream)
  • Venkat prefers using Collectors as a static import and just calling toList()
  • Collectors are recursive data structures. The second parameter is another Collector
  • Often need to chain collectors to do what want.
  • Ok to write code “the long way” and then refactor once have passing tests

Collectors – Code

  • Java 10+: toUnmodifiableList() – immutable list
  • partitioningBy – when need both the matching and non matching results. Avoids needing two passes of data to get result.
  • joining(“, “) – comma separated
  • groupingBy(Person::getName) – create map with key as name and value as list of Person objects. Conceptualize as buckets. Put items in bucket by key
  • groupingBy(Person::getName, mapping(Person::getAge, toList())) – map after group. Perform mapping right before throw data into bucket.
  • groupingBy(Person::getName, counting()) – value is # matching values
  • groupingBy(Person::getName, collectingAndThen(counting(), Long::intValue)) – transform the result of a collector to a different type

My take

I like that Venkat talked about how to write code “the long way” to explain the power of collectors. This was a good review. And good motivation as we update our OCP book (I have the streams chapter). I like the bucket analogy for groupingBy(). I didn’t know about collectingAndThen() The 45 minutes flew!