Machine Learning for Java Developers in 45 Minutes
Speakers: Zoran Sevarac & Frank Greco – @zsevarac & @frankgreco
For more blog posts, see The Oracle Code One table of contents
General
- “AI is the new electricity” – Andrew Ng (societies with AI were above those without
- For many tasks, algorithms are well known
- Other algorithms harder – image recognition. Rule based. Constantly add rules. Large number of rules. Complex.
- When complexity goes up, bells should go off. Avoid complexity.
- When complexity index is too big, it isn’t scalable. Breading ground for bugs.
- Not all use cases are not good for ML
- Core of ML – recognizing patterns in data and making predictions against the data
- Learn language by understanding all the rules (algorithm) or observing patterns (ML)
Terms
- AI – type of algorithm where machine emulates aspects of human behavior
- ML – subset of AI. Allows machine to learn from experience/data
- Deep learning. Subset of ML. Uses powerful computing and advanced nueral networks
Deep learning
- Accuracy grows with more data.
- Older learning algorithms get outperformed after a certain amount of data.
- Think of deep learning as a graph. Each node performs computation. Computation can be reconfigured by tweaking coefficients on edges
- Layer – groups of nodes
Examples
- Image recognition
- Spam classification
- Data classification
- Identifying handwritten characters/image transformation
Data
- Training data
- Try to minimize differences as go thru
- Once goes below a certain threshold, training stops
- Determine whether false positives or false negatives are worse for your use case
JSR381 – Visual Recognition API
- Standard API for computer vision tasks using machine learning
- Provides generic ML API design to support other libraries
- Next phase is to figure out who/what get wider support/adoption
- Brings ML closer to general Java dev audience
- App programmers need to know this. Don’t need to become a data scientist to use.
Why matters
- Patterns
- Can change data structures
- The case for Learned Index Structures – https://arxiv.org/abs/1712.01208
- New hardware for API
- What happens to countries that host call centers and their economy?
Issues
- Need clean data
- Privacy and ethics
- Correlation vs causality
- Data hacking/poisoning
- DeepFakes – can create people that don’t exist
- Interpretability
- AI/ML talent is scarce
My take
This was a great way to get started. There were a bunch of code samples as well using Java APIs.