[javaone 2025] Jump-Start Your Data Science Learning with Jupyter Notebooks and Java

Posted on March 19, 2025 by Jeanne Boyarsky

Speaker Brian Sam-Bodden @bsbodden

See the table of contents for more posts

General

At school everything was Python. Then saw had to rewrite to get to production because was a giant notebook.
Data science – multi disciplinary field that uses scientific methods, algorithms, processes and systems to extract knowledge and actionable insights from structured/unstructured data
Data science – extracts insights/knowledge from data.
ML – algorithms capable of learning patterns from data and making predictions
Deep learning – specialized methods
Foundations – statistics, big data, ML, distributed computing, NLP, Gen AI
Present day – Applied AI – RAG evolves, agents are back

Jupyter Notebooks

Can run tests against notebooks from CI/CD
Spun off iPython project in 2014
Browser based notebook interface
Supports code, text, math, plots and other media
Originally supported Julia, Python and R
Code and markdown cells
Can replace some documentation with notebooks. Can add button to run in codelabs
Can only use one language per notebook; can’t mix and match
Important to runs in cell order to avoid errors
Good for experimentation and discovery, executable docs, defacto communication medium in the data science/ML/AI community

Build data science stack with Java

Java is the dominant force in the enterprise
Rich data science ecosystem – DJL weka, Mahout, mallet, flink, H20.ai, semantic-kernal, spark, smile, MLLIb, jenetics and many more libraries
DL4J no longer maintained. DJL (Deep Java Library) is different
python – pandas – matrix apps, display data, data frame

Java stack

Jupyter Lab Docker Stack Image (Notebook is single Notebook. Lab is interface to show notebooks)
JJava Jupyter kernel – there are others; this is one of the most stables.
Curated set of Java libraries
Glue code to streamline API usage

JJava Jupyter Kernel

Well maintained
Fork of IJava Kernel
Uses JShell
Java 21
Can write code in a cell without any ceremony – ex: just a println Or a full class and then the code to run [like how the java playground at dev.java deals with classes]

Glue code

Glue code can be in jar. Don’t have to put all code in notebook.
Good to have the methods be static

Example: linear regression with Iris dataset

Python version – uses pandas/data frame, showed tables and plots, linear regression class, test vs training data
Java version – load dependencies via maven command, DS.read() to get dataframe, DS was a three line glue code method. Then showed tables/plots. Code too long; mostly abstracted
JFreeChart to show plot
DFLib for linear regression along with commons math
Created a linear regression class using commons math

More examples

Object detection
Visualize embeddings of Vectors
Code is very short
https://github.com/bsbodden/data-science-with-java

RAG with Spring AI and Redis

Redis – very fast vector database, also caching
Vectorize question to retrieve
Enhance question to augment
And then ask LLM for answer

My take

Good intro. Assumed didn’t know anything about notebooks to start. I like that he showed the Iris example in both Python and Java. This was all new to me and great to see. I’m confused about who it is for. I thought the sales pitch of Python was that easier to code for data scientists vs Java devs. Maybe creating a DSL for the data scientists? He said at the end about reusing skillsets.

[javaone 2025] how netflix uses java 2025 edition

Posted on March 19, 2025 by Jeanne Boyarsky

Speaker: Paul Bakker

See the table of contents for more posts

He started out by showing the social media reaction to his few minutes in yesterday’s keynote. Which included “how much do you pay Oracle” to which he said 0 (they use Azul but also Open JDK exists). And my favorite “Java is heavyweight; you should use Kotlin”. Which is entertaining because it is literally the same runtime

For streaming

Hight RPS (requests per second)
Multi region – 4 regions. Expensive/slow (milliseconds) to communicate across region, but needs to be near customers.
Large fanout to backend services
Retry on failure, aggressive timeous
Non relational data store
GraphSQL query to API gateway, federated so can get to multiple data sources. DGS (domain graph service)
Spring boot
Kafka
gRPC
evCache
Stream processing – ex: Spark
Also have Go and Python, but mostly Java

Enterprise/studio apps (ex; managing movie production)

Low RPS
Single region
Relational data store
Failure not acceptable
UI and backend
Similar – GraphQL, Federated Gateway, DGS, spring boot
Database could be postgres

General

Were on Java 8 until recently
Relied on old libraries and old in house framework which were incompatible with modern Java
Java 11 wasn’t enough incentive to upgrade
Went to 17 as a big migration
Migrated all services to Spring Boot – 3000 apps
Patched unmaintained libraries for JDK availability – “might look hard; its not”

Garbage Collection

G1 is better on Java 17 than Java 8
About 20% less CPU on garbage collection
Switched to Generational ZGC in Java 21. More predictable. Pause times are effectively gone
Important to have generational garbage collector so doesn’t have to go thru whole heap each collection
Error rates also dropped due to not having GC related timeouts

Virtual Threads

Added virtual thread support to internal frameworks
Virtual threads and structured concurrency will replace reactive
Java 23 – mixing synchronized and reentrant locks lead to deadlocks due to thread pinning. Some virtual threads are pinned when waiting on lock but no more platform threads available resulting in deadlock
Had to back off virtual threads some because of that. Fixed in Java 24

Spring Boot

Added Netflix modules to open source spring boot
Looks like regular spriing boot to developers
Upgrade Netflix Spring boot to OSS minor releases in days
Added: security, gRPT, IPC clients, etc
Use WebMVC
Not using Webflux since not using reactive
Spring 3 – went to Java 17, jakarta packages. Need to upgrade libraries at same time. Used bytecode rewrite in Gradle to change package names during migration

GraphQL vs gRPC

GraphQL – flexible schema to query data, think in data rather than methods
gRPC- highly performant for server to server communication. Think in methods rather than data
REST – easier than GraphQL but doesn’t recommend for UI. Often returns more data than UI needs

Deployment

Either AWS or Titus (in house k8s)
Exploded JAR with embedded Tomcat
Not using native images – not yet working well enough. Hard to get right. Development experience is worse – build time longer, don’t hant to build a native image for development
Experimenting with AOT and Leyden

My take

Great case study!

[javaone 2025] stream gatherers: the architect’s cut

Posted on March 18, 2025 by Jeanne Boyarsky

Speaker: Viktor Klang

See the table of contents for more posts

Oracle has sample code so I didn’t take notes on all the code

General

Reviewed source, intermediate operations, terminal operations vocabulary
Imagine if could have any intermediate Stream operations; can grow API
Features need (collectors don’t meet all needs) – consume/produce ratios, finite/infinite, stateful/stateless, frugal/greedy, sequential/parallelizable, whether to react to end of stream
Stream gatherers preview in Java 22/23. Released in Java 24

New interface

Gatherer<T, A, R> – R is what goes to next step
Supplier<A> initializer()
Integrator<A, T, R> integrator() – single abstract method boolean integrate(A state, T element, Downstream<R> downstream) – single abstract method – boolean push(R element)
BinaryOperator<A> combiner()
BIConsumer<A, Downstream<R>> finisher()

Basic Examples

Showed code to implemented map()
Gatherer.of() to create
Call as .gather(map(i -> I +1)
Other examples: mapMulti(), limit()

Named Gatherers

Progression – start as inline code and then refactor to be own class for reuse.

Parallel vs Sequential

For sequential, start with evaluate() and call in a loop while source.hasNext() and integrator.integrate() returns true
For parallel, recursively split the upstream until the chunks are small. (Split/fork into distinct parts)
For takeWhile(), need to deal with short circuiting/infinite streams. Can cancel() or propogate short circuit symbol

Other built in Gatherers

scan() – kind of like an incremental add/accumulator
windowFixed() – get immutable list of certain sized optionally keeping last
mapConcurrent() – specify maximum concurrency level

Other notes

Can compose
Stream pipeline – spliterator + Gatherer? + Collector

My take

This is the first time I’ve seen a presentation on this topic. It was great hearing the explanation and seeing a bunch of example. The font for the code was a little smaller than I’d like but I was able to make it out. Only a bit blurry. Most made sense. A few parts I’m going to need to absorb. He did say “it’s a bit tricky” so I don’t feel bad it wasn’t immediately obvious! The diagrams for parallel were helpful

Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Java/J2EE Software Development and Technology Discussion Blog

Category Archives: Conferences

[javaone 2025] Jump-Start Your Data Science Learning with Jupyter Notebooks and Java

[javaone 2025] how netflix uses java 2025 edition

Speaker: Paul Bakker

[javaone 2025] stream gatherers: the architect’s cut

Speaker: Viktor Klang

Share this:

Speaker: Paul Bakker

Share this:

Speaker: Viktor Klang

Share this: