[javaone 2026] Look Inside a Large Language Model to Become a Better Java Developer

Speaker: Barry Burd

See the table of contents


Opener

  • Imagine have to find lowest point on a line but can only see a few steps ahead or behind you. One dimensional problem.
  • On a mountain, finding lowest point but in two dimensions.
  • LLM does that but with many more dimensions – ex: billions
  • Uses a lot of tricks, not just extending the one dimensional problem

Problem

  • Training GPT-3 required 10K NVIDIA GPUs
  • PyTorch is highly optimized 0 biuldt in libraries, deep integration with GPU hardware (NVIDIA CUDA)
  • Apple has GPU stack
  • Want to do with Java

Solution

  • HAT (Heterogeneous Acceleration Toolkit)
  • Work in progress
  • Part of project Babylon
  • Code models/reflection
  • Barry’s goal: algorithms to run on these

Deeplearning4j (ND4j)

  • CUDA support
  • No MDS support
  • Arrays stored off-heap (outside JVM)
  • Several arrays can point to several subarrays of same data.

What LLM does

  • After analyzing a possible incomplete string, the LLM decides to add the string’s next token.
  • Too many words to predict word
  • Characters too granular because missing meaning.

Tokens

  • I’ve groked Heinelin’s work as tokens:
*I

've

 gro

k

ked

 Hein

lein

's 

work

s
  • Token id is a number that goes with the tokens
  • Token is sequence of characters that occur together frequently enough using byte-pair encoding.
  • Supposing string is “a b r a c a d a b r a” the tokens are: a, b, c, d,r . Then observer token pair ab appears frequently so ab is also a token. Now have “ab r a c a d ab r a” with “ab” added to vocabulary. Then repeat and see ab and r appear next to each other so “abr a c a d abr a” with “abr” added to token list. Then “abra”
  • Python library tiktoken
  • Java library JTokkkit

Brain

  • 86 billion neurons in human brain
  • Dendrite – input from another cell
  • Soma – cell body
  • Axon – output to other cells
  • Oversimplified: Imagine cell body multiplies each input by a certain weight (different per cell) and adds them. That’s like multiplying a vector and a matrix

Math terms

  • Vector – array/list of numbers. Can represent a point in n-dimensional space. Usually visualized as an arrow from origin to that point. ex: 1526 dimensions is 1526 number in vector
  • Matrix – rectangular array of vectors. Turns one vector into another
  • Tensor – stack of matrixes. Array of array of matrixes. Not important here.
  • Dot product of two vectors – multiply elements in same spot in each vector and add them up.
  • Matrix multiplication – had nice animation

ND4j

  • N dimensions for J.
  • Knows how to do vector/matrix math

Embedding

  • Each token gets assigned to an arbitrary vector at first. This is the token embedding
  • Picture adjusting bunny years antenna and how it only works when you are touching it. Walking away breaks the reception.
  • Each number in the initial arbitrary vector is like a dial that needs to be tuned

Gradient Descent

  • Normally millions of minimum points and easy to get stuck in point that is a local minimum. Looks like at lowest point in all directions, but there is a lower one elsewhere. LLM training is meant to avoid that pitfall
  • Eclipse DeepLearning4J – can configure neural network and make a model

Vector meaning

  • Similar vectors are similar semantics
  • Applying related vectors to others should have consistent semantic meaning
  • RGB for colors are vectors with three points representing the colors.
  • Dot product of Cyan (0/255/255) and Red (255/0/0) because nothing in common
  • Add positional embedding to token embedding so know where in sentence. (Add to each element with different scale so know which part goes with which). These combined are in the input embedding

Attention

  • Attention is all you need https://arxiv.org/abs/1706.03762
  • Attention examples: grammatical structure, meaning, word order
  • Long range dependency – like a pronoun that refers to something many words away
  • Attention helps focus on important parts, ex: “The cat sat on the mat”. What’s on the me (the mat) at that point.. Has to know.
  • Apply key matrix to cat and a query matrix to mat. Key matrix offers info. Query matrix is what you want to know.
  • Start with random values. Then multiply by tokens. Then take dot product and get a number to see what predicts about next word. See how far prediction off from what it actually is. Apply to surrounding neighbors and go in direction that makes error less. Repeat a very large number of times
  • A lot of this can be done in parallel

Feed forward

  • Linear – straight – ex: 2x + 3y (2 and 3 are the knobs to tune)
  • Non-linear – wavy – more dimensions
  • Languages isn’t linear by nature
  • “great, terrific meal” – can add up
  • “not good” – not flips meaning of sentence. So can’t just predict the next word.

Universal approximation theorem

  • Imagine a wavy line as a series of bumps from the ground to that line
  • The more granular the bump, the more accurate the result.
  • Each bump can be represented by a linear formula
  • Apply GeLU (non linear) function to make the bumps

Experiment

  • Translated Karpathy’s 253 line LLM into Java with a list of baby names
  • Took about a minute to train.
    • Generated new names. A couple legit but most random looking

My take

This was great. Tokens and critical concepts get used without defining them and we take for granted. So being able to think about it was informative and helpful. I like that Barry showed math but said ok to not to understand. [it was understandable]

[javaone 2026] community and intellij keynotes

See the live blog table of contents


Sharat Chandler

  • Started by throwing two stuffed Dukes to the audience
  • 30 years of Java
  • 25th edition of JavaOne
  • People first, technology second.
  • 360+ Java User Groups
  • 400+ Java Champions
  • 1M+ through Oracle University

Heather Stevens (Oracle education)

  • Brought up two people from College Board; education group
  • learn.java – for for beginners, teachers and students
  • Had education submit on Monday at JavaOne
  • Monday field trip to see FIRST Robotics Competition team at Oracle Design Tech High School (BREAD). [ got to go. It was nice seeing another team’s space]

Student Panel

  • 4 students on panel and 1 student as moderator
  • Point of education is to learn how to learn and work within context
  • Cybersecurity is almost as broad as computer science itself.

AI Tools Adoption – Lize Raes/Ana-Maria Mihalceanu/Paul Bakker/Simon Martinelli

  • Audience poll: AI most useful for reading and understanding code
  • “I find it most useful for everything” – Simon
  • Move faster
  • AI created migrations
  • Audience poll: unsurprisingly most people said much more productivity with AI. About 5% put less productive and about 10% said about the same though.
  • When code generation is a commodity, stills that matter: maintainable code, scalable to future systems

Chad Arimura/Colt McNealy, Mandeep Gill/Zoran Sevarac

  • Panel for startups
  • Gave example where had to switch from Python to Java
  • Java has technical advantage for business critical scaling and also business ecosystem that exists
  • Colt made joke about doing Java because of his dad [co-founder of Sun]
  • exciting features: data oriented programming, virtual threads
  • More time helping customers user AI than doing it by self – Colt
  • Docs mainly written by AI
  • Most important thing is talking to users and understanding what their problem is
  • Understand business value and logic
  • Hire people smarter than you

Jim Grisanzio/Bruno Souza/Mala Gupta/Brian Vermeer

  • Panel of community leaders
  • Only about half the audience is a member of a JUG (or didn’t want to raise their hands). More raised hands for learned something/watched a video from a JUG
  • Knowledge flows between juniors and seniors in both directions
  • Some users groups run conferences with thousands of attendees. Ex: Netherlands JUG runs two conferences a year. Helps with growth and getting broader people in. Country wide and local (city level) JUGs. Delhi had over 1000 participants every year.
  • Bruno doing tour of 10 JUGs on this trip
  • Venkat did tour of 12 JUGs in Brazil
  • Mentorship Hub at this conference
  • Community gives you relationships; AI can’t do that

Aton Arhipov – Intellij

  • IntelliJ is 25 years old
  • Product originally called IntelliJ Renamed – a standalone Java refactoring tool
  • Still built on Swing
  • 2009 – open sourced the core
  • Goal is to support every Java release on day 1
  • Support for preview features
  • More than just the language. ExL Maven 4 supported even though hasn’t shipped yet
  • Includes Spring debugger
  • Probabilistic AI + Deterministic tooling – reliable software
  • in December 2025, merged community and ultimate into one edition. Subscription adds features
  • Just launched Koog for Java – enterprise agentic framework for Java

Sharat & Duke

  • dev.java
  • inside.java
  • youtube.com/java
  • Shout out to Java Champions
  • Logos from 16 Java conferences on screen

My take

The community keynote is always great and this year was no exception. I like the variety and that it comes from many parts of the Java community. I missed the community keynote last year (had to fly out Wednesday on the red eye for a robotics competition) so I’m super appreciating it this year. Cool seeing the original IntelliJ screen. Love that Shar brought Duke on stage for the end of the keynote. Duke doesn’t speak but did great gestures to participate.

[javaone 2026] Evolving the Java Language: An Inside Perspective

Speaker: Brian Goetz

See the live blog table of contents


General

  • Need to balance teaching Java new tricks, but feeling like has been part of language all along
  • Like living on the Ship of Theseus

Anti patterns

  • Didn’t want to be like Perl and cram in features. It’s nice was taken over by Ruby and Python
  • Someone wants a feature and extrapolate from one data point. View is hyperlocalized and doesn’t fit cleanly into rest of language. Starts with presupposing a solution vs developing a deeper, shared understanding of the problem.
  • Heroes journey archetype. With language features, they don’t always return home victorious. Like marble maze game where many holes to fall into

Many bad ideas

  • Undeclared statement. Undelcare x would mean taking it out of scope. Presumably useful for writing thousand line methods was to reuse variable in the same scope.
  • Easy to dismiss an obviously bad idea
  • Other idea aren’t bad ideas for some language, but not a good choice for Java. Or might have been a good idea for Java in the past, but decisions since conflict with it.
  • Make parameters final by default – backward compatibility issues. Could do for new syntax like lambdas or pattern matching, but adds complexity to what developers need to learn about when variables are mutable. Would have to be language historian to know when added to know how behaves. Living with mistakes of the past

Complaints

  • “just do what languageX does”. Features live in the context of a language. For example, dynamic types fits in Python. Taking inspiration from other languages isn’t much of a shortcut.
  • Why do you have to reinvent everything. Needs to fit Java. Doesn’t look nailed onto the side.

Ideas

  • Ideas are not a scarce resource
  • Having the idea is a very small percentage of the work
  • “Genius is 1% inspiration and 99% perspiration” – Thomas Edison
  • Steve Jobs said it is a disease thinking the idea is 90% of the work.
  • Person with idea gets the credit so get married to idea.
  • First idea is never as good as you think it is
  • Value of first idea is that it gets you to the next idea. And so

Solutions vs problems

  • Problems are better than solutions
  • Ideas often come dressed as solutions. Feels helpful. Not just complaining, have an answer.
  • Need context
  • What new problems will it cause
  • Will benefit exceed cost
  • Have to reverse engineer problem from solution
  • Maybe don’t understand problem. But usually there is some problem.

Syntax

  • Amateurs discuss syntax.
  • Suck up all the oxygen in the room and start again in the room next door.
  • Subjective
  • Critically important
  • Avoid pitfalls like thinking people would consider the method reference :: to be like C++
  • Don’t want new features to stand out “because new” but then become old with bold features

Semantics

  • Thinking is hard
  • There are no writing problems, only thinking problems
  • Trying to write something down clearly tells you if you understand it
  • Interactions with existing features, new
  • Interactions with already complex features
  • Solid theoretical basis or ad hoc

Translation (to bytecode)

  • Most developers on’t think about, and shouldn’t have to, think about bytecode
  • Important part of design process.
  • Affects performance
  • Default methods – no easy way to implement so had to do it the hard way
  • Symbols matter. Anonymous classes and stable names. Could affect reflection, deserialization, etc. Need to decide if ok for name to change

Constraints

  • User perception – can’t ask too much of users. Building on things already in language makes this better. Using similar rules to other parts of the language keeps intuition.
  • Past decisions – existing code written with a set of assumptions. Backward compatibility. Code that already means something else is a constraint of a past decision. Breaking changes undermine investment in code and trust in Java. Also think about future changes so not making a change that will cause future compatibility problems. Language features must go together
  • Future decisions – what paths are closed in future by decision. Can’t use up degrees of semantic freedom because then no way to implement better ideas. For example, can’t use up all delimiters
  • Ecosystem and incentives – avoid dividing ecosystem into libraries that do things in the new/old way. Create incentives to write good code
  • Budget – ex: complexity

Get more out of concepts/features that exist

  • Records as tuples
  • Sealed classes as generalization of final classes
  • Virtual threads are a type of thread

Evolving

  • Pattern matching introduced a bit at a time
  • Simple so easier to get
  • Replaced existing idiom with more concise idiom so people like. Marketing
  • Flow based scoping

Summary

  • A program is a means to an end
  • Language design is like writing all possible programs simultaneously
  • Have to understand how a feature can be used or misused

My take

I enjoyed hearing about the “whys”. The marble maze analogy was fun. And the examples are great. It was nice getting a peek behind the curtain.