[2024 dev2next] virtual threads

Speaker: Piotr Przybyl @piotrprz and @piotrprz@mstdn.social

For more see theĀ table of contents


Note: some things in this talk are in preview features. Might change/get better

Demo

  • Traditional demo of starting a thread and having it wait 10 seconds.
  • Ran 1K platform threads in just over 10 seconds. Ran 30K in 15 seconds. Ran 40K and it crashed due to “not enough space”.
  • Switched to virtual threads and no more resource constraints. Just over 10 seconds for 40K virtual threads. A million virtual threads took about 15 seconds
  • Awesome until see how fooled by demo
  • In the demo is sleeping, not using any resources
  • Redid demo with CPU work so no blocking.

Releases

  • released in Java 21 and previewed in 20-21
  • structured concurrency – preview in 23
  • Scoped values – preview in 22

Virtual threads

  • Great for I/O and waiting
  • No way to switch on and off globally in JDK
  • Don’t make threads faster; Make app scale better
  • Gave NYC example: drive to restaurant and park. Hardware resource (Car) not used. Vs taxi/Uber where don’t care where is when eating
  • If all cars were taxis, could move people faster. But one person doesn’t move faster by just using a taxi.
  • Said not his metaphor but likes it [I like it too]
  • Will not squeeze more juice from your CPU

Don’t Reuse

  • Reuse – cheap to start, create new
  • Pool – requires reusing

Don’t Pin

  • Like car “keep engine running”. Wastes resources, still have to pay.
  • I/O and synchronize pin in Java 21 when takes more than 20ms
  • Find with -Djdk.tracePinnedThreads=full and JFR events
  • toxiproxy – good for chaos engineering. Can set to half a second plus some jitter for variance (or anything you want)

Logging Double slit experiment – “how many lights are there”

  • Observing makes the answer different
  • Changed from 5 seconds of work to five one seconds and added logging.
  • Logging changing behavior because default handler is synchronous.
  • Less finished because opportunities for other threads to get in
  • It’s like everyone calling an Uber when a concert gets out.
  • Every time you do I/O, the thread blocks. ex: get out of taxi
  • Platform threads are different; they don’t block just because you do an I/O operation.

Other problems

  • Can’t ignore backpressure control
  • Ignore interrupts
  • merge with synchronized/native

Structured Concurrency (preview)

  • better “idioms” for multi threaded coded
  • helps eliminate thread leaks and cancellation delays
  • not replacing interruption with cancellation
  • showed waiting on two future. they don’t know the other gave up.
  • each future needs a reference to all other features so they know to give up when implemented manually
  • StructuredTaskScope.ShutdownOnFailure() { // two futures } // allows to know
  • scope.throwIfFailed() knows

Scoped Values (preview)

  • One way immutable ThreadLocals.
  • ThreadLocale is like a cache.
  • Bounded lifetime; visible in code
  • Results in simplified reasoning and improved performance
  • ScopedValue.where(x,y). run(() -> dostuff())
  • Can use Runnable or Callable
  • Can nest where() as long as you don’t mutate it

My take

This was interesting. I didn’t know pinned threads were a thing. Also good humor; well known debug pattern of “debug 1” šŸ™‚

[2024 dev2next] 7 tech trends

Speaker: Vanya Seth

For more see theĀ table of contents


ScriptingTheKernel – eBPF

  • Cartoon – need a year to add something to the kernel, then have to wait until Linux distro ships. Five years later available and requirements have changed
  • eBPF – verifies bytecode and then runs in kernel
  • kernel makes sure you are absolutely safe
  • “superpower of linux”
  • Cilium – networking/clusters, service mesh

AI Team Assistants

  • First reaction is usually coding assistants
  • Software development is a team sport
  • Not just about increasing coding throughput
  • Need to boost whole supply chain/delivery cycle and include all roles.
  • Need to think about how AI can help a cross functional team
  • https://www.thoughtworks.com/en-us/what-we-do/ai/ai-enabled-software-engineering/Haiven_team_assistant

Zero Trust Security for CI/CD

  • Zero trust is not a new trend
  • Need to think about CI/CD in same way as customer facing systems
  • Pipelines need access to critical data like code, credentials/secrets
  • Limit runner privileges
  • Short lived tokens

Using Gen AI to understand legacy codebases

  • A lot to understand to convert to a new tech stack – business logic, dependencies, etc
  • Document understanding
  • Ask gen AI to explain code, but it’s not enough
  • Can use RAG on codebase, but still not enough
  • Graphs + RAG is more powerful.

SecretsOps

  • Where do you put your seed secrets? The one neeeded to start everything. HOw do you bootstrap the bootstrapper?
  • Where store secrets overall?

On device LLM Inference

  • Need integrated into life
  • Wrapped into devices use ex: fridge [I don’t want my fridge to be smart!]
  • Quantization – compress parameters so can run on phone/raspberry pi
  • Small language models – fit for purpose models have 1-7 billlion params or less. Save memory
  • WebLLM – In browser inference engine

My take

Excellent keynote. New things and new ways to think about non-new things.

[2024 dev2next] kafka

Speaker: Daniel Hinojosa (mastodon.social/@dhinojosa)

For more see theĀ table of contents


Related tech

  • Piniot – OLAP
  • Input to kafka – kafka streams, akka streams, flink, spark streams
  • Connectors – ex: to database

Kafka

  • publish/subscribe queue
  • producer can also be a consumer

How looks inside

  • messages sharded
  • immutable data store
  • message gets an offset number in the partition.
  • data is temporary – specify retention size or time
  • don’t reuse offset numbers even after message delete
  • can only read/write to leader, not the replicas/followers

Message

  • Similar to a row/record
  • Just an array of bytes; format doesn’t matter
  • Message key is also an array of bytes. Only time something is guaranteed to be in order. Partitioner hashes key and maps to partition.

Batch

  • Group of messages
  • Every batch knows where each partition is going
  • Uses murmur2 for hashing
  • Can set batch size

Acknowledgements (Ack)

  • 0 – no ack; assume all is well; lowest latency
  • 1 – only goes to leader
  • all – all replicates must ack. Higher latency; safest. ex: bank transactions

Consumer

  • goal: scale to a large amount of different consumers without affecting performance
  • Consumers are not threadsafe
  • Consumer rebalance – mitigate when consumers go down
  • Settings: Isolation level

Producer

  • Settings: idempotent, transactions

Compaction

  • Retain message of same key where only latest message will be retained.
  • Cleaner thread does compaction
  • Can treat as events or tables
  • Tables treat Kafka as key/value database
  • LIkely don’t care about past with respect to table. Care about current/end state not everything that happened along the way.
  • Dirty – extra records

Stream processing

  • Everything is consumer/producer. Everything else is just a higher level
  • Stream groups
  • Java type stream methods – peek. foreach, groupByKey

My take

I used to know some of this, but had forgotten it so excellent review. And the new stuff was good too! I wish the screen had different resolution rather than relying completely on zooming in. Would have allowed to see some stuff while running, live code changing, and the web page (which weren’t magnified). The extended demo was great though! The gitpod “sales pitch” was a nice side effect.