[2024 dev2next] kafka

Speaker: Daniel Hinojosa (mastodon.social/@dhinojosa)

For more see the table of contents


Related tech

  • Piniot – OLAP
  • Input to kafka – kafka streams, akka streams, flink, spark streams
  • Connectors – ex: to database

Kafka

  • publish/subscribe queue
  • producer can also be a consumer

How looks inside

  • messages sharded
  • immutable data store
  • message gets an offset number in the partition.
  • data is temporary – specify retention size or time
  • don’t reuse offset numbers even after message delete
  • can only read/write to leader, not the replicas/followers

Message

  • Similar to a row/record
  • Just an array of bytes; format doesn’t matter
  • Message key is also an array of bytes. Only time something is guaranteed to be in order. Partitioner hashes key and maps to partition.

Batch

  • Group of messages
  • Every batch knows where each partition is going
  • Uses murmur2 for hashing
  • Can set batch size

Acknowledgements (Ack)

  • 0 – no ack; assume all is well; lowest latency
  • 1 – only goes to leader
  • all – all replicates must ack. Higher latency; safest. ex: bank transactions

Consumer

  • goal: scale to a large amount of different consumers without affecting performance
  • Consumers are not threadsafe
  • Consumer rebalance – mitigate when consumers go down
  • Settings: Isolation level

Producer

  • Settings: idempotent, transactions

Compaction

  • Retain message of same key where only latest message will be retained.
  • Cleaner thread does compaction
  • Can treat as events or tables
  • Tables treat Kafka as key/value database
  • LIkely don’t care about past with respect to table. Care about current/end state not everything that happened along the way.
  • Dirty – extra records

Stream processing

  • Everything is consumer/producer. Everything else is just a higher level
  • Stream groups
  • Java type stream methods – peek. foreach, groupByKey

My take

I used to know some of this, but had forgotten it so excellent review. And the new stuff was good too! I wish the screen had different resolution rather than relying completely on zooming in. Would have allowed to see some stuff while running, live code changing, and the web page (which weren’t magnified). The extended demo was great though! The gitpod “sales pitch” was a nice side effect.