Speaker: Daniel Hinojosa (mastodon.social/@dhinojosa)
For more see theĀ table of contents
Related tech
- Piniot – OLAP
- Input to kafka – kafka streams, akka streams, flink, spark streams
- Connectors – ex: to database
Kafka
- publish/subscribe queue
- producer can also be a consumer
How looks inside
- messages sharded
- immutable data store
- message gets an offset number in the partition.
- data is temporary – specify retention size or time
- don’t reuse offset numbers even after message delete
- can only read/write to leader, not the replicas/followers
Message
- Similar to a row/record
- Just an array of bytes; format doesn’t matter
- Message key is also an array of bytes. Only time something is guaranteed to be in order. Partitioner hashes key and maps to partition.
Batch
- Group of messages
- Every batch knows where each partition is going
- Uses murmur2 for hashing
- Can set batch size
Acknowledgements (Ack)
- 0 – no ack; assume all is well; lowest latency
- 1 – only goes to leader
- all – all replicates must ack. Higher latency; safest. ex: bank transactions
Consumer
- goal: scale to a large amount of different consumers without affecting performance
- Consumers are not threadsafe
- Consumer rebalance – mitigate when consumers go down
- Settings: Isolation level
Producer
- Settings: idempotent, transactions
Compaction
- Retain message of same key where only latest message will be retained.
- Cleaner thread does compaction
- Can treat as events or tables
- Tables treat Kafka as key/value database
- LIkely don’t care about past with respect to table. Care about current/end state not everything that happened along the way.
- Dirty – extra records
Stream processing
- Everything is consumer/producer. Everything else is just a higher level
- Stream groups
- Java type stream methods – peek. foreach, groupByKey
My take
I used to know some of this, but had forgotten it so excellent review. And the new stuff was good too! I wish the screen had different resolution rather than relying completely on zooming in. Would have allowed to see some stuff while running, live code changing, and the web page (which weren’t magnified). The extended demo was great though! The gitpod “sales pitch” was a nice side effect.