[QCon 2019] Navigating Complexity – High-performance discovery teams

Posted on June 26, 2019 by Jeanne Boyarsky

Conal Scanlon

For other QCon blog posts, see QCon live blog table of contents

Discovery vs Delivery

Delivery: completing lots of tickets, points completed, happy team
Discovery – different goals
Two types of work; two different ways of thinking
Dual track agile – discovery is for learning. Inputs into delivery track.
Discovery will happen whether planning to do it or not. ex: bug report
Can change % of tie on discovery vs delivery over phase in project

Cynefin

Welsh word
Complex/Chaotic/Complicated/Obvious
Also box in middle for when don’t know which in.
Complicated – can use knowledge to see what to do

Key areas of discovery work

Maximize learning – accelerate discovery, MPVs
- Want to encounter problems as quickly as possible
- Learning is messy and doesn’t easily fit Scrum process
- MVP goal – maximize learning while minimizing risk and investment
- MVPs can be paper prototype or a single use case
Better ideas – idea flow, collective intelligence
- Levels: Psychology safety, dependability, structure & clarity, meaning, impact
- Validate with people outside team
- Closer relationships with specific customers so can see reaction as progress
Alignment – OKRs, Briefs, Roadmap
- OKR = objective and key results
- Think about where want to go and how get there
- Should understand why vs an aspirational goal
- Alignment and autonomy are orthogonal
- Product brief – map from strategic level to feature going to build. It is not a requirements or architectural doc
- Roadmap – show on larger ranges of time
Metrics – 3 levels, correct category (delivery vs discovery)
- Business, product and engagement metrics.

My impression

I like that he provided an outline with the key points up front. The OKR section was detailed with examples. I like that there were book references/recommendations. And it was certainly interesting. I think I expected it to be about something else, but I’m glad I came. I would have liked more on examples of discovery projects specifically.

[QCOn 2019] The Trouble with Learning in Complex Systems

Posted on June 26, 2019 by Jeanne Boyarsky

Jason Hand from Microsoft – @jasonhand

For other QCon blog posts, see QCon live blog table of contents

Definitions

We use terms where not everyone on same page as to meanings.
Ex: what does “complex” mean
Types of systems
- Whether can determine cause and effect
- Ordered vs unordered
- Ordered – Obvious (Can take it apart/put it back together. Know how works. ex: bicycle), complicated (ex: motorcycle)
- Unordered – obvious, complicated, complex (ex: people on road, human body), chaotic (ex: NYC)
Sociotechnical systems – th epeople part is hard

Complex system

Causality can only be examined/understood/determined in hindsight
Specialists, but lack broad understanding of system
Imperfect information
Constantly changing
Users good at surprising us with what system can/can’t do

Learning

Takes time
Takes success and failure. Need both
Learning opportunities not evenly distributed
Sample learning opportunities – code commits, config changes, feature releases and incident response. Commits occur much more often than instances
However, the cost to recovery is low for the more frequent opportunities
High opportunity – low stakes and high frequency. GIt push is muscle memory
Low opportunity – high stakes and low opportunity
Frequency is what creates the opportunity

Incident

Everyone would agree impacting the customer is an incident
If didn’t affect the customer, not always called an incident.
If not called an incident, no incident review.
Missed learning opportunity
We view incidents as bad.
Incidents are unplanned work.
Near misses save the day, but don’t get recognized or learned from
Systems are continuously changing; will never be able to remove all problems from system

Techniques to learn

Root cause analysis is insufficient. Like a post mortem, it is just about what went wrong.
Needs to be a learning review
Discuss language barriers, tools, confidence level, what people tried
Discuss what happened by time and the impact
ChatOps better than phone bridge because can capture what happened. Nobody is going to transcribe later. Having clean channel for communication helps.
However, incidents not linear.
Book: Overcomplicated
If someone just does one thing, the learning doesn’t transfer. Need operational knowledge and mental models

Learning Reviews

Set context – not looking for answers/fixes. Looking for ways to learn even if no action items
Set aside time/effort to be curious
Asking linear questions (ex: five whys), don’t get to reality system
Invite people who weren’t part of incident response. They should still learn and can provide info about system
Understand and reduce blind spots

My impression

Good talk. It’s definitely thought provoking. And suggests small things one can do to start making things better

[QCon 2019] Low Latency in the Cloud, with OSS

Posted on June 26, 2019 by Jeanne Boyarsky

Mark Price @epickrram

For other QCon blog posts, see QCon live blog table of contents

Requirements

Trading app
Need microsecond (not millisecond) response time
Need data in memory vs database
Lock free programming
Redundancy
High volume
Predictable latency

Hydra

System built on OSS
Opinionated framework to accelerate app dev
Clients communicate with stateless, scalable gateways
Persistors – manage data in memory.
Gateway – converts large text message to something smaller and more efficient

Design choices

Replay logs to reapply changes. Business logic must be fully deterministic. Bounded recovery times
Placement group in cloud – machines guaranteed to be near each other. Minimizes latency between nodes

Testing latency

Do as part of CD pipeline
Can’t physically monitor with fibertab
Capture in histogram to get statistical view and calculate data
Test under load
Fan out where test from
Store % in time series data
Can see jigger for garbage collection

Performance on shared box/cloud

Not in control of resources running on
Containers share L3 cache so can see higher rates of cache miss
CPU throttling effects
Hard to measure since can’t see what neighbors are doing
One option is to rent the largest box possible and compare to vendor website for specs. If have max # cores, know have box to self. Expensive. Was about five dollars a year. At that price, might be worth just buying own machine in data center
Can pack non latency services onto shared machines

My impressions

There was a lot of discussion about the histogram. I would have liked to see some examples rather than just talking about how it is calculated. They didn’t have to be real examples to be useful. There were some interesting facts and it was a good case study so I’m glad I went. I was glad he addressed that non-cloud is a possible option for this scenario

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Java/J2EE Software Development and Technology Discussion Blog

Category Archives: Conferences

[QCon 2019] Navigating Complexity – High-performance discovery teams

[QCOn 2019] The Trouble with Learning in Complex Systems

[QCon 2019] Low Latency in the Cloud, with OSS

Share this:

Share this:

Share this: