For more QCon posts, see my live blog table of contents. Adrian is from Gilt.
History
- No off the shelf software to run a flash sale business. Therefore Gilt has to do something custom.
- Started with Ruby on Rails in 2007. Didn’t scale well enough
- Moved to Java in 2011
- Moved to microservices in 2015
- In a 30 day period, moved bulk of Gilt to Amazon
Problems
- Isolation problem – nobody should be able to take down someone else’s work
- A noon outage in 2013 – what happened
- Impedance mismatch problems. “Developers often think of machines as something that’s all theirs, magically provided by the hardware fairy.”
Machines for Gilt Japan
- Run 20-40 containers per machine.
- Load balancer between two racks of three boxes each.
- Separate machines for the database and email.
- From developer’s point of view, a machine is a machine.
What did Gilt Japan learn
- Scalable by time of day
- Solves impedance mismatch – developers see “a machine”
- Limits damage one person can do
- Infra/Devops engineer embedded into engineering team
- Outstanding potential problems
- Static infrastructure
- Resource hogging
Docker topology
- Dark canary – only for internal use
- Canary – First prod install. Let it run for a while (ex through a noon cycle for Gilt)
- Release – Once happy with canary, roll it out to other nodes
- Gilt has a lot of read only traffic which limits damage you can do and reduces need for staging environment.
- Gilt has one container per host/EC2 instance
- Want to have as few moving parts/risk points in deployment process
- “We could solve this now, or just wait six months and Amazon wil provide a solution”
Projects
- ION Roller
- Immutable deployment – Destroy original cluster when done with this process for Docker upgrades.
- Slow to setup/tear down environments.
- Can be expensive for continuous deployment
- Open source, but in house.
- Nova
- Uses yaml to deploy
- No Docker registry. Base images are on Docker. Releases aren’t needed on there so go straight to Amazon
- Less boilerplate
- Immutable deployment on mutable infrastructure. Docker container is immutable.
- Fighting bit rot, chaos-monkey style
- Don’t want things to run forever in Prod.
- What if there is a security vulnerability
- Every day, kill oldest AMI randomly. This forces latest AMI with fixes and fail early.
- Doesn’t solve vulnerability in Docker container. Would need new release with new base image for that. Hasn’t happened to Gilt yet.
- Sundial
- For running batch jobs
- Automatically reschedules if fail
- Define a process – group of tasks with dependencies between them
EC2
- Less configuration
- Automatic rollout
- Integrations
- IAM roles are at instance level, not container level
Using Docker as a local build platform
- Different projects use different versions of build tools
- Docker can be used as a versioned build container.
- A year from now, will still have everything need to run code
Lessons
- Containers let separate what deploy from how.where deploy it
- Still the wild west on how containers are deployed
- Seek immutability in the container, not in the stack
- The competitive advantage for Gilt is to be able to deploy quickly/frequently/safely to production and therefore can innovate faster. Gilt lets engineers deploy whenever they want without asking permission.