## Introduction

• By default (without rolling updates), when a scaled resource is updated:

• new pods are created

• old pods are terminated

• ... all at the same time

• if something goes wrong, ¯\_(ツ)_/¯

• With rolling updates, when a resource is updated, it happens progressively

• Two parameters determine the pace of the rollout: maxUnavailable and maxSurge

• They can be specified in absolute number of pods, or percentage of the replicas count

• At any given time ...

• there will always be at least replicas-maxUnavailable pods available

• there will never be more than replicas+maxSurge pods in total

• there will therefore be up to maxUnavailable+maxSurge pods being updated

• We have the possibility to rollback to the previous version
(if the update fails or is unsatisfactory in any way)

## Checking current rollout parameters

• Recall how we build custom reports with kubectl and jq:

### Exercise

• Show the rollout plan for our deployments:

  kubectl get deploy -o json |
jq ".items[] | {name:.metadata.name} + .spec.strategy.rollingUpdate"


• As of Kubernetes 1.8, we can do rolling updates with:

deployments, daemonsets, statefulsets

• Editing one of these resources will automatically result in a rolling update

• Rolling updates can be monitored with the kubectl rollout subcommand

## Building a new version of the worker service

### Exercise

• Go to the stack directory:

cd ~/container.training/stacks

• Edit dockercoins/worker/worker.py; update the first sleep line to sleep 1 second

• Build a new tag and push it to the registry:

#export REGISTRY=localhost:3xxxx
export TAG=v0.2
docker-compose -f dockercoins.yml build
docker-compose -f dockercoins.yml push


## Rolling out the new worker service

### Exercise

• Let's monitor what's going on by opening a few terminals, and run:

kubectl get pods -w
kubectl get replicasets -w
kubectl get deployments -w

• Update worker either with kubectl edit, or by running:

kubectl set image deploy worker worker=$REGISTRY/worker:$TAG


That rollout should be pretty quick. What shows in the web UI?

## Give it some time

• At first, it looks like nothing is happening (the graph remains at the same level)

• According to kubectl get deploy -w, the deployment was updated really quickly

• But kubectl get pods -w tells a different story

• The old pods are still here, and they stay in Terminating state for a while

• Eventually, they are terminated; and then the graph decreases significantly

• This delay is due to the fact that our worker doesn't handle signals

• Kubernetes sends a "polite" shutdown request to the worker, which ignores it

• After a grace period, Kubernetes gets impatient and kills the container

(The grace period is 30 seconds, but can be changed if needed)

## Rolling out something invalid

• What happens if we make a mistake?

### Exercise

• Update worker by specifying a non-existent image:

export TAG=v0.3
kubectl set image deploy worker worker=$REGISTRY/worker:$TAG

• Check what's going on:

kubectl rollout status deploy worker


Our rollout is stuck. However, the app is not dead.

(After a minute, it will stabilize to be 20-25% slower.)

## What's going on with our rollout?

• Why is our app a bit slower?

• Because MaxUnavailable=25%

... So the rollout terminated 2 replicas out of 10 available

• Okay, but why do we see 5 new replicas being rolled out?

• Because MaxSurge=25%

... So in addition to replacing 2 replicas, the rollout is also starting 3 more

• It rounded down the number of MaxUnavailable pods conservatively,
but the total number of pods being rolled out is allowed to be 25+25=50%

## The nitty-gritty details

• We start with 10 pods running for the worker deployment

• Current settings: MaxUnavailable=25% and MaxSurge=25%

• When we start the rollout:

• two replicas are taken down (as per MaxUnavailable=25%)
• two others are created (with the new version) to replace them
• three others are created (with the new version) per MaxSurge=25%)
• Now we have 8 replicas up and running, and 5 being deployed

• Our rollout is stuck at this point!

## Checking the dashboard during the bad rollout

If you haven't deployed the Kubernetes dashboard earlier, just skip this slide.

### Exercise

• Check which port the dashboard is on:

kubectl -n kube-system get svc socat


Note the 3xxxx port.

### Exercise

• We have failures in Deployments, Pods, and Replica Sets

## Recovering from a bad rollout

• We could push some v0.3 image

(the pod retry logic will eventually catch it and the rollout will proceed)

• Or we could invoke a manual rollback

### Exercise

• Cancel the deployment and wait for the dust to settle down:

kubectl rollout undo deploy worker
kubectl rollout status deploy worker


## Changing rollout parameters

• We want to:

• revert to v0.1
• be conservative on availability (always have desired number of available workers)
• go slow on rollout speed (update only one pod at a time)
• give some time to our workers to "warm up" before starting more

The corresponding changes can be expressed in the following YAML snippet:

spec:
template:
spec:
containers:
- name: worker
image: $REGISTRY/worker:v0.1 strategy: rollingUpdate: maxUnavailable: 0 maxSurge: 1 minReadySeconds: 10  ## Applying changes through a YAML patch • We could use kubectl edit deployment worker • But we could also use kubectl patch with the exact YAML shown before ### Exercise • Apply all our changes and wait for them to take effect: kubectl patch deployment worker -p " spec: template: spec: containers: - name: worker image:$REGISTRY/worker:v0.1
strategy:
rollingUpdate:
maxUnavailable: 0
maxSurge: 1