Daemon sets
- Introduction
- Daemon sets in practice
- Creating a daemon set
- Creating the YAML file for our daemon set
- "Casting" a resource to another
- Understanding the problem
- Use the
--force
, Luke - Checking what we've done
deploy/rng
andds/rng
- Too many pods
- Is this working?
- Selector evaluation
- Where do labels come from?
- Updating load balancer configuration
- Selectors for replica sets and daemon sets
- Isolation of replica sets and daemon sets
- Removing a pod from the load balancer
- Complex selectors
- The plan
- Adding labels to pods
- Updating the service selector
- When the YAML parser is being too smart
- Updating the service selector, take 2
- Updating labels
- Removing a pod from the load balancer
- Updating the daemon set
- We've put resources in your resources
- Labels and debugging
- Labels and advanced rollout control
Introduction
We want to scale
rng
in a way that is different from how we scaledworker
We want one (and exactly one) instance of
rng
per nodeWhat if we just scale up
deploy/rng
to the number of nodes?nothing guarantees that the
rng
containers will be distributed evenlyif we add nodes later, they will not automatically run a copy of
rng
if we remove (or reboot) a node, one
rng
container will restart elsewhere
Instead of a
deployment
, we will use adaemonset
Daemon sets in practice
Daemon sets are great for cluster-wide, per-node processes:
kube-proxy
weave
(our overlay network)monitoring agents
hardware management tools (e.g. SCSI/FC HBA agents)
etc.
They can also be restricted to run only on some nodes
Creating a daemon set
- Unfortunately, as of Kubernetes 1.14, the CLI cannot create daemon sets
- More precisely: it doesn't have a subcommand to create a daemon set
But any kind of resource can always be created by providing a YAML description:
kubectl apply -f foo.yaml
How do we create the YAML file for our daemon set?
option 1: read the docs
option 2:
vi
our way out of it
Creating the YAML file for our daemon set
- Let's start with the YAML file for the current
rng
resource
Exercise
Dump the
rng
resource in YAML:kubectl get deploy/rng -o yaml --export >rng.yml
Edit
rng.yml
Note: --export
will remove "cluster-specific" information, i.e.:
- namespace (so that the resource is not tied to a specific namespace)
- status and creation timestamp (useless when creating a new resource)
- resourceVersion and uid (these would cause... interesting problems)
"Casting" a resource to another
What if we just changed the
kind
field?(It can't be that easy, right?)
Exercise
- Change
kind: Deployment
tokind: DaemonSet
Save, quit
Try to create our new resource:
kubectl apply -f rng.yml
We all knew this couldn't be that easy, right!
Understanding the problem
The core of the error is:
error validating data: [ValidationError(DaemonSet.spec): unknown field "replicas" in io.k8s.api.extensions.v1beta1.DaemonSetSpec, ...
Obviously, it doesn't make sense to specify a number of replicas for a daemon set
Workaround: fix the YAML
- remove the
replicas
field - remove the
strategy
field (which defines the rollout mechanism for a deployment) - remove the
progressDeadlineSeconds
field (also used by the rollout mechanism) - remove the
status: {}
line at the end
- remove the
Or, we could also ...
Use the --force
, Luke
We could also tell Kubernetes to ignore these errors and try anyway
The
--force
flag's actual name is--validate=false
Exercise
Try to load our YAML file and ignore errors:
kubectl apply -f rng.yml --validate=false
🎩✨🐇
Wait ... Now, can it be that easy?
Checking what we've done
- Did we transform our
deployment
into adaemonset
?
Exercise
Look at the resources that we have now:
kubectl get all
We have two resources called rng
:
the deployment that was existing before
the daemon set that we just created
We also have one too many pods.
(The pod corresponding to the deployment still exists.)
deploy/rng
and ds/rng
You can have different resource types with the same name
(i.e. a deployment and a daemon set both named
rng
)We still have the old
rng
deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/rng 1 1 1 1 18m
- But now we have the new
rng
daemon set as well
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/rng 2 2 2 2 2 <none> 9s
Too many pods
If we check with
kubectl get pods
, we see:one pod for the deployment (named
rng-xxxxxxxxxx-yyyyy
)one pod per node for the daemon set (named
rng-zzzzz
)
NAME READY STATUS RESTARTS AGE
rng-54f57d4d49-7pt82 1/1 Running 0 11m
rng-b85tm 1/1 Running 0 25s
rng-hfbrr 1/1 Running 0 25s
[...]
The daemon set created one pod per node, except on the master node.
The master node has taints preventing pods from running there.
(To schedule a pod on this node anyway, the pod will require appropriate tolerations.)
(Off by one? We don't run these pods on the node hosting the control plane.)
Is this working?
Look at the web UI
The graph should now go above 10 hashes per second!
It looks like the newly created pods are serving traffic correctly
How and why did this happen?
(We didn't do anything special to add them to the
rng
service load balancer!)
Labels and selectors
The
rng
service is load balancing requests to a set of podsThat set of pods is defined by the selector of the
rng
service
Exercise
Check the selector in the
rng
service definition:kubectl describe service rng
The selector is
app=rng
It means "all the pods having the label
app=rng
"(They can have additional labels as well, that's OK!)
Selector evaluation
We can use selectors with many
kubectl
commandsFor instance, with
kubectl get
,kubectl logs
,kubectl delete
... and more
Exercise
Get the list of pods matching selector
app=rng
:kubectl get pods -l app=rng kubectl get pods --selector app=rng
But ... why do these pods (in particular, the new ones) have this app=rng
label?
Where do labels come from?
When we create a deployment with
kubectl create deployment rng
,
this deployment gets the labelapp=rng
The replica sets created by this deployment also get the label
app=rng
The pods created by these replica sets also get the label
app=rng
When we created the daemon set from the deployment, we re-used the same spec
Therefore, the pods created by the daemon set get the same labels
Note: when we use kubectl run stuff
, the label is run=stuff
instead.
Updating load balancer configuration
We would like to remove a pod from the load balancer
What would happen if we removed that pod, with
kubectl delete pod ...
?It would be re-created immediately (by the replica set or the daemon set)
What would happen if we removed the
app=rng
label from that pod?It would also be re-created immediately
Why?!?
Selectors for replica sets and daemon sets
The "mission" of a replica set is:
"Make sure that there is the right number of pods matching this spec!"
The "mission" of a daemon set is:
"Make sure that there is a pod matching this spec on each node!"
In fact, replica sets and daemon sets do not check pod specifications
They merely have a selector, and they look for pods matching that selector
Yes, we can fool them by manually creating pods with the "right" labels
Bottom line: if we remove our
app=rng
label ...... The pod "diseappears" for its parent, which re-creates another pod to replace it
Isolation of replica sets and daemon sets
Since both the
rng
daemon set and therng
replica set useapp=rng
...... Why don't they "find" each other's pods?
Replica sets have a more specific selector, visible with
kubectl describe
(It looks like
app=rng,pod-template-hash=abcd1234
)Daemon sets also have a more specific selector, but it's invisible
(It looks like
app=rng,controller-revision-hash=abcd1234
)As a result, each controller only "sees" the pods it manages
Removing a pod from the load balancer
Currently, the
rng
service is defined by theapp=rng
selectorThe only way to remove a pod is to remove or change the
app
label... But that will cause another pod to be created instead!
What's the solution?
We need to change the selector of the
rng
service!Let's add another label to that selector (e.g.
enabled=yes
)
Complex selectors
If a selector specifies multiple labels, they are understood as a logical AND
(In other words: the pods must match all the labels)
Kubernetes has support for advanced, set-based selectors
(But these cannot be used with services, at least not yet!)
The plan
Add the label
enabled=yes
to all ourrng
podsUpdate the selector for the
rng
service to also includeenabled=yes
Toggle traffic to a pod by manually adding/removing the
enabled
labelProfit!
Note: if we swap steps 1 and 2, it will cause a short service disruption, because there will be a period of time during which the service selector won't match any pod. During that time, requests to the service will time out. By doing things in the order above, we guarantee that there won't be any interruption.
Adding labels to pods
We want to add the label
enabled=yes
to all pods that haveapp=rng
We could edit each pod one by one with
kubectl edit
...... Or we could use
kubectl label
to label them allkubectl label
can use selectors itself
Exercise
Add
enabled=yes
to all pods that haveapp=rng
:kubectl label pods -l app=rng enabled=yes
Updating the service selector
We need to edit the service specification
Reminder: in the service definition, we will see
app: rng
in two placesthe label of the service itself (we don't need to touch that one)
the selector of the service (that's the one we want to change)
Exercise
Update the service to add
enabled: yes
to its selector:kubectl edit service rng
... And then we get the weirdest error ever. Why?
When the YAML parser is being too smart
YAML parsers try to help us:
xyz
is the string"xyz"
42
is the integer42
yes
is the boolean valuetrue
If we want the string
"42"
or the string"yes"
, we have to quote themSo we have to use
enabled: "yes"
For a good laugh: if we had used "ja", "oui", "si" ... as the value, it would have worked!
Updating the service selector, take 2
Exercise
Update the service to add
enabled: "yes"
to its selector:kubectl edit service rng
This time it should work!
If we did everything correctly, the web UI shouldn't show any change.
Updating labels
We want to disable the pod that was created by the deployment
All we have to do, is remove the
enabled
label from that podTo identify that pod, we can use its name
... Or rely on the fact that it's the only one with a
pod-template-hash
labelGood to know:
kubectl label ... foo=
doesn't remove a label (it sets it to an empty string)to remove label
foo
, usekubectl label ... foo-
to change an existing label, we would need to add
--overwrite
Removing a pod from the load balancer
Exercise
In one window, check the logs of that pod:
POD=$(kubectl get pod -l app=rng,pod-template-hash -o name) kubectl logs --tail 1 --follow $POD
(We should see a steady stream of HTTP logs)
In another window, remove the label from the pod:
kubectl label pod -l app=rng,pod-template-hash enabled-
(The stream of HTTP logs should stop immediately)
There might be a slight change in the web UI (since we removed a bit
of capacity from the rng
service). If we remove more pods,
the effect should be more visible.
Updating the daemon set
If we scale up our cluster by adding new nodes, the daemon set will create more pods
These pods won't have the
enabled=yes
labelIf we want these pods to have that label, we need to edit the daemon set spec
We can do that with e.g.
kubectl edit daemonset rng
We've put resources in your resources
Reminder: a daemon set is a resource that creates more resources!
There is a difference between:
the label(s) of a resource (in the
metadata
block in the beginning)the selector of a resource (in the
spec
block)the label(s) of the resource(s) created by the first resource (in the
template
block)
We would need to update the selector and the template
(metadata labels are not mandatory)
The template must match the selector
(i.e. the resource will refuse to create resources that it will not select)
Labels and debugging
When a pod is misbehaving, we can delete it: another one will be recreated
But we can also change its labels
It will be removed from the load balancer (it won't receive traffic anymore)
Another pod will be recreated immediately
But the problematic pod is still here, and we can inspect and debug it
We can even re-add it to the rotation if necessary
(Very useful to troubleshoot intermittent and elusive bugs)
Labels and advanced rollout control
Conversely, we can add pods matching a service's selector
These pods will then receive requests and serve traffic
Examples:
one-shot pod with all debug flags enabled, to collect logs
pods created automatically, but added to rotation in a second step
(by setting their label accordingly)
This gives us building blocks for canary and blue/green deployments