Troubleshooting overlay networks
- Troubleshooting overlay networks
- Finding the real cause of the bottleneck
- Breaking into an overlay network
- Entering the debug container
- Labels
- Installing our debugging tools
- Investigating the
rng
service - Investigating the VIP
- What if I don't like VIPs?
- Looking up VIP backends
- Testing and benchmarking our service
- Benchmarking
rng
- Benchmark results for
rng
- Benchmarking
hasher
- Benchmarking
hasher
- Benchmark results for
hasher
- Why does everything take (at least) 100ms?
- Why did we sprinkle the code with sleeps?
- Why do
rng
andhasher
behave differently? - Global scheduling → global debugging
- More about overlay networks
Troubleshooting overlay networks
- We want to run tools like
ab
orhttping
on the internal network
- Ah, if only we had created our overlay network with the
--attachable
flag ...
- Oh well, let's use this as an excuse to introduce New Ways To Do Things
Breaking into an overlay network
We will create a dummy placeholder service on our network
Then we will use
docker exec
to run more processes in this container
- Start a "do nothing" container using our favorite Swiss-Army distro:
docker service create --network dockercoins_default --name debug \ --constraint node.hostname==$HOSTNAME alpine sleep 1000000000
The constraint
makes sure that the container will be created on the local node.
Entering the debug container
- Once our container is started (which should be really fast because the alpine image is small), we can enter it (from any node)
Locate the container:
docker ps
Enter it:
docker exec -ti containerID sh
Labels
We can also be fancy and find the ID of the container automatically
SwarmKit places labels on containers
Get the ID of the container:
CID=$(docker ps -q --filter label=com.docker.swarm.service.name=debug)
And enter the container:
docker exec -ti $CID sh
Installing our debugging tools
Ideally, you would author your own image, with all your favorite tools, and use it instead of the base
alpine
imageBut we can also dynamically install whatever we need
- Install a few tools:
apk add --update curl apache2-utils drill
Investigating the rng
service
- First, let's check what
rng
resolves to
- Use drill or nslookup to resolve
rng
:drill rng
This give us one IP address. It is not the IP address of a container.
It is a virtual IP address (VIP) for the rng
service.
Investigating the VIP
- Try to ping the VIP:
ping -c 3 rng
It should ping. (But this might change in the future.)
With Engine 1.12: VIPs respond to ping if a backend is available on the same machine.
With Engine 1.13: VIPs respond to ping if a backend is available anywhere.
(Again: this might change in the future.)
What if I don't like VIPs?
Services can be published using two modes: VIP and DNSRR.
With VIP, you get a virtual IP for the service, and a load balancer based on IPVS
(By the way, IPVS is totally awesome and if you want to learn more about it in the context of containers, I highly recommend this talk by @kobolog at DC15EU!)
With DNSRR, you get the former behavior (from Engine 1.11), where resolving the service yields the IP addresses of all the containers for this service
You change this with
docker service create --endpoint-mode [VIP|DNSRR]
Looking up VIP backends
You can also resolve a special name:
tasks.<name>
It will give you the IP addresses of the containers for a given service
- Obtain the IP addresses of the containers for the
rng
service:drill tasks.rng
This should list 5 IP addresses.
Testing and benchmarking our service
- We will check that the service is up with
rng
, then benchmark it withab
Make a test request to the service:
curl rng
Open another window, and stop the workers, to test in isolation:
docker service update dockercoins_worker --replicas 0
Wait until the workers are stopped (check with docker service ls
)
before continuing.
Benchmarking rng
We will send 50 requests, but with various levels of concurrency.
Send 50 requests, with a single sequential client:
ab -c 1 -n 50 http://rng/10
Send 50 requests, with fifty parallel clients:
ab -c 50 -n 50 http://rng/10
Benchmark results for rng
When serving requests sequentially, they each take 100ms
In the parallel scenario, the latency increased dramatically:
What about
hasher
?
Benchmarking hasher
We will do the same tests for hasher
.
The command is slightly more complex, since we need to post random data.
First, we need to put the POST payload in a temporary file.
- Generate 10 bytes of random data:
curl http://rng/10 >/tmp/random
Benchmarking hasher
Once again, we will send 50 requests, with different levels of concurrency.
Send 50 requests with a sequential client:
ab -c 1 -n 50 -T application/octet-stream -p /tmp/random http://hasher/
Send 50 requests with 50 parallel clients:
ab -c 50 -n 50 -T application/octet-stream -p /tmp/random http://hasher/
Benchmark results for hasher
The sequential benchmarks takes ~5 seconds to complete
The parallel benchmark takes less than 1 second to complete
In both cases, each request takes a bit more than 100ms to complete
Requests are a bit slower in the parallel benchmark
It looks like
hasher
is better equipped to deal with concurrency thanrng
Why?
Why does everything take (at least) 100ms?
rng
code:
hasher
code:
But ...
WHY?!?
Why did we sprinkle the code with sleeps?
Deterministic performance
(regardless of instance speed, CPUs, I/O...)Actual code sleeps all the time anyway
When your code makes a remote API call:
it sends a request;
it sleeps until it gets the response;
it processes the response.
Why do rng
and hasher
behave differently?
(Synchronous vs. asynchronous event processing)
Global scheduling → global debugging
Traditional approach:
- log into a node
- install our Swiss Army Knife (if necessary)
- troubleshoot things
Proposed alternative:
- put our Swiss Army Knife in a container (e.g. nicolaka/netshoot)
- run tests from multiple locations at the same time
(This becomes very practical with the docker service log
command, available since 17.05.)
More about overlay networks
.blackbelt[DC17US: Deep Dive in Docker Overlay Networks (video)]
.blackbelt[DC17EU: Deeper Dive in Docker Overlay Networks (video)]