Kubernetes in LXC on Linode

Posted on Jan 28, 2019 | 826 words | ~4mins

software

I’ve finished a migration for all of my personal infrastructure to a 3 Linode HA Kubernetes cluster running:

Web Server
- Blog (Hugo)
JupyterHub
kube-flow (Tensorflow abstraction)
Nextcloud
Gitea

Why?

First off, why Kubernetes?

I’ve know about Kubernetes for years but held off on deploying it due to it’s complexity (which is about to be explained below).

Kubernetes is becoming more and more ubiquitus. It’s taking over on every front from the web application space, IoT, serverless, and data science.

The biggest reason I eventually decided to deploy it for myself is for the machine learning and data science that I wanted to get into. I went straight for Jupyter and found that running workloads with Kubernetes is pretty well integrated and I wanted to get hands on with it.

Some bonuses is that for other services I run will also now be able to become HA if done correctly.

A couple of big misconceptions that I want to address are:

You get HA for free. In reality it just provides a framework for achieving HA and there is some work to get there.
Stateful services are a bad idea in Kubernetes. This is also false. It’s like saying that running stateful service with systemd is a bad idea. One way to look at Kubernetes is a distributed init system for containers. Kubernetes provides some good abstractions for dealing with persistence like StatefulSets. You can always get to your data even after the containers are stopped. It was Docker that made the idea of state in containers look bad, but that issue has mostly been resolved.

How?

Kubernetes is quite the beast and I wanted it all to be HA. I also didn’t want to pay a lot of money and $15.00 per month isn’t terrible. HA on Linode is pretty important because from time to time they require reboots. This happens when they increase the amount of available resources in your plan or when they need to install security patches (Meltdown/Spectre). Ideally, with Kubernetes if a Linode goes down there are 2 other nodes to reschedule the work, but I have yet to prove that this works in practice.

To reach public HTTP services I used the K8 Ingress rules to expose these services using clean URL’s and also added each Linode’s IP to the same A record (dyll.in). You can see all of these IP’s by running:

$ host -t a dyll.in

Web Browsers will generally try each IP address so if one IP is down it will try a different one. Another method could also be for the DNS servers to shuffle the IP’s after each DNS query to achieve some sort of psuedo load balancing. This would only be useful if the client didn’t know how to try each IP, but there’s also the issue of DNS caching so it might be all for naught.

For Kubernetes itself I was able to achieve HA by running a Worker and Master on the same Linode, but separated them using unprivileged LXC containers under different Linux users.

On top of that, I also created SELinux contexts for each user so that if a Worker node breaks out there is another layer of security to, hopefully, prevent that user from accessing anything in the Master container (and vise-versa).

The only way to operate both the Worker and Master nodes is to have access to the K8 API with Kubectl or to have a privileged user on the Linode, which right now is just me.

As far as networking goes this had to be thought out pretty meticulously. There are pods that need to reach each other there are many layers to go through. Each Linode has an IP, each LXC container has an IP and each pod has an IP. This means we’ll need a lot of gateways and route configuring to get things like MySQL replication to work.

All of this is fully automated with Ansible and you can find the code here. It does these things:

Spins up the Linode’s
Configures a base role on each Linode: a. k8-worker user b. k8-master user c. Debugging tools d. Installs: SELinux, snappy
Installs LXD using snappy
Sets up LXC containers under the correct users
Connects to the containers and installs kubelet/docker
Sets up routing/networking

At this point we have a working Kubernetes cluster and Ansible will configure the cluster and install the given packages into the cluster.

Because it’s all automated and because I’m able to do replication between nodes; I found it arbitrary to setup backups! This just means I get to save a little bit of money. For archiving and management, I have a home server that can reach the cluster through a Wireguard VPN.

2024: Update

I’ve killed off pretty much all of this. For a one person to run all of this while having a family and full time job simply wasn’t worth it. I’m exploring replacing with Proxmox and will follow-up.