Building a Home Lab Kubernetes Cluster with GitOps

This post covers the architecture of my home lab Kubernetes cluster built on Raspberry Pi hardware. The cluster is powered by K3s and fully managed through GitOps using ArgoCD. Every piece of infrastructure is defined in code and reconciled automatically from git.

Hardware Topology

The cluster consists of four Raspberry Pi nodes:

Control Planes (server nodes): 10.0.0.31, 10.0.0.32, 10.0.0.33
Worker Node: 10.0.0.34
Virtual IP (kube-vip): 10.0.0.30

All four nodes run workloads - the control plane nodes are not tainted and can schedule applications in addition to running the Kubernetes control plane. This provides additional capacity for running services. kube-vip provides a single virtual IP (10.0.0.30) for the API server in an HA configuration.

Provisioning with K3SUP

The cluster is provisioned using k3sup, a tool that automates K3s installation. On the first control plane, run:

k3sup install \
  --ip "10.0.0.31" \
  --user "rush" \
  --ssh-key "~/path/to/your-ssh-key.pem" \
  --cluster \
  --k3s-version "v1.34.1+k3s1" \
  --tls-san "10.0.0.30" \
  --k3s-extra-args "--disable traefik --disable servicelb --disable local-storage"

Key decisions in the installation:

K3s version: v1.34.1+k3s1 (stable channel) - Just pick a release
TLS SAN: The virtual IP (10.0.0.30) is added as a Subject Alternative Name so the API server certificate is valid for the VIP (Kube-VIP)
Disabled components: Traefik, ServiceLB (MetalLB), and local-storage are disabled because we install our own (Traefik for ingress, MetalLB for load balancing, Longhorn for storage)

After the first control plane is initialized, subsequent control planes and workers join using the k3sup join command. The exact command varies slightly depending on whether you’re joining a control plane or worker node. Here’s an example for joining a control plane:

k3sup join \
  --host "10.0.0.32" \
  --user "rush" \
  --ssh-key "~/path/to/your-ssh-key.pem" \
  --server-host "https://10.0.0.31:6443" \
  --server \
  --k3s-version "v1.34.1+k3s1"

And for worker nodes:

k3sup join \
  --host "10.0.0.34" \
  --user "rush" \
  --ssh-key "~/path/to/your-ssh-key.pem" \
  --server-host "https://10.0.0.31:6443" \
  --k3s-version "v1.34.1+k3s1"

Wait for each node to become ready before proceeding:

kubectl wait node <node-name> --for condition=ready --timeout=240s

The GitOps Bootstrap

This is the most critical part of the architecture. The cluster uses ArgoCD to manage itself, but how does ArgoCD get installed in the first place?

Step 1: Manual Helm Install

Once all nodes are joined and ready, install ArgoCD via Helm:

helm install argocd argo/argo-cd \
  -n argocd \
  --create-namespace

Wait for ArgoCD to be ready:

kubectl wait deployment --all -n argocd --for=condition=Available --timeout=300s

Step 2: Set Admin Password

ArgoCD’s default password is generated at install time. Patch the secret to set your own:

kubectl -n argocd patch secret argocd-secret \
  -p '{"stringData": {
    "admin.password": "$2a$10$YOUR_HASH_HERE"
  }}'

To generate a hash, you can use:

echo -n "your-password" | htpasswd -Bnc /dev/stdout | cut -d: -f2

Step 3: Configure ArgoCD to Watch Your Repo

You need to create a Secret with SSH credentials so ArgoCD can clone your Git repository, then create the “core” Application:

apiVersion: v1
kind: Secret
metadata:
  name: your-repo
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: repository
stringData:
  url: git@github.com:yourusername/your-repo.git
  sshPrivateKey: |
    -----BEGIN OPENSSH PRIVATE KEY-----
    your-private-key
    -----END OPENSSH PRIVATE KEY-----
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: core
  namespace: argocd
spec:
  project: default
  source:
    repoURL: 'git@github.com:yourusername/your-repo.git'
    path: path/to/your/app_of_apps
    targetRevision: HEAD
    directory:
      recurse: true
  destination:
    server: 'https://kubernetes.default.svc'
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

This Application tells ArgoCD to watch the app_of_apps/ folder in your repository. Every YAML file in that folder defines an ArgoCD Application (or any Kubernetes resource), and ArgoCD will reconcile them to the cluster.

This creates the GitOps loop: The cluster bootstraps itself, then ArgoCD takes over and ensures the cluster state always matches what’s in Git.

Core Infrastructure Components

Each infrastructure component is defined as an ArgoCD Application with a sync-wave annotation to control deployment order:

annotations:
  argocd.argoproj.io/sync-wave: "-50"

Lower (more negative) numbers deploy first. Here’s the order:

Wave	Component	Purpose
-100	namespaces	namespaces must exist before the apps
-60	Longhorn	Distributed storage
-50	MetalLB	Load balancer
-40	kube-vip	Virtual IP
-30	Traefik	Ingress controller
-20	tinyauth	Auth Middleware
10+	Applications	Non-Core Apps like this blog

kube-vip

kube-vip provides a virtual IP for the Kubernetes API server, enabling HA control planes. It’s deployed as a DaemonSet that runs on every node:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kube-vip
  annotations:
    argocd.argoproj.io/sync-wave: "-39"
spec:
  chart: kube-vip
  repoURL: https://kube-vip.github.io/helm-charts
  helm:
    valuesObject:
      config:
        address: "10.0.0.30"

The VIP (10.0.0.30) is used as the endpoint for kubectl and all cluster communication.

MetalLB

MetalLB provides load balancing on bare metal (replacing the K3s built-in ServiceLB). It uses L2 mode (ARP) to announce the VIP:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: metallb-pool
  namespace: metallb
spec:
  addresses:
  - 10.0.0.60-10.0.0.62
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: metallb-l2
  namespace: metallb

The address pool (10.0.0.60-10.0.0.62) is used for Kubernetes Services of type LoadBalancer.

Traefik

Traefik replaces the default K3s Traefik installation. It’s deployed via Helm with:

Let’s Encrypt certificate resolver for automatic TLS
HTTP to HTTPS redirect
Custom buffer sizes for streaming workloads

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: dashboard
  namespace: traefik
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`traefik.example.com`)
      kind: Rule
      middlewares:
        - name: tinyauth
          namespace: tinyauth
      services:
        - name: api@internal
          kind: TraefikService
  tls:
    certResolver: letsencrypt

Longhorn

Longhorn provides distributed block storage, essential for persistent workloads on a Pi cluster:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: longhorn
  annotations:
    argocd.argoproj.io/sync-wave: "-59"
spec:
  chart: longhorn
  repoURL: https://charts.longhorn.io
  helm:
    valuesObject:
      preUpgradeChecker:
        jobEnabled: false

Longhorn requires iSCSI, which must be installed on each node before deploying Longhorn:

# Run on each node
ssh user@node 'sudo apt-get update && sudo apt-get -y install open-iscsi'
ssh user@node 'sudo systemctl enable iscsid open-iscsi'
ssh user@node 'sudo systemctl restart iscsid open-iscsi'

tinyauth

tinyauth provides HTTP Basic authentication middleware for Traefik. It’s used to protect internal services:

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: tinyauth
  namespace: tinyauth
spec:
  forwardAuth:
    address: https://auth.yourdomain.com/api/auth/traefik

Services that need authentication include the tinyauth middleware in their IngressRoute:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: app-ingress
  namespace: app
  annotations:
    argocd.argoproj.io/sync-wave: "6"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`app.yourdomain.com`)
      kind: Rule
      services:
        - name: app
          port: 1234
      middlewares:
        - name: tinyauth
          namespace: tinyauth
  tls:
    certResolver: letsencrypt

Reconciliation Flow

The GitOps reconciliation works as follows:

Push to Git: A change is made to any file in your app_of_apps/ folder
ArgoCD detects: ArgoCD watches the repo and sees the change
Sync: ArgoCD applies the change to the cluster
Self-heal: If someone manually changes a resource that ArgoCD manages, ArgoCD will revert it (if selfHeal: true is set)

The app_of_apps/ folder uses a recursive directory structure - each subdirectory contains an app.yaml that defines an ArgoCD Application. This allows adding new infrastructure components simply by creating a new folder with an application manifest.

Destroy and Rebuild

To destroy the cluster, run the following on each node:

# On control plane nodes
ssh user@node '/usr/local/bin/k3s-uninstall.sh'

# On worker nodes
ssh user@node '/usr/local/bin/k3s-agent-uninstall.sh'

This removes K3s from each node while leaving the operating system intact. Delete your local kubeconfig file as well.

To rebuild, run the k3sup commands again, then re-install ArgoCD and apply your core Application. Since ArgoCD is the source of truth, it will restore all infrastructure and applications to their desired state.

Summary

This architecture provides:

High availability: 3 control planes with kube-vip
GitOps: Every piece of infrastructure is defined in Git, ArgoCD reconciles automatically
Self-healing: Manual changes are reverted, cluster state always matches Git
Repeatable: Destroy and rebuild in minutes
Bare-metal capable: MetalLB for load balancing, Longhorn for storage

The cluster has been running reliably with this setup for years, and adding new services is as simple as adding a new YAML file to the repository.