I audit Kubernetes so you don't have to guess.

Infrastructure · Kubernetes · Docker

Most of my clients come through referrals. If someone sent you here, they probably gave you the short version: I find what’s broken, risky, or silently costing you money in your container stack — and I tell you exactly what to do about it.

I work on production clusters, not demo environments. If you’ve read any of my posts, you have a reasonable idea of what to expect.

Kubernetes Cluster Audit

What’s actually wrong with your cluster — and what to do about it.

1–2 weeks · Fixed price

A focused diagnostic engagement. I go through your cluster configuration, workload definitions, RBAC, networking, resource requests/limits, HPA setup, and storage. You get a written report with a prioritized list of findings — specific manifests, specific patterns, specific recommendations.

What I usually find: Pods with requests: {} and limits set generously “just to be safe” — running as BestEffort, first to get evicted under memory pressure. A ClusterRole with verbs: ["*"] bound to the default Service Account because something needed it once and nobody cleaned it up. A Helm release that last deployed successfully seven months ago but helm status says deployed. A liveness probe with a timeout shorter than the application’s cold start — the cluster keeps restarting Pods before they can come up, and nobody knows why things “fix themselves” every so often.

What I look at:

Node and Pod resource sizing (QoS classes)
RBAC and Service Account permissions
Network policies, ingress, and Gateway API
PodDisruptionBudgets, liveness/readiness/startup probes
etcd backup and DR posture
Helm chart hygiene and upgrade paths
GitOps pipeline health (ArgoCD/Flux)
Observability coverage
Security context, PSA, admission controllers

This is for you if: Something in the cluster is off but you can’t pin it down — or you know what it is but need an outside engineer to document it and give you ammunition for the conversation with leadership.

Container & Docker Audit

Your images, registry, compose stacks — under a microscope.

3–5 days · Fixed price

I review your Dockerfiles, base images, build pipelines, and runtime configuration. You get a clear picture of image bloat, supply chain risks, secrets leaking into layers, and CI/CD gaps.

What I usually find: A base image pinned to ubuntu:20.04 from 2021 with CVEs nobody sees because the scanner is configured not to block the pipeline. COPY . . as the first instruction in the Dockerfile — cache is useless, every build re-downloads all dependencies from scratch. A registry token hardcoded in an environment variable in docker-compose.yml, committed alongside the rest of the code. A production image weighing 2.3 GB because nobody did a multi-stage build and the full SDK plus dev tooling is still in there.

What I look at:

Multi-stage builds and layer caching
Base image CVE exposure
Secrets in image layers or env vars
Registry setup and image signing
Compose production readiness
Build pipeline efficiency

This is for you if: You’re running containers in production but your process grew organically and nobody has ever looked at it critically.

CI/CD & GitOps Pipeline Review

From commit to production — every step that can fail, will fail.

1 week · Fixed price

I map your entire delivery pipeline and find the gaps: missing rollback mechanisms, no deploy locking, secrets in CI variables, staging environments that don’t mirror production.

What I usually find: A staging environment nobody has deployed to in three weeks — and nobody knows because there’s no alert for a stale environment. Rollback defined as “push a hotfix and hope”. Production secrets in CI variables accessible to all branches and all project members. A pipeline job that does ssh prod-server "git pull && docker-compose up -d" because it was faster to set up that way at the start.

What I look at:

GitLab CI / GitHub Actions structure
Helm chart values hygiene
Deployment strategies
Secret management (Vault, ESO)
Rollback and incident procedures
Environment parity (dev/staging/prod)

This is for you if: Deployments are stressful events. Something breaks every few weeks and nobody is quite sure why.

Stack

Kubernetes Docker Helm ArgoCD Flux GitLab CI Terraform OpenTofu Vault Prometheus Grafana Elasticsearch Traefik cert-manager Harbor EKS

Get in touch

Tell me about your stack and what’s keeping you up at night. I’ll get back to you within a day.