Cluster

Your cluster, on one page.

/cluster shows everything that matters about your k3s in one scroll — health, problems, events, nodes, pods, logs — and lets you click "Investigate →" on anything that's red to spawn a Claude session pre-seeded with the row's context.

Health banner

Four tiles summarising nodes Ready, failing pods, pending pods, and warnings in the last 30 min. Green / amber / red so you spot trouble without reading.

Problems pane

Auto-collapses when the cluster is clean. When it isn't, lists every failing or pending pod with the reason — pulled from pod.reason or the most recent matching Warning event.

Events feed

Collapsible chronological feed with a Warning-count badge. Refreshes on mount so the badge stays accurate even when the panel's collapsed.

Node cards

One card per node. Status dot, age, cordoned badge, custom labels as chips (system labels filtered out). Expand to see OS / kernel / kubelet / runtime plus the full pod table — image, namespace, status, restarts, CPU%, mem.

Pod logs side-panel

Click the hamburger on a pod row, a drawer slides in from the right. Snapshot mode (tail --500) plus optional 2 s polling for follow mode. Case-insensitive filter, copy, download.

Cluster assistant chat dock

Resizable bottom panel. Rolling chat session by default — survives between visits. kubectl_get, kubectl_logs, kubectl_events as read tools; cordon / drain / scale / restart / delete-pod as write tools, each gated by an Allow / Deny approval prompt.

Investigate → in chat

One click. Focused session. Pre-seeded with the problem.

Every problem row and Warning event has an Investigate → button. Clicking it spawns a fresh Claude session with that row's context already loaded into the system prompt — pod name, namespace, reason, the lot. The agent starts already knowing what it's looking at.

ChannelDesk Cluster — Investigate flow with the chat dock open, Claude reading kubectl_logs and kubectl_events for an unhealthy pod

/cluster · investigate worker-02 disk pressure

Pod channeldesk-orchestrator-7f9 is CrashLoopBackOff on worker-02. Why?

Read kubectl_logs + kubectl_events. /var/log on worker-02 is at 91 % — Postgres can't checkpoint, orchestrator can't reach it, kubelet kills the pod. Fix: cordon worker-02, drain to worker-01, vacuum journald. Cordon worker-02 now?

cluster_cordon · awaiting your Allow / Deny

Default rolling session lives in the dock between visits. Investigate spawns a side-by-side focused session marked with a 🔍 icon and the pod / event label, so you can compare what the rolling chat thinks against a fresh take.

Security posture

22 checks. Five categories. One score.

/cluster/posture runs an on-demand security audit against your master and every worker. It renders an overall score, per-category tiles, and the full check list with evidence + a one-line remediation hint for every failure.

ChannelDesk Posture page — overall score banner, five category score tiles, and a check table with severity chips, target host, pass/fail status and remediation hints

Host hardening

SSH config (root login, password auth, pubkey auth), failed-login rate, reboot-required, pending security updates, unexpected listen ports — run against master and every worker.

Kubernetes

Default-deny NetworkPolicies, namespaces without policies, privileged pods, hostPath mounts, missing PodSecurity baseline, image-tag immutability.

Exposure

LoadBalancer / NodePort surface, Ingresses without TLS, Cloudflare-tunnel health, what's actually reachable from the internet.

Certs & secrets

Cert expiry windows, kubeconfig age, ServiceAccount tokens lying around, kubelet authentication mode.

Backup & recovery

etcd snapshot age, Postgres backup freshness, Flux reconciliation failures. The boring stuff that becomes the only stuff at 03:00.

Failure → Investigate

Same Investigate → pattern: every failed check has a button that spawns a Claude session seeded with the finding, the evidence, and the remediation hint.

Auto-remediation, with guards

Fix it. With your approval.

Most posture tools stop at "your sshd config is wrong". ChannelDesk has three armed MCP fixers that will actually edit the file, run the upgrade, or reboot the host — but only after you click Allow in the chat, and only when the per-tool safety guard is satisfied.

● posture_fix_sshd_directive — edit a single sshd directive. Master gets an extra orchestrator-auth-safety guard so we never lock ourselves out.
● posture_apt_upgrade_security — apply security upgrades only, no full dist-upgrade roulette.
● posture_reboot_host — etcd-snapshot freshness preflight + 15-min master delay floor so a multi-node reboot can't take the cluster with it.

Every write tool sets X-Cluster-Source: agent so its audit row is distinguishable from a human action. Every write requires explicit user Allow in the chat — the agent cannot act unilaterally.

ChannelDesk Cluster audit log — chronological list of cluster actions with source (human / agent) and approval state

ChannelDesk PWA on mobile — Recent Events with an Investigate flow opening the Cluster assistant chat dock, where Claude is analysing an unhealthy pod

Mobile-first

Approve from the kitchen.

The cluster page, the chat dock, Investigate, and the Allow / Deny prompt all ship as Progressive Web App. Add it to your iOS or Android home screen and the service worker keeps the shell warm. A pod goes red on Sunday morning and you triage it with one thumb.

Stop reading dashboards. Talk to your cluster.

ChannelDesk's cluster page is one click from a focused Claude session that already knows what's broken — and one Allow away from fixing it.

Deploy in 10 minutes Read the source on GitHub