Cluster Readiness¶
./bin/mcp-runtime setup installs the platform (registry, operator, ingress, sentinel) into an already-running Kubernetes cluster. It does not configure the node's container runtime or host DNS stack. Those are prerequisites that differ per distribution.
If you skip this, you'll typically see:
./bin/mcp-runtime setupfails at "Publish runtime images" withdial tcp: lookup registry.local: no such host.- The operator pod goes to
ImagePullBackOffwith10.43.x.x:5000: connection refusedorno such host. - MCPServer pods get stuck in
ImagePullBackOffpullingregistry.local/<server-name>.
This document lists what each distribution needs before you run setup. If you
are still choosing where to deploy the platform, start with
Deployment Targets, then return here for the exact
registry, DNS, TLS, and node-runtime preparation.
Dev vs production readiness¶
Most examples in this page describe the dev/local path: the bundled Docker
registry exposed by NodePort, HTTP registry access, registry.local, and
node-level insecure-registry or mirror configuration. That path is appropriate
for kind, minikube, Docker Desktop, single-node k3s, CI, and disposable test
clusters where the goal is quick image push/pull validation.
Production clusters use the same readiness model, but should make different platform choices:
| Area | Dev / local default | Production expectation |
|---|---|---|
| Registry | Bundled Docker registry exposed through NodePort and registry.local |
Managed or hardened registry such as ECR, Artifact Registry, ACR, Harbor, or an internal registry behind stable DNS |
| Transport | HTTP or an insecure registry mirror for fast iteration | HTTPS with a trusted certificate; use setup --with-tls when retaining the bundled registry |
| Node trust | /etc/hosts, containerd mirror entries, and Docker Desktop insecure-registry settings |
Platform-managed DNS, trusted CAs, image pull policy, and audited runtime config across every node pool |
| Image credentials | Local helper flows and default service account image pull secrets | Dedicated pull secrets, workload identity, or cloud-native node registry auth |
| Ingress | Bundled Traefik overlay is acceptable | Existing platform ingress or gateway may be preferred; validate ingress class, TLS, DNS, and policy ownership |
| Persistence | Default registry storage is acceptable for throwaway clusters | Storage class, backup/restore, quota, retention, and registry HA need explicit owner decisions |
| Sentinel stack | Bundled stack is useful for development and demos | Size, retention, security, and observability integration should be reviewed before production use |
Do not treat insecure_skip_verify, HTTP registries, wildcard insecure CIDR
ranges, or manual /etc/hosts edits as production guidance. They are local
workarounds for kubelet image pulls. In production, prefer a registry name that
resolves through normal DNS and is trusted by every node without bypassing TLS
verification.
./bin/mcp-runtime cluster doctor is useful in both modes. For the bundled
registry it probes the in-cluster registry/registry Service and selects HTTP
or HTTPS from the installed registry state: if registry/registry-internal-tls
exists, doctor probes https://registry.registry.svc.cluster.local:5000/v2/;
otherwise it probes the plain HTTP service. If you run with a provisioned
external registry, interpret bundled-registry-specific failures against your
registry architecture instead of copying the local workaround literally.
Production readiness checklist:
- Choose the registry architecture: bundled registry with TLS, or a managed / hardened external registry.
- Ensure every node pool that can schedule MCP Runtime workloads can pull the operator, gateway proxy, Sentinel, and MCP server images.
- Configure image credentials with pull secrets, workload identity, or cloud-native node registry auth.
- Confirm the ingress class, public DNS, and TLS issuer are owned by the platform team and match the cluster's ingress controller.
- Confirm a default
StorageClassexists, or set explicit storage choices for the registry and Sentinel data paths. - Decide certificate management up front: Let's Encrypt with public DNS,
--tls-cluster-issuerfor an enterprise CA, or a preinstalled issuer. - Review Sentinel sizing, retention, auth, Kubernetes API access, and observability integration before enabling it on production traffic; see Sentinel Kubernetes awareness and hardening.
- Run
./bin/mcp-runtime setup --with-tls --strict-prodfor production-style validation, or document why a non-strict setup is intentional.
Why these prerequisites exist¶
The registry runs as a Kubernetes Service of type NodePort (default 5000:32000/TCP) with an Ingress at registry.local.
Three different actors fetch images, and they resolve hostnames differently:
| Actor | What it pulls | DNS source |
|---|---|---|
./bin/mcp-runtime registry push (in-cluster mode) |
Pushes from a helper pod after platform credential validation using the registry Service DNS | Cluster CoreDNS (always works) |
kubelet on the node |
Pulls operator / MCPServer images for pod creation | Host DNS (not CoreDNS) + containerd registry mirrors |
Developer docker push / docker pull |
Ad-hoc pushes or pulls from your laptop | Your local /etc/hosts / corporate DNS |
The in-cluster push path is handled by the CLI (PushInCluster rewrites the destination to the service DNS). The developer path is your local concern. The node/kubelet path is what the distribution-specific config below is for.
setup --test-mode still uses this model. It relaxes production guardrails, but
it builds and pushes the operator, gateway proxy, and Sentinel images with
latest tags to the configured or bundled registry. Those pods still pull
through kubelet/containerd, so an HTTP bundled registry requires node trust for
the exact image host and port used in the rendered image references.
When setup uses the bundled registry, platform-owned image refs for the
operator, gateway proxy, and Sentinel services are rendered with the internal
registry endpoint, ClusterIP, or service-DNS host rather than a public registry
ingress hostname derived from MCP_PLATFORM_DOMAIN. The public registry host is
still used for ingress routing and user-facing registry flows; set
MCP_REGISTRY_ENDPOINT or MCP_REGISTRY_HOST only when cluster nodes should
pull platform images through a specific internal registry endpoint.
Registry setup modes¶
setup --registry-mode makes the registry path explicit:
| Mode | What setup does | Node requirement |
|---|---|---|
auto |
Keeps existing behavior: use a provisioned registry config when present, otherwise install the bundled registry. | Depends on the resolved path. |
bundled-http |
Installs the bundled registry and keeps the registry pod on plain HTTP for internal platform pulls. Public ingress can still use TLS with --with-tls. |
Configure every node/containerd runtime to allow the exact image host as an insecure registry. |
bundled-https |
Installs the bundled registry with TLS served by the registry pod itself. Public ingress TLS can still use ACME; the registry pod uses a separate internal certificate from setup-generated mcp-runtime-ca, or from a non-ACME --tls-cluster-issuer when you provide one. |
Configure every node to resolve or mirror the rendered registry host and trust the internal issuing CA. |
external |
Skips the bundled registry and pushes setup images to a provisioned registry. | Registry DNS, TLS, auth, pull secrets, and node trust are owned by the external registry platform. |
For bundled HTTPS without an explicit MCP_REGISTRY_ENDPOINT, setup renders
platform image refs with registry.registry.svc.cluster.local:5000 so the
internal registry certificate can include a stable service DNS SAN. Kubernetes
nodes do not automatically use cluster DNS or trust the MCP Runtime CA for image
pulls; configure containerd/k3s/your node runtime to resolve or mirror that host
and trust the CA before relying on this mode in lab clusters.
Public TLS with MCP_PLATFORM_DOMAIN (production k3s shape): when the
registry is exposed at registry.<domain> with Let's Encrypt or an enterprise
issuer, set MCP_REGISTRY_ENDPOINT=registry.<domain> before setup so kubelet
pulls match the certificate SANs. Do not use the registry Service ClusterIP
(for example 10.43.x.x:5000) on this path — pulls fail with
x509: cannot validate certificate ... doesn't contain any IP SANs. See
k3s Deployment Runbook - Environment variable reference
and Deployment Targets - bundled HTTPS.
When using the built-in issuer for internal registry pod TLS, setup creates
cert-manager/mcp-runtime-ca if it is missing; export its tls.crt and add that
certificate to each node runtime trust store. After setup, cluster doctor uses
the registry-internal-tls Secret as the signal to probe the registry Service
over HTTPS; a successful doctor registry probe confirms in-cluster push-helper
reachability, while kubelet image pulls still depend on node/containerd trust
for the exact rendered image host.
Setup image platform¶
setup builds the operator, gateway proxy, and Sentinel images as Linux
container images. By default it inspects kubernetes.io/arch on cluster nodes
and builds a single-platform image for a homogeneous cluster, for example
linux/amd64 on standard VPS nodes or linux/arm64 on ARM nodes. Mixed-arch
clusters are rejected until multi-arch manifest publishing is supported, because
the platform deployments do not pin pods to architecture-specific node pools.
Set MCP_IMAGE_PLATFORM=linux/amd64 or MCP_IMAGE_PLATFORM=linux/arm64 when
you need to override detection, such as building from an Apple Silicon laptop
for an amd64 k3s VPS cluster.
External registry path¶
Use an external registry when your platform already provides ECR, Artifact
Registry, ACR, Harbor, or another hardened registry with DNS, TLS, retention,
and auth controls. In that mode, setup skips installing the bundled registry
and configures the operator to use the provisioned registry.
Configure it either with the CLI:
./bin/mcp-runtime registry provision --url registry.example.com
or directly on setup:
./bin/mcp-runtime setup --registry-mode external --with-tls --strict-prod \
--external-registry-url registry.example.com \
--external-registry-username <user> \
--external-registry-password <password>
or with environment variables before setup:
export PROVISIONED_REGISTRY_URL=registry.example.com
export PROVISIONED_REGISTRY_USERNAME=<user> # optional
export PROVISIONED_REGISTRY_PASSWORD=<password> # optional
When external registry credentials are configured, setup creates pull secrets
for Sentinel workloads and a separate mcp-runtime/mcp-runtime-registry-pull-creds
Secret for the operator Deployment. If you intentionally roll platform
workloads from an already-authenticated public registry, set
MCP_PLATFORM_IMAGE_PULL_SECRET=<secret-name> before setup; the Secret must
exist in mcp-runtime for the operator and in mcp-sentinel for Sentinel
workloads.
If you configured the registry with registry provision or environment
variables first, run setup with production validation:
./bin/mcp-runtime setup --registry-mode external --with-tls --strict-prod
Before rendering MCP server manifests for review or GitOps, set the image host that cluster nodes should pull from:
export MCP_REGISTRY_INGRESS_HOST=registry.example.com
./bin/mcp-runtime server generate --metadata-dir .mcp --output manifests
The platform API follows the same split for short image deploys: it expands
short image names with PLATFORM_REGISTRY_URL, MCP_REGISTRY_INGRESS_HOST, or
MCP_PLATFORM_DOMAIN before falling back to MCP_REGISTRY_ENDPOINT. This keeps
tenant MCPServer.spec.image values on the pullable public registry host when
an internal endpoint also exists for push helpers or registry mirrors.
Use MCP_REGISTRY_ENDPOINT only when the operator needs a different internal
push/pull endpoint than the public image host. For example, a private registry
may expose registry.example.com to developers but an internal host:port to
cluster nodes. Keep image references, pull secrets, and node trust aligned with
the endpoint that kubelet actually uses.
External registry readiness checks:
docker login registry.example.comsucceeds from the build machine, if username/password auth is required.kubectl get secret -A | grep -i pullshows the expected image pull secret, or workload identity / node registry auth is configured by the platform.- A test pod using an image from the registry reaches
Runningon every node pool that may run MCP Runtime workloads. ./bin/mcp-runtime statusreports the provisioned registry URL after setup.
Strict production setup¶
./bin/mcp-runtime setup --strict-prod adds guardrails for production-style
installs. It is ignored in --test-mode; otherwise it requires:
--with-tls.--registry-mode externalwith an external registry URL that is not dev-only, or--registry-mode bundled-httpsfor a bundled registry served over HTTPS.- No implicit bundled HTTP registry path.
Normal setup still allows local HTTP and internal registry flows so kind, k3s,
Docker Desktop, and CI remain easy to use. Use --strict-prod when the cluster
is intended to represent production or a production-like staging environment.
DNS and TLS readiness¶
When MCP_PLATFORM_DOMAIN=example.com is set, setup derives these public names:
registry.example.comfor registry ingress.mcp.example.comfor MCP server traffic.platform.example.comfor the dashboard, API, and Grafana paths. Prometheus remains an internal metrics backend.
All configured public names must resolve to the cluster ingress address before
certificate issuance. For Let's Encrypt HTTP-01, port 80 must reach the ingress
controller from the public internet. For enterprise PKI, install the
ClusterIssuer first and pass it with:
./bin/mcp-runtime setup --with-tls --tls-cluster-issuer <issuer-name>
If you retain the bundled registry with TLS ingress only, make sure the registry
certificate covers the registry and MCP hostnames nodes, build machines, and
MCP clients use. The registry/registry-cert Certificate writes the
registry/registry-tls Secret and covers registry.<domain> plus
mcp.<domain> when those names are derived from MCP_PLATFORM_DOMAIN or
explicit ingress host environment
variables.
The registry Ingress intentionally does not carry a
cert-manager.io/cluster-issuer annotation. Registry TLS uses the explicit
registry-cert owner only; this avoids cert-manager ingress-shim creating a
second registry-tls Certificate for the same Secret. If setup reports that
another Certificate already references registry-tls, remove the stale owner
before applying TLS again.
The platform UI hostname is separate. platform.<domain> is owned by the
mcp-sentinel-platform-ui Ingress in the mcp-sentinel namespace, and
cert-manager writes that certificate into the mcp-sentinel-platform-tls
Secret in the same namespace. Do not expect registry-cert to contain
platform.<domain>.
Inspect both TLS paths when debugging:
kubectl get certificate registry-cert -n registry -o yaml
kubectl get secret registry-tls -n registry \
-o jsonpath='{.data.tls\.crt}' | base64 -d | \
openssl x509 -noout -text | grep -A1 "Subject Alternative Name"
kubectl get ingress mcp-sentinel-platform-ui -n mcp-sentinel -o yaml
kubectl get secret mcp-sentinel-platform-tls -n mcp-sentinel \
-o jsonpath='{.data.tls\.crt}' | base64 -d | \
openssl x509 -noout -text | grep -A1 "Subject Alternative Name"
If you use an external registry, the registry's own TLS and auth configuration are outside the bundled cert-manager flow.
The bundled registry ingress expects the repo-managed Traefik dynamic
middleware registry-admin-auth@file. If you bring your own ingress controller
or reuse an external Traefik install, configure an equivalent forward-auth guard
to /api/registry/authz before exposing registry.<domain> publicly.
If the cluster already has a live external Traefik install, setup reuses it
and refuses to install a second repo-managed Traefik stack. In that shape, use:
./bin/mcp-runtime setup --ingress none ...
Do that only when the external ingress controller already owns the ingress class, public entrypoints, and any required registry auth middleware. Running two Traefik installations against the same ingress surface leads to ambiguous ownership and hard-to-debug routing failures.
Quick public endpoint checks after DNS and TLS are live:
curl -k -I -H "x-api-key: $ADMIN_API_KEY" https://registry.<domain>/v2/should return200anddocker-distribution-api-version: registry/2.0. Without admin credentials, the public registry ingress should return401or403.curl -k -I https://platform.<domain>/should return200.curl -k -i https://platform.<domain>/api/healthshould normally return401without platform credentials; that still proves the platform host is routing API traffic correctly.curl -k -i -H "x-api-key: $ADMIN_API_KEY" https://platform.<domain>/grafana/api/healthshould reach the admin-gated observability route. Without admin credentials, thesentinel-admin-auth@fileguard should return401. Prometheus is not exposed directly on the platform host; validate it through Grafana's datasource or a temporarykubectl port-forward.curl -k -i https://mcp.<domain>/<server-name>/mcpmay return an application-level400or401when called without the expected MCP protocol headers or session context. That is often enough to confirm the public route is live before you move on to MCP client debugging.
Public-mode admin bootstrap¶
Fresh public/TLS installs must set an OIDC admin allowlist before setup:
MCP_PLATFORM_ADMIN_EMAIL, for the first platform admin email- or
ADMIN_USERS, for a comma-separated list of admin emails or OIDC subjects
Setup writes those values into the ADMIN_USERS secret key. The API checks that
allowlist on Google/OIDC login and promotes matching users to platform admin.
--acme-email is only the certificate contact email and does not grant
platform admin.
Password bootstrap is separate. Installs that expect a password-created admin must keep both of these secret-backed values aligned:
PLATFORM_ADMIN_EMAILPLATFORM_ADMIN_PASSWORD
If only one of the two is present in the API deployment environment, the API can crash-loop on a clean database with an error similar to:
failed to seed platform admin: PLATFORM_ADMIN_EMAIL and PLATFORM_ADMIN_PASSWORD must both be set
Before assuming a database or auth bug, inspect the deployment and the managed secret:
kubectl get deploy mcp-sentinel-api -n mcp-sentinel -o yaml
kubectl get secret mcp-sentinel-secrets -n mcp-sentinel -o yaml
Bootstrap-only credentials should still be removed or rotated after the first successful platform bring-up, but both values must be present for the initial seed path to succeed.
Clean platform reset¶
Deleting pods is not a full platform reset. Sentinel state primarily lives in PVC-backed StatefulSets. If you want a truly fresh platform state, scale the StatefulSets down and delete the PVCs you intend to wipe.
Typical destructive reset flow:
kubectl scale statefulset clickhouse kafka loki tempo mcp-sentinel-postgres \
-n mcp-sentinel --replicas=0
kubectl delete pvc \
data-clickhouse-0 \
kafka-data-kafka-0 \
data-loki-0 \
data-tempo-0 \
data-mcp-sentinel-postgres-0 \
-n mcp-sentinel
Important distinctions:
- Preserve
mcp-sentinel-secretsunless you intentionally want new API keys, JWT secrets, and platform bootstrap values. - Preserve
mcp-sentinel-platform-tlsunless you want to reissue the platform certificate. - Preserve
registry/registry-storageunless you intentionally want to wipe the image registry too. - A PVC stuck in
Terminatingis often still referenced by stale pods from an old ReplicaSet. Remove the stale pods first, then recreate the workload.
This reset is destructive. Treat it as an operator workflow, not a normal upgrade path.
Node pool consistency¶
Registry fixes must apply to every node pool that can schedule MCP Runtime pods. A setup can look healthy on one node and fail later when a workload lands on a different pool with missing DNS, missing CA trust, missing registry mirror config, or missing cloud registry permissions.
For production, either standardize registry trust and auth across all eligible node pools, or use taints, tolerations, labels, and node selectors so MCP Runtime workloads only land on pools that have been prepared and audited.
k3s¶
k3s uses embedded containerd. The steps below cover lab HTTP registry
mirrors (registry.local, NodePort). For public TLS + bundled HTTPS
(registry.<domain> with Let's Encrypt), skip the insecure mirror path and
follow Deployment Targets - k3s production
and k3s Deployment Runbook instead — set
MCP_REGISTRY_ENDPOINT=registry.<domain> and use --ingress none when k3s
Traefik already runs in kube-system.
For setup --test-mode with the bundled plain HTTP registry, the same
containerd mirror requirement applies because setup still builds and pushes
operator, gateway proxy, and Sentinel images, then deploys pods that pull those
images. On k3s hosts where ~/.kube/config is empty or minimal, pass
--kubeconfig /etc/rancher/k3s/k3s.yaml to setup.
The fastest path is to preconfigure the steps below before running setup so
the host k3s pulls from is already trusted on first attempt. If you skip that
preflight, expect setup to fail once: it prints the registry Internal URL
after the Service exists; copy that host:port into registries.yaml, restart
k3s, then rerun setup. To avoid the rerun on dynamic ClusterIP environments,
configure a stable external registry or set MCP_REGISTRY_ENDPOINT to a
host:port every node already trusts.
- Registry mirror. Create
/etc/rancher/k3s/registries.yaml:
mirrors:
registry.local:
endpoint:
- "http://127.0.0.1:32000"
configs:
"127.0.0.1:32000":
tls:
insecure_skip_verify: true
If the registry's ClusterIP (e.g. 10.43.39.164:5000) or service DNS
(registry.registry.svc.cluster.local:5000) ever appears as an image ref,
add a mirror entry for it too — containerd does exact-host matching.
- Host DNS. Add to
/etc/hosts:
127.0.0.1 registry.local
- Reload.
systemctl restart k3s. k3s regenerates containerd's config fromregistries.yamlat startup.
Multi-node k3s: apply the same /etc/rancher/k3s/registries.yaml and /etc/hosts to every node — 127.0.0.1:32000 reaches the local kube-proxy which forwards to the registry pod regardless of where the pod is scheduled.
Node disk-pressure recovery¶
Image-heavy setup runs on small clusters can trip kubelet DiskPressure during
the operator/Sentinel build-and-push phase. Typical symptoms:
- helper pods such as
registry-pusher-*stayPending - new Sentinel pods fail scheduling with
untolerated taint(s) kubectl describe nodeshowsnode.kubernetes.io/disk-pressure:NoSchedule
Check the node first:
kubectl describe node <node-name>
df -h /
Then clear disposable build/cache data before rerunning setup:
- stale Go build/module caches under
/tmpand/root/.cache/go-build - unused Docker images from local build stages
If free space returns but the node still reports DiskPressure, restart the
node's Kubernetes runtime service so kubelet/containerd accounting catches up.
The exact service name depends on the distribution, for example:
systemctl restart k3s
After the restart, verify DiskPressure=False before continuing the rollout.
If the registry pod was unavailable during the pressure window, expect a brief
wave of ErrImagePull or ImagePullBackOff until the registry becomes ready
again.
kind¶
kind's nodes are containers, so the registry NodePort needs an extraPortMappings entry to be reachable, and containerd inside the node container needs the same mirror.
For setup --test-mode, MCP Runtime emits image refs such as
registry.registry.svc.cluster.local:5000/mcp-sentinel-api:latest so Kind
nodes use one stable service-DNS host instead of a mutable registry
ClusterIP:port.
- Cluster config. Pass this to
kind create cluster --config:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
containerdConfigPatches:
- |-
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.registry.svc.cluster.local:5000"]
endpoint = ["http://127.0.0.1:32000"]
[plugins."io.containerd.grpc.v1.cri".registry.configs."127.0.0.1:32000".tls]
insecure_skip_verify = true
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 32000
hostPort: 32000
protocol: TCP
- Exact host matching. If pod events show
http: server gave HTTP response to HTTPS client, compare the pod image host with the mirror key. Containerd only applies the HTTP mirror when the strings match exactly.
Alternative: kind load docker-image <image> sideloads without a registry at all — useful for throwaway tests, but bypasses the registry-push flow the CLI is built around.
minikube¶
Two options.
Option A — insecure registry flag at start time:
minikube start --insecure-registry=registry.local --insecure-registry=10.43.39.164:5000
minikube addons enable ingress
echo "$(minikube ip) registry.local" | sudo tee -a /etc/hosts
The --insecure-registry flag is read-only on initial start. Re-creating the VM is required to change it.
Option B — minikube image load:
Skip the registry entirely and push images directly into the node's image store:
docker build -t registry.local/my-server:latest .
minikube image load registry.local/my-server:latest
Fine for quick iteration, but ./bin/mcp-runtime registry push won't help — images bypass the registry.
Docker Desktop (Kubernetes)¶
- Docker Desktop → Settings → Docker Engine → add:
{
"insecure-registries": ["registry.local", "10.96.0.0/12"]
}
Apply & Restart. Docker Desktop's embedded k8s shares the Docker daemon's registry config.
/etc/hosts:
127.0.0.1 registry.local
Reachability from the k8s nodes (which are VMs managed by Docker Desktop) is automatic because they share the host loopback for the NodePort via 127.0.0.1.
kubeadm / vanilla Kubernetes¶
For each node running kubelet:
- Edit
/etc/containerd/config.toml:
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.local"]
endpoint = ["http://<registry-reachable-ip>:32000"]
[plugins."io.containerd.grpc.v1.cri".registry.configs."<registry-reachable-ip>:32000".tls]
insecure_skip_verify = true
Pick <registry-reachable-ip> as whichever IP the node can reach the registry's NodePort on — typically the node's own IP or a load-balancer VIP.
-
/etc/hostson each node: mapregistry.localto the same IP. -
systemctl restart containerdon each node.
For production, use a stable registry endpoint with HTTPS and trusted node configuration as described in Dev vs production readiness.
Generic checks you can run¶
# Is the registry Service up?
kubectl get svc -n registry registry
# NodePort?
kubectl get svc -n registry registry -o jsonpath='{.spec.ports[0].nodePort}'
# From inside the cluster — should return a JSON repository list:
kubectl run -n registry --rm -it registry-check --restart=Never \
--image=curlimages/curl --command -- \
curl -s http://registry.registry.svc.cluster.local:5000/v2/_catalog
# From the node (SSH to a node first; this bypasses public ingress auth):
curl -s http://127.0.0.1:32000/v2/_catalog
getent hosts registry.local
# Post-install diagnostics via the CLI (see below)
./bin/mcp-runtime cluster doctor
Failure-to-cause map¶
| Symptom | Likely cause | What to check |
|---|---|---|
lookup registry.local: no such host during publish or image pull |
Host or node DNS does not know registry.local |
/etc/hosts, corporate DNS, MCP_REGISTRY_INGRESS_HOST, and the image host emitted by generated manifests |
http: server gave HTTP response to HTTPS client |
The registry is HTTP, but Docker/containerd is trying HTTPS | Insecure registry / mirror settings for the exact image host in dev, or switch to HTTPS with trusted certs for production |
ImagePullBackOff with 401 or 403 |
Registry auth is missing or invalid | Image pull secrets, service account references, workload identity, or cloud node registry permissions |
ImagePullBackOff only on some nodes |
Node pool config drift | Registry mirror config, CA trust, DNS, and registry permissions on every eligible node pool |
curl https://registry.<domain>/v2/ returns 401 or 403 |
Public registry ingress is routed but no admin credential was accepted | Retry with an admin x-api-key, admin Bearer token, or admin-owned registry Basic credential |
curl https://registry.<domain>/v2/ returns Traefik 404 page not found |
Ingress/router is not routing to the registry service | Registry Ingress, ingress class, Traefik logs, host rules, TLS secret names, and the registry-admin-auth@file middleware |
cert-manager reports NXDOMAIN |
Public DNS is missing or misspelled | getent hosts registry.<domain>, mcp.<domain>, platform.<domain>, and DNS records from outside the cluster |
cluster doctor reports missing bundled registry while using an external registry |
Doctor is checking the local bundled-registry path | Validate your external registry path manually and treat that specific check as advisory |
bootstrap¶
./bin/mcp-runtime bootstrap validates a smaller, mostly orthogonal set of prerequisites before setup:
- kubectl connectivity to the cluster.
- A CoreDNS deployment in
kube-system. - A default
StorageClass(one annotatedstorageclass.kubernetes.io/is-default-class=true). - The
traefikIngressClass. - The
metallb-systemnamespace, if you plan to use MetalLB forLoadBalancerservices.
Missing pieces are warnings, not errors — the command surfaces them so you can decide what to install with your standard platform tooling.
bootstrap --apply --provider k3s is the only automated apply path today: run it on the k3s server node and it applies the bundled CoreDNS and local-path manifests under /var/lib/rancher/k3s/server/manifests, then waits for both rollouts. Other providers (rke2, kubeadm, generic) print guidance instead.
cluster doctor¶
./bin/mcp-runtime cluster doctor runs post-install diagnostics by default:
- Detects your distribution (k3s / kind / minikube / docker-desktop / generic).
- Checks the installed MCP Runtime namespaces, CRDs, operator, Traefik ingress, registry, Sentinel, and MCPServer reconciliation path. The MCPServer smoke uses an existing ready app image when available; otherwise it falls back to
registry.k8s.io/pause:3.9and validates deployment/service/ingress reconciliation plus pod scheduling without a TCP readiness wait. - Prefers k3s' bundled Traefik in
kube-system/traefikwhen the active cluster is k3s, then falls back to the repo-managedtraefik/traefikinstall. setupfollows the same ownership model: it reuses active external Traefik and refuses to force-install the repo-managed Traefik when that would create a second active stack.- Verifies registry reachability, registry image-pull smoke behavior, and common pod image-pull failures. The bundled registry reachability probe uses HTTPS when
registry/registry-internal-tlsis installed, and HTTP otherwise. - Reports
http: server gave HTTP response to HTTPS clientwhen kubelet/containerd tried HTTPS against the HTTP dev registry, including the affected pod and image where possible. - Streams the current check before running it, including helper pod probes and waits, so a slow run shows what it is doing.
- Prints the distribution-specific registry remediation hint only when registry or image-pull checks fail; Traefik and Sentinel failures use their own check-specific remedies.
For setup preflight, run ./bin/mcp-runtime cluster doctor --for-setup. That mode focuses on:
- Traefik ingress readiness and exposure.
- Public host resolution from
MCP_PLATFORM_DOMAINor the explicitMCP_PLATFORM_INGRESS_HOST,MCP_REGISTRY_INGRESS_HOST, andMCP_MCP_INGRESS_HOSTenv vars. - Local DNS resolution for those configured public hosts.
- cert-manager deployment readiness when TLS preflight is requested.
MCP_TLS_CLUSTER_ISSUERexistence when configured.MCP_ACME_EMAILHTTP-01 readiness, including whether the active Traefik web entrypoint is on service port80.
Run bootstrap before setup on a fresh cluster. Run cluster doctor after
setup, or use cluster doctor --for-setup before a host-based TLS install.