Skip to content

Troubleshooting

Common errors and how to fix them — covering deployment, gateway policy, registry, analytics, and connectivity issues.


Server deployment

tool_side_effect_unknown

The gateway returned this error on a tool call.

Cause: The tool name in the grant or the session does not match a tool listed in the server's .mcp/servers.yaml metadata. The gateway cannot look up the tool's side-effect class, so it denies the call.

Fix:

# 1. Check what tools are in the metadata
cat .mcp/servers.yaml

# 2. Check what tools the running server actually exposes
mcp-runtime server init --from-server http://localhost:8088 --force

# 3. Validate the grant against the metadata
mcp-runtime server validate --metadata-dir .mcp --grant-file grant.yaml

# 4. Re-apply the corrected grant
mcp-runtime access grant apply --file grant.yaml

tool_not_granted

The agent tried to call a tool that is not in any allow rule in the active grant.

Fix: Add the tool to the grant with --tool <name> and re-apply.


Server stuck in Pending or NotReady

mcp-runtime server status --namespace mcp-team-<slug>
kubectl describe pod -n mcp-team-<slug> -l app=<server-name>
kubectl get events -n mcp-team-<slug> --sort-by='.lastTimestamp'

Common causes:

Symptom Cause Fix
ImagePullBackOff Registry credentials stale Re-push image; check pull secret
Pending (no node) Cluster resource exhaustion Scale nodes or reduce replicas
CrashLoopBackOff Server crashes on start mcp-runtime server logs <name> --use-kube

registry push returns 401

# Check the registry pull secret is valid
kubectl get secret mcp-runtime-registry-pull -n mcp-team-<slug>

# Re-login and retry
mcp-runtime auth login --api-url https://platform.example.com
mcp-runtime registry push --image ...

Access control

Grant applied but calls still denied

  1. Confirm the grant exists:

    mcp-runtime access grant list --namespace mcp-team-<slug>
    

  2. Confirm the session exists and is not expired or revoked:

    mcp-runtime access session list --namespace mcp-team-<slug>
    

  3. Check the gateway logs for the denial reason:

    mcp-runtime server logs <server-name> --namespace mcp-team-<slug> \
      --use-kube 2>&1 | grep -E "deny|allow|session|grant"
    

  4. Inspect the effective policy:

    mcp-runtime server policy inspect <server-name> \
      --namespace mcp-team-<slug>
    


Session expired or session_not_found

The adapter auto-refreshes sessions when started with --auto-refresh. If you are using manual sessions:

# Create a new session
mcp-runtime access session init new-session \
  --server <name> --namespace mcp-team-<slug> \
  --agent-id cursor --trust low --expires-in 4h \
  --output session.yaml

MCP_PLATFORM_API_PROFILE=admin \
  mcp-runtime access session apply --file session.yaml

Registry and images

x509: certificate signed by unknown authority

The cluster node does not trust the registry's TLS certificate.

For bundled-https mode, the registry uses the internal mcp-runtime-ca. Nodes must trust this CA. See Cluster Readiness for distribution- specific node trust configuration.


no basic auth credentials on image pull

The mcp-runtime-registry-pull pull secret in the server's namespace has stale credentials. This happens after a mcp-runtime setup rerun that rotates API keys.

# Check secret exists
kubectl get secret mcp-runtime-registry-pull -n mcp-team-<slug>

# Re-run setup or re-create the secret manually:
ADMIN_KEY=$(kubectl get secret mcp-sentinel-secrets -n mcp-sentinel \
  -o jsonpath='{.data.UI_API_KEY}' | base64 -d)

kubectl create secret docker-registry mcp-runtime-registry-pull \
  -n mcp-team-<slug> \
  --docker-server=registry.example.com \
  --docker-username=platform-service \
  --docker-password="$ADMIN_KEY" \
  --dry-run=client -o yaml | kubectl apply -f -

Analytics and observability

Tool calls not showing in the Analytics dashboard

  1. Check the ingest service is receiving events:

    KUBECONFIG=~/.kube/config mcp-runtime sentinel logs ingest --since 5m
    
    If you see 401 errors, the analytics API key in the gateway sidecar is stale — re-run setup or restart the analytics deployments.

  2. Check the processor is consuming from Kafka:

    KUBECONFIG=~/.kube/config mcp-runtime sentinel logs processor --since 5m
    

  3. Check ClickHouse has the mcp.events topic:

    kubectl exec -n mcp-sentinel kafka-0 -- \
      kafka-topics --list --bootstrap-server localhost:9092
    
    If mcp.events is missing, re-run setup.


Sentinel API returns 401

The mcp-sentinel-api pods may have started with stale API keys from a previous setup run.

# Restart to pick up current keys
kubectl rollout restart deployment/mcp-sentinel-api -n mcp-sentinel
kubectl rollout status deployment/mcp-sentinel-api -n mcp-sentinel --timeout=120s

Platform and cluster health

cluster doctor reports failures

KUBECONFIG=~/.kube/config mcp-runtime cluster doctor

The doctor runs 37 checks and prints a remedy for each failure. Follow the printed instructions — most failures point to missing ingress, stale certificates, or image pull errors with specific kubectl commands to fix them.


Setup pre-flight check blocked by stale Certificate

ERROR  Stale Certificate "registry-cert" has DNS names [registry.local]
       but the expected registry host is "registry.example.com"
kubectl delete certificate -n registry registry-cert
kubectl delete certificaterequest -n registry --all
# Re-run setup

Namespace stuck in Terminating

kubectl patch ns <namespace> \
  -p '{"metadata":{"finalizers":null}}' \
  --type=merge

Getting more help

  • Run mcp-runtime <command> --help for flag reference
  • Run mcp-runtime cluster doctor for a full 37-point cluster diagnostic
  • Check GitHub Issues
  • Contributor Troubleshooting for development-environment specific issues