Gateway runbook

Use this page for day-1 startup and day-2 operations of the Gateway service.

Deep troubleshooting

Symptom-first diagnostics with exact command ladders and log signatures.

Configuration

Task-oriented setup guide + full configuration reference.

Secrets management

SecretRef contract, runtime snapshot behavior, and migrate/reload operations.

Secrets plan contract

Exact secrets apply target/path rules and ref-only auth-profile behavior.

5-minute local startup

1

Start the Gateway

pllan gateway --port 18789
# debug/trace mirrored to stdio
pllan gateway --port 18789 --verbose
# force-kill listener on selected port, then start
pllan gateway --force
2

Verify service health

pllan gateway status
pllan status
pllan logs --follow
Healthy baseline: Runtime: running and RPC probe: ok.
3

Validate channel readiness

pllan channels status --probe
Gateway config reload watches the active config file path (resolved from profile/state defaults, or PLLAN_CONFIG_PATH when set). Default mode is gateway.reload.mode="hybrid".

Runtime model

  • One always-on process for routing, control plane, and channel connections.
  • Single multiplexed port for:
    • WebSocket control/RPC
    • HTTP APIs (OpenAI-compatible, Responses, tools invoke)
    • Control UI and hooks
  • Default bind mode: loopback.
  • Auth is required by default (gateway.auth.token / gateway.auth.password, or PLLAN_GATEWAY_TOKEN / PLLAN_GATEWAY_PASSWORD).

Port and bind precedence

SettingResolution order
Gateway port--portPLLAN_GATEWAY_PORTgateway.port18789
Bind modeCLI/override → gateway.bindloopback

Hot reload modes

gateway.reload.modeBehavior
offNo config reload
hotApply only hot-safe changes
restartRestart on reload-required changes
hybrid (default)Hot-apply when safe, restart when required

Operator command set

pllan gateway status
pllan gateway status --deep
pllan gateway status --json
pllan gateway install
pllan gateway restart
pllan gateway stop
pllan secrets reload
pllan logs --follow
pllan doctor

Remote access

Preferred: Tailscale/VPN. Fallback: SSH tunnel.
ssh -N -L 18789:127.0.0.1:18789 user@host
Then connect clients to ws://127.0.0.1:18789 locally.
If gateway auth is configured, clients still must send auth (token/password) even over SSH tunnels.
See: Remote Gateway, Authentication, Tailscale.

Supervision and service lifecycle

Use supervised runs for production-like reliability.
pllan gateway install
pllan gateway status
pllan gateway restart
pllan gateway stop
LaunchAgent labels are ai.pllan.gateway (default) or ai.pllan.<profile> (named profile). pllan doctor audits and repairs service config drift.

Multiple gateways on one host

Most setups should run one Gateway. Use multiple only for strict isolation/redundancy (for example a rescue profile). Checklist per instance:
  • Unique gateway.port
  • Unique PLLAN_CONFIG_PATH
  • Unique PLLAN_STATE_DIR
  • Unique agents.defaults.workspace
Example:
PLLAN_CONFIG_PATH=~/.pllan/a.json PLLAN_STATE_DIR=~/.pllan-a pllan gateway --port 19001
PLLAN_CONFIG_PATH=~/.pllan/b.json PLLAN_STATE_DIR=~/.pllan-b pllan gateway --port 19002
See: Multiple gateways.

Dev profile quick path

pllan --dev setup
pllan --dev gateway --allow-unconfigured
pllan --dev status
Defaults include isolated state/config and base gateway port 19001.

Protocol quick reference (operator view)

  • First client frame must be connect.
  • Gateway returns hello-ok snapshot (presence, health, stateVersion, uptimeMs, limits/policy).
  • Requests: req(method, params)res(ok/payload|error).
  • Common events: connect.challenge, agent, chat, presence, tick, health, heartbeat, shutdown.
Agent runs are two-stage:
  1. Immediate accepted ack (status:"accepted")
  2. Final completion response (status:"ok"|"error"), with streamed agent events in between.
See full protocol docs: Gateway Protocol.

Operational checks

Liveness

  • Open WS and send connect.
  • Expect hello-ok response with snapshot.

Readiness

pllan gateway status
pllan channels status --probe
pllan health

Gap recovery

Events are not replayed. On sequence gaps, refresh state (health, system-presence) before continuing.

Common failure signatures

SignatureLikely issue
refusing to bind gateway ... without authNon-loopback bind without token/password
another gateway instance is already listening / EADDRINUSEPort conflict
Gateway start blocked: set gateway.mode=localConfig set to remote mode
unauthorized during connectAuth mismatch between client and gateway
For full diagnosis ladders, use Gateway Troubleshooting.

Safety guarantees

  • Gateway protocol clients fail fast when Gateway is unavailable (no implicit direct-channel fallback).
  • Invalid/non-connect first frames are rejected and closed.
  • Graceful shutdown emits shutdown event before socket close.

Related: