Observability
Truss follows one principle here: instrument, don’t impose. It always emits signals in standard formats so any monitoring stack can ingest them, and it never forces a heavy stack on you. You can wire it into whatever you already run, or spin up a bundled Grafana stack.
What the API always exposes
Section titled “What the API always exposes”- Metrics — a Prometheus endpoint at
/metrics(unauthenticated; scrape it on your internal network). It carries the RED signals (request Rate, Errors, Duration) as one histogram labeled bymethod/route/status_code, plus a Postgres pool gauge and Node process metrics (CPU, memory, event-loop lag, GC). - Logs — structured JSON to stdout (pino), with secrets redacted. Any collector that reads container stdout (Promtail, Alloy, Fluent Bit, a cloud agent) can ship them.
- Traces — opt-in. Set
OTEL_EXPORTER_OTLP_ENDPOINTand the API exports OpenTelemetry traces (auto-instrumented HTTP → Express route → Postgres / Redis queries). Unset, tracing is fully dormant and costs nothing. When tracing is on, every log line is stamped with the activetrace_id, so you can pivot metric → trace → logs.
| Variable | Purpose | Default |
|---|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT | OTLP/HTTP endpoint for trace export (e.g. http://collector:4318) | (unset → off) |
OTEL_SERVICE_NAME | Service name on spans | truss-api |
LOG_LEVEL | pino log level | info |
Plug into your existing stack
Section titled “Plug into your existing stack”- Prometheus: scrape
truss-api:8787/metrics. - Logs: point your collector at the API container’s stdout.
- Traces: set
OTEL_EXPORTER_OTLP_ENDPOINTto your collector / Tempo / vendor OTLP URL.
Kubernetes (Prometheus operator)
Section titled “Kubernetes (Prometheus operator)”If you run kube-prometheus-stack, flip on the chart’s opt-in artifacts (all default-off):
helm upgrade truss ./charts/truss \ --set observability.serviceMonitor.enabled=true \ --set observability.prometheusRule.enabled=true \ --set observability.grafanaDashboard.enabled=true \ --set observability.otlpEndpoint=http://otel-collector.monitoring:4318That creates a ServiceMonitor (the operator auto-scrapes /metrics), a PrometheusRule
with three SLO alerts (error rate > 1%, p95 > 500ms, DB-pool saturation), and a Grafana
dashboard ConfigMap the Grafana sidecar auto-loads.
Bundled stack (batteries-included)
Section titled “Bundled stack (batteries-included)”If you don’t run monitoring, layer the bundled LGTM stack onto Docker Compose:
docker compose -f docker-compose.selfhosted.yml -f docker-compose.observability.yml \ --env-file .env.selfhosted up -dThat adds Prometheus, Loki + Promtail, Tempo, an OTel Collector, and Grafana — pre-wired:
Prometheus scrapes /metrics, the API exports traces to the collector → Tempo, Promtail
ships container logs → Loki. Open Grafana at http://localhost:3001 (anonymous admin);
the Truss API dashboard and all three datasources are already provisioned.
SLOs worth alerting on
Section titled “SLOs worth alerting on”Start with three, alert on burn rate rather than every blip:
- Availability —
rate(...status_code=~"5..")/ total < 1% - Latency —
histogram_quantile(0.95, ...)under your target (e.g. 500ms) - Saturation —
truss_db_pool_connections{state="waiting"}should stay at 0
The bundled PrometheusRule ships these as a starting point.