Table of contents

  1. Overview
  2. Docker image
  3. Kubernetes Deployment
    1. Basic Deployment
    2. Secrets
    3. Service
  4. Health probes
  5. Graceful shutdown tuning
  6. Prometheus monitoring
    1. Prometheus ServiceMonitor
    2. Key metrics to alert on
    3. Custom histogram buckets
  7. Distributed tracing
    1. OTLP backend (Jaeger, Tempo, Honeycomb, etc.)
    2. New Relic
    3. OTEL metrics (alongside Prometheus)
    4. What gets traced
    5. Sampling in production
  8. gRPC load balancing
    1. Option 1: Headless Service + client-side balancing
    2. Option 2: Service mesh / L7 proxy
  9. TLS
  10. Resource tuning
    1. GOMAXPROCS
    2. Connection keepalive
  11. Security hardening
    1. Dedicated admin port (recommended)
    2. Public-facing services
    3. Internal services
    4. Built-in protections
    5. Data sent to third-party services
    6. Not built into ColdBrew
  12. Production checklist
    1. All services
    2. Public-facing services (additional)

Overview

This guide covers deploying ColdBrew services to production on Kubernetes. ColdBrew is designed for containerized environments — health checks, metrics, and graceful shutdown work out of the box.

Docker image

The ColdBrew cookiecutter generates a multi-stage Dockerfile:

# Build stage
FROM golang:1.25 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /service .

# Runtime stage
FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /service /service
EXPOSE 9090 9091
ENTRYPOINT ["/service"]

Key points:

  • CGO_ENABLED=0 produces a static binary — no libc dependency
  • ca-certificates is needed for TLS connections to external services (New Relic, Sentry, OTLP endpoints)
  • Ports 9090 (gRPC) and 9091 (HTTP) are the defaults

Build and push:

docker build -t your-registry/myservice:v1.0.0 .
docker push your-registry/myservice:v1.0.0

Kubernetes Deployment

Basic Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myservice
  labels:
    app: myservice
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myservice
  template:
    metadata:
      labels:
        app: myservice
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9091"
        prometheus.io/path: "/metrics"
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - name: myservice
          image: your-registry/myservice:v1.0.0
          ports:
            - name: grpc
              containerPort: 9090
              protocol: TCP
            - name: http
              containerPort: 9091
              protocol: TCP
          env:
            - name: APP_NAME
              value: myservice
            - name: ENVIRONMENT
              value: production
            - name: LOG_LEVEL
              value: info
          envFrom:
            - secretRef:
                name: myservice-secrets
          livenessProbe:
            httpGet:
              path: /healthcheck
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 3
          readinessProbe:
            httpGet:
              path: /readycheck
              port: http
            initialDelaySeconds: 3
            periodSeconds: 5
            timeoutSeconds: 3
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: "1"
              memory: 512Mi

Secrets

Store sensitive values like API keys in a Kubernetes Secret:

apiVersion: v1
kind: Secret
metadata:
  name: myservice-secrets
type: Opaque
stringData:
  NEW_RELIC_LICENSE_KEY: "your-license-key"
  SENTRY_DSN: "https://your-dsn@sentry.io/123"
  OTLP_HEADERS: "x-honeycomb-team=your-api-key"  # if your OTLP backend needs auth

Service

Expose both gRPC and HTTP ports:

apiVersion: v1
kind: Service
metadata:
  name: myservice
  labels:
    app: myservice
spec:
  selector:
    app: myservice
  ports:
    - name: grpc
      port: 9090
      targetPort: grpc
      protocol: TCP
    - name: http
      port: 9091
      targetPort: http
      protocol: TCP

Health probes

ColdBrew provides two health endpoints:

Endpoint Purpose Kubernetes probe
/healthcheck Liveness — is the process alive? livenessProbe
/readycheck Readiness — can it accept traffic? readinessProbe

Both return JSON with build/version info on success. During graceful shutdown, /readycheck fails first, which causes Kubernetes to stop routing traffic before the process exits.

Set terminationGracePeriodSeconds to at least SHUTDOWN_DURATION_IN_SECONDS to avoid SIGKILL during shutdown. The drain wait (GRPC_GRACEFUL_DURATION_IN_SECONDS) is included within the shutdown timeout, not additional to it. With the default of 15s, a value of 20 provides a safe buffer.

Graceful shutdown tuning

ColdBrew’s shutdown sequence (bounded by SHUTDOWN_DURATION_IN_SECONDS, default 15s):

  1. Receive SIGTERM from Kubernetes
  2. FailCheck(true) on CBGracefulStopper services — /readycheck starts failing
  3. Wait GRPC_GRACEFUL_DURATION_IN_SECONDS (default: 7s, included in shutdown timeout) for the load balancer to drain
  4. Shutdown admin server if configured (ADMIN_PORT)
  5. Shutdown HTTP server (stop accepting new requests)
  6. GracefulStop() gRPC server (finish in-flight RPCs, reject new ones)
  7. Force-stop gRPC server if graceful shutdown didn’t complete in time
  8. Call Stop() on CBStopper services — close database pools, flush metrics, drain message producers
  9. Exit

Tune these values based on your service:

env:
  # If your longest request takes 30s, set shutdown duration accordingly
  - name: SHUTDOWN_DURATION_IN_SECONDS
    value: "35"
  # Match your load balancer's health check interval + propagation time
  - name: GRPC_GRACEFUL_DURATION_IN_SECONDS
    value: "10"

For more details, see Signal Handling and Graceful Shutdown.

Prometheus monitoring

Prometheus ServiceMonitor

If you’re using the Prometheus Operator:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myservice
  labels:
    app: myservice
spec:
  selector:
    matchLabels:
      app: myservice
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

Key metrics to alert on

ColdBrew exposes these metrics out of the box via gRPC interceptors:

Metric Type Description
grpc_server_handled_total Counter Total RPCs completed, by method and status code
grpc_server_handling_seconds Histogram RPC latency distribution. Only available when ENABLE_PROMETHEUS_GRPC_HISTOGRAM=true (the default). Disabling this removes all latency percentile data from Prometheus
grpc_server_started_total Counter Total RPCs started

Recommended alerts:

# High error rate
- alert: HighGRPCErrorRate
  expr: |
    sum(rate(grpc_server_handled_total{grpc_code!="OK"}[5m])) by (grpc_service)
    /
    sum(rate(grpc_server_handled_total[5m])) by (grpc_service)
    > 0.05
  for: 5m

# High latency (p99 > 1s)
- alert: HighGRPCLatency
  expr: |
    histogram_quantile(0.99,
      sum(rate(grpc_server_handling_seconds_bucket[5m])) by (le, grpc_service)
    ) > 1
  for: 5m

The latency alert above requires ENABLE_PROMETHEUS_GRPC_HISTOGRAM=true (the default). If you set it to false for throughput tuning, the grpc_server_handling_seconds metric disappears and this alert will silently stop firing. Ensure you have an alternative latency signal (distributed tracing, load balancer metrics) before disabling histograms.

Custom histogram buckets

If the default latency buckets don’t match your SLOs, customize them:

env:
  - name: PROMETHEUS_GRPC_HISTOGRAM_BUCKETS
    value: "0.005,0.01,0.025,0.05,0.1,0.25,0.5,1,2.5,5,10"

Distributed tracing

ColdBrew sends traces via OpenTelemetry to any OTLP-compatible backend (Jaeger, Grafana Tempo, Honeycomb, Datadog, etc.) or New Relic.

OTLP backend (Jaeger, Tempo, Honeycomb, etc.)

env:
  - name: OTLP_ENDPOINT
    value: "otel-collector.monitoring:4317"  # your OTLP collector
  - name: OTLP_SAMPLING_RATIO
    value: "0.1"  # sample 10% of traces in production

For backends that require authentication headers:

env:
  - name: OTLP_ENDPOINT
    value: "api.honeycomb.io:443"
  - name: OTLP_HEADERS
    value: "x-honeycomb-team=your-api-key"
  - name: OTLP_SAMPLING_RATIO
    value: "0.1"

For local development, set OTLP_INSECURE=true and point to a local Jaeger instance (localhost:4317). See the config reference for a full example.

New Relic

New Relic tracing is configured separately and can run alongside OTLP:

env:
  - name: NEW_RELIC_LICENSE_KEY
    valueFrom:
      secretKeyRef:
        name: myservice-secrets
        key: NEW_RELIC_LICENSE_KEY
  - name: NEW_RELIC_OPENTELEMETRY
    value: "true"
  - name: NEW_RELIC_OPENTELEMETRY_SAMPLE
    value: "0.2"

OTEL metrics (alongside Prometheus)

To export gRPC metrics via OTLP alongside Prometheus scraping, enable OTEL metrics on the same endpoint used for tracing:

env:
  - name: ENABLE_OTEL_METRICS
    value: "true"
  - name: OTEL_METRICS_INTERVAL
    value: "60"  # seconds between OTLP metric exports
  # OTLP_ENDPOINT is already set for tracing above

This does not replace Prometheus — both /metrics scraping and OTLP push run in parallel. See the Metrics How-To for details on exported metric names.

What gets traced

ColdBrew automatically creates spans for:

Source Span kind Example
Incoming gRPC RPCs Server /pkg.Service/Method
Incoming HTTP requests Server ServeHTTP
Outbound gRPC calls (gateway) Client /pkg.Service/Method
tracing.NewInternalSpan() Internal Custom business logic spans
tracing.NewDatastoreSpan() Client Database/Redis operations
tracing.NewExternalSpan() Client External HTTP/API calls

Sampling in production

Set OTLP_SAMPLING_RATIO based on your traffic volume:

QPS Recommended ratio Traces/sec
100 1.0 100
1,000 0.1 100
10,000 0.01 100
70,000+ 0.001–0.01 70–700

Sampling is parent-based — if an incoming request already has a sampled trace context, ColdBrew respects that decision regardless of the local ratio.

gRPC load balancing

gRPC uses HTTP/2 with long-lived connections. A standard Kubernetes Service with ClusterIP won’t distribute load across pods — all requests go over a single connection to one pod.

Solutions:

Option 1: Headless Service + client-side balancing

apiVersion: v1
kind: Service
metadata:
  name: myservice-headless
spec:
  clusterIP: None  # headless
  selector:
    app: myservice
  ports:
    - name: grpc
      port: 9090

Use with ColdBrew’s grpcpool for client-side round-robin:

conn, err := grpcpool.DialContext(ctx, "dns:///myservice-headless:9090",
    grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
)

Option 2: Service mesh / L7 proxy

Use a gRPC-aware proxy (Istio, Linkerd, Envoy) that understands HTTP/2 multiplexing and balances per-request rather than per-connection.

TLS

Enable TLS on the gRPC server:

env:
  - name: GRPC_TLS_CERT_FILE
    value: /certs/tls.crt
  - name: GRPC_TLS_KEY_FILE
    value: /certs/tls.key
volumeMounts:
  - name: tls-certs
    mountPath: /certs
    readOnly: true
volumes:
  - name: tls-certs
    secret:
      secretName: myservice-tls

If you’re using a service mesh that handles mTLS (Istio, Linkerd), you typically don’t need ColdBrew’s built-in TLS — the mesh sidecar terminates TLS at the pod level.

Resource tuning

GOMAXPROCS

ColdBrew automatically sets GOMAXPROCS to match the container’s CPU limit using automaxprocs. This prevents the Go runtime from spawning more OS threads than the container has CPU quota.

If your container runtime already handles this (e.g., via cgroup-aware runtimes), disable it:

env:
  - name: DISABLE_AUTO_MAX_PROCS
    value: "true"

Connection keepalive

ColdBrew ships sane defaults for connection keepalive (idle: 300s, age: 1800s, grace: 30s). These ensure connections rotate for balanced load distribution and timely DNS updates. Override only if your service has specific requirements:

env:
  # Override: close idle connections after 10 minutes instead of 5
  - name: GRPC_SERVER_MAX_CONNECTION_IDLE_IN_SECONDS
    value: "600"
  # Override: force connection refresh every hour instead of 30 minutes
  # Change to "-1" to disable the connection age limit entirely (not recommended)
  - name: GRPC_SERVER_MAX_CONNECTION_AGE_IN_SECONDS
    value: "3600"

Security hardening

This section provides general security guidance for ColdBrew configuration. Always follow your organization’s security policies and compliance requirements. ColdBrew is a framework — securing your deployment is your responsibility.

ColdBrew’s defaults are tuned for internal services — debug endpoints, API docs, and gRPC reflection are enabled by default. Public-facing services need different settings.

The preferred approach is to serve admin endpoints (pprof, metrics, swagger) on a separate port using ADMIN_PORT. This keeps profiling and metrics available for operations while isolating them from external traffic via Kubernetes NetworkPolicy:

env:
  # Serve admin endpoints on a dedicated internal port
  - name: ADMIN_PORT
    value: "9092"

When ADMIN_PORT is set:

  • Port 9090 (gRPC): gRPC server — expose as needed
  • Port 9091 (HTTP): gRPC-gateway + health/readiness probes — expose with path allowlisting
  • Admin port (e.g., 9092): pprof, metrics, swagger — restrict via NetworkPolicy
# Kubernetes NetworkPolicy — restricts admin port (9092) to monitoring namespace
# while leaving app ports (9090/9091) open. Add further restrictions to
# 9090/9091 if you need to limit app traffic sources too.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: restrict-admin-port
spec:
  podSelector:
    matchLabels:
      app: my-service
  policyTypes:
    - Ingress
  ingress:
    # Allow app traffic (gRPC + HTTP gateway) from anywhere
    - ports:
        - port: 9090
        - port: 9091
    # Restrict admin port to monitoring namespace only
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: monitoring
      ports:
        - port: 9092

This approach is better than disabling endpoints entirely because:

  • Prometheus can still scrape /metrics on the admin port
  • Operations can still access pprof for production debugging
  • No application-level auth needed — network isolation handles it

Public-facing services

For services exposed to external traffic where a separate admin port is not sufficient, disable discovery and debug features entirely:

The most effective security measure is to use ADMIN_PORT to separate admin endpoints, or whitelist public API paths at your load balancer and block everything else. ColdBrew serves the HTTP gateway on the HTTP port (default 9091) and gRPC on a separate port (default 9090). When ADMIN_PORT is not set, admin endpoints (debug, metrics, swagger) share the HTTP port. Only your application’s API routes (e.g., /api/v1/*) should be exposed externally.

env:
  # Option 1 (preferred): Separate admin port
  - name: ADMIN_PORT
    value: "9092"
  # Option 2: Disable admin endpoints entirely
  # Disable pprof — exposes CPU/memory profiling data
  - name: DISABLE_DEBUG
    value: "true"
  # Disable Swagger UI — exposes API schema and endpoint discovery
  - name: DISABLE_SWAGGER
    value: "true"
  # Disable gRPC reflection — prevents service discovery via grpcurl
  - name: DISABLE_GRPC_REFLECTION
    value: "true"
  # Disable debug log interceptor — prevents external clients from
  # triggering debug logging via x-debug-log-level header
  - name: DISABLE_DEBUG_LOG_INTERCEPTOR
    value: "true"
  # Never use debug level on public services — may log request payloads
  - name: LOG_LEVEL
    value: "info"
  # Rate limit incoming requests (per-pod). Adjust to your service's capacity.
  - name: RATE_LIMIT_PER_SECOND
    value: "1000"
  - name: RATE_LIMIT_BURST
    value: "50"
  # GRPC_MAX_SEND_MSG_SIZE limits response size FROM your service (default ~2GB).
  # GRPC_MAX_RECV_MSG_SIZE limits request size TO your service (default 4MB).
  # Consider reducing send size for public APIs; use streaming for large payloads.
  # - name: GRPC_MAX_SEND_MSG_SIZE
  #   value: "16777216"  # 16MB

The /metrics endpoint exposes request counts, latency distributions, and Go runtime stats. When using ADMIN_PORT, metrics are automatically served on the admin port only. Without ADMIN_PORT, restrict access to /metrics at the load balancer level (IP whitelist or path-based routing) rather than disabling Prometheus entirely.

Internal services

Services behind a load balancer or service mesh can keep the defaults:

  • Debug endpoints (/debug/pprof/) — useful for profiling production issues
  • Swagger UI (/swagger/) — API documentation for developers
  • gRPC reflection — enables grpcurl and grpcui for ad-hoc testing
  • Debug log interceptorOverrideLogLevel + trace ID for targeted production debugging (see Log How-To)
  • Default message sizes — ~2GB send (response) / 4MB recv (request) defaults are fine behind a load balancer

Internal services should still follow the production checklist below for observability, health probes, and graceful shutdown.

Built-in protections

ColdBrew includes several security features that are on by default. Don’t disable them unless you have a specific reason:

Protection What it does Config to disable (not recommended)
Trace ID validation Sanitizes client-supplied trace IDs — max 128 chars, printable ASCII only. Prevents log injection attacks SetTraceIDValidator(nil) in code
Protovalidate Validates incoming messages against proto annotation rules. Returns InvalidArgument on failure DISABLE_PROTO_VALIDATE=true
Default timeout 60s deadline on unary RPCs without one. Prevents slowloris and resource exhaustion GRPC_SERVER_DEFAULT_TIMEOUT_IN_SECONDS=0
Panic recovery Catches handler panics, returns generic error to client. Stack traces go to logs and error trackers only — never in gRPC responses Cannot be disabled

Data sent to third-party services

When error tracking (Sentry, Rollbar) or distributed tracing (New Relic, OTLP) is configured, ColdBrew sends data to external services. Review what your service logs before enabling these on public-facing services.

What gets sent to error trackers (Sentry, Rollbar, Airbrake):

  • Stack traces with internal file paths and function names
  • Server hostname and git commit hash
  • Log context fields — any data added via log.AddToContext() or log.AddAttrsToContext() is included
  • Trace IDs and OTEL span context

What gets sent to tracing backends (New Relic, OTLP):

  • Service name, version, environment
  • Go runtime version and VCS metadata
  • Span attributes including coldbrew.trace_id

Avoid adding PII (passwords, tokens, user data) to log context or error notification tags.

Not built into ColdBrew

These are your responsibility to handle at the infrastructure level:

  • CORS — ColdBrew does not handle CORS headers. Use a reverse proxy (Nginx, Envoy, Istio) or add CORS middleware to the HTTP gateway.
  • Authentication/authorization — Admin endpoints (/debug/pprof, /metrics, /swagger) have no built-in auth. Disable them for public services or restrict access at the load balancer. For application-level auth (JWT, API keys), the cookiecutter template includes ready-to-use examples — see Authentication How-To.
  • Cluster-wide rate limiting — Built-in rate limiting (RATE_LIMIT_PER_SECOND) is per-pod only. For cluster-wide or per-tenant rate limiting, use interceptors.SetRateLimiter() with a custom implementation or your load balancer. See Interceptors How-To.
  • HTTP header forwardingHTTP_HEADER_PREFIXES forwards matching HTTP headers to gRPC metadata. Never add authorization, cookie, or x-api-key prefixes unless you are intentionally doing header-based gRPC auth.

Production checklist

All services

  • Set APP_NAME and ENVIRONMENT for log/metric identification
  • Configure livenessProbe on /healthcheck and readinessProbe on /readycheck
  • Set terminationGracePeriodSeconds ≥ shutdown + healthcheck wait duration
  • Enable Prometheus scraping (annotation or ServiceMonitor)
  • Set up error tracking (SENTRY_DSN or equivalent)
  • Configure tracing (OTLP_ENDPOINT or NEW_RELIC_LICENSE_KEY)
  • Use headless Service or L7 proxy for gRPC load balancing
  • Set resource requests and limits
  • Store secrets in Kubernetes Secrets, not environment variable literals
  • Run make lint (includes govulncheck) before deploying
  • For high-QPS services: set RESPONSE_TIME_LOG_ERROR_ONLY=true to skip per-request logging on successful RPCs (see tuning impact)

Public-facing services (additional)

  • Whitelist public API paths at the load balancer — block /debug/*, /metrics, /swagger/*
  • DISABLE_DEBUG=true — disable pprof endpoints
  • DISABLE_SWAGGER=true — disable API documentation
  • DISABLE_GRPC_REFLECTION=true — disable service discovery
  • DISABLE_DEBUG_LOG_INTERCEPTOR=true — disable header-based debug logging
  • Enable rate limiting — RATE_LIMIT_PER_SECOND + RATE_LIMIT_BURST (per-pod, adjust to capacity). See interceptors howto
  • Consider reducing GRPC_MAX_SEND_MSG_SIZE from its ~2GB default if responses are small
  • Restrict /metrics access at the load balancer
  • LOG_LEVEL=info or higher (never debug)