← Docs

Monitoring Integration

Standard Prometheus metrics. Ready for Grafana, Datadog, and anything that speaks the format.

Prometheus /metrics endpoint

Gold Lapel exposes a standard Prometheus text format endpoint at /metrics on the dashboard port (default http://127.0.0.1:7933/metrics). If Prometheus can reach the endpoint, it can scrape Gold Lapel. The same applies to Datadog Agent, Grafana Alloy, Victoria Metrics, or any tool that speaks Prometheus text format.

GET /metrics (excerpt)
# HELP goldlapel_time_saved_seconds_total Cumulative time saved by optimizations
# TYPE goldlapel_time_saved_seconds_total counter
goldlapel_time_saved_seconds_total 345892.471

# HELP goldlapel_queries_observed_total Total queries observed
# TYPE goldlapel_queries_observed_total counter
goldlapel_queries_observed_total 2847193

# HELP goldlapel_queries_rewritten_total Total queries rewritten
# TYPE goldlapel_queries_rewritten_total counter
goldlapel_queries_rewritten_total 1923847

# HELP goldlapel_cache_hits_total Total result cache hits
# TYPE goldlapel_cache_hits_total counter
goldlapel_cache_hits_total 4129381

# HELP goldlapel_active_connections Active client connections
# TYPE goldlapel_active_connections gauge
goldlapel_active_connections 42

# HELP goldlapel_info Gold Lapel version info
# TYPE goldlapel_info gauge
goldlapel_info{version="0.14.0"} 1

The endpoint returns text/plain; version=0.0.4; charset=utf-8 — the standard Prometheus exposition format. Every metric includes # HELP and # TYPE annotations, so your monitoring tool can auto-discover descriptions and types without additional configuration.

The metrics endpoint is part of the dashboard server and shares its port. If you have the dashboard running — which it does by default — you already have the metrics endpoint.

Scrape configuration

Add Gold Lapel to your Prometheus scrape configuration:

# prometheus.yml
scrape_configs:
  - job_name: goldlapel
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:7933']
    metrics_path: /metrics

A 15-second scrape interval works well. Gold Lapel's metrics endpoint is lightweight — it reads from in-memory counters and returns in under a millisecond. You will not notice the overhead.

For multiple Gold Lapel instances, add each target to the static_configs list, or use service discovery if your infrastructure supports it. All instances expose identical metric names, so Prometheus can aggregate them naturally with label selectors.

Metrics reference

Gold Lapel exports approximately 35 metrics across five categories. All counters are monotonically increasing totals. Gauges reflect the current value at scrape time.

Headline

MetricTypeDescription
goldlapel_time_saved_seconds_totalcounterCumulative wall-clock time saved by all optimizations

This is the metric that answers "is Gold Lapel helping?" For the live view and per-pattern breakdown, see Time saved on the dashboard page.

Core query metrics

MetricTypeDescription
goldlapel_queries_observed_totalcounterTotal queries observed through the proxy
goldlapel_queries_rewritten_totalcounterTotal queries rewritten to use materialized views
goldlapel_active_connectionsgaugeCurrent active client connections
goldlapel_uptime_secondsgaugeProxy uptime in seconds
goldlapel_infogaugeVersion info label (goldlapel_info{version="0.14.0"} 1)

Strategy counters

MetricTypeDescription
goldlapel_matviews_created_totalcounterMaterialized views created
goldlapel_immvs_created_totalcounterIncremental materialized views created (pg_ivm)
goldlapel_matviews_routed_totalcounterQueries routed to materialized views
goldlapel_matviews_expanded_totalcounterMatview expansions (columns added to existing views)
goldlapel_rewrites_totalcounterTotal query rewrites
goldlapel_btree_indexes_created_totalcounterB-tree indexes created
goldlapel_trigram_indexes_created_totalcounterTrigram GIN indexes created
goldlapel_expression_indexes_created_totalcounterExpression indexes created
goldlapel_partial_indexes_created_totalcounterPartial indexes created
goldlapel_matview_indexes_created_totalcounterIndexes on materialized views created
goldlapel_prepared_hits_totalcounterPrepared statement cache hits
goldlapel_prepared_misses_totalcounterPrepared statement cache misses
goldlapel_cache_hits_totalcounterResult cache hits
goldlapel_cache_misses_totalcounterResult cache misses
goldlapel_coalesced_totalcounterQueries coalesced (deduplicated in-flight requests)
goldlapel_shadow_passes_totalcounterShadow verification passes
goldlapel_shadow_failures_totalcounterShadow verification failures
goldlapel_deep_pagination_warnings_totalcounterDeep pagination warnings issued

Connection pool

Pool metrics are present when connection pooling is active.

MetricTypeDescription
goldlapel_pool_hits_totalcounterConnections served from idle pool
goldlapel_pool_misses_totalcounterConnections that required a new upstream connection
goldlapel_pool_timeouts_totalcounterConnection acquire timeouts
goldlapel_pool_pins_totalcounterConnections pinned (session state detected)
goldlapel_pool_activegaugeCurrently active pool connections
goldlapel_pool_idlegaugeCurrently idle pool connections

Replica routing

Replica metrics are present when read replicas are configured.

MetricTypeDescription
goldlapel_replica_reads_totalcounterReads routed to a replica
goldlapel_primary_reads_totalcounterReads routed to the primary (read-after-write safety)
goldlapel_writes_totalcounterWrites routed to the primary
goldlapel_dirty_reads_totalcounterDirty reads sent to primary due to replication lag

Grafana setup

With Prometheus scraping Gold Lapel, adding a Grafana dashboard is straightforward. The following JSON provides a starting point with the five panels that matter most: time saved, rewrite rate, cache hit rate, active connections, and pool utilization.

Grafana dashboard JSON
{
  "dashboard": {
    "title": "Gold Lapel",
    "panels": [
      {
        "title": "Time Saved",
        "type": "stat",
        "targets": [{
          "expr": "goldlapel_time_saved_seconds_total",
          "legendFormat": "Total seconds saved"
        }],
        "fieldConfig": {
          "defaults": { "unit": "s" }
        }
      },
      {
        "title": "Query Rewrite Rate",
        "type": "gauge",
        "targets": [{
          "expr": "rate(goldlapel_queries_rewritten_total[5m]) / rate(goldlapel_queries_observed_total[5m]) * 100"
        }],
        "fieldConfig": {
          "defaults": { "unit": "percent", "max": 100 }
        }
      },
      {
        "title": "Cache Hit Rate",
        "type": "timeseries",
        "targets": [{
          "expr": "rate(goldlapel_cache_hits_total[5m]) / (rate(goldlapel_cache_hits_total[5m]) + rate(goldlapel_cache_misses_total[5m])) * 100",
          "legendFormat": "Cache hit %"
        }]
      },
      {
        "title": "Active Connections",
        "type": "timeseries",
        "targets": [{
          "expr": "goldlapel_active_connections",
          "legendFormat": "Active"
        }]
      },
      {
        "title": "Pool Utilization",
        "type": "timeseries",
        "targets": [
          { "expr": "goldlapel_pool_active", "legendFormat": "Active" },
          { "expr": "goldlapel_pool_idle", "legendFormat": "Idle" }
        ]
      }
    ]
  }
}

Import this via Grafana's dashboard import (Dashboards > Import > paste JSON). Adjust the Prometheus data source name if yours differs from the default.

Panels worth adding

The starter dashboard covers the essentials. As you get familiar with the metrics, consider adding:

  • Matview creation raterate(goldlapel_matviews_created_total[1h]) shows how actively Gold Lapel is finding new optimization opportunities. A burst early on that tapers off is normal — it means the major patterns have been covered.
  • Shadow verificationgoldlapel_shadow_passes_total vs goldlapel_shadow_failures_total. Failures are not cause for alarm — they mean Gold Lapel caught a matview whose results did not match the original query and correctly declined to route traffic to it.
  • Prepared statement hit raterate(goldlapel_prepared_hits_total[5m]) / (rate(goldlapel_prepared_hits_total[5m]) + rate(goldlapel_prepared_misses_total[5m])). High hit rates indicate the proxy's prepared statement cache is working well for your query patterns.
  • Replica read distribution — if you use read replicas, goldlapel_replica_reads_total vs goldlapel_primary_reads_total shows how effectively reads are being distributed.

Alerting

Gold Lapel is designed to fail safely — if something goes wrong, it falls back to proxying queries directly to PostgreSQL. That said, you will want to know if it goes down or if something unusual is happening. Here are alert rules for the conditions worth watching:

Prometheus alerting rules
# Prometheus alert rules for Gold Lapel
groups:
  - name: goldlapel
    rules:
      - alert: GoldLapelDown
        expr: up{job="goldlapel"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Gold Lapel instance is down"

      - alert: CacheHitRateLow
        expr: >
          rate(goldlapel_cache_hits_total[10m])
          / (rate(goldlapel_cache_hits_total[10m]) + rate(goldlapel_cache_misses_total[10m]))
          < 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cache hit rate below 50% for 5 minutes"

      - alert: PoolTimeouts
        expr: rate(goldlapel_pool_timeouts_total[5m]) > 0
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Connection pool timeouts detected"

      - alert: ShadowVerificationFailures
        expr: increase(goldlapel_shadow_failures_total[1h]) > 0
        labels:
          severity: info
        annotations:
          summary: "Shadow verification failure in the last hour"

A note on the cache hit rate alert: a sustained rate below 50% usually means the workload is write-heavy or the query patterns are highly varied. Neither is a problem — it simply means caching is less effective for this particular workload. Investigate before acting.

The PoolTimeouts alert deserves attention when it fires. Pool timeouts mean clients are waiting for a database connection and giving up. The fix is usually increasing pool_size or investigating whether long-running transactions are holding connections open.

Datadog & other monitoring tools

Any monitoring tool that can scrape a Prometheus endpoint will work with Gold Lapel without modification. The /metrics endpoint speaks the standard exposition format — no proprietary protocol, no agent required beyond what your monitoring stack already provides.

Datadog

Datadog's Agent includes an OpenMetrics check that scrapes Prometheus endpoints natively. Add a configuration file for the check:

conf.d/openmetrics.d/conf.yaml
# datadog.yaml — Datadog Agent OpenMetrics check
instances:
  - openmetrics_endpoint: http://localhost:7933/metrics
    namespace: goldlapel
    metrics:
      - goldlapel_time_saved_seconds_total
      - goldlapel_queries_observed_total
      - goldlapel_queries_rewritten_total
      - goldlapel_cache_hits_total
      - goldlapel_cache_misses_total
      - goldlapel_active_connections
      - goldlapel_pool_active
      - goldlapel_pool_idle
      - goldlapel_pool_timeouts_total
      - goldlapel_shadow_passes_total
      - goldlapel_shadow_failures_total

The Agent will collect all listed metrics and report them to Datadog with the goldlapel. namespace prefix. From there, build dashboards and monitors using Datadog's standard tools.

Other tools

The following tools have been confirmed to work with Gold Lapel's /metrics endpoint out of the box:

  • Grafana Alloy / Grafana Agent — configure a prometheus.scrape component pointed at the metrics endpoint.
  • Victoria Metrics — use vmagent with a standard Prometheus scrape config. Victoria Metrics is a drop-in replacement for Prometheus storage and works identically.
  • New Relic — the Prometheus OpenMetrics integration or the Infrastructure agent's Flex integration can scrape the endpoint. Metrics appear in New Relic One with the goldlapel_ prefix.
  • Elastic / Metricbeat — the prometheus module in Metricbeat scrapes the endpoint and ships metrics to Elasticsearch.
  • InfluxDB / Telegraf — Telegraf's inputs.prometheus plugin scrapes the endpoint and writes to InfluxDB.

If your monitoring tool speaks Prometheus text format — and at this point, nearly all of them do — it will work. No adapter required.