317 Connect Orchestration Engine and Policy Decision Point to Observability Stack#380
Closed
317 Connect Orchestration Engine and Policy Decision Point to Observability Stack#380
Conversation
448532b to
af8ba3f
Compare
24d171f to
eba3284
Compare
eba3284 to
3825981
Compare
90844ba to
d77a01f
Compare
4ee78bb to
e5d0cda
Compare
Collaborator
Author
|
Created new branch #408 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
review #371 first + merge to main (updating the base branch here) OR, ideally, review + merge this into #371 first
Summary
OE and PDP now instrumented and connected to the observability stack from #371
OE (
orchestration-engine:4000)/metrics(Prometheus format)http_requests_total,http_request_duration_secondsexchange/shared/monitoringpackagePDP (
policy-decision-point:8082)/metrics(Prometheus format)http_requests_total,http_request_duration_secondsexchange/shared/monitoringpackageBoth services now expose the following Prometheus metrics:
http_requests_total{http_method, http_route, http_status_code}- Total HTTP request count by method, route, and status codehttp_request_duration_seconds{http_method, http_route}- HTTP request latency histogram by method and routeexternal_calls_total{opendif.external.target, opendif.external.operation}- External service call metrics (when used)business_events_total{opendif.business.action, opendif.business.outcome}- Business event metrics (when used)Note: Custom attributes use the
opendif.namespace prefix to distinguish them from standard OpenTelemetry semantic conventions.Why these changes are needed:
Type of Change
Changes Made
New Files Created
observability/generate_sample_traffic.sh(156 lines)ORCHESTRATION_ENGINE_URL,POLICY_DECISION_POINT_URL,REQUEST_INTERVAL,REQUEST_COUNT)Modified Files
Service Integration
exchange/orchestration-engine/server/server.gogithub.com/gov-dx-sandbox/exchange/shared/monitoring/metricsendpoint:mux.Handle("/metrics", monitoring.Handler())monitoring.HTTPMetricsMiddleware()for automatic instrumentationexchange/orchestration-engine/go.mod&go.sumexchange/shared/monitoringpackagego mod tidyexchange/policy-decision-point/main.gogithub.com/gov-dx-sandbox/exchange/shared/monitoring/metricsendpoint:mux.Handle("/metrics", monitoring.Handler())monitoring.HTTPMetricsMiddleware()for automatic instrumentationexchange/policy-decision-point/go.mod&go.sumexchange/shared/monitoringpackagego mod tidyexchange/docker-compose.ymlSERVICE_NAMEandOTEL_METRICS_EXPORTERenvironment variables to orchestration-engine and policy-decision-point servicesObservability Stack Configuration
observability/prometheus/prometheus.yml/metricspathTesting
Test Results
Unit Tests (exchange/shared/monitoring):
Service Integration:
/metricsendpointsRuntime Testing
To verify the observability stack is working:
Start observability stack:
Start Go services (ensure they're on
opendif-network):cd exchange docker compose up -d orchestration-engine policy-decision-pointVerify metrics endpoints:
Check Prometheus targets:
Generate sample traffic:
View metrics in Grafana:
Checklist
opendif-networkRelated Issues
Related to observability stack setup. Enables metrics collection from Go services for monitoring and debugging.
Deployment Notes
Pre-Deployment Checklist
Service Restart Required: Services must be restarted to load the new monitoring code
Network Setup: Ensure
opendif-networkexists before starting servicesNo Configuration Changes Required: Services use Prometheus exporter by default (no env vars needed for local dev)
Prometheus Already Configured: Prometheus is already configured to scrape these services (see
observability/prometheus/prometheus.yml)Grafana Dashboard Ready: Grafana dashboard is already configured to display these metrics
Environment Variables (Optional)
For local development, no environment variables are needed (Prometheus is default).
To switch to other backends (Datadog, New Relic, etc.), set:
Post-Deployment Verification
Check Metrics Endpoints:
Verify Prometheus Scraping:
View in Grafana:
Generate Sample Traffic:
cd observability ./generate_sample_traffic.shRelated PRs
exchange/shared/monitoring/317-part3-oe-pdp) integrates that infra into OE and PDP