-
Notifications
You must be signed in to change notification settings - Fork 0
Add Prometheus monitoring and metrics instrumentation #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2.增加构建文件和指令 Signed-off-by: zjzjzjzj1874 <zjzjzjzj1874@gmail.com>
README新增登陆信息
- Refactor from single-container to 8 independent services - Enable horizontal scaling for Logic, Connect, Task, and API services - Add automatic container IP registration to etcd for RPC communication - Implement health checks and dependency management - Create separate dev/prod Docker Compose configurations - Add Makefile commands for easy deployment - Update documentation with quick start guide Improvements over original: - 44MB runtime images (vs 1.93GB single container) - Independent service scaling - Process isolation and resource limits - Health-based dependency management - Production-ready deployment Original project: https://github.com/LockGit/gochat
- Removed docker/dev/ and docker/prod/ (Supervisord configs) - Removed run.sh and reload.sh (legacy deployment scripts) - Updated README.md to remove legacy deployment section The application now uses Docker Compose multi-container deployment exclusively. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ection Replaced shell script preprocessing with clean Go implementation: - Added GetContainerIP() and GetServiceAddress() to tools/network.go - Updated logic/publish.go to auto-detect container IP for etcd registration - Updated connect/rpc.go to auto-detect container IP for etcd registration - Updated Dockerfile to remove entrypoint.sh and use direct CMD - Updated docker-compose.yml to use direct command without environment preprocessing - Removed docker/entrypoint.sh This approach is cleaner and more maintainable: - No shell preprocessing required - Services auto-detect their container IP at runtime - Proper multi-container deployment with correct service discovery - All services register with actual container IPs (e.g., tcp@172.28.0.4:6900) Tested and verified: - All services start successfully - Services register correctly in etcd with container IPs - Multi-container deployment fully functional 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implemented comprehensive CI/CD pipeline for automated testing, building, and deployment: **CI/CD Workflow (.github/workflows/ci-cd.yml):** - Test job: go fmt, go vet, tests with Redis service container - Build job: Docker image with layer caching - Push job: Push to Docker Hub with multiple tags (latest, branch, git-sha) - Deploy jobs: Optional deployment to dev/staging/prod environments (commented out) **Testing Infrastructure:** - Makefile: Added test, test-coverage, test-unit, test-integration targets - Makefile: Added fmt, fmt-check, vet, lint for code quality - Makefile: Added build-binary, build-image for building - docker-compose.test.yml: Test environment with Redis and etcd - docker/Dockerfile.test: Test runner Dockerfile **Deployment:** - scripts/deploy.sh: Manual deployment script for all environments - config/staging/: Staging configuration (copied from dev) - docker-compose.staging.yml: Staging environment overrides **Documentation:** - README.md: Added CI/CD Pipeline section with setup instructions - README.md: Added GitHub Secrets configuration guide - README.md: Added branch strategy and manual deployment commands - CHANGELOG.md: Documented all CI/CD pipeline changes **Removed:** - .travis.yml: Replaced with GitHub Actions **Branch Strategy:** - dev → Development environment (auto-deploy) - staging → Staging environment (auto-deploy) - master → Production environment (manual approval) **Image Tags:** - latest - Latest from master - <branch> - Latest from branch - <git-sha> - Specific commit for rollback GitHub Secrets required: DOCKERHUB_USERNAME, DOCKERHUB_TOKEN Optional: Server SSH credentials for deployment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed os.Signal channels from unbuffered to buffered (capacity 1). This fixes the error: 'misuse of unbuffered os.Signal channel as argument to signal.Notify' Fixed in: - main.go: Line 44 - api/chat.go: Line 55
Fixed linting error: 'logrus.Error call has possible formatting directive %s' Changed to logrus.Errorf() to properly support format string in db/db.go:40
Fixed all logrus logging format issues: - connect/server.go:71 - Changed logrus.Warn to logrus.Warnf - connect/rpc.go:115 - Changed logrus.Info to logrus.Infof - api/chat.go:63 - Added missing format directive %v to logrus.Errorf Ran go fmt ./... to format all code. Verified with go vet ./... - all checks pass. Tested locally before pushing.
Moved environment-specific docker-compose files: - docker-compose.dev.yml → deployments/ - docker-compose.prod.yml → deployments/ - docker-compose.staging.yml → deployments/ - docker-compose.test.yml → deployments/ Updated references in: - Makefile (all compose-* targets) - scripts/deploy.sh - README.md (Quick Start section) This improves project organization by separating deployment configs from root directory. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
… support Problem: - Integration tests were failing in CI (TCP server not running) - Tests had infinite loops and poor error handling - No separation between fast unit tests and slow integration tests Solution: Implemented proper test separation with two-stage CI pipeline: **Unit Tests (Stage 1 - Fast)**: - Run with `go test -short` flag - Integration tests check `testing.Short()` and skip - No service dependencies required - Runs in ~2 seconds **Integration Tests (Stage 2 - Full Stack)**: - Start all services via docker-compose - Run `go test` from host against exposed container ports - Tests connect to localhost:7001 (TCP), localhost:6379 (Redis) - Integration tests execute fully (not skipped!) Changes: - pkg/stickpackage/stickpackage_test.go: * Added `testing.Short()` check to skip in unit test stage * Removed infinite loops, added 10s timeout * Proper error handling with t.Fatalf/t.Logf * Connection timeout with graceful fallback - task/queue_test.go: * Added `testing.Short()` check * Better error handling for empty queue - deployments/docker-compose.test.yml: * Simplified to expose ports (2379, 6379, 7001, 7002) * Tests run from host, not in container * Removed test-runner service - .github/workflows/ci-cd.yml: * Split "test" job into "unit-test" and "integration-test" * unit-test: runs `go test -short` (fast) * integration-test: starts services, runs `go test` (full) * build: depends on BOTH test jobs passing - Makefile: * test-unit: `go test -short` * test-integration: starts services, runs tests, stops services - Removed docker/Dockerfile.test (no longer needed) CI Pipeline Flow: ``` Unit Tests → Integration Tests → Build → Push → Deploy (2s) (~2min) (3min) ``` This ensures: ✓ Fast feedback from unit tests ✓ Integration tests actually RUN (not skipped) in separate stage ✓ Multi-container deployment is properly tested ✓ No test hangs or infinite loops 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem: GitHub Actions runners use Docker Compose V2 which uses the command `docker compose` (without hyphen) instead of legacy `docker-compose`. Integration tests were failing with: "docker-compose: command not found" Changes: - .github/workflows/ci-cd.yml: Updated all docker-compose → docker compose - Makefile: Updated all compose commands (dev, prod, scale, logs, ps, clean, test-integration) - scripts/deploy.sh: Updated deployment commands - README.md: Updated example commands in documentation - deployments/docker-compose.test.yml: Updated usage comment Docker Compose V2 is the modern version bundled with Docker Desktop and available by default in GitHub Actions runners. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Merged changes from master: - Upgraded to Go 1.23 - Updated dependencies to latest versions - Removed vendor directory (using Go modules) Prometheus monitoring additions: - Added metrics server endpoints on ports 9091-9094 and 8080 - Instrumented API service with HTTP metrics and user operation tracking - Instrumented Logic service with RPC server metrics - Instrumented Connect services (WebSocket/TCP) with metrics servers - Added Prometheus service to docker-compose.yml - Created Prometheus configuration for scraping all services All conflicts resolved. Ready for deployment.
b37c4a6 to
9836f68
Compare
- Reorder imports in logic/publish.go to follow gofmt standards - Mark github.com/google/uuid as indirect dependency - Update google.golang.org/protobuf to v1.28.1
- Removed 'promMetrics' import that was not being used - Verified locally: all tests pass, build succeeds, formatting correct
- site/site.go: Added /metrics endpoint to HTTP mux - task/task.go: Added metrics server on port 9094 These were missing after the merge with master. Verified locally: - Build successful - All tests pass (task: 7.247s) - Formatting correct
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive Prometheus monitoring to GoChat, instrumenting all services with metrics collection for QPS, latency, connections, and RPC calls. The implementation includes:
Key Changes
New Packages
pkg/metrics/metrics.go- Centralized Prometheus metric definitions (147 lines)pkg/metrics/server.go- Standalone metrics HTTP server for RPC services (42 lines)pkg/middleware/prometheus.go- Gin middleware for HTTP request instrumentation (35 lines)pkg/middleware/rpcx.go- RPCX plugin for RPC server instrumentation using goroutine ID tracking (68 lines)Service Instrumentation
api/) - Added Prometheus middleware and /metrics endpoint, instrumented user operationslogic/) - Added metrics server on port 9091, instrumented RPC server and Redis operationsconnect/) - Metrics servers on ports 9092 (WebSocket) and 9093 (TCP), connection and message trackingtask/) - Metrics server on port 9094, RPC client metrics for calls to Connect servicessite/) - Added /metrics endpoint on port 8080Infrastructure
deployments/prometheus/prometheus.yml- Prometheus scrape configuration for all services (51 lines)docker-compose.yml- Added Prometheus container, exposed metrics ports (9091-9094, 8080)Dependency Management
github.com/prometheus/client_golang v1.14.0.gitignoreto exclude vendor/Merged Changes from Master
golang:1.23anddebian:bookworm-slimioutil.ReadFile→os.ReadFilenetcat-openbsdMetrics Exposed
HTTP Metrics
gochat_http_requests_total- Counter by service, method, path, statusgochat_http_request_duration_seconds- Histogram by service, method, pathgochat_http_requests_in_flight- Gauge by serviceRPC Metrics
gochat_rpc_server_requests_total- Counter by service, method, statusgochat_rpc_server_duration_seconds- Histogram by service, methodgochat_rpc_server_requests_in_flight- Gauge by servicegochat_rpc_client_requests_total- Counter by service, method, statusgochat_rpc_client_duration_seconds- Histogram by service, target, methodConnection Metrics
gochat_connections_active- Gauge by service, type (websocket/tcp)gochat_connections_total- Counter by service, type, statusMessage Metrics
gochat_messages_total- Counter by service, direction (sent/received)Application Metrics
gochat_user_operations_total- Counter by operation (login/register), statusgochat_redis_operations_total- Counter by service, commandMetrics Endpoints
Each service exposes metrics on the following endpoints:
Testing
Start Services
Verify Metrics
Example Queries
In Prometheus UI, try these queries:
Test User Registration
Bug Fixes
During implementation, fixed a critical RPCX plugin bug where PreCall was returning modified context instead of original args, causing reflection errors:
reflect: Call using *context.valueCtx as type *proto.RegisterRequestpkg/middleware/rpcx.go:40-47)Files Changed
Modified: 23 files (+568, -54)
.github/workflows/ci-cd.yml,Makefile,docker-compose.yml,docker/Dockerfilego.mod,go.sum,.gitignoreapi/,connect/,logic/,task/,site/pkg/metrics/,pkg/middleware/deployments/prometheus/prometheus.yml🤖 Generated with Claude Code