Skip to content

Conversation

@miladhzzzz
Copy link
Contributor

@miladhzzzz miladhzzzz commented Jun 7, 2025

Pull Request: Major Infrastructure and Services Restructuring

Description

This pull request implements a comprehensive restructuring of the Persys Cloud platform, focusing on service architecture, infrastructure organization, and security enhancements. The changes include implementing mTLS, reorganizing service directories, updating configurations, and improving the overall infrastructure setup.

Major Changes

Infrastructure Reorganization

  1. Moved infrastructure files to new structure:

    • Relocated manifests from manifests/ to infra/manifests/
    • Updated infrastructure paths from IaC/** to infra/**
    • Reorganized Docker configurations and service definitions
  2. Added new infrastructure components:

    • Added CFSSL service for certificate management
    • Implemented CoreDNS configuration for service discovery
    • Added Prometheus and Grafana for monitoring
    • Added Jaeger for distributed tracing

Service Architecture Updates

  1. API Gateway:

    • Implemented mTLS support
    • Added OpenTelemetry tracing
    • Updated MongoDB integration
    • Removed event processing and gRPC components
    • Enhanced authentication and authorization
  2. Prow Scheduler:

    • Added mTLS support
    • Implemented CoreDNS registration
    • Separated mTLS and non-mTLS endpoints
    • Enhanced node management and workload scheduling
  3. Cloud Management:

    • Updated to Go 1.24
    • Removed Azure provider
    • Enhanced cluster management
    • Improved service discovery

Security Enhancements

  1. Certificate Management:

    • Added CFSSL service for certificate generation
    • Implemented mTLS for service-to-service communication
    • Added certificate rotation and validation
  2. Authentication:

    • Enhanced JWT token handling
    • Improved user authentication flow
    • Added token validation and refresh mechanisms

Monitoring and Observability

  1. Added Prometheus configuration:

    • Configured scraping for API gateway
    • Added Prow scheduler metrics
    • Set up node exporter monitoring
  2. Added Grafana dashboards and data sources

  3. Implemented distributed tracing with Jaeger

Removed Components

  • Removed event processing system
  • Removed gRPC event manager
  • Removed Azure cloud provider
  • Removed watermill messaging
  • Removed websocket implementation
  • Removed auto-assign workflow

Configuration Updates

  1. Updated Docker Compose:

    • Added new services
    • Configured networking
    • Set up volumes and dependencies
  2. Updated CoreDNS:

    • Added Kubernetes integration
    • Configured etcd backend
    • Set up caching and forwarding
  3. Updated service configurations:

    • Added TLS configurations
    • Updated database connections
    • Enhanced logging configuration

Testing

  • Verify mTLS communication between services
  • Test certificate generation and rotation
  • Validate service discovery through CoreDNS
  • Verify monitoring and tracing setup
  • Test node registration and workload scheduling
  • Validate authentication and authorization flows

Dependencies

  • Updated Go version to 1.24
  • Added new dependencies for mTLS and tracing
  • Removed unused dependencies
  • Updated existing dependencies to latest versions

Related Issues

  • [Link to related issues]

Checklist

  • All services build successfully
  • Infrastructure changes are properly documented
  • Security configurations are properly tested
  • Monitoring and tracing are working as expected
  • Service discovery is functioning correctly
  • No breaking changes to existing functionality

Additional Notes

This restructuring significantly improves the platform's security, observability, and maintainability. The changes align with modern cloud-native practices and provide a more robust foundation for future development.

@miladhzzzz miladhzzzz self-assigned this Jun 7, 2025
@gitguardian
Copy link

gitguardian bot commented Jun 7, 2025

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

… Redis and MongoDB initialization in main.go. Introduce provider management structure for cloud services.
…ures, capabilities, and services, and improve local development instructions for Persys Cloud.
…h-flow.drawio, including adjustments to element positions and styles for improved clarity.
Copy link
Contributor Author

@miladhzzzz miladhzzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fine...

miladhzzzz added 12 commits July 9, 2025 20:43
…t runner, and comprehensive test scenarios for Persys Cloud components, including service health checks, workload management, and monitoring.
…d testing strategy and system flow visualization, detailing testing objectives, scenarios, architecture, and data flow for improved clarity and guidance.
…yml to improve service definitions, add new Grafana dashboards for API Gateway, Prow Scheduler, and CoreDNS, and remove outdated dashboard configurations. Introduce Prometheus as a data source for monitoring and adjust scrape configurations for better service visibility.
…eus metrics support, and streamline etcd settings for improved performance and monitoring.
…ult signing settings for enhanced certificate flexibility.
… for MongoDB storage, add TLS settings, and update HTTP addresses. Upgrade Go version to 1.23.0 and update various dependencies in go.mod. Introduce Makefile for build automation and add sample environment variables for configuration.
…ructure with a more modular approach, introducing separate types for application, database, TLS, and other settings. Update config loading method to utilize viper's unmarshal functionality for improved configuration management. Adjust GitHub controller to utilize new configuration structure for webhook settings and authentication credentials.
…ler for handling requests to the prow-scheduler, including authentication and proxying logic. Implement ProwService for managing scheduler discovery and request proxying. Add routes for workload and node management, along with health check and metrics endpoints. Include middleware for service identity headers and ensure certificate management for secure communication.
…, ensuring alignment with the current project state.
….go to support mutual TLS for secure communication, configure separate routers for mTLS and non-mTLS traffic, and integrate Prometheus for monitoring. Introduce graceful shutdown handling and MongoDB connection setup, ensuring robust application initialization and error management.
…thod to AuthService interface and its implementation in AuthServiceImpl for validating user authentication via OAuth2 tokens. Refactor logging utility in audit.go to enhance logging capabilities with context support and optional Loki integration.
miladhzzzz added 27 commits July 9, 2025 22:39
… observability and instrumentation improvements, including OpenTelemetry tracing, Prometheus metrics, and bugfixes for reconciliation and monitoring.
…agent API calls and workload scheduling, refactor key management to improve key loading and generation, and implement asynchronous command sending to nodes. Add monitoring and reconciliation capabilities with new Monitor and Reconciler components, and update workload handling to include logging and status updates.
…Prometheus metrics support, configure mTLS and non-mTLS servers, and improve certificate management. Introduce graceful shutdown for both server types and ensure CoreDNS registration with dynamic IP resolution.
…cluding gin, uuid, and OpenTelemetry components. Add new indirect dependencies for enhanced functionality and performance. Introduce sample environment variables for Prow scheduler configuration.
…ate mTLS and non-mTLS route handlers, implement workload reconciliation logic, and enhance node and workload management endpoints. Add support for logging and metrics in reconciliation processes, and improve error handling across API interactions.
…-end tests using Docker. The workflow includes steps for checking out the code, setting up Docker, running tests, and uploading test results.
… in e2e-docker.yml and refactor build command in go.yml to target the scheduler directory. Comment out Cloud Mgmt build step for clarity.
…, streamline help output, and enhance cleanup process. This simplifies the test environment management and focuses on essential testing commands.
…e Makefile commands to use 'docker compose' syntax. This enhances the testing environment setup and ensures compatibility with the latest Docker CLI standards.
…le support and refine trigger conditions for main branch and pull requests. This improves repository handling and ensures compatibility with submodules.
…file and simplify go.mod by removing unnecessary require block. This enhances compatibility and streamlines dependency management.
…s by adding MongoDB and CFSSL components, update environment variables for API Gateway and Prow Scheduler, and streamline volume management. This improves the overall architecture and facilitates better service interactions.
… CoreDNS and adjust health check endpoints for API Gateway, Prow Scheduler, and Persys Agent. This enhances service connectivity and ensures accurate health monitoring.
…ice to Persys Prow, add environment variable for CENTRAL_URL, and adjust service dependencies. This improves clarity and enhances service communication within the testing environment.
…pping and add a new mapping for port 8084. This adjustment improves service configuration clarity and ensures proper port allocation for testing.
…oud-test' network for all services to enhance connectivity and ensure proper service communication during testing.
… point to 'persys-prow' service, ensuring correct service communication in the testing environment. Update submodule reference for 'persys-agent'.
…RL environment variable for consistency and clarity in service definitions.
…ring the latest changes are included in the project.
…rporating the latest changes into the project.
…rporating the latest changes into the project.
…e container name from Docker Compose for MongoDB service to streamline service definitions.
…er URLs for correct service communication, and change health check endpoints to metrics for improved monitoring. Comment out unused workload tests to streamline the test suite.
…Scheduler URLs from HTTPS to HTTP for consistency in service communication.
@miladhzzzz miladhzzzz merged commit 9dae07e into main Jul 9, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants