Skip to content
This repository was archived by the owner on Sep 24, 2025. It is now read-only.

Conversation

@tty47
Copy link
Collaborator

@tty47 tty47 commented Apr 10, 2025

Generate Ansible inventory from database

Overview

This PR modifies the Ansible inventory generation to fetch instance data from the database instead of relying on directly passed instances. This change centralizes the inventory management and ensures consistency with the database state.

Key Changes

  • Added InstanceRepository integration in AnsibleConfigurator
  • Updated CreateInventory method to fetch instances from database while maintaining backward compatibility
  • Implemented filtering to exclude deleted instances
  • Added comprehensive test coverage with mock repository

Implementation Details

  • Added InstanceRepository as a dependency in AnsibleConfigurator struct
  • Updated NewAnsibleConfigurator constructor to accept repository
  • Implemented database query using GetByJobID to fetch non-deleted instances
  • Maintained existing inventory file format and SSH configuration
  • Added error handling for database operations with fallback to provided instances
  • Created test suite covering database integration, inventory generation, and file structure

Testing

All tests pass successfully, including:

  • Database query tests with mocked repository
  • Inventory generation from both database and direct input
  • Error handling scenarios
  • File structure and permissions validation
  • Backward compatibility tests

Documentation

  • Updated function documentation with new parameters
  • Added examples in comments
  • Updated existing documentation to reflect the changes

Migration/Breaking Changes

None. The implementation maintains backward compatibility while adding database support.

closes: #156

Summary by CodeRabbit

  • New Features

    • Introduced an event-driven system for instance creation and deletion notifications.
    • Implemented a refined provisioning service with improved job ID validation and configuration handling.
  • Documentation

    • Overhauled project documentation with a new Architecture section, clearer usage guidelines, and updated setup instructions.
  • Refactor

    • Streamlined instance management by filtering out terminated instances and removing redundant provisioning logic.
  • Tests

    • Expanded automated test coverage for event handling and provisioning operations.

@tty47 tty47 requested review from a team, MSevey, aWN4Y25pa2EK and smuu and removed request for a team April 10, 2025 11:17
@tty47 tty47 self-assigned this Apr 10, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 10, 2025

Walkthrough

This pull request removes Pulumi-related configurations and code, switching the focus to Ansible. It deletes the AnsibleConfigurator implementation and associated methods while introducing new job ID validation and Ansible provisioner functions. The documentation has been restructured with added architecture and usage details, and the event-driven infrastructure now publishes events for instance creation and deletion. Additionally, database queries are updated to filter terminated instances, and new tests, mocks, and SSH checker implementations are added to support these changes.

Changes

Files Group Change Summary
.gitignore Removed Pulumi entries and added an entry to ignore ansible/inventory/.
internal/compute/ansible.go Removed the entire file implementing AnsibleConfigurator and its associated methods.
internal/compute/provider.go, provider_test.go Updated NewProvisioner to return an Ansible provisioner with job ID validation; added new functions validateJobID and NewAnsibleProvisioner; included tests validating job ID formats and error handling.
README.md Restructured documentation with new sections on Architecture, Event-Driven Flow, Key Features, Creating Instances, Monitoring Status, Prerequisites, Setup, and Code Quality.
cmd/main.go Added context management, event system initialization (events.Start(ctx)), and instantiation of a new provisioning service.
internal/db/repos/instance.go Updated query methods (GetByJobID, GetByJobIDOrdered) to filter out terminated instances by adding an extra Where clause.
internal/events/* Introduced a new events package including events.go, README.md, and corresponding tests to support event publishing, subscription, and asynchronous processing.
internal/provisioner/* Added new files (ansible.go, factory.go, interface.go, ssh_checker.go, mock_ssh_checker.go) defining the Ansible provisioner, related interfaces, a provisioner factory, SSH connectivity checkers, and test mocks.
internal/services/* Modified infrastructure management by removing embedded provisioning logic; added jobName to Infrastructure; updated instance service comments; removed direct Ansible provisioning calls; added job ID retrieval (GetJobIDByName) and a new provisioning service managing event handlers and database interaction.

Sequence Diagram(s)

sequenceDiagram
    participant M as Main
    participant I as Infrastructure
    participant E as Event System
    participant P as Provisioning Service
    participant DB as Database
    participant A as Ansible Provisioner

    M->>I: Request instance creation
    I->>E: Publish EventInstancesCreated (job details, instances)
    E->>P: Deliver event to subscribed handler
    P->>DB: Fetch non-terminated instances using jobID
    DB-->>P: Return instance list
    P->>A: Configure provisioner & trigger playbook
    A-->>P: Return provisioning result
Loading

Assessment against linked issues

Objective Addressed Explanation
Generate the Ansible inventory from the database (#156)

Suggested labels

infrastructure, refactor

Suggested reviewers

  • smuu
  • sysrex

Poem

I'm a rabbit hopping through the code,
Leaving Pulumi behind on the road.
Ansible steps in with a playful cheer,
Events and tests now dance so near.
With whiskers twitching, I celebrate this mode!
🐰🌟

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (1.64.8)

Error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2
Failed executing command with error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@celestia-bot celestia-bot requested a review from a team April 10, 2025 11:17
@tty47 tty47 added enhancement New feature or request provisioning Provisioning api API labels Apr 10, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
internal/compute/provider.go (2)

59-60: Clarify the rationale for passing a nil repository.

Currently, NewProvisioner always passes nil to NewAnsibleConfigurator. If there are plans to integrate a real InstanceRepository later, consider adding a comment explaining the temporary use of nil. Otherwise, if no repository will ever be used in this path, removing the parameter could simplify interface usage.


62-64: Reevaluate the return of an unused error.

Since NewAnsibleProvisioner never returns anything other than nil for the error, returning (Provisioner, error) might be unnecessary. Consider removing the error return unless you have future plans to handle errors here.

internal/compute/ansible_test.go (4)

23-28: Parameter jobID is unused in the mock repository.

Inside GetByJobID, the jobID parameter is never used to filter instances; the function unconditionally returns m.instances. Consider removing or using jobID for filtering, or rename it to _ to avoid confusion.

🧰 Tools
🪛 GitHub Check: golangci-lint

[failure] 23-23:
unused-parameter: parameter 'jobID' seems to be unused, consider removing or renaming it as _ (revive)


31-35: Check the error from os.RemoveAll.

os.RemoveAll(tmpDir) is deferred without checking its return value. This can mask potential failures in cleaning up the temporary directory.

 defer func() {
-	os.RemoveAll(tmpDir)
+	if rmErr := os.RemoveAll(tmpDir); rmErr != nil {
+		t.Logf("Failed to remove temp directory: %v", rmErr)
+	}
 }()
🧰 Tools
🪛 GitHub Check: golangci-lint

[failure] 34-34:
Error return value of os.RemoveAll is not checked (errcheck)

🪛 GitHub Actions: CI

[error] 34-34: Error return value of os.RemoveAll is not checked (errcheck)


37-41: Check the error when changing directories in the defer.

defer os.Chdir(originalWd) doesn’t handle any potential error. Although this is test code, consider checking or logging the error to ensure reliable teardown.

 defer func() {
-	os.Chdir(originalWd)
+	if chdirErr := os.Chdir(originalWd); chdirErr != nil {
+		t.Logf("Failed to restore original directory: %v", chdirErr)
+	}
 }()
🧰 Tools
🪛 GitHub Check: golangci-lint

[failure] 41-41:
Error return value of os.Chdir is not checked (errcheck)


44-44: Use more restrictive permissions for a secure directory.

Line 44 creates the directory with 0755, but gosec recommends 0750 or less.

-err = os.MkdirAll(filepath.Join("ansible", "inventory"), 0755)
+err = os.MkdirAll(filepath.Join("ansible", "inventory"), 0750)
🧰 Tools
🪛 GitHub Check: golangci-lint

[failure] 44-44:
G301: Expect directory permissions to be 0750 or less (gosec)

internal/compute/ansible.go (1)

105-109: Avoid overshadowing the function-level error.

Within the defer closure, assigning to err can overwrite a previously set error, potentially losing context about what went wrong. Consider storing the close error in a separate variable or appending it.

 defer func() {
 	if closeErr := f.Close(); closeErr != nil {
-		err = fmt.Errorf("failed to close inventory file: %w", closeErr)
+		tempErr := fmt.Errorf("failed to close inventory file: %w", closeErr)
+		if err == nil {
+			err = tempErr
+		} else {
+			err = fmt.Errorf("%v; also: %w", err, tempErr)
+		}
 	}
 }()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc6af8 and 6433e76.

📒 Files selected for processing (4)
  • .gitignore (1 hunks)
  • internal/compute/ansible.go (6 hunks)
  • internal/compute/ansible_test.go (1 hunks)
  • internal/compute/provider.go (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal/compute/provider.go (1)
internal/compute/ansible.go (1)
  • NewAnsibleConfigurator (59-65)
🪛 GitHub Check: golangci-lint
internal/compute/ansible_test.go

[failure] 23-23:
unused-parameter: parameter 'jobID' seems to be unused, consider removing or renaming it as _ (revive)


[failure] 182-182:
G304: Potential file inclusion via variable (gosec)


[failure] 44-44:
G301: Expect directory permissions to be 0750 or less (gosec)


[failure] 41-41:
Error return value of os.Chdir is not checked (errcheck)


[failure] 34-34:
Error return value of os.RemoveAll is not checked (errcheck)

🪛 GitHub Actions: CI
internal/compute/ansible_test.go

[error] 34-34: Error return value of os.RemoveAll is not checked (errcheck)

🔇 Additional comments (1)
.gitignore (1)

1-2: No issues identified.

Ignoring the ansible/inventory/ directory seems appropriate given the new Ansible inventory workflow.

@tty47
Copy link
Collaborator Author

tty47 commented Apr 10, 2025

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 10, 2025

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
internal/compute/provider.go (2)

59-60: Consider passing a non-nil repository if database integration becomes standard

NewProvisioner currently passes nil for the InstanceRepository, effectively disabling DB usage. If your roadmap includes fetching instances from the database by default, consider injecting an actual repository here.


62-64: Validate the necessity of an error return

NewAnsibleProvisioner always returns nil for the error yet declares (Provisioner, error). If you anticipate future error cases, consider implementing them; otherwise, simplify by returning a single return value.

internal/compute/ansible_test.go (3)

34-34: Check the returned error from os.RemoveAll

The static analysis indicates this line’s return value is ignored. Consider capturing/logging the error or explicitly ignoring it:

- defer os.RemoveAll(tmpDir)
+ defer func() {
+   if err := os.RemoveAll(tmpDir); err != nil {
+     t.Logf("warning: failed to remove temp dir: %v", err)
+   }
+}()
🧰 Tools
🪛 GitHub Check: golangci-lint

[failure] 34-34:
Error return value of os.RemoveAll is not checked (errcheck)

🪛 GitHub Actions: CI

[error] 34-34: Error return value of os.RemoveAll is not checked (errcheck)


41-41: Check the deferred os.Chdir error

Similarly, check the return error or handle it in cases where failure to return to the original directory causes unforeseen issues.

🧰 Tools
🪛 GitHub Check: golangci-lint

[failure] 41-41:
Error return value of os.Chdir is not checked (errcheck)


44-44: Use more restrictive permissions and verify file path usage

• Line 44: The static analysis recommends restricting directory permissions (e.g., 0750).
• Line 182: Reading the inventory file is potentially flagged by “G304” as a file inclusion via variable. In this test context, it’s likely safe, but ensure no user-supplied path is used.

Also applies to: 182-182

🧰 Tools
🪛 GitHub Check: golangci-lint

[failure] 44-44:
G301: Expect directory permissions to be 0750 or less (gosec)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc6af8 and 6433e76.

📒 Files selected for processing (4)
  • .gitignore (1 hunks)
  • internal/compute/ansible.go (6 hunks)
  • internal/compute/ansible_test.go (1 hunks)
  • internal/compute/provider.go (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
internal/compute/provider.go (1)
internal/compute/ansible.go (1)
  • NewAnsibleConfigurator (59-65)
internal/compute/ansible.go (2)
internal/db/models/instance.go (1)
  • Instance (40-57)
internal/db/repos/instance.go (1)
  • InstanceRepository (14-16)
🪛 GitHub Check: golangci-lint
internal/compute/ansible_test.go

[failure] 23-23:
unused-parameter: parameter 'jobID' seems to be unused, consider removing or renaming it as _ (revive)


[failure] 182-182:
G304: Potential file inclusion via variable (gosec)


[failure] 44-44:
G301: Expect directory permissions to be 0750 or less (gosec)


[failure] 41-41:
Error return value of os.Chdir is not checked (errcheck)


[failure] 34-34:
Error return value of os.RemoveAll is not checked (errcheck)

🪛 GitHub Actions: CI
internal/compute/ansible_test.go

[error] 34-34: Error return value of os.RemoveAll is not checked (errcheck)

🔇 Additional comments (5)
.gitignore (1)

1-2: Good practice to ignore generated Ansible inventory

Ignoring ansible/inventory/ is sensible for preventing accidental commits of generated files.

internal/compute/ansible.go (4)

37-40: Interface definition looks good

Defining InstanceRepository here is clear and straightforward. No issues spotted.


74-80: Request validation of job ID format assumption

fmt.Sscanf(a.jobID, "job-%d", &jobIDUint) expects “job-”. If a different format is used downstream, this could fail. Consider adding a fallback or a more robust parser.


95-95: Directory permissions use recommended secure flags

Using 0750 aligns well with security best practices. This addresses the typical gosec warnings about overly permissive directories.


327-328: Confirm fallback for inventory creation logic

Calling CreateInventory(nil, "") in ConfigureHosts triggers database lookup. Ensure all calls to ConfigureHosts have valid ownerID set if DB usage is intended, or handle the fallback scenario gracefully.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/compute/ansible_test.go (2)

172-178: Consider adding error message verification

While you're correctly checking for error presence with assert.Error, consider also verifying the specific error message content to ensure the right error is being returned. This adds an extra layer of validation to your tests.

 if tt.wantErr {
     assert.Error(t, err)
+    // Optionally verify error message content
+    // assert.Contains(t, err.Error(), "expected error substring")
     return
 }

183-189: File path and permission checks

The file path is constructed correctly and permissions are verified. However, consider making the expected file permission (0600) a configurable part of the test case if this might change in future requirements.

 // In the test struct
 type struct {
     // existing fields
+    expectedPermission os.FileMode
 }

 // In the test case definitions
 {
     // existing fields
+    expectedPermission: 0600,
 }

 // In the assertion
-assert.Equal(t, os.FileMode(0600), info.Mode().Perm())
+assert.Equal(t, tt.expectedPermission, info.Mode().Perm())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6433e76 and e381da4.

📒 Files selected for processing (1)
  • internal/compute/ansible_test.go (1 hunks)
🔇 Additional comments (7)
internal/compute/ansible_test.go (7)

18-28: Mock implementation looks good

The mock implementation of the InstanceRepository interface is clean and focused. It provides exactly what's needed for the test cases by returning either configured instances or an error.


30-56: Good test setup with proper cleanup

Well-structured test setup with proper resource cleanup through deferred functions. The code correctly handles potential errors during cleanup operations by logging them instead of failing silently. Good use of require for assertions that should terminate the test immediately if they fail.


58-75: Test data setup is appropriate

The test instances are properly defined with all necessary fields for the test cases.


77-154: Comprehensive test cases cover important scenarios

The test cases are well-structured and cover a good range of scenarios:

  • Creating inventory from both a map and database
  • Handling database errors with fallback mechanism
  • Empty inventory scenarios
  • Filtering out deleted instances

This demonstrates thorough test coverage of the functionality.


196-198: Verify all required Ansible inventory sections

The test correctly checks for essential inventory sections. Consider adding verification for any other required sections or configuration parameters if they exist in your inventory template.


203-206: Good validation for skipping deleted instances

The test properly verifies that deleted instances (with valid DeletedAt) are not included in the generated inventory file. This is an important edge case that's correctly handled.


30-216: Overall test structure and coverage is excellent

The test is well-structured, using the table-driven approach effectively. It covers multiple scenarios including happy paths and error cases. Good job on creating comprehensive test coverage for the inventory creation functionality, especially around the new database integration features.

Copy link
Contributor

@mojtaba-esk mojtaba-esk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM only a small thing which is that instanceRepo thing.

Copy link
Contributor

@MSevey MSevey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small updates

@tty47
Copy link
Collaborator Author

tty47 commented Apr 11, 2025

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 11, 2025

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
internal/provisioner/ansible.go (3)

111-112: Consider consolidating environment path expansion.

os.ExpandEnv is called in both the constructor and here. Consolidating this logic (e.g., at initialization) could ensure consistent key path usage throughout the provisioner.


133-184: Optionally allow parallel playbook execution.

Running the playbook with --forks=1 ensures sequential execution, which helps consistency but can slow down provisioning for multiple instances. Consider making it configurable if performance becomes a bottleneck.


186-232: Enable ordering or grouping of hosts in the inventory.

Appending each instance directly is fine for smaller environments, but for large clusters, grouping or stable ordering within the file can improve manageability. Consider grouping instances under labeled sections (e.g., [web], [db]) or sorting them alphabetically.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a8fc5c and 4447594.

📒 Files selected for processing (1)
  • internal/provisioner/ansible.go (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal/provisioner/ansible.go (5)
internal/provisioner/interface.go (1)
  • Provisioner (11-20)
internal/provisioner/config/ansible.go (1)
  • AnsibleConfig (11-26)
internal/provisioner/ssh_checker.go (2)
  • SSHChecker (12-14)
  • NewDefaultSSHChecker (25-32)
internal/types/instance.go (1)
  • InstanceInfo (106-116)
internal/events/events.go (3)
  • Publish (57-60)
  • Event (27-34)
  • EventInventoryRequested (21-21)
🔇 Additional comments (3)
internal/provisioner/ansible.go (3)

49-52: Great initialization and validation approach.

Ensuring the configuration is validated upfront helps catch misconfigurations early. This straightforward pattern sets a solid foundation for error handling in the rest of the provisioner logic.


70-80: Confirm intended behavior when no instances are provided.

Returning early after publishing the event means no provisioning steps occur locally. Verify that external components properly handle the inventory request event to fetch instances, so the flow doesn’t stall.


82-96: Efficient concurrency for SSH checks.

Your usage of goroutines and a wait group for parallel SSH availability checks optimizes setup time. The error channel approach is clean and prevents partial successes from continuing unnoticed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
internal/services/provisioning.go (1)

15-21: Potential concurrency issue when sharing a single p.provisioner instance.

Because the Provisioning struct holds a single provisioner.Provisioner, if multiple events arrive concurrently and manipulate this same provisioner reference, data races or inconsistent state could occur. If concurrency is expected, consider instantiating a separate provisioner or adding synchronization.

internal/compute/provider.go (1)

47-66: Avoid using panic in a constructor-like function.

When provisioner.NewAnsibleProvisioner fails, crashing the entire application with panic can be risky. As previously noted, consider returning an error instead so the caller can decide how to handle that failure scenario.

🧹 Nitpick comments (2)
internal/services/provisioning.go (1)

38-84: Consider returning an empty slice instead of an error when no instances are found.

Right now, encountering zero instances triggers an error. If your workflow can proceed with an empty set of instances, returning an empty slice may provide better flexibility. If the absence of instances is truly exceptional, then this approach is fine.

internal/compute/provider.go (1)

68-84: Validation is good; optionally enforce stricter date/time checks.

validateJobID correctly checks for an expected prefix and splits on “-”. If you need to strictly validate YYYYMMDD-HHMMSS, consider adding a regular expression or date/time parse check.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4447594 and 25fd3e7.

📒 Files selected for processing (2)
  • internal/compute/provider.go (2 hunks)
  • internal/services/provisioning.go (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
internal/services/provisioning.go (7)
internal/provisioner/interface.go (1)
  • Provisioner (11-20)
internal/services/instance.go (1)
  • Instance (19-22)
internal/services/job.go (1)
  • Job (15-18)
internal/events/events.go (3)
  • EventInstancesCreated (17-17)
  • EventInventoryRequested (21-21)
  • Event (27-34)
internal/types/instance.go (1)
  • InstanceInfo (106-116)
internal/logger/logger.go (2)
  • Info (92-94)
  • InfoWithFields (157-159)
internal/compute/provider.go (2)
  • Provider (17-32)
  • NewProvisioner (47-66)
internal/compute/provider.go (4)
internal/provisioner/factory.go (1)
  • NewProvisioner (19-30)
internal/provisioner/interface.go (1)
  • Provisioner (11-20)
internal/provisioner/config/ansible.go (1)
  • AnsibleConfig (11-26)
internal/provisioner/ansible.go (5)
  • DefaultSSHUser (28-28)
  • DefaultSSHKeyPath (31-31)
  • PathToPlaybook (25-25)
  • InventoryBasePath (37-37)
  • NewAnsibleProvisioner (49-60)
🔇 Additional comments (4)
internal/services/provisioning.go (2)

23-36: Looks good – event subscription is straightforward.

Subscribing to the relevant events here is logical and keeps the code organized. No major issues spotted.


86-124: Verify the logic for overwriting p.provisioner.

handleInstancesCreated always reassigns p.provisioner with a fresh instance from compute.NewProvisioner. If a previous event configured the provisioner differently, that configuration is lost. Ensure this reinitialization is correct for your use case.

internal/compute/provider.go (2)

6-11: Import additions are appropriate.

Bringing in "os", "strings", and the new provisioner/config references is consistent with the updated code requirements for Ansible-based provisioning.


86-104: Wrapping function for Ansible provisioner looks good.

NewAnsibleProvisioner is a straightforward wrapper that delegates to provisioner.NewAnsibleProvisioner after validating the job ID. This cleanly separates validation logic from the Ansible creation step.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
internal/services/provisioning.go (1)

17-21: Potential concurrency race condition with shared provisioner.

As noted in the past, if multiple events arrive concurrently and call methods that overwrite p.provisioner, there’s a possibility of data races or inconsistent state. If concurrency is expected, consider either synchronizing access to p.provisioner or creating a new local provisioner per event to eliminate races.

🧹 Nitpick comments (1)
internal/services/provisioning.go (1)

86-124: Provisioning logic is clear but consider extended error handling or logs.

While the provisioning flow is straightforward (checking the Provision flag, retrieving fresh instances, initializing the provisioner, and calling Configure), you might want to log or track the number of instances being provisioned for easier troubleshooting.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25fd3e7 and dfebf44.

📒 Files selected for processing (1)
  • internal/services/provisioning.go (1 hunks)
🔇 Additional comments (2)
internal/services/provisioning.go (2)

38-84: Logic for retrieving and converting database instances looks correct.

The step-by-step process of:

  1. Retrieving the job ID from the job name,
  2. Fetching instances using that ID,
  3. Converting them to []types.InstanceInfo,
    appears well-structured and handles errors appropriately. Nice job.

140-143: Good practice: Guarding p.provisioner against nil.

This implementation addresses a previously flagged concern. Initializing a new provisioner if p.provisioner is nil avoids potential nil-pointer panics during configuration.

@tty47 tty47 enabled auto-merge April 14, 2025 10:28
@tty47 tty47 requested review from MSevey, mojtaba-esk and sysrex April 14, 2025 10:28
}

// validateJobID validates the format of a job ID (should be "job-YYYYMMDD-HHMMSS")
func validateJobID(jobID string) error {
Copy link
Contributor

@MSevey MSevey Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, this shouldn't be in the compute package. Any JobID validation should be done with job model functions to avoid duplication

}

// NewAnsibleProvisioner creates a new Ansible provisioner
func NewAnsibleProvisioner(jobID string) (provisioner.Provisioner, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is going on here? why is there a compute NewAnsibleProvisioner and a provisioner.NewAnsibleProvisioner?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
internal/compute/provider.go (2)

48-49: Move string manipulation to a Job model method

String manipulation in higher-level packages increases the risk of bugs, especially if this needs to be done in multiple places.

As mentioned in previous comments, this should be a Name() method on the Job model or a JobID type with a Name() method that handles the string manipulation.


47-66: ⚠️ Potential issue

Replace panic with proper error handling in constructor

The function currently panics if there's an error creating the provisioner. This approach can bring down the entire application for configuration errors, which is generally unsafe for production code.

Consider changing the function signature to return an error and propagate it to the caller instead:

-func NewProvisioner(jobID string) provisioner.Provisioner {
+func NewProvisioner(jobID string) (provisioner.Provisioner, error) {
    // Extract job name from job ID (format: "job-20250411-134559")
    jobName := strings.TrimPrefix(jobID, "job-")

    cfg := &config.AnsibleConfig{
        JobID:             jobID,
        JobName:           jobName,
        SSHUser:           provisioner.DefaultSSHUser,
        SSHKeyPath:        os.ExpandEnv(provisioner.DefaultSSHKeyPath),
        PlaybookPath:      provisioner.PathToPlaybook,
        InventoryBasePath: provisioner.InventoryBasePath,
    }

    p, err := provisioner.NewAnsibleProvisioner(cfg)
    if err != nil {
-       // This should never happen with our default config
-       panic(fmt.Sprintf("failed to create provisioner: %v", err))
+       return nil, fmt.Errorf("failed to create provisioner: %w", err)
    }
-   return p
+   return p, nil
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa63148 and 816758b.

📒 Files selected for processing (1)
  • internal/compute/provider.go (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal/compute/provider.go (4)
internal/provisioner/factory.go (1)
  • NewProvisioner (19-30)
internal/provisioner/interface.go (1)
  • Provisioner (11-20)
internal/provisioner/config/ansible.go (1)
  • AnsibleConfig (12-27)
internal/provisioner/ansible.go (5)
  • DefaultSSHUser (28-28)
  • DefaultSSHKeyPath (31-31)
  • PathToPlaybook (25-25)
  • InventoryBasePath (37-37)
  • NewAnsibleProvisioner (49-60)
🔇 Additional comments (1)
internal/compute/provider.go (1)

82-82:

Details

❓ Verification inconclusive

Clarify purpose of duplicate provisioner creation functions

It's unclear why there are two similar functions (NewProvisioner and NewAnsibleProvisioner) that essentially do the same thing but with different error handling. Could you explain the intended difference between these functions and where each is meant to be used?


🏁 Script executed:

#!/bin/bash
# Find where these functions are called in the codebase
echo "Searching for references to NewProvisioner:"
rg "NewProvisioner\(" --type go | grep -v "internal/compute/provider.go"

echo -e "\nSearching for references to NewAnsibleProvisioner from compute package:"
rg "compute\.NewAnsibleProvisioner\(" --type go | grep -v "internal/compute/provider.go"

echo -e "\nSearching for references to NewAnsibleProvisioner from provisioner package:"
rg "provisioner\.NewAnsibleProvisioner\(" --type go | grep -v "internal/provisioner"

Length of output: 1096


Clarify the Distinctions in Provisioner Creation Functions

It appears that there are two routines for creating provisioners with overlapping purposes:

  • NewProvisioner (defined in internal/provisioner/factory.go and used in modules like internal/services/provisioning.go) is the general-purpose function for setting up a provisioner.
  • NewAnsibleProvisioner (invoked only from internal/compute/provider.go) is tailored for Ansible-based provisioning and incorporates a different error handling strategy.

Could you update the documentation or add inline comments to clearly explain:

  • The rationale for having these two separate functions,
  • The intended usage context for each, and
  • The specific differences in error handling between them?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
internal/compute/provider.go (3)

45-65: ⚠️ Potential issue

Replace panic with proper error handling.

Using panic in a constructor function is not recommended as it can bring down the entire application for configuration errors. Consider changing the function signature to return an error similar to NewAnsibleProvisioner, which would allow proper error handling by callers.

-func NewProvisioner(jobID string) provisioner.Provisioner {
+func NewProvisioner(jobID string) (provisioner.Provisioner, error) {
    // Extract job name from job ID (format: "job-20250411-134559")
    jobName := strings.TrimPrefix(jobID, "job-")

    cfg := &provisioner.AnsibleConfig{
        JobID:             jobID,
        JobName:           jobName,
        SSHUser:           provisioner.DefaultSSHUser,
        SSHKeyPath:        os.ExpandEnv(provisioner.DefaultSSHKeyPath),
        PlaybookPath:      provisioner.PathToPlaybook,
        InventoryBasePath: provisioner.InventoryBasePath,
    }

    p, err := provisioner.NewAnsibleProvisioner(cfg)
    if err != nil {
-       // This should never happen with our default config
-       panic(fmt.Sprintf("failed to create provisioner: %v", err))
+       return nil, fmt.Errorf("failed to create provisioner: %w", err)
    }
-   return p
+   return p, nil
}

45-65: 🛠️ Refactor suggestion

Refactor to eliminate code duplication.

There's significant code duplication between NewProvisioner and NewAnsibleProvisioner. Both functions create the same configuration using identical logic.

Create a helper function to centralize configuration creation:

+// createAnsibleConfig creates a common config for Ansible provisioners
+func createAnsibleConfig(jobID string) *provisioner.AnsibleConfig {
+   jobName := strings.TrimPrefix(jobID, "job-")
+   return &provisioner.AnsibleConfig{
+       JobID:             jobID,
+       JobName:           jobName,
+       SSHUser:           provisioner.DefaultSSHUser,
+       SSHKeyPath:        os.ExpandEnv(provisioner.DefaultSSHKeyPath),
+       PlaybookPath:      provisioner.PathToPlaybook,
+       InventoryBasePath: provisioner.InventoryBasePath,
+   }
+}

func NewProvisioner(jobID string) (provisioner.Provisioner, error) {
+   if err := validateJobID(jobID); err != nil {
+       return nil, err
+   }
-   // Extract job name from job ID...
-   jobName := strings.TrimPrefix(jobID, "job-")
-   cfg := &provisioner.AnsibleConfig{...}
+   cfg := createAnsibleConfig(jobID)
    
    p, err := provisioner.NewAnsibleProvisioner(cfg)
    if err != nil {
        return nil, fmt.Errorf("failed to create provisioner: %w", err)
    }
    return p, nil
}

func NewAnsibleProvisioner(jobID string) (provisioner.Provisioner, error) {
    if err := validateJobID(jobID); err != nil {
        return nil, err
    }
-   // Extract job name from job ID...
-   jobName := strings.TrimPrefix(jobID, "job-")
-   cfg := &provisioner.AnsibleConfig{...}
+   cfg := createAnsibleConfig(jobID)
    
    return provisioner.NewAnsibleProvisioner(cfg)
}

Also applies to: 67-78, 80-99


67-78: 🛠️ Refactor suggestion

Enhance job ID validation to match expected format.

The validation function only checks for the "job-" prefix, but the comment indicates the format should be "job-YYYYMMDD-HHMMSS".

func validateJobID(jobID string) error {
    if jobID == "" {
        return fmt.Errorf("job ID is required")
    }

    if !strings.HasPrefix(jobID, "job-") {
        return fmt.Errorf("invalid job ID format: must start with 'job-'")
    }

+   // Check if the rest follows the expected date format
+   parts := strings.Split(strings.TrimPrefix(jobID, "job-"), "-")
+   if len(parts) != 2 {
+       return fmt.Errorf("invalid job ID format: expected 'job-YYYYMMDD-HHMMSS'")
+   }
+   
+   datepart, timepart := parts[0], parts[1]
+   if len(datepart) != 8 || len(timepart) != 6 {
+       return fmt.Errorf("invalid job ID format: date should be YYYYMMDD and time should be HHMMSS")
+   }

    return nil
}
🧹 Nitpick comments (5)
internal/compute/provider.go (1)

81-99: Document the relationship between NewAnsibleProvisioner and provisioner.NewAnsibleProvisioner.

This function appears to serve as an adapter between the compute package and the provisioner package. Adding a comment explaining this relationship would help clarify why this function exists alongside the one in the provisioner package.

// NewAnsibleProvisioner creates a new Ansible provisioner
+// This function serves as an adapter between the compute package and the provisioner package,
+// validating the job ID and setting up the default configuration before delegating to 
+// provisioner.NewAnsibleProvisioner.
func NewAnsibleProvisioner(jobID string) (provisioner.Provisioner, error) {
internal/provisioner/ansible.go (4)

59-76: Address the TODO comment about OwnerID validation.

There's a TODO comment about validating OwnerID that should be implemented or tracked as a separate issue.

If the OwnerID validation is important, I can help implement it. Otherwise, consider creating a GitHub issue to track this enhancement and reference it in the comment.

// Validate validates the Ansible configuration
func (c *AnsibleConfig) Validate() error {
    // Validate required fields
    if c.JobID == "" {
        return fmt.Errorf("JobID is required")
    }
-   // TODO: Add validation for OwnerID
-   // if c.OwnerID == 0 {
-   // 	return fmt.Errorf("OwnerID is required")
-   // }
+   // TODO(GH-XXX): Add validation for OwnerID once we determine if it's required

106-116: Clarify behavior when instances are requested from database.

If no instances are provided and an event is published but nothing handles it, the method silently returns nil. This behavior should be documented or improved with logging.

    // If no instances provided, try to get them from DB
    if len(instances) == 0 {
        logger.Info("No instances provided, requesting inventory from database...")
        events.Publish(events.Event{
            Type:    events.EventInventoryRequested,
            JobID:   a.config.JobID,
            JobName: a.config.JobName,
            OwnerID: a.config.OwnerID,
        })
+       // Note: This returns without error as the request is handled asynchronously.
+       // The event handler will process the inventory request separately.
+       logger.InfoWithFields("Inventory request published, provisioning will continue asynchronously", 
+           map[string]interface{}{"job_id": a.config.JobID})
        return nil
    }

119-145: Improve error collection from goroutines.

The current error handling pattern works, but using a more structured approach would make the code more maintainable and easier to understand.

Consider using an error group (from golang.org/x/sync/errgroup) to simplify concurrency and error handling:

+import (
+   "golang.org/x/sync/errgroup"
+)

// Configure hosts in parallel
-var wg sync.WaitGroup
-errChan := make(chan error, len(instances))
+g, gCtx := errgroup.WithContext(ctx)

for _, instance := range instances {
-   wg.Add(1)
-   go func(inst types.InstanceInfo) {
-       defer wg.Done()
+   inst := instance // Create a copy of the instance for the goroutine
+   g.Go(func() error {
+       // Check if context is canceled
+       select {
+       case <-gCtx.Done():
+           return fmt.Errorf("operation canceled for instance %s: %w", inst.Name, ctx.Err())
+       default:
+       }
        if err := a.sshChecker.WaitForSSH(inst.PublicIP); err != nil {
-           errChan <- fmt.Errorf("failed waiting for SSH on instance %s: %w", inst.Name, err)
-           return
+           return fmt.Errorf("failed waiting for SSH on instance %s: %w", inst.Name, err)
        }
+       return nil
-   }(instance)
+   })
}

// Wait for all goroutines to finish
-wg.Wait()
-close(errChan)
-
-// Check for any errors
-var errors []error
-for err := range errChan {
-   errors = append(errors, err)
-}
-
-if len(errors) > 0 {
-   return fmt.Errorf("failed to configure some hosts: %v", errors)
+if err := g.Wait(); err != nil {
+   return fmt.Errorf("failed to configure hosts: %w", err)
}

222-268: Add file validation before writing inventory.

Before creating the inventory file, it would be beneficial to validate that the path doesn't contain directory traversal or other path manipulation attempts.

func (a *AnsibleProvisioner) CreateInventory(instances map[string]string, keyPath string) error {
    logger.Info("🎭 Creating inventory for job ", a.config.JobID, "...")

    // If no instances provided, return error
    if len(instances) == 0 {
        return fmt.Errorf("no instances provided")
    }

    inventoryPath := filepath.Join(a.config.InventoryBasePath, fmt.Sprintf("inventory_%s_ansible.ini", a.config.JobID))

+   // Validate inventory path doesn't contain directory traversal
+   if !strings.HasPrefix(filepath.Clean(inventoryPath), a.config.InventoryBasePath) {
+       return fmt.Errorf("invalid inventory path: directory traversal detected")
+   }

    // Ensure the inventory directory exists with secure permissions
    if err := os.MkdirAll(filepath.Dir(inventoryPath), 0750); err != nil {
        return fmt.Errorf("failed to create inventory directory: %w", err)
    }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 816758b and 5e8d714.

📒 Files selected for processing (4)
  • internal/compute/provider.go (2 hunks)
  • internal/provisioner/ansible.go (1 hunks)
  • internal/provisioner/ansible_test.go (1 hunks)
  • internal/provisioner/factory.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/provisioner/factory.go
  • internal/provisioner/ansible_test.go
🧰 Additional context used
🧬 Code Graph Analysis (2)
internal/compute/provider.go (2)
internal/provisioner/interface.go (1)
  • Provisioner (11-20)
internal/provisioner/ansible.go (6)
  • AnsibleConfig (42-57)
  • DefaultSSHUser (27-27)
  • DefaultSSHKeyPath (30-30)
  • PathToPlaybook (24-24)
  • InventoryBasePath (36-36)
  • NewAnsibleProvisioner (85-96)
internal/provisioner/ansible.go (5)
internal/provisioner/interface.go (1)
  • Provisioner (11-20)
internal/provisioner/ssh_checker.go (1)
  • SSHChecker (12-14)
internal/compute/provider.go (1)
  • NewAnsibleProvisioner (81-99)
internal/types/instance.go (1)
  • InstanceInfo (106-116)
internal/events/events.go (1)
  • Event (27-34)
🔇 Additional comments (1)
internal/provisioner/ansible.go (1)

1-269: Excellent implementation with comprehensive error handling!

Overall, this is a well-structured implementation of an Ansible provisioner. It has:

  • Clear separation of concerns
  • Strong error handling throughout
  • Secure file operations with proper permissions
  • Good logging of operations
  • Concurrent SSH checking for performance

The modularity and attention to detail in this implementation make it both robust and maintainable.

Comment on lines +98 to +99
// Configure implements Provisioner.Configure
func (a *AnsibleProvisioner) Configure(_ context.Context, instances []types.InstanceInfo) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Utilize the provided context parameter.

The method signature accepts a context parameter but doesn't use it. Context should be used for timeouts, cancellation, and passing request-scoped values.

-func (a *AnsibleProvisioner) Configure(_ context.Context, instances []types.InstanceInfo) error {
+func (a *AnsibleProvisioner) Configure(ctx context.Context, instances []types.InstanceInfo) error {

Then use the context throughout the method for cancellation checks and passing to other functions:

    // Configure hosts in parallel
    var wg sync.WaitGroup
    errChan := make(chan error, len(instances))

+   // Create a child context that can be canceled
+   childCtx, cancel := context.WithCancel(ctx)
+   defer cancel()

    for _, instance := range instances {
        wg.Add(1)
-       go func(inst types.InstanceInfo) {
+       go func(inst types.InstanceInfo) {
            defer wg.Done()
+           // Check if context is canceled before starting work
+           select {
+           case <-childCtx.Done():
+               errChan <- fmt.Errorf("operation canceled for instance %s: %w", inst.Name, ctx.Err())
+               return
+           default:
+           }
            if err := a.sshChecker.WaitForSSH(inst.PublicIP); err != nil {
                errChan <- fmt.Errorf("failed waiting for SSH on instance %s: %w", inst.Name, err)
+               // Cancel context to stop other goroutines if needed
+               cancel()
                return
            }
        }(instance)
    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Configure implements Provisioner.Configure
func (a *AnsibleProvisioner) Configure(_ context.Context, instances []types.InstanceInfo) error {
// Configure implements Provisioner.Configure
func (a *AnsibleProvisioner) Configure(ctx context.Context, instances []types.InstanceInfo) error {
// Create a child context that can be canceled
childCtx, cancel := context.WithCancel(ctx)
defer cancel()
// Configure hosts in parallel
var wg sync.WaitGroup
errChan := make(chan error, len(instances))
for _, instance := range instances {
wg.Add(1)
go func(inst types.InstanceInfo) {
defer wg.Done()
// Check if context is canceled before starting work
select {
case <-childCtx.Done():
errChan <- fmt.Errorf("operation canceled for instance %s: %w", inst.Name, ctx.Err())
return
default:
}
if err := a.sshChecker.WaitForSSH(inst.PublicIP); err != nil {
errChan <- fmt.Errorf("failed waiting for SSH on instance %s: %w", inst.Name, err)
// Cancel context to stop other goroutines if needed
cancel()
return
}
}(instance)
}
wg.Wait()
close(errChan)
// Return the first error encountered, if any
for err := range errChan {
if err != nil {
return err
}
}
return nil
}

Comment on lines +169 to +220
// RunPlaybook runs the Ansible playbook with the given inventory
func (a *AnsibleProvisioner) RunPlaybook(inventoryPath string) error {
logger.InfoWithFields("Running Ansible playbook", map[string]interface{}{
"job_id": a.config.JobID,
"inventory_path": inventoryPath,
"playbook_path": a.config.PlaybookPath,
})

// Validate paths exist
if _, err := os.Stat(inventoryPath); err != nil {
return fmt.Errorf("inventory file not found: %w", err)
}

if _, err := os.Stat(a.config.PlaybookPath); err != nil {
return fmt.Errorf("playbook file not found: %w", err)
}

// Prepare command arguments
args := []string{
"-i", inventoryPath,
// Run serially for consistency
"--forks", "1",
}

// Add verbosity if debug mode is enabled
if AnsibleDebug {
args = append(args, "-vvv")
}

// Add playbook path
args = append(args, a.config.PlaybookPath)

// Run ansible-playbook command
// #nosec G204 -- command arguments are constructed from validated inputs
cmd := exec.Command("ansible-playbook", args...)

// Set environment variables
env := os.Environ()
env = append(env, "ANSIBLE_HOST_KEY_CHECKING=false")
env = append(env, "ANSIBLE_RETRY_FILES_ENABLED=false")
cmd.Env = env

// Redirect output
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr

if err := cmd.Run(); err != nil {
return fmt.Errorf("failed to run ansible playbook (check output above for details): %w", err)
}

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Implement timeout and cancellation for playbook execution.

The RunPlaybook method should respect context cancellation and timeouts when executing the Ansible command.

Update the method signature and add context support:

-func (a *AnsibleProvisioner) RunPlaybook(inventoryPath string) error {
+func (a *AnsibleProvisioner) RunPlaybook(ctx context.Context, inventoryPath string) error {
    logger.InfoWithFields("Running Ansible playbook", map[string]interface{}{
        "job_id":         a.config.JobID,
        "inventory_path": inventoryPath,
        "playbook_path":  a.config.PlaybookPath,
    })

    // Validate paths exist
    if _, err := os.Stat(inventoryPath); err != nil {
        return fmt.Errorf("inventory file not found: %w", err)
    }

    if _, err := os.Stat(a.config.PlaybookPath); err != nil {
        return fmt.Errorf("playbook file not found: %w", err)
    }

    // Prepare command arguments
    args := []string{
        "-i", inventoryPath,
        // Run serially for consistency
        "--forks", "1",
    }

    // Add verbosity if debug mode is enabled
    if AnsibleDebug {
        args = append(args, "-vvv")
    }

    // Add playbook path
    args = append(args, a.config.PlaybookPath)

    // Run ansible-playbook command
    // #nosec G204 -- command arguments are constructed from validated inputs
    cmd := exec.Command("ansible-playbook", args...)

+   // Set up command with context for cancellation
+   cmd = exec.CommandContext(ctx, "ansible-playbook", args...)

    // Set environment variables
    env := os.Environ()
    env = append(env, "ANSIBLE_HOST_KEY_CHECKING=false")
    env = append(env, "ANSIBLE_RETRY_FILES_ENABLED=false")
    cmd.Env = env

    // Redirect output
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    if err := cmd.Run(); err != nil {
+       if ctx.Err() != nil {
+           return fmt.Errorf("ansible-playbook execution was canceled: %w", ctx.Err())
+       }
        return fmt.Errorf("failed to run ansible playbook (check output above for details): %w", err)
    }

    return nil
}

This also requires updating the Configure method to pass the context to RunPlaybook:

    // Run Ansible playbook
    logger.Info("📝 Running provisioning playbook...")
    inventoryPath := filepath.Join(a.config.InventoryBasePath, fmt.Sprintf("inventory_%s_ansible.ini", a.config.JobID))
-   if err := a.RunPlaybook(inventoryPath); err != nil {
+   if err := a.RunPlaybook(ctx, inventoryPath); err != nil {
        return fmt.Errorf("failed to run playbook: %w", err)
    }

And update the Provisioner interface definition to include context in the RunPlaybook method.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// RunPlaybook runs the Ansible playbook with the given inventory
func (a *AnsibleProvisioner) RunPlaybook(inventoryPath string) error {
logger.InfoWithFields("Running Ansible playbook", map[string]interface{}{
"job_id": a.config.JobID,
"inventory_path": inventoryPath,
"playbook_path": a.config.PlaybookPath,
})
// Validate paths exist
if _, err := os.Stat(inventoryPath); err != nil {
return fmt.Errorf("inventory file not found: %w", err)
}
if _, err := os.Stat(a.config.PlaybookPath); err != nil {
return fmt.Errorf("playbook file not found: %w", err)
}
// Prepare command arguments
args := []string{
"-i", inventoryPath,
// Run serially for consistency
"--forks", "1",
}
// Add verbosity if debug mode is enabled
if AnsibleDebug {
args = append(args, "-vvv")
}
// Add playbook path
args = append(args, a.config.PlaybookPath)
// Run ansible-playbook command
// #nosec G204 -- command arguments are constructed from validated inputs
cmd := exec.Command("ansible-playbook", args...)
// Set environment variables
env := os.Environ()
env = append(env, "ANSIBLE_HOST_KEY_CHECKING=false")
env = append(env, "ANSIBLE_RETRY_FILES_ENABLED=false")
cmd.Env = env
// Redirect output
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
return fmt.Errorf("failed to run ansible playbook (check output above for details): %w", err)
}
return nil
}
// RunPlaybook runs the Ansible playbook with the given inventory and respects context cancellation.
func (a *AnsibleProvisioner) RunPlaybook(ctx context.Context, inventoryPath string) error {
logger.InfoWithFields("Running Ansible playbook", map[string]interface{}{
"job_id": a.config.JobID,
"inventory_path": inventoryPath,
"playbook_path": a.config.PlaybookPath,
})
// Validate paths exist
if _, err := os.Stat(inventoryPath); err != nil {
return fmt.Errorf("inventory file not found: %w", err)
}
if _, err := os.Stat(a.config.PlaybookPath); err != nil {
return fmt.Errorf("playbook file not found: %w", err)
}
// Prepare command arguments
args := []string{
"-i", inventoryPath,
// Run serially for consistency
"--forks", "1",
}
// Add verbosity if debug mode is enabled
if AnsibleDebug {
args = append(args, "-vvv")
}
// Add playbook path
args = append(args, a.config.PlaybookPath)
// Build command for ansible-playbook execution
cmd := exec.Command("ansible-playbook", args...)
// Set up command with context for cancellation
cmd = exec.CommandContext(ctx, "ansible-playbook", args...)
// Set environment variables
env := os.Environ()
env = append(env, "ANSIBLE_HOST_KEY_CHECKING=false")
env = append(env, "ANSIBLE_RETRY_FILES_ENABLED=false")
cmd.Env = env
// Redirect output
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
if ctx.Err() != nil {
return fmt.Errorf("ansible-playbook execution was canceled: %w", ctx.Err())
}
return fmt.Errorf("failed to run ansible playbook (check output above for details): %w", err)
}
return nil
}

@tty47
Copy link
Collaborator Author

tty47 commented Apr 14, 2025

paused for now:
cc: @MSevey @mojtaba-esk

@MSevey MSevey marked this pull request as draft April 16, 2025 12:53
auto-merge was automatically disabled April 16, 2025 12:53

Pull request was converted to draft

@sysrex sysrex removed their request for review May 29, 2025 13:04
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Sep 21, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

api API enhancement New feature or request provisioning Provisioning Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🧾 Generate the Ansible inventory from the database

5 participants