Skip to content

Conversation

@nh13
Copy link

@nh13 nh13 commented Jan 31, 2026

Summary

  • Use self.job.threads for vCPU allocation instead of the undocumented _cores resource (falls back to _cores for backward compatibility)
  • Auto-detect compute platform (EC2 vs Fargate) from the job queue
  • Apply strict memory/vCPU validation only for Fargate; use relaxed validation for EC2

Details

This PR addresses two issues:

  1. threads directive ignored: The plugin was ignoring self.job.threads and instead looking for an undocumented _cores resource (defaulting to 1 vCPU). This is now fixed to use the standard snakemake threads directive.

  2. Overly restrictive resource validation: The VALID_RESOURCES_MAPPING validation is only applicable to Fargate (which has strict memory/vCPU ratios), but was being applied to all compute environments including EC2. This PR incorporates the fix from fix: issue with resource validation assuming Fargate (#34) #37 which auto-detects the platform and applies appropriate validation.

Test plan

  • Create a snakemake workflow with threads: 4 and verify the AWS Batch job requests 4 vCPU
  • Test with EC2 compute environment using various mem_mb values that aren't in the old VALID_RESOURCES_MAPPING
  • Test with Fargate compute environment to verify strict validation still applies

Fixes #34
Incorporates #37

Summary by CodeRabbit

  • New Features

    • Added AWS Batch metadata access and automatic platform detection (EC2 vs FARGATE).
    • Platform-aware resource validation and improved defaults with intelligent fallbacks.
  • Bug Fixes

    • Graceful fallback and clearer messages when platform detection or resource validation encounters issues.

✏️ Tip: You can customize this high-level summary in your review settings.

radusuciu and others added 2 commits January 31, 2026 12:08
As described in issue snakemake#34, validation assumed a Fargate compute environment. We can detect the compute environment by looking up the queue, and only run the Fargate specific validation when necessary.
Use self.job.threads for vCPU instead of the undocumented _cores resource.
Falls back to _cores for backward compatibility if threads is not set.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 31, 2026

📝 Walkthrough

Walkthrough

Adds two boto3-delegating methods to BatchClient and implements platform-aware platform detection and resource validation in BatchJobBuilder, distinguishing EC2 vs FARGATE resource rules and using the detected platform for job definition registration.

Changes

Cohort / File(s) Summary
BatchClient API delegation
snakemake_executor_plugin_aws_batch/batch_client.py
Added describe_job_queues(**kwargs) and describe_compute_environments(**kwargs) that directly delegate to the boto3 Batch client.
Platform-aware resource validation
snakemake_executor_plugin_aws_batch/batch_job_builder.py
Added _get_platform_from_queue() to detect EC2 vs FARGATE; introduced _validate_fargate_resources() and _validate_ec2_resources() and _validate_resources() dispatcher; changed vCPU/memory defaulting and applied detected platform to platformCapabilities when registering job definitions.

Sequence Diagram(s)

sequenceDiagram
    participant BJB as BatchJobBuilder
    participant BC as BatchClient
    participant API as AWS Batch API
    participant QC as Job Queue / Compute Environments

    rect rgba(100,150,200,0.5)
    Note over BJB,QC: Platform Detection
    BJB->>BC: describe_job_queues()
    BC->>API: describe_job_queues()
    API-->>BC: queue details
    BC-->>BJB: queue details

    BJB->>BC: describe_compute_environments()
    BC->>API: describe_compute_environments()
    API-->>BC: compute environment details
    BC-->>BJB: compute environment details
    BJB->>BJB: _get_platform_from_queue() (detect EC2 or FARGATE)
    end

    rect rgba(150,100,200,0.5)
    Note over BJB: Resource Validation
    alt Platform == FARGATE
        BJB->>BJB: _validate_fargate_resources() (strict vCPU/memory mapping)
    else Platform == EC2
        BJB->>BJB: _validate_ec2_resources() (basic sanity checks)
    end
    end

    rect rgba(200,150,100,0.5)
    Note over BJB,API: Job Definition Registration
    BJB->>BC: register_job_definition() (with detected platform)
    BC->>API: register_job_definition()
    API-->>BC: registration result
    BC-->>BJB: success/failure
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main changes: using threads directive for vCPU and implementing platform-aware resource validation.
Linked Issues check ✅ Passed The PR implementation addresses all key objectives from issue #34: platform detection via job queue, platform-specific validation for Fargate vs EC2, and use of threads directive for vCPU.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the stated objectives: two new delegation methods in BatchClient and platform-aware resource validation in BatchJobBuilder.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py`:
- Around line 119-125: The code assumes there is at least one matching entry in
VALID_RESOURCES_MAPPING for the given vcpu which can cause min([]) to raise
ValueError; change the branch that computes min_mem to first build the list
matches = [m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v] and if
matches is empty raise a WorkflowError (include the invalid vcpu value in the
message) instead of calling min(), otherwise compute min_mem = min(matches), log
the warning via self.logger.warning and return the str(vcpu), str(min_mem) as
before; update any callers/tests expecting a WorkflowError when an invalid vCPU
is supplied.
🧹 Nitpick comments (2)
snakemake_executor_plugin_aws_batch/batch_job_builder.py (2)

70-71: Consider handling mixed-platform compute environments.

The current implementation only inspects the first compute environment in the queue. If a job queue has multiple compute environments with mixed platforms (e.g., EC2 and FARGATE), this could lead to incorrect platform detection depending on priority order.

This may be acceptable if mixed-platform queues are rare in practice, but consider adding a comment explaining this design decision or logging when multiple compute environments are present.


139-150: Consider accepting integer parameters directly.

The method signature takes strings but immediately converts to integers. The caller at line 163 converts ints to strings via str(vcpu), str(mem), then this method converts back. Accepting integers directly would be cleaner.

♻️ Proposed simplification
-    def _validate_resources(self, vcpu: str, mem: str) -> tuple[str, str]:
+    def _validate_resources(self, vcpu: int, mem: int) -> tuple[str, str]:
         """Validates vcpu and memory based on platform requirements.

         https://docs.aws.amazon.com/batch/latest/APIReference/API_ResourceRequirement.html
         """
-        vcpu_int = int(vcpu)
-        mem_int = int(mem)
-
         if self.platform == BATCH_JOB_PLATFORM_CAPABILITIES.FARGATE.value:
-            return self._validate_fargate_resources(vcpu_int, mem_int)
+            return self._validate_fargate_resources(vcpu, mem)
         else:
-            return self._validate_ec2_resources(vcpu_int, mem_int)
+            return self._validate_ec2_resources(vcpu, mem)

Then update the caller at line 163:

-        vcpu_str, mem_str = self._validate_resources(str(vcpu), str(mem))
+        vcpu_str, mem_str = self._validate_resources(vcpu, mem)

Avoid ValueError from min() on empty list when vCPU is not valid
for any Fargate memory configuration.
@nh13
Copy link
Author

nh13 commented Jan 31, 2026

@jakevc I would appreciate a review (CC: @johanneskoester)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VALID_RESOURCES_MAPPING is restrictive, and only applicable to Fargate compute environments.

2 participants