Skip to content

feat: add worker log retrieval to MCP server for job troubleshooting#999

Open
rickrams wants to merge 2 commits intoaws-deadline:mainlinefrom
rickrams:feat/mcp-worker-logs
Open

feat: add worker log retrieval to MCP server for job troubleshooting#999
rickrams wants to merge 2 commits intoaws-deadline:mainlinefrom
rickrams:feat/mcp-worker-logs

Conversation

@rickrams
Copy link
Contributor

Add ability to retrieve worker logs via the MCP server, enabling a complete job troubleshooting workflow from job -> session -> worker logs.

Changes:

API layer (deadline.client.api):

  • Add get_worker_logs() function to retrieve CloudWatch logs for a worker
  • Add WorkerLogResult dataclass
  • Uses AssumeFleetRoleForRead for Deadline Cloud Monitor credential users, mirroring how get_session_logs uses AssumeQueueRoleForUser

MCP tool layer (deadline._mcp):

  • Add get_session_and_worker_logs MCP tool that takes a session_id and automatically fetches session details, session logs, AND the correct worker logs in one call. This prevents worker/session ID mismatches when an AI agent is troubleshooting jobs with multiple sessions.
  • get_worker_logs is NOT registered as a standalone MCP tool — it is a public Python API that the combined tool calls internally.
  • get_session_logs remains available as a standalone MCP tool for users who have queue permissions but not fleet permissions.
  • Update MCP server instructions to guide AI agents to check worker logs for infrastructure issues (spot interruptions, OOM, agent crashes).

Worker logs are stored at /aws/deadline/{farm_id}/{fleet_id}/{worker_id} and contain worker agent operations, environment setup, and system events not visible in session logs.

How was this change tested?

See DEVELOPMENT.md for information on running tests.

  • Have you run the unit tests? Yes
  • Have you run the integration tests? No
  • Have you made changes to the download or asset_sync modules? N/A

Was this change documented?

  • Are relevant docstrings in the code base updated? yes
  • Has the README.md been updated? N/A

Does this PR introduce new dependencies?

This library is designed to be integrated into third-party applications that have bespoke and customized deployment environments. Adding dependencies will increase the chance of library version conflicts and incompatabilities. Please evaluate the addition of new dependencies. See the Dependencies section of DEVELOPMENT.md for more details.

  • This PR adds one or more new dependency Python packages. I acknowledge I have reviewed the considerations for adding dependencies in DEVELOPMENT.md.
  • This PR does not add any new dependencies.

Is this a breaking change? No

Does this change impact security? No


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@github-actions github-actions bot added the waiting-on-maintainers Waiting on the maintainers to review. label Feb 12, 2026

```
src/deadline/mcp/
src/deadline/_mcp/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this!

Add ability to retrieve worker logs via the MCP server, enabling a
complete job troubleshooting workflow from job -> session -> worker logs.

Changes:

API layer (deadline.client.api):
- Add get_worker_logs() function to retrieve CloudWatch logs for a worker
- Add WorkerLogResult dataclass
- Uses AssumeFleetRoleForRead for Deadline Cloud Monitor credential users,
  mirroring how get_session_logs uses AssumeQueueRoleForUser

MCP tool layer (deadline._mcp):
- Add get_session_and_worker_logs MCP tool that takes a session_id and
  automatically fetches session details, session logs, AND the correct
  worker logs in one call. This prevents worker/session ID mismatches
  when an AI agent is troubleshooting jobs with multiple sessions.
- get_worker_logs is NOT registered as a standalone MCP tool — it is a
  public Python API that the combined tool calls internally.
- get_session_logs remains available as a standalone MCP tool for users
  who have queue permissions but not fleet permissions.
- Update MCP server instructions to guide AI agents to check worker logs
  for infrastructure issues (spot interruptions, OOM, agent crashes).

Worker logs are stored at /aws/deadline/{farm_id}/{fleet_id}/{worker_id}
and contain worker agent operations, environment setup, and system events
not visible in session logs.

Signed-off-by: rickrams <rickrams@users.noreply.github.com>
@rickrams rickrams force-pushed the feat/mcp-worker-logs branch from f496dd8 to 1b73b99 Compare February 14, 2026 22:08
@leongdl
Copy link
Contributor

leongdl commented Feb 18, 2026

By the way, this change is still in draft, so no one is reviewing it.

@rickrams rickrams marked this pull request as ready for review February 18, 2026 20:05
@rickrams rickrams requested a review from a team as a code owner February 18, 2026 20:05
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-on-maintainers Waiting on the maintainers to review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments