-
Notifications
You must be signed in to change notification settings - Fork 5
[Fusion] Troubleshooting error codes #1007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
christopher-hakkaart
commented
Jan 12, 2026
- Update headers and sections
- Make into multiple pages
- Start migration error messages
✅ Deploy Preview for seqera-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Updated last updated date and removed architectural limitations section. Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
| - Storage backends return normalized cloud errors (with provider-agnostic categories) or internal errors (`ErrNotFound`, `ErrReadOnly`, etc.) | ||
| - The FUSE layer maps both cloud errors and internal errors to FUSE status codes (`ENOENT`, `EACCES`, `EREMOTEIO`, `EIO`) | ||
| - The kernel translates FUSE status to errno values for the application | ||
| - Fusion logs cloud errors with structured details (provider, error code, HTTP status, request ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Fusion logs cloud errors with structured details (provider, error code, HTTP status, request ID) | |
| - Fusion logs cloud errors with structured details (i.e., provider, error code, HTTP status, request ID) |
|
|
||
| ## FUSE status codes | ||
|
|
||
| Fusion maps internal errors to standard FUSE status codes returned to the operating system. These are the [errno](https://man7.org/linux/man-pages/man3/errno.3.html) values applications receive when filesystem operations fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Fusion maps internal errors to standard FUSE status codes returned to the operating system. These are the [errno](https://man7.org/linux/man-pages/man3/errno.3.html) values applications receive when filesystem operations fail. | |
| Fusion maps internal errors to standard FUSE status codes returned to the operating system. These are the [errno](https://man7.org/linux/man-pages/man3/errno.3.html) values that applications receive when filesystem operations fail. |
| 1. **Cloud > Storage Backend > FUSE Layer > Kernel > Application** | ||
|
|
||
| - Storage backends catch and normalize cloud errors (network timeouts, auth failures, rate limits) using the `clouderr` package | ||
| - Storage backends return normalized cloud errors (with provider-agnostic categories) or internal errors (`ErrNotFound`, `ErrReadOnly`, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Storage backends return normalized cloud errors (with provider-agnostic categories) or internal errors (`ErrNotFound`, `ErrReadOnly`, etc.) | |
| - The Storage backend may also return internal errors when other logic errors are encountered. These errors will always begin with an `Err` prefix (e.g. `ErrNotFound`, `ErrReadOnly`, `ErrListTruncated`, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this suggestion truncates the normalised errors. I'd say something like 'The storage back end returns normalized cloud errors (with provider-agnostic categories). It may also return other internal errors when other logic errors are encountered
|
|
||
| - Storage backends catch and normalize cloud errors (network timeouts, auth failures, rate limits) using the `clouderr` package | ||
| - Storage backends return normalized cloud errors (with provider-agnostic categories) or internal errors (`ErrNotFound`, `ErrReadOnly`, etc.) | ||
| - The FUSE layer maps both cloud errors and internal errors to FUSE status codes (`ENOENT`, `EACCES`, `EREMOTEIO`, `EIO`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FUSE layer maps both cloud errors and internal errors to errno status codes (ENOENT, EACCES, EREMOTEIO, EIO, etc.)
| - Storage backends catch and normalize cloud errors (network timeouts, auth failures, rate limits) using the `clouderr` package | ||
| - Storage backends return normalized cloud errors (with provider-agnostic categories) or internal errors (`ErrNotFound`, `ErrReadOnly`, etc.) | ||
| - The FUSE layer maps both cloud errors and internal errors to FUSE status codes (`ENOENT`, `EACCES`, `EREMOTEIO`, `EIO`) | ||
| - The kernel translates FUSE status to errno values for the application |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kernel communicates the errno values to the application
| | Cloud error category | FUSE status | Examples | | ||
| |---------------------|-------------|----------| | ||
| | `Unauthenticated` | `fuse.EACCES` | No credentials provided | | ||
| | `InvalidCredentials` | `fuse.EACCES` | Wrong, malformed, or expired credentials | | ||
| | `Forbidden` | `fuse.EACCES` | Valid credentials, insufficient permissions | | ||
| | `AccountError` | `fuse.EACCES` | Account disabled, suspended, or has billing issues | | ||
| | `ResourceNotFound` | `fuse.ENOENT` | S3 "NoSuchKey", Azure "BlobNotFound", GCS 404 | | ||
| | `ContainerNotFound` | `fuse.ENOENT` | S3 "NoSuchBucket", Azure "ContainerNotFound", GCS 404 with bucket error | | ||
| | `RateLimited` | `fuse.EBUSY` | Request rate limits exceeded | | ||
| | `Busy` | `fuse.EBUSY` | Service temporarily unavailable or overloaded | | ||
| | `ResourceArchived` | `fuse.EBUSY` | Resource in archived/transitional state (for example, Glacier) | | ||
| | `Conflict` | `fuse.EEXIST` | Resource already exists or precondition failed | | ||
| | `InvalidArgument` | `fuse.EINVAL` | Malformed request or invalid parameters | | ||
| | `QuotaExceeded` | `fuse.EREMOTEIO` | Storage quota or capacity limit reached | | ||
| | `Unknown` (cloud errors) | `fuse.EREMOTEIO` | Unclassified cloud provider errors | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same changes as I mentioned above:
- FUSE status -> errno code
- Remove the
fuse.prefix.
| | Internal error | FUSE status | | ||
| |---------------|-------------| | ||
| | Not found | `fuse.ENOENT` | | ||
| | Read-only | `fuse.EROFS` | | ||
| | Unsupported | `fuse.ENOSYS` | | ||
| | Context cancelled | `fuse.EINTR` | | ||
| | Context deadline exceeded | `fuse.ETIMEDOUT` | | ||
| | Other errors | `fuse.EIO` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
| ## Cloud provider error categories | ||
|
|
||
| Fusion normalizes errors from different cloud storage providers (S3, Azure Blob Storage, Google Cloud Storage) into consistent categories. When you see an `error_code` field in Fusion logs, it represents one of these categories: | ||
|
|
||
| | Category | Description | Common provider codes | | ||
| |----------|-------------|----------------------| | ||
| | `ResourceNotFound` | Requested resource (object/file) does not exist | S3: "NoSuchKey", Azure: "BlobNotFound", GCS: HTTP 404 | | ||
| | `ContainerNotFound` | Storage container (bucket) does not exist | S3: "NoSuchBucket", Azure: "ContainerNotFound", GCS: HTTP 404 with bucket error | | ||
| | `Unauthenticated` | No credentials provided | S3: "MissingSecurityHeader", GCS: HTTP 401 with no credentials | | ||
| | `InvalidCredentials` | Credentials provided but wrong, malformed, or expired | S3: "InvalidAccessKeyId", "ExpiredToken", Azure: "InvalidAuthenticationInfo", GCS: HTTP 401 with invalid credentials | | ||
| | `Forbidden` | Valid credentials but insufficient permissions | S3: "AccessDenied", Azure: "AuthorizationPermissionMismatch", GCS: HTTP 403 | | ||
| | `AccountError` | Account-level problems (disabled, suspended, billing issues) | S3: "AccountProblem", Azure: "AccountIsDisabled", GCS: HTTP 403 with specific messages | | ||
| | `ResourceArchived` | Resource exists but is in archived/transitional state | S3: "InvalidObjectState" (Glacier), Azure: "BlobArchived" | | ||
| | `RateLimited` | Request rate limits exceeded | S3: "SlowDown", Azure: "TooManyRequests", GCS: HTTP 429 | | ||
| | `Busy` | Service temporarily unavailable or overloaded | S3: "ServiceUnavailable", "InternalError", Azure: "ServerBusy", GCS: HTTP 503 | | ||
| | `Conflict` | Resource state conflict or precondition failure | S3: "BucketAlreadyExists", Azure: "BlobAlreadyExists", "ConditionNotMet", GCS: HTTP 409/412 | | ||
| | `InvalidArgument` | Malformed request or invalid parameters | S3: "InvalidArgument", "InvalidRange", Azure: "InvalidQueryParameterValue", GCS: HTTP 400 | | ||
| | `QuotaExceeded` | Storage quota or capacity limit reached | S3: "TooManyBuckets", Azure: "AccountLimitExceeded", GCS: HTTP 429 with quota message | | ||
| | `Unknown` | Unclassified or unexpected error | Various | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we wrote most of this, but it seems to provide similar information to the table above: maybe a unified table would work better?
| ## Nextflow integration | ||
|
|
||
| When running Nextflow with Fusion: | ||
|
|
||
| - Exit code `0`: Task completed successfully | ||
| - Exit code `127`: Retry logic activates (`.command.sh` not found) | ||
| - Exit code `174`: Fusion I/O error—check logs for details | ||
|
|
||
| ### Check exit codes | ||
|
|
||
| ```bash | ||
| fusion --foreground | ||
| exit_code=$? | ||
|
|
||
| case $exit_code in | ||
| 0) | ||
| echo "Success" | ||
| ;; | ||
| 127) | ||
| echo "Command not found - may retry" | ||
| ;; | ||
| 174) | ||
| echo "Fusion I/O error - check Fusion logs" | ||
| ;; | ||
| *) | ||
| echo "Unknown exit code: $exit_code" | ||
| ;; | ||
| esac | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this section is needed.
| | `mounting filesystem` | Failed to mount FUSE filesystem | | ||
| | `could not get current job attempt` | Failed to get job attempt from compute environment | | ||
|
|
||
| ## Understanding Fusion logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is rather important both to support and users, thus I would put it after the "Triaging errors" section. As a matter of fact, most of the sections in the document are for reference purposes, and since we have a specific Reference document, it might be interesting to move them there and cross-reference them here as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed I would actually have this as the first section (though after triaging also works)
Co-authored-by: Alberto Miranda <alberto.miranda@protonmail.com> Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Co-authored-by: Alberto Miranda <alberto.miranda@protonmail.com> Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Co-authored-by: Cristian Ramon-Cortes <cristianrcv@users.noreply.github.com> Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
|
|
||
| 1. **Failures during startup/shutdown → Exit Code** | ||
|
|
||
| - Startup: Configuration errors, missing credentials, or mount failures terminate Fusion immediately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either a comma after failures, or can we put will terminate Fusion immediately for clarity?