diff --git a/platform-enterprise_versioned_docs/version-23.4/getting-started/quickstart-demo/launch-pipelines.mdx b/platform-enterprise_versioned_docs/version-23.4/getting-started/quickstart-demo/launch-pipelines.mdx
index 4ffc3cdae..e97fc8702 100644
--- a/platform-enterprise_versioned_docs/version-23.4/getting-started/quickstart-demo/launch-pipelines.mdx
+++ b/platform-enterprise_versioned_docs/version-23.4/getting-started/quickstart-demo/launch-pipelines.mdx
@@ -63,7 +63,7 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc
There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
Specify your pipeline input and output and modify other pipeline parameters as needed:
diff --git a/platform-enterprise_versioned_docs/version-23.4/getting-started/rnaseq.mdx b/platform-enterprise_versioned_docs/version-23.4/getting-started/rnaseq.mdx
index 0fa590391..9a16223c0 100644
--- a/platform-enterprise_versioned_docs/version-23.4/getting-started/rnaseq.mdx
+++ b/platform-enterprise_versioned_docs/version-23.4/getting-started/rnaseq.mdx
@@ -215,7 +215,7 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc
There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.
diff --git a/platform-enterprise_versioned_docs/version-24.1/getting-started/proteinfold.mdx b/platform-enterprise_versioned_docs/version-24.1/getting-started/proteinfold.mdx
index f81708645..cff163053 100644
--- a/platform-enterprise_versioned_docs/version-24.1/getting-started/proteinfold.mdx
+++ b/platform-enterprise_versioned_docs/version-24.1/getting-started/proteinfold.mdx
@@ -9,10 +9,10 @@ toc_max_heading_level: 2
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
-This guide details how to perform best-practice analysis for protein 3D structure prediction on an AWS Batch compute environment in Platform. It includes:
+This guide details how to perform best-practice analysis for protein 3D structure prediction on an AWS Batch compute environment in Platform. It includes:
- Creating AWS Batch compute environments to run your pipeline and downstream analysis
-- Adding the *nf-core/proteinfold* pipeline to your workspace
+- Adding the *nf-core/proteinfold* pipeline to your workspace
- Importing your pipeline input data
- Launching the pipeline and monitoring execution from your workspace
- Setting up a custom analysis environment with Data Studios
@@ -22,7 +22,7 @@ You will need the following to get started:
- [Admin](../orgs-and-teams/roles) permissions in an existing organization workspace. See [Set up your workspace](./workspace-setup) to create an organization and workspace from scratch.
- An existing AWS cloud account with access to the AWS Batch service.
-- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
+- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
:::
## Compute environment
@@ -35,24 +35,24 @@ Given the data sizes and computational intensity, production pipelines perform b
The *nf-core/proteinfold* pipeline performs protein folding prediction using one of three deep learning models: AlphaFold2, ColabFold, or ESMFold. The computationally intensive tasks for protein structure prediction perform better on GPUs due to their ability to handle large matrix operations efficiently and perform parallel computations. GPUs can dramatically reduce the time required for protein structure predictions, making it feasible to analyze larger datasets or perform more complex simulations.
-Platform supports the allocation of both CPUs and GPUs in the same compute environment. For example, specify `m6id`, `c6id`, `r6id`, `g5`, `p3` instance families in the **Instance types** field when creating your AWS Batch compute environment. See [Create compute environment](#create-compute-environment) below.
+Platform supports the allocation of both CPUs and GPUs in the same compute environment. For example, specify `m6id`, `c6id`, `r6id`, `g5`, `p3` instance families in the **Instance types** field when creating your AWS Batch compute environment. See [Create compute environment](#create-compute-environment) below.
-When you launch *nf-core/proteinfold* in Platform, enable **use_gpu** to instruct Nextflow to run GPU-compatible pipeline processes on GPU instances. See [Launch pipeline](#launch-pipeline) below.
+When you launch *nf-core/proteinfold* in Platform, enable **use_gpu** to instruct Nextflow to run GPU-compatible pipeline processes on GPU instances. See [Launch pipeline](#launch-pipeline) below.
### Fusion file system
-The [Fusion](../supported_software/fusion/overview) file system enables seamless read and write operations to cloud object stores, leading to simpler pipeline logic and faster, more efficient execution. While Fusion is not required to run *nf-core/proteinfold*, it significantly enhances I/O-intensive tasks and eliminates the need for intermediate data copies, which is particularly beneficial when working with the large databases used by deep learning models for prediction.
+The [Fusion](../supported_software/fusion/overview) file system enables seamless read and write operations to cloud object stores, leading to simpler pipeline logic and faster, more efficient execution. While Fusion is not required to run *nf-core/proteinfold*, it significantly enhances I/O-intensive tasks and eliminates the need for intermediate data copies, which is particularly beneficial when working with the large databases used by deep learning models for prediction.
-Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage, you must select EC2 instances with NVMe SSD storage (`g4dn`, `g5`, or `P3` families or greater).
+Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage, you must select EC2 instances with NVMe SSD storage (`g4dn`, `g5`, or `P3` families or greater).
-:::note
+:::note
Fusion requires a license for use in Seqera Platform compute environments or directly in Nextflow. Fusion can be trialed at no cost. [Contact Seqera](https://seqera.io/contact-us/) for more details.
:::
### Create compute environment
:::info
-The same compute environment can be used for pipeline execution and running your Data Studios notebook environment, but Data Studios does not support AWS Fargate. To use this compute environment for both *nf-core/proteinfold* execution and your data studio, leave **Enable Fargate for head job** disabled and include a CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
+The same compute environment can be used for pipeline execution and running your Data Studios notebook environment, but Data Studios does not support AWS Fargate. To use this compute environment for both *nf-core/proteinfold* execution and your data studio, leave **Enable Fargate for head job** disabled and include a CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
Alternatively, create a second basic AWS Batch compute environment and a data studio with at least 2 CPUs and 8192 MB of RAM.
:::
@@ -77,7 +77,7 @@ From the **Compute Environments** tab in your organization workspace, select **A
| **Enable Fargate for head job** | Run the Nextflow head job using the Fargate container service to speed up pipeline launch. Requires Fusion v2. Do not enable for Data Studios compute environments. |
| **Use Amazon-recommended GPU-optimized ECS AMI** | When enabled, Batch Forge specifies the most current AWS-recommended GPU-optimized ECS AMI as the EC2 fleet AMI when creating the compute environment. |
| **Allowed S3 buckets** | Additional S3 buckets or paths to be granted read-write permission for this compute environment. For the purposes of this guide, add `s3://proteinfold-dataset` to grant compute environment access to the DB and params used for prediction by AlphaFold2 and ColabFold. |
-| **Instance types** | Specify the instance types to be used for computation. You must include GPU-enabled instance types (`g4dn`, `g5`) when the Amazon-recommended GPU-optimized ECS AMI is in use. Include CPU-based instance families for Data Studios compute environments. |
+| **Instance types** | Specify the instance types to be used for computation. You must include GPU-enabled instance types (`g4dn`, `g5`) when the Amazon-recommended GPU-optimized ECS AMI is in use. Include CPU-based instance families for Data Studios compute environments. |
| **Resource labels** | `name=value` pairs to tag the AWS resources created by this compute environment.|

@@ -97,17 +97,17 @@ To use Seqera Pipelines to import the *nf-core/proteinfold* pipeline to your wor

1. Search for *nf-core/proteinfold* and select **Launch** next to the pipeline name in the list. In the **Add pipeline** tab, select **Cloud** or **Enterprise** depending on your Platform account type, then provide the information needed for Seqera Pipelines to access your Platform instance:
- - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
+ - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
- **Seqera Enterprise**: Specify the **Seqera Platform URL** (hostname) and **Base API URL** for your Enterprise instance, then paste your Platform **Access token** and select **Next**.
:::tip
If you do not have a Platform access token, select **Get your access token from Seqera Platform** to open the Access tokens page in a new browser tab.
:::
-1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
+1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
1. (Optional) Customize the **Pipeline Name** and **Pipeline Description**.
-1. Select **Add Pipeline**.
+1. Select **Add Pipeline**.
:::info
-To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
+To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
:::
## Pipeline input data
@@ -116,7 +116,7 @@ The [*nf-core/proteinfold*](https://github.com/nf-core/proteinfold) pipeline wor
**nf-core/proteinfold example samplesheet**
-
+
| sequence | fasta |
| -------- | ----- |
| T1024 | https://raw.githubusercontent.com/nf-core/test-datasets/proteinfold/testdata/sequences/T1024.fasta |
@@ -124,12 +124,12 @@ The [*nf-core/proteinfold*](https://github.com/nf-core/proteinfold) pipeline wor
-In Platform, samplesheets and other data can be made easily accessible in one of two ways:
+In Platform, samplesheets and other data can be made easily accessible in one of two ways:
- Use **Data Explorer** to browse and interact with remote data from AWS S3, Azure Blob Storage, and Google Cloud Storage repositories, directly in your organization workspace.
- Use **Datasets** to upload structured data to your workspace in CSV (Comma-Separated Values) or TSV (Tab-Separated Values) format.
- **Add a cloud bucket via Data Explorer**
+ **Add a cloud bucket via Data Explorer**
Private cloud storage buckets accessible with the credentials in your workspace are added to Data Explorer automatically by default. However, you can also add custom directory paths within buckets to your workspace to simplify direct access.
@@ -137,7 +137,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of

- 1. From the **Data Explorer** tab, select **Add cloud bucket**.
+ 1. From the **Data Explorer** tab, select **Add cloud bucket**.
1. Specify the bucket details:
- The cloud **Provider**: AWS
- An existing cloud **Bucket path**: `s3://proteinfold-dataset`
@@ -146,7 +146,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- An optional bucket **Description**.
1. Select **Add**.
- You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
+ You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
@@ -164,7 +164,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- Select the **First row as header** option to prevent Platform from parsing the header row of the samplesheet as sample data.
- Select **Upload file** and browse to your CSV or TSV samplesheet file in local storage, or simply drag and drop it into the box.
- The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
+ The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
:::info
Platform does not store the data used for analysis in pipelines. The dataset must specify the locations of data stored on your own infrastructure.
@@ -175,21 +175,21 @@ In Platform, samplesheets and other data can be made easily accessible in one of
## Launch pipeline
:::note
-This guide is based on [version 1.1.1](https://nf-co.re/proteinfold/1.1.1) of the nf-core/proteinfold pipeline. Launch form parameters and tools may differ in other versions.
+This guide is based on [version 1.1.1](https://nf-co.re/proteinfold/1.1.1) of the nf-core/proteinfold pipeline. Launch form parameters and tools may differ in other versions.
:::
With your compute environment created, *nf-core/proteinfold* added to your workspace Launchpad, and your samplesheet accessible in Platform, you are ready to launch your pipeline. Navigate to the Launchpad and select **Launch** next to `nf-core-proteinfold` to open the launch form.
-The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
+The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
-### General config
+### General config
- **Pipeline to launch**: The pipeline Git repository name or URL: `https://github.com/nf-core/proteinfold`. For saved pipelines, this is prefilled and cannot be edited.
- **Revision number**: A valid repository commit ID, tag, or branch name: `1.1.1`. For saved pipelines, this is prefilled and cannot be edited.
-- **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution. Config profiles must be defined in the `nextflow.config` file in the pipeline repository. Benchmarking runs for this guide used nf-core profiles with included test datasets — `test_full_alphafold2_multimer` for Alphafold2 and `test_full_alphafold2_multimer` for Colabfold.
+- **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution. Config profiles must be defined in the `nextflow.config` file in the pipeline repository. Benchmarking runs for this guide used nf-core profiles with included test datasets — `test_full_alphafold2_multimer` for Alphafold2 and `test_full_alphafold2_multimer` for Colabfold.
- **Workflow run name**: An identifier for the run, pre-filled with a random name. This can be customized.
- **Labels**: Assign new or existing [labels](../labels/overview) to the run.
-- **Compute environment**: Your AWS Batch compute environment.
+- **Compute environment**: Your AWS Batch compute environment.
- **Work directory**: The cloud storage path where pipeline scratch data is stored. Platform will create a scratch sub-folder if only a cloud bucket location is specified.
:::note
The credentials associated with the compute environment must have access to the work directory.
@@ -197,24 +197,24 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc

-### Run parameters
+### Run parameters
There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
-Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.
+Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.

-Specify your pipeline input and output and modify other pipeline parameters as needed.
+Specify your pipeline input and output and modify other pipeline parameters as needed.
**input**
- Use **Browse** to select your pipeline input data:
+ Use **Browse** to select your pipeline input data:
- In the **Data Explorer** tab, select the existing cloud bucket that contains your samplesheet, browse or search for the samplesheet file, and select the chain icon to copy the file path before closing the data selection window and pasting the file path in the input field.
- In the **Datasets** tab, search for and select your existing dataset.
@@ -223,24 +223,24 @@ Specify your pipeline input and output and modify other pipeline parameters as n
**outdir**
- Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
+ Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
**Browse** and copy cloud storage directory paths using Data Explorer, or enter a path manually.
-- The **mode** menu allows you to select the deep learning model used for structure prediction (`alphafold2`, `colabfold`, or `esmfold`).
-- Enable **use_gpu** to run GPU-compatible tasks on GPUs. This requires **Use Amazon-recommended GPU-optimized ECS AMI** to be enabled and GPU-enabled instances to be specified under **Instance types** in your compute environment.
+- The **mode** menu allows you to select the deep learning model used for structure prediction (`alphafold2`, `colabfold`, or `esmfold`).
+- Enable **use_gpu** to run GPU-compatible tasks on GPUs. This requires **Use Amazon-recommended GPU-optimized ECS AMI** to be enabled and GPU-enabled instances to be specified under **Instance types** in your compute environment.

:::info
For the purposes of this guide, run the pipeline in both `alphafold2` and `colabfold` modes. Specify unique directory paths for the `outdir` parameter (such as "Alphafold2" and "ColabFold") to ensure output data is kept separate and not overwritten. Predicted protein structures for each model will be visualized side-by-side in the [Interactive analysis](#interactive-analysis-with-data-studios) section.
-:::
+:::
-### Advanced settings
+### Advanced settings
-- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
+- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
- [Pipeline secrets](../secrets/overview) store keys and tokens used by workflow tasks to interact with external systems. Enter the names of any stored user or workspace secrets required for the workflow execution.
- See [Advanced options](../launch/advanced) for more details.
@@ -271,7 +271,7 @@ After you have filled the necessary launch details, select **Launch**. The **Run
The paths to report files point to a location in cloud storage (in the `outdir` directory specified during launch), but you can view the contents directly and download each file without navigating to the cloud or a remote filesystem.
:::info
- See [Reports](../reports/overview) for more information.
+ See [Reports](../reports/overview) for more information.
:::
#### View general information
@@ -295,9 +295,9 @@ After you have filled the necessary launch details, select **Launch**. The **Run
Select a task in the task table to open the **Task details** dialog. The dialog has three tabs:
- - The **About** tab contains extensive task execution details.
+ - The **About** tab contains extensive task execution details.
- The **Execution log** tab provides a real-time log of the selected task's execution. Task execution and other logs (such as stdout and stderr) are available for download from here, if still available in your compute environment.
- - The **Data Explorer** tab allows you to view the task working directory directly in Platform.
+ - The **Data Explorer** tab allows you to view the task working directory directly in Platform.

@@ -309,29 +309,29 @@ After you have filled the necessary launch details, select **Launch**. The **Run
[Data Studios](../data_studios/overview) streamlines the process of creating interactive analysis environments for Platform users. With built-in templates for platforms like Jupyter Notebook, RStudio, and VSCode, creating a data studio is as simple as adding and sharing pipelines or datasets. The data studio URL can also be shared with any user with the [Connect role](../orgs-and-teams/roles) for real-time access and collaboration.
-For the purposes of this guide, a Jupyter notebook environment will be used for interactive visualization of the predicted protein structures, optionally comparing AlphaFold2 and Colabfold structures for the same sequence data.
+For the purposes of this guide, a Jupyter notebook environment will be used for interactive visualization of the predicted protein structures, optionally comparing AlphaFold2 and Colabfold structures for the same sequence data.
### Create a Jupyter notebook data studio
From the **Data Studios** tab, select **Add a data studio** and complete the following:
- In the **Compute & Data** tab:
- - Select your AWS Batch compute environment.
+ - Select your AWS Batch compute environment.
:::info
- The same compute environment can be used for pipeline execution and running your Data Studios notebook environment, but Data Studios does not support AWS Fargate and data studio sessions must run on CPUs. To use one compute environment for both nf-core/proteinfold execution and your data studio, leave **Enable Fargate for head job** disabled and include at least one CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
+ The same compute environment can be used for pipeline execution and running your Data Studios notebook environment, but Data Studios does not support AWS Fargate and data studio sessions must run on CPUs. To use one compute environment for both nf-core/proteinfold execution and your data studio, leave **Enable Fargate for head job** disabled and include at least one CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
Alternatively, create a second basic AWS Batch compute environment with at least 2 CPUs and 8192 MB of RAM for your data studio.
:::
- Optional: Enter CPU and memory allocations. The default values are 2 CPUs and 8192 MB memory (RAM).
:::note
- Data studios compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and data studio sessions.
+ Data studios compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and data studio sessions.
:::
- - Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your Proteinfold run.
+ - Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your Proteinfold run.
- In the **General config** tab:
- Select the latest **Jupyter** container image template from the list.
- - Optional: Enter a unique name and description for the data studio.
+ - Optional: Enter a unique name and description for the data studio.
- Check **Install Conda packages** and paste the following Conda environment YAML snippet:
- ```yaml
+ ```yaml
channels:
- bioconda
- conda-forge
@@ -344,7 +344,7 @@ From the **Data Studios** tab, select **Add a data studio** and complete the fol
- Confirm the data studio details in the **Summary** tab
- Select **Add** and choose whether to add and start the studio immediately.
-- When the data studio is created and in a running state, **Connect** to it.
+- When the data studio is created and in a running state, **Connect** to it.

@@ -354,7 +354,7 @@ The Jupyter environment can be configured with the packages and scripts you need
1. Import libraries and check versions:
- ```python
+ ```python
import sys
import jupyter_core
import nglview
@@ -372,7 +372,7 @@ The Jupyter environment can be configured with the packages and scripts you need
1. Define visualization functions:
- ```python
+ ```python
import os
import ipywidgets as widgets
from IPython.display import display, HTML
@@ -382,14 +382,14 @@ The Jupyter environment can be configured with the packages and scripts you need
view.add_representation('cartoon', selection='protein', color='residueindex')
view.add_representation('ball+stick', selection='hetero')
view._remote_call('setSize', target='Widget', args=[width, height])
-
+
# Set initial view
view._remote_call('autoView')
view._remote_call('centerView')
-
+
# Adjust zoom level (you may need to adjust this value)
view._remote_call('zoom', target='stage', args=[0.8])
-
+
return view
def compare_proteins(pdb_files):
@@ -406,7 +406,7 @@ The Jupyter environment can be configured with the packages and scripts you need
1. Set up file paths and create file dictionary:
- ```python
+ ```python
# Replace with the actual paths to your AlphaFold2 and ColabFold PDB files
alphafold_pdb = "data/path/to/your/alphafold/output.pdb"
colabfold_pdb = "data/path/to/your/colabfold/output.pdb"
@@ -453,9 +453,9 @@ The Jupyter environment can be configured with the packages and scripts you need
description='Select method:',
disabled=False,
)
-
+
info_output = widgets.Output()
-
+
def on_change(change):
with info_output:
info_output.clear_output()
@@ -464,9 +464,9 @@ The Jupyter environment can be configured with the packages and scripts you need
print(f"Selected method: {selected_method}")
print(f"File path: {selected_file}")
print(f"File size: {os.path.getsize(selected_file) / 1024:.2f} KB")
-
+
method_dropdown.observe(on_change, names='value')
-
+
display(HTML("Structure Information:
"))
display(widgets.VBox([method_dropdown, info_output]))
```
@@ -490,4 +490,3 @@ The Jupyter environment can be configured with the packages and scripts you need
```

-
diff --git a/platform-enterprise_versioned_docs/version-24.1/getting-started/quickstart-demo/launch-pipelines.md b/platform-enterprise_versioned_docs/version-24.1/getting-started/quickstart-demo/launch-pipelines.md
index e33272298..5fd38fb37 100644
--- a/platform-enterprise_versioned_docs/version-24.1/getting-started/quickstart-demo/launch-pipelines.md
+++ b/platform-enterprise_versioned_docs/version-24.1/getting-started/quickstart-demo/launch-pipelines.md
@@ -20,12 +20,12 @@ The Launchpad in every Platform workspace allows users to easily create and shar
## Launch a pipeline
:::note
-This guide is based on version 3.15.1 of the [nf-core/rnaseq pipeline](https://github.com/nf-core/rnaseq). Launch form parameters and tools will differ for other pipelines.
+This guide is based on version 3.15.1 of the [nf-core/rnaseq pipeline](https://github.com/nf-core/rnaseq). Launch form parameters and tools will differ for other pipelines.
:::
Navigate to the Launchpad and select **Launch** next to your pipeline to open the launch form.
-The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
+The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
Nextflow parameter schema
@@ -33,48 +33,48 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc
The launch form lets you configure the pipeline execution. The pipeline parameters in this form are rendered from a [pipeline schema](../../pipeline-schema/overview) file in the root of the pipeline Git repository. `nextflow_schema.json` is a simple JSON-based schema describing pipeline parameters for pipeline developers to easily adapt their in-house Nextflow pipelines to be executed in Platform.
:::tip
- See [Best Practices for Deploying Pipelines with the Seqera Platform](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-seqera-platform/) to learn how to build the parameter schema for any Nextflow pipeline automatically with tooling maintained by the nf-core community.
+ See [Best Practices for Deploying Pipelines with the Seqera Platform](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-seqera-platform/) to learn how to build the parameter schema for any Nextflow pipeline automatically with tooling maintained by the nf-core community.
:::
-### General config
+### General config

- **Pipeline to launch**: The pipeline Git repository name or URL. For saved pipelines, this is prefilled and cannot be edited.
- **Revision number**: A valid repository commit ID, tag, or branch name. For saved pipelines, this is prefilled and cannot be edited.
-- (*Optional*) **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution.
+- (*Optional*) **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution.
- **Workflow run name**: An identifier for the run, pre-filled with a random name. This can be customized.
- (*Optional*) **Labels**: Assign new or existing [labels](../../labels/overview) to the run.
-- **Compute environment**: Select an existing workspace [compute environment](../../compute-envs/overview).
+- **Compute environment**: Select an existing workspace [compute environment](../../compute-envs/overview).
- **Work directory**: The (cloud or local) file storage path where pipeline scratch data is stored. Platform will create a scratch sub-folder if only a cloud bucket location is specified.
:::note
The credentials associated with the compute environment must have access to the work directory.
:::
-### Run parameters
+### Run parameters

There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
Specify your pipeline input and output and modify other pipeline parameters as needed:
#### input
-Use **Browse** to select your pipeline input data:
+Use **Browse** to select your pipeline input data:
- In the **Data Explorer** tab, select the existing cloud bucket that contains your samplesheet, browse or search for the samplesheet file, and select the chain icon to copy the file path before closing the data selection window and pasting the file path in the input field.
- In the **Datasets** tab, search for and select your existing dataset.
#### outdir
-Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
+Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
**Browse** and copy cloud storage directory paths using Data Explorer, or enter a path manually.
@@ -84,10 +84,10 @@ Modify other parameters to customize the pipeline execution through the paramete

-### Advanced settings
+### Advanced settings
-- Use [resource labels](../../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
+- Use [resource labels](../../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
- [Pipeline secrets](../../secrets/overview) store keys and tokens used by workflow tasks to interact with external systems. Enter the names of any stored user or workspace secrets required for the workflow execution.
- See [Advanced options](../../launch/advanced) for more details.
-After you have filled the necessary launch details, select **Launch**. The **Runs** tab shows your new run in a **submitted** status at the top of the list. Select the run name to navigate to the [**View Workflow Run**](../../monitoring/overview) page and view the configuration, parameters, status of individual tasks, and run report.
\ No newline at end of file
+After you have filled the necessary launch details, select **Launch**. The **Runs** tab shows your new run in a **submitted** status at the top of the list. Select the run name to navigate to the [**View Workflow Run**](../../monitoring/overview) page and view the configuration, parameters, status of individual tasks, and run report.
diff --git a/platform-enterprise_versioned_docs/version-24.1/getting-started/rnaseq.mdx b/platform-enterprise_versioned_docs/version-24.1/getting-started/rnaseq.mdx
index 32a11022d..7ad7677fc 100644
--- a/platform-enterprise_versioned_docs/version-24.1/getting-started/rnaseq.mdx
+++ b/platform-enterprise_versioned_docs/version-24.1/getting-started/rnaseq.mdx
@@ -9,10 +9,10 @@ toc_max_heading_level: 2
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
-This guide details how to run bulk RNA sequencing (RNA-Seq) data analysis, from quality control to differential expression analysis, on an AWS Batch compute environment in Platform. It includes:
+This guide details how to run bulk RNA sequencing (RNA-Seq) data analysis, from quality control to differential expression analysis, on an AWS Batch compute environment in Platform. It includes:
- Creating an AWS Batch compute environment to run your pipeline and analysis environment
-- Adding pipelines to your workspace
+- Adding pipelines to your workspace
- Importing your pipeline input data
- Launching the pipeline and monitoring execution from your workspace
- Setting up a custom analysis environment with Data Studios
@@ -23,17 +23,17 @@ You will need the following to get started:
- [Admin](../orgs-and-teams/roles) permissions in an existing organization workspace. See [Set up your workspace](./workspace-setup) to create an organization and workspace from scratch.
- An existing AWS cloud account with access to the AWS Batch service.
-- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
+- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
:::
## Compute environment
-Compute and storage requirements for RNA-Seq analysis are dependent on the number of samples and the sequencing depth of your input data. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for details on RNA-Seq datasets and the CPU and memory requirements for important steps of RNA-Seq pipelines.
+Compute and storage requirements for RNA-Seq analysis are dependent on the number of samples and the sequencing depth of your input data. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for details on RNA-Seq datasets and the CPU and memory requirements for important steps of RNA-Seq pipelines.
-In this guide, you will create an AWS Batch compute environment with sufficient resources allocated to run the [*nf-core/rnaseq*](https://github.com/nf-core/rnaseq) pipeline with a large dataset. This compute environment will also be used to run a Data Studios R-IDE for interactive analysis of the resulting pipeline data.
+In this guide, you will create an AWS Batch compute environment with sufficient resources allocated to run the [*nf-core/rnaseq*](https://github.com/nf-core/rnaseq) pipeline with a large dataset. This compute environment will also be used to run a Data Studios R-IDE for interactive analysis of the resulting pipeline data.
:::note
-The compute recommendations below are based on internal benchmarking performed by Seqera. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for more information.
+The compute recommendations below are based on internal benchmarking performed by Seqera. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for more information.
:::
### Recommended compute environment resources
@@ -48,13 +48,13 @@ The following compute resources are recommended for production RNA-Seq pipelines
| **Max CPUs** | >500 |
| **Min CPUs** | 0 |
-#### Fusion file system
+#### Fusion file system
The [Fusion](../supported_software/fusion/overview) file system enables seamless read and write operations to cloud object stores, leading to simpler pipeline logic and faster, more efficient execution. While Fusion is not required to run *nf-core/rnaseq*, it is recommended for optimal performance. See [*nf-core/rnaseq* performance in Platform](#nf-corernaseq-performance-in-platform) at the end of this guide.
-Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage (see Create compute environment below), you must select EC2 instances with NVMe SSD storage (`m5d` or `r5d` families).
+Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage (see Create compute environment below), you must select EC2 instances with NVMe SSD storage (`m5d` or `r5d` families).
-:::note
+:::note
Fusion requires a license for use in Seqera Platform compute environments or directly in Nextflow. See [Fusion licensing](https://docs.seqera.io/fusion/licensing) for more information.
:::
@@ -84,7 +84,7 @@ From the **Compute Environments** tab in your organization workspace, select **A
| **Resource labels** | `name=value` pairs to tag the AWS resources created by this compute environment.|
-## Add pipeline to Platform
+## Add pipeline to Platform
:::info
The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline is a highly configurable and robust workflow designed to analyze RNA-Seq data. It performs quality control, alignment and quantification.
@@ -99,28 +99,28 @@ To use Seqera Pipelines to import the *nf-core/rnaseq* pipeline to your workspac

1. Search for `nf-core/rnaseq` and select **Launch** next to the pipeline name in the list. In the **Add pipeline** tab, select **Cloud** or **Enterprise** depending on your Platform account type, then provide the information needed for Seqera Pipelines to access your Platform instance:
- - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
+ - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
- **Seqera Enterprise**: Specify the **Seqera Platform URL** (hostname) and **Base API URL** for your Enterprise instance, then paste your Platform **Access token** and select **Next**.
:::tip
If you do not have a Platform access token, select **Get your access token from Seqera Platform** to open the Access tokens page in a new browser tab.
:::
-1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
+1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
1. (Optional) Customize the **Pipeline Name** and **Pipeline Description**.
-1. Select **Add Pipeline**.
+1. Select **Add Pipeline**.
:::info
-To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
+To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
:::
## Pipeline input data
-The [*nf-core/rnaseq*](https://github.com/nf-core/rnaseq) pipeline works with input datasets (samplesheets) containing sample names, FASTQ file locations (paths to FASTQ files in cloud or local storage), and strandedness. For example, the dataset used in the `test_full` profile is derived from the publicly available iGenomes collection of datasets, commonly used in bioinformatics analyses.
+The [*nf-core/rnaseq*](https://github.com/nf-core/rnaseq) pipeline works with input datasets (samplesheets) containing sample names, FASTQ file locations (paths to FASTQ files in cloud or local storage), and strandedness. For example, the dataset used in the `test_full` profile is derived from the publicly available iGenomes collection of datasets, commonly used in bioinformatics analyses.
This dataset represents RNA-Seq samples from various human cell lines (GM12878, K562, MCF7, and H1) with biological replicates, stored in an AWS S3 bucket (`s3://ngi-igenomes`) as part of the iGenomes resource. These RNA-Seq datasets consist of paired-end sequencing reads, which can be used to study gene expression patterns in different cell types.
**nf-core/rnaseq test_full profile dataset**
-
+
| sample | fastq_1 | fastq_2 | strandedness |
|--------|---------|---------|--------------|
| GM12878_REP1 | s3://ngi-igenomes/test-data/rnaseq/SRX1603629_T1_1.fastq.gz | s3://ngi-igenomes/test-data/rnaseq/SRX1603629_T1_2.fastq.gz | reverse |
@@ -134,12 +134,12 @@ This dataset represents RNA-Seq samples from various human cell lines (GM12878,
-In Platform, samplesheets and other data can be made easily accessible in one of two ways:
+In Platform, samplesheets and other data can be made easily accessible in one of two ways:
- Use **Data Explorer** to browse and interact with remote data from AWS S3, Azure Blob Storage, and Google Cloud Storage repositories, directly in your organization workspace.
- Use **Datasets** to upload structured data to your workspace in CSV (Comma-Separated Values) or TSV (Tab-Separated Values) format.
- **Add a cloud bucket via Data Explorer**
+ **Add a cloud bucket via Data Explorer**
Private cloud storage buckets accessible with the credentials in your workspace are added to Data Explorer automatically by default. However, you can also add custom directory paths within buckets to your workspace to simplify direct access.
@@ -147,7 +147,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of

- 1. From the **Data Explorer** tab, select **Add cloud bucket**.
+ 1. From the **Data Explorer** tab, select **Add cloud bucket**.
1. Specify the bucket details:
- The cloud **Provider**.
- An existing cloud **Bucket path**.
@@ -156,7 +156,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- An optional bucket **Description**.
1. Select **Add**.
- You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
+ You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
@@ -174,7 +174,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- Select the **First row as header** option to prevent Platform from parsing the header row of the samplesheet as sample data.
- Select **Upload file** and browse to your CSV or TSV samplesheet file in local storage, or simply drag and drop it into the box.
- The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
+ The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
:::info
Platform does not store the data used for analysis in pipelines. The dataset must specify the locations of data stored on your own infrastructure.
@@ -185,14 +185,14 @@ In Platform, samplesheets and other data can be made easily accessible in one of
## Launch pipeline
:::note
-This guide is based on version 3.15.1 of the *nf-core/rnaseq* pipeline. Launch form parameters and tools may differ in other versions.
+This guide is based on version 3.15.1 of the *nf-core/rnaseq* pipeline. Launch form parameters and tools may differ in other versions.
:::
With your compute environment created, *nf-core/rnaseq* added to your workspace Launchpad, and your samplesheet accessible in Platform, you are ready to launch your pipeline. Navigate to the Launchpad and select **Launch** next to `nf-core-rnaseq` to open the launch form.
-The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
+The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
-### General config
+### General config

@@ -201,30 +201,30 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc
- **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution. Config profiles must be defined in the `nextflow.config` file in the pipeline repository.
- **Workflow run name**: An identifier for the run, pre-filled with a random name. This can be customized.
- **Labels**: Assign new or existing [labels](../labels/overview) to the run.
-- **Compute environment**: Your AWS Batch compute environment.
+- **Compute environment**: Your AWS Batch compute environment.
- **Work directory**: The cloud storage path where pipeline scratch data is stored. Platform will create a scratch sub-folder if only a cloud bucket location is specified.
:::note
The credentials associated with the compute environment must have access to the work directory.
:::
-### Run parameters
+### Run parameters

There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
-Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.
+Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.
-Specify your pipeline input and output and modify other pipeline parameters as needed.
+Specify your pipeline input and output and modify other pipeline parameters as needed.
**input**
- Use **Browse** to select your pipeline input data:
+ Use **Browse** to select your pipeline input data:
- In the **Data Explorer** tab, select the existing cloud bucket that contains your samplesheet, browse or search for the samplesheet file, and select the chain icon to copy the file path before closing the data selection window and pasting the file path in the input field.
- In the **Datasets** tab, search for and select your existing dataset.
@@ -233,7 +233,7 @@ Specify your pipeline input and output and modify other pipeline parameters as n
**outdir**
- Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
+ Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
**Browse** and copy cloud storage directory paths using Data Explorer, or enter a path manually.
@@ -243,9 +243,9 @@ Modify other parameters to customize the pipeline execution through the paramete

-### Advanced settings
+### Advanced settings
-- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
+- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
- [Pipeline secrets](../secrets/overview) store keys and tokens used by workflow tasks to interact with external systems. Enter the names of any stored user or workspace secrets required for the workflow execution.
- See [Advanced options](../launch/advanced) for more details.
@@ -282,7 +282,7 @@ After you have filled the necessary launch details, select **Launch**. The **Run
The paths to report files point to a location in cloud storage (in the `outdir` directory specified during launch), but you can view the contents directly and download each file without navigating to the cloud or a remote filesystem.
:::info
- See [Reports](../reports/overview) for more information.
+ See [Reports](../reports/overview) for more information.
:::
#### View general information
@@ -308,9 +308,9 @@ After you have filled the necessary launch details, select **Launch**. The **Run

- - The **About** tab contains extensive task execution details.
+ - The **About** tab contains extensive task execution details.
- The **Execution log** tab provides a real-time log of the selected task's execution. Task execution and other logs (such as stdout and stderr) are available for download from here, if still available in your compute environment.
- - The **Data Explorer** tab allows you to view the task working directory directly in Platform.
+ - The **Data Explorer** tab allows you to view the task working directory directly in Platform.
Nextflow hash-addresses each task of the pipeline and creates unique directories based on these hashes. Data Explorer allows you to view the log files and output files generated for each task in its working directory, directly within Platform. You can view, download, and retrieve the link for these intermediate files in cloud storage from the **Data Explorer** tab to simplify troubleshooting.
@@ -326,9 +326,9 @@ For the purposes of this guide, an R-IDE will be used to normalize the pipeline
### Prepare your data
-#### Gene counts
+#### Gene counts
-Salmon is the default tool used during the `pseudo-aligner` step of the nf-core/rnaseq pipeline. In the pipeline output data, the `/salmon` directory contains the tool's output, including a `salmon.merged.gene_counts_length_scaled.tsv` file.
+Salmon is the default tool used during the `pseudo-aligner` step of the nf-core/rnaseq pipeline. In the pipeline output data, the `/salmon` directory contains the tool's output, including a `salmon.merged.gene_counts_length_scaled.tsv` file.
#### Sample info
@@ -370,15 +370,15 @@ The analysis script provided in this section requires a sample information file
From the **Data Studios** tab, select **Add a data studio** and complete the following:
- Select the latest **R-IDE** container image template from the list.
-- Select your AWS Batch compute environment.
+- Select your AWS Batch compute environment.
:::note
-Data studios compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and data studio sessions. The default CPU and memory allocation for a data studio is 2 CPUs and 8192 MB RAM.
+Data studios compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and data studio sessions. The default CPU and memory allocation for a data studio is 2 CPUs and 8192 MB RAM.
:::
-- Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your RNA-Seq run.
+- Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your RNA-Seq run.
- Optional: Enter CPU and memory allocations. The default values are 2 CPUs and 8192 MB memory (RAM).
- Select **Add**.
- Once the data studio has been created, select the options menu next to it and select **Start**.
-- When the data studio is in a running state, **Connect** to it.
+- When the data studio is in a running state, **Connect** to it.
### Perform the analysis and explore results
@@ -466,7 +466,7 @@ The R-IDE can be configured with the packages you wish to install and the R scri
:::info
MDS plots are used to visualize the overall similarity between RNA-Seq samples based on their gene expression profiles, helping to identify sample clusters and potential batch effects.
:::
-
+
```r
# Create MDS plot
# a. Display in RStudio
@@ -522,7 +522,7 @@ The R-IDE can be configured with the packages you wish to install and the R scri
names(results) <- colnames(my.contrasts)
```
- :::info
+ :::info
This script is written for the analysis of human data, based on *nf-core/rnaseq*'s `test_full` dataset. To adapt the script for your data, modify the contrasts based on the comparisons you want to make between your sample groups:
```r
@@ -535,7 +535,7 @@ The R-IDE can be configured with the packages you wish to install and the R scri
```
:::
-1. Print the number of differentially expressed genes for each comparison and save the results to CSV files:
+1. Print the number of differentially expressed genes for each comparison and save the results to CSV files:
```r
# Print the number of differentially expressed genes for each comparison
@@ -670,20 +670,20 @@ The *nf-core/rnaseq* pipeline involves several key steps, each with distinct com
#### Overall run metrics
-**Total pipeline run cost (USD)**:
+**Total pipeline run cost (USD)**:
- Fusion file system with fast instance storage: $34.90
- Plain S3 storage without Fusion: $58.40
**Pipeline runtime**:
-The Fusion file system used with NVMe instance storage contributed to a 34% improvement in total pipeline runtime and a 49% reduction in CPU hours.
+The Fusion file system used with NVMe instance storage contributed to a 34% improvement in total pipeline runtime and a 49% reduction in CPU hours.

#### Process run time
-The Fusion file system demonstrates significant performance improvements for most processes in the *nf-core/rnaseq* pipeline, particularly for I/O-intensive tasks:
+The Fusion file system demonstrates significant performance improvements for most processes in the *nf-core/rnaseq* pipeline, particularly for I/O-intensive tasks:
- The most time-consuming processes see improvements of 36.07% to 70.15%, saving hours of runtime in a full pipeline execution.
- Most processes show significant performance improvements with Fusion, with time savings ranging from 35.57% to 99.14%.
@@ -691,7 +691,7 @@ The Fusion file system demonstrates significant performance improvements for mos
- `SALMON_INDEX` shows a notable 70.15% improvement, reducing runtime from 102.18 minutes to 30.50 minutes.
- `STAR_ALIGN_IGENOMES`, one of the most time-consuming processes, is 53.82% faster with Fusion, saving nearly an hour of runtime.
-
+
| Process | S3 Runtime (min) | Fusion Runtime (min) | Time Saved (min) | Improvement (%) |
|---------|------------------|----------------------|------------------|-----------------|
@@ -730,9 +730,9 @@ The Fusion file system demonstrates significant performance improvements for mos
This profile consists of Nextflow configuration settings for each process and each resource directive (where applicable): **cpus**, **memory**, and **time**. The optimized setting for a given process and resource directive is based on the maximum use of that resource across all tasks in that process.
- Once optimization is selected, subsequent runs of that pipeline will inherit the optimized configuration profile, indicated by the black lightbulb icon with a checkmark.
+ Once optimization is selected, subsequent runs of that pipeline will inherit the optimized configuration profile, indicated by the black lightbulb icon with a checkmark.
- :::info
+ :::info
Optimization profiles are generated from one run at a time, defaulting to the most recent run, and _not_ an aggregation of previous runs.
:::
diff --git a/platform-enterprise_versioned_docs/version-24.1/launch/launchpad.md b/platform-enterprise_versioned_docs/version-24.1/launch/launchpad.md
index 2907f341a..d6c89326a 100644
--- a/platform-enterprise_versioned_docs/version-24.1/launch/launchpad.md
+++ b/platform-enterprise_versioned_docs/version-24.1/launch/launchpad.md
@@ -52,7 +52,7 @@ For saved pipelines, **General config** and **Run parameters** fields are prefil
There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text, select attributes from dropdowns, and browse input and output locations with [Data Explorer](../data/data-explorer).
-- The **Config view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** dropdown.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** dropdown.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
Seqera uses a `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters. Most pipelines contain at least input and output parameters:
diff --git a/platform-enterprise_versioned_docs/version-24.2/getting-started/proteinfold.mdx b/platform-enterprise_versioned_docs/version-24.2/getting-started/proteinfold.mdx
index f2c805084..c60acc853 100644
--- a/platform-enterprise_versioned_docs/version-24.2/getting-started/proteinfold.mdx
+++ b/platform-enterprise_versioned_docs/version-24.2/getting-started/proteinfold.mdx
@@ -9,10 +9,10 @@ toc_max_heading_level: 2
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
-This guide details how to perform best-practice analysis for protein 3D structure prediction on an AWS Batch compute environment in Platform. It includes:
+This guide details how to perform best-practice analysis for protein 3D structure prediction on an AWS Batch compute environment in Platform. It includes:
- Creating AWS Batch compute environments to run your pipeline and downstream analysis
-- Adding the nf-core/proteinfold pipeline to your workspace
+- Adding the nf-core/proteinfold pipeline to your workspace
- Importing your pipeline input data
- Launching the pipeline and monitoring execution from your workspace
- Setting up a custom analysis environment with Data Studios
@@ -22,7 +22,7 @@ You will need the following to get started:
- [Admin](../orgs-and-teams/roles) permissions in an existing organization workspace. See [Set up your workspace](./workspace-setup) to create an organization and workspace from scratch.
- An existing AWS cloud account with access to the AWS Batch service.
-- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
+- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
:::
## Compute environment
@@ -35,25 +35,25 @@ Given the data sizes and computational intensity, production pipelines perform b
The nf-core/proteinfold pipeline performs protein folding prediction using one of three deep learning models: AlphaFold2, ColabFold, or ESMFold. The computationally intensive tasks for protein structure prediction perform better on GPUs due to their ability to handle large matrix operations efficiently and perform parallel computations. GPUs can dramatically reduce the time required for protein structure predictions, making it feasible to analyze larger datasets or perform more complex simulations.
-Platform supports the allocation of both CPUs and GPUs in the same compute environment. For example, specify `m6id`, `c6id`, `r6id`, `g5`, `p3` instance families in the **Instance types** field when creating your AWS Batch compute environment. See [Create compute environment](#create-compute-environment) below.
+Platform supports the allocation of both CPUs and GPUs in the same compute environment. For example, specify `m6id`, `c6id`, `r6id`, `g5`, `p3` instance families in the **Instance types** field when creating your AWS Batch compute environment. See [Create compute environment](#create-compute-environment) below.
-When you launch nf-core/proteinfold in Platform, enable **use_gpu** to instruct Nextflow to run GPU-compatible pipeline processes on GPU instances. See [Launch pipeline](#launch-pipeline) below.
+When you launch nf-core/proteinfold in Platform, enable **use_gpu** to instruct Nextflow to run GPU-compatible pipeline processes on GPU instances. See [Launch pipeline](#launch-pipeline) below.
### Fusion file system
The [Fusion](../supported_software/fusion/overview) file system enables seamless read and write operations to cloud object stores, leading to
-simpler pipeline logic and faster, more efficient execution. While Fusion is not required to run nf-core/proteinfold, it significantly enhances I/O-intensive tasks and eliminates the need for intermediate data copies, which is particularly beneficial when working with the large databases used by deep learning models for prediction.
+simpler pipeline logic and faster, more efficient execution. While Fusion is not required to run nf-core/proteinfold, it significantly enhances I/O-intensive tasks and eliminates the need for intermediate data copies, which is particularly beneficial when working with the large databases used by deep learning models for prediction.
-Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage, you must select EC2 instances with NVMe SSD storage (`g4dn`, `g5`, or `P3` families or greater).
+Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage, you must select EC2 instances with NVMe SSD storage (`g4dn`, `g5`, or `P3` families or greater).
-:::note
+:::note
Fusion requires a license for use in Seqera Platform compute environments or directly in Nextflow. Fusion can be trialed at no cost. [Contact Seqera](https://seqera.io/contact-us/) for more details.
:::
### Create compute environment
:::info
-The same compute environment can be used for pipeline execution and running your Data Studios notebook environment, but Data Studios does not support AWS Fargate. To use this compute environment for both nf-core/proteinfold execution and your data studio, leave **Enable Fargate for head job** disabled and include a CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
+The same compute environment can be used for pipeline execution and running your Data Studios notebook environment, but Data Studios does not support AWS Fargate. To use this compute environment for both nf-core/proteinfold execution and your data studio, leave **Enable Fargate for head job** disabled and include a CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
Alternatively, create a second basic AWS Batch compute environment and a data studio with at least 2 CPUs and 8192 MB of RAM.
:::
@@ -78,7 +78,7 @@ From the **Compute Environments** tab in your organization workspace, select **A
| **Enable Fargate for head job** | Run the Nextflow head job using the Fargate container service to speed up pipeline launch. Requires Fusion v2. Do not enable for Data Studios compute environments. |
| **Use Amazon-recommended GPU-optimized ECS AMI** | When enabled, Batch Forge specifies the most current AWS-recommended GPU-optimized ECS AMI as the EC2 fleet AMI when creating the compute environment. |
| **Allowed S3 buckets** | Additional S3 buckets or paths to be granted read-write permission for this compute environment. For the purposes of this guide, add `s3://proteinfold-dataset` to grant compute environment access to the DB and params used for prediction by AlphaFold2 and ColabFold. |
-| **Instance types** | Specify the instance types to be used for computation. You must include GPU-enabled instance types (`g4dn`, `g5`) when the Amazon-recommended GPU-optimized ECS AMI is in use. Include CPU-based instance families for Data Studios compute environments. |
+| **Instance types** | Specify the instance types to be used for computation. You must include GPU-enabled instance types (`g4dn`, `g5`) when the Amazon-recommended GPU-optimized ECS AMI is in use. Include CPU-based instance families for Data Studios compute environments. |
| **Resource labels** | `name=value` pairs to tag the AWS resources created by this compute environment.|

@@ -98,17 +98,17 @@ To use Seqera Pipelines to import the `nf-core/proteinfold` pipeline to your wor

1. Search for `nf-core/proteinfold` and select **Launch** next to the pipeline name in the list. In the **Add pipeline** tab, select **Cloud** or **Enterprise** depending on your Platform account type, then provide the information needed for Seqera Pipelines to access your Platform instance:
- - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
+ - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
- **Seqera Enterprise**: Specify the **Seqera Platform URL** (hostname) and **Base API URL** for your Enterprise instance, then paste your Platform **Access token** and select **Next**.
:::tip
If you do not have a Platform access token, select **Get your access token from Seqera Platform** to open the Access tokens page in a new browser tab.
:::
-1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
+1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
1. (Optional) Customize the **Pipeline Name** and **Pipeline Description**.
-1. Select **Add Pipeline**.
+1. Select **Add Pipeline**.
:::info
-To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
+To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
:::
## Pipeline input data
@@ -117,7 +117,7 @@ The [nf-core/proteinfold](https://github.com/nf-core/proteinfold) pipeline works
**nf-core/proteinfold example samplesheet**
-
+
| sequence | fasta |
| -------- | ----- |
| T1024 | https://raw.githubusercontent.com/nf-core/test-datasets/proteinfold/testdata/sequences/T1024.fasta |
@@ -125,12 +125,12 @@ The [nf-core/proteinfold](https://github.com/nf-core/proteinfold) pipeline works
-In Platform, samplesheets and other data can be made easily accessible in one of two ways:
+In Platform, samplesheets and other data can be made easily accessible in one of two ways:
- Use **Data Explorer** to browse and interact with remote data from AWS S3, Azure Blob Storage, and Google Cloud Storage repositories, directly in your organization workspace.
- Use **Datasets** to upload structured data to your workspace in CSV (Comma-Separated Values) or TSV (Tab-Separated Values) format.
- **Add a cloud bucket via Data Explorer**
+ **Add a cloud bucket via Data Explorer**
Private cloud storage buckets accessible with the credentials in your workspace are added to Data Explorer automatically by default. However, you can also add custom directory paths within buckets to your workspace to simplify direct access.
@@ -138,7 +138,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of

- 1. From the **Data Explorer** tab, select **Add cloud bucket**.
+ 1. From the **Data Explorer** tab, select **Add cloud bucket**.
1. Specify the bucket details:
- The cloud **Provider**: AWS
- An existing cloud **Bucket path**: `s3://proteinfold-dataset`
@@ -147,7 +147,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- An optional bucket **Description**.
1. Select **Add**.
- You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
+ You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
@@ -165,7 +165,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- Select the **First row as header** option to prevent Platform from parsing the header row of the samplesheet as sample data.
- Select **Upload file** and browse to your CSV or TSV samplesheet file in local storage, or simply drag and drop it into the box.
- The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
+ The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
:::info
Platform does not store the data used for analysis in pipelines. The dataset must specify the locations of data stored on your own infrastructure.
@@ -176,21 +176,21 @@ In Platform, samplesheets and other data can be made easily accessible in one of
## Launch pipeline
:::note
-This guide is based on [version 1.1.1](https://nf-co.re/proteinfold/1.1.1) of the nf-core/proteinfold pipeline. Launch form parameters and tools may differ in other versions.
+This guide is based on [version 1.1.1](https://nf-co.re/proteinfold/1.1.1) of the nf-core/proteinfold pipeline. Launch form parameters and tools may differ in other versions.
:::
With your compute environment created, nf-core/proteinfold added to your workspace Launchpad, and your samplesheet accessible in Platform, you are ready to launch your pipeline. Navigate to the Launchpad and select **Launch** next to `nf-core-proteinfold` to open the launch form.
-The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
+The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
-### General config
+### General config
- **Pipeline to launch**: The pipeline Git repository name or URL: `https://github.com/nf-core/proteinfold`. For saved pipelines, this is prefilled and cannot be edited.
- **Revision number**: A valid repository commit ID, tag, or branch name: `1.1.1`. For saved pipelines, this is prefilled and cannot be edited.
-- **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution. Config profiles must be defined in the `nextflow.config` file in the pipeline repository. Benchmarking runs for this guide used nf-core profiles with included test datasets — `test_full_alphafold2_multimer` for Alphafold2 and `test_full_alphafold2_multimer` for Colabfold.
+- **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution. Config profiles must be defined in the `nextflow.config` file in the pipeline repository. Benchmarking runs for this guide used nf-core profiles with included test datasets — `test_full_alphafold2_multimer` for Alphafold2 and `test_full_alphafold2_multimer` for Colabfold.
- **Workflow run name**: An identifier for the run, pre-filled with a random name. This can be customized.
- **Labels**: Assign new or existing [labels](../labels/overview) to the run.
-- **Compute environment**: Your AWS Batch compute environment.
+- **Compute environment**: Your AWS Batch compute environment.
- **Work directory**: The cloud storage path where pipeline scratch data is stored. Platform will create a scratch sub-folder if only a cloud bucket location is specified.
:::note
The credentials associated with the compute environment must have access to the work directory.
@@ -198,24 +198,24 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc

-### Run parameters
+### Run parameters
There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
-Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.
+Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.

-Specify your pipeline input and output and modify other pipeline parameters as needed.
+Specify your pipeline input and output and modify other pipeline parameters as needed.
**input**
- Use **Browse** to select your pipeline input data:
+ Use **Browse** to select your pipeline input data:
- In the **Data Explorer** tab, select the existing cloud bucket that contains your samplesheet, browse or search for the samplesheet file, and select the chain icon to copy the file path before closing the data selection window and pasting the file path in the input field.
- In the **Datasets** tab, search for and select your existing dataset.
@@ -224,24 +224,24 @@ Specify your pipeline input and output and modify other pipeline parameters as n
**outdir**
- Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
+ Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
**Browse** and copy cloud storage directory paths using Data Explorer, or enter a path manually.
-- The **mode** menu allows you to select the deep learning model used for structure prediction (`alphafold2`, `colabfold`, or `esmfold`).
-- Enable **use_gpu** to run GPU-compatible tasks on GPUs. This requires **Use Amazon-recommended GPU-optimized ECS AMI** to be enabled and GPU-enabled instances to be specified under **Instance types** in your compute environment.
+- The **mode** menu allows you to select the deep learning model used for structure prediction (`alphafold2`, `colabfold`, or `esmfold`).
+- Enable **use_gpu** to run GPU-compatible tasks on GPUs. This requires **Use Amazon-recommended GPU-optimized ECS AMI** to be enabled and GPU-enabled instances to be specified under **Instance types** in your compute environment.

:::info
For the purposes of this guide, run the pipeline in both `alphafold2` and `colabfold` modes. Specify unique directory paths for the `outdir` parameter (such as "Alphafold2" and "ColabFold") to ensure output data is kept separate and not overwritten. Predicted protein structures for each model will be visualized side-by-side in the [Interactive analysis](#interactive-analysis-with-studios) section.
-:::
+:::
-### Advanced settings
+### Advanced settings
-- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
+- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
- [Pipeline secrets](../secrets/overview) store keys and tokens used by workflow tasks to interact with external systems. Enter the names of any stored user or workspace secrets required for the workflow execution.
- See [Advanced options](../launch/advanced) for more details.
@@ -272,7 +272,7 @@ After you have filled the necessary launch details, select **Launch**. The **Run
The paths to report files point to a location in cloud storage (in the `outdir` directory specified during launch), but you can view the contents directly and download each file without navigating to the cloud or a remote filesystem.
:::info
- See [Reports](../reports/overview) for more information.
+ See [Reports](../reports/overview) for more information.
:::
#### View general information
@@ -296,9 +296,9 @@ After you have filled the necessary launch details, select **Launch**. The **Run
Select a task in the task table to open the **Task details** dialog. The dialog has three tabs:
- - The **About** tab contains extensive task execution details.
+ - The **About** tab contains extensive task execution details.
- The **Execution log** tab provides a real-time log of the selected task's execution. Task execution and other logs (such as stdout and stderr) are available for download from here, if still available in your compute environment.
- - The **Data Explorer** tab allows you to view the task working directory directly in Platform.
+ - The **Data Explorer** tab allows you to view the task working directory directly in Platform.

@@ -310,29 +310,29 @@ After you have filled the necessary launch details, select **Launch**. The **Run
[Studios](../data_studios/overview.md) streamlines the process of creating interactive analysis environments for Platform users. With built-in templates for platforms like Jupyter Notebook, RStudio, and VS Code, creating a data studio is as simple as adding and sharing pipelines or datasets. The data studio URL can also be shared with any user with the [Connect role](../orgs-and-teams/roles.md) for real-time access and collaboration.
-For the purposes of this guide, a Jupyter Notebook environment will be used for interactive visualization of the predicted protein structures, optionally comparing AlphaFold2 and Colabfold structures for the same sequence data.
+For the purposes of this guide, a Jupyter Notebook environment will be used for interactive visualization of the predicted protein structures, optionally comparing AlphaFold2 and Colabfold structures for the same sequence data.
### Create a Jupyter notebook data studio
From the **Data Studios** tab, select **Add a data studio** and complete the following:
- In the **Compute & Data** tab:
- - Select your AWS Batch compute environment.
+ - Select your AWS Batch compute environment.
:::info
- The same compute environment can be used for pipeline execution and running your Data Studios notebook environment, but Data Studios does not support AWS Fargate and data studio sessions must run on CPUs. To use one compute environment for both nf-core/proteinfold execution and your data studio, leave **Enable Fargate for head job** disabled and include at least one CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
+ The same compute environment can be used for pipeline execution and running your Data Studios notebook environment, but Data Studios does not support AWS Fargate and data studio sessions must run on CPUs. To use one compute environment for both nf-core/proteinfold execution and your data studio, leave **Enable Fargate for head job** disabled and include at least one CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
Alternatively, create a second basic AWS Batch compute environment with at least 2 CPUs and 8192 MB of RAM for your data studio.
:::
- Optional: Enter CPU and memory allocations. The default values are 2 CPUs and 8192 MB memory (RAM).
:::note
- Data studios compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and data studio sessions.
+ Data studios compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and data studio sessions.
:::
- - Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your Proteinfold run.
+ - Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your Proteinfold run.
- In the **General config** tab:
- Select the latest **Jupyter** container image template from the list.
- - Optional: Enter a unique name and description for the data studio.
+ - Optional: Enter a unique name and description for the data studio.
- Check **Install Conda packages** and paste the following Conda environment YAML snippet:
- ```yaml
+ ```yaml
channels:
- bioconda
- conda-forge
@@ -345,7 +345,7 @@ From the **Data Studios** tab, select **Add a data studio** and complete the fol
- Confirm the data studio details in the **Summary** tab
- Select **Add** and choose whether to add and start the studio immediately.
-- When the data studio is created and in a running state, **Connect** to it.
+- When the data studio is created and in a running state, **Connect** to it.

@@ -355,7 +355,7 @@ The Jupyter environment can be configured with the packages and scripts you need
1. Import libraries and check versions:
- ```python
+ ```python
import sys
import jupyter_core
import nglview
@@ -373,7 +373,7 @@ The Jupyter environment can be configured with the packages and scripts you need
1. Define visualization functions:
- ```python
+ ```python
import os
import ipywidgets as widgets
from IPython.display import display, HTML
@@ -383,14 +383,14 @@ The Jupyter environment can be configured with the packages and scripts you need
view.add_representation('cartoon', selection='protein', color='residueindex')
view.add_representation('ball+stick', selection='hetero')
view._remote_call('setSize', target='Widget', args=[width, height])
-
+
# Set initial view
view._remote_call('autoView')
view._remote_call('centerView')
-
+
# Adjust zoom level (you may need to adjust this value)
view._remote_call('zoom', target='stage', args=[0.8])
-
+
return view
def compare_proteins(pdb_files):
@@ -407,7 +407,7 @@ The Jupyter environment can be configured with the packages and scripts you need
1. Set up file paths and create file dictionary:
- ```python
+ ```python
# Replace with the actual paths to your AlphaFold2 and ColabFold PDB files
alphafold_pdb = "data/path/to/your/alphafold/output.pdb"
colabfold_pdb = "data/path/to/your/colabfold/output.pdb"
@@ -454,9 +454,9 @@ The Jupyter environment can be configured with the packages and scripts you need
description='Select method:',
disabled=False,
)
-
+
info_output = widgets.Output()
-
+
def on_change(change):
with info_output:
info_output.clear_output()
@@ -465,9 +465,9 @@ The Jupyter environment can be configured with the packages and scripts you need
print(f"Selected method: {selected_method}")
print(f"File path: {selected_file}")
print(f"File size: {os.path.getsize(selected_file) / 1024:.2f} KB")
-
+
method_dropdown.observe(on_change, names='value')
-
+
display(HTML("Structure Information:
"))
display(widgets.VBox([method_dropdown, info_output]))
```
@@ -491,4 +491,3 @@ The Jupyter environment can be configured with the packages and scripts you need
```

-
diff --git a/platform-enterprise_versioned_docs/version-24.2/getting-started/rnaseq.mdx b/platform-enterprise_versioned_docs/version-24.2/getting-started/rnaseq.mdx
index 0ac3d9841..e6ba8371b 100644
--- a/platform-enterprise_versioned_docs/version-24.2/getting-started/rnaseq.mdx
+++ b/platform-enterprise_versioned_docs/version-24.2/getting-started/rnaseq.mdx
@@ -9,10 +9,10 @@ toc_max_heading_level: 2
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
-This guide details how to run bulk RNA sequencing (RNA-Seq) data analysis, from quality control to differential expression analysis, on an AWS Batch compute environment in Platform. It includes:
+This guide details how to run bulk RNA sequencing (RNA-Seq) data analysis, from quality control to differential expression analysis, on an AWS Batch compute environment in Platform. It includes:
- Creating an AWS Batch compute environment to run your pipeline and analysis environment
-- Adding pipelines to your workspace
+- Adding pipelines to your workspace
- Importing your pipeline input data
- Launching the pipeline and monitoring execution from your workspace
- Setting up a custom analysis environment with Data Studios
@@ -23,17 +23,17 @@ You will need the following to get started:
- [Admin](../orgs-and-teams/roles) permissions in an existing organization workspace. See [Set up your workspace](./workspace-setup) to create an organization and workspace from scratch.
- An existing AWS cloud account with access to the AWS Batch service.
-- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
+- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
:::
## Compute environment
-Compute and storage requirements for RNA-Seq analysis are dependent on the number of samples and the sequencing depth of your input data. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for details on RNA-Seq datasets and the CPU and memory requirements for important steps of RNA-Seq pipelines.
+Compute and storage requirements for RNA-Seq analysis are dependent on the number of samples and the sequencing depth of your input data. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for details on RNA-Seq datasets and the CPU and memory requirements for important steps of RNA-Seq pipelines.
-In this guide, you will create an AWS Batch compute environment with sufficient resources allocated to run the [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline with a large dataset. This compute environment will also be used to run a Data Studios RStudio environment for interactive analysis of the resulting pipeline data.
+In this guide, you will create an AWS Batch compute environment with sufficient resources allocated to run the [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline with a large dataset. This compute environment will also be used to run a Data Studios RStudio environment for interactive analysis of the resulting pipeline data.
:::note
-The compute recommendations below are based on internal benchmarking performed by Seqera. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for more information.
+The compute recommendations below are based on internal benchmarking performed by Seqera. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for more information.
:::
### Recommended compute environment resources
@@ -48,14 +48,14 @@ The following compute resources are recommended for production RNA-Seq pipelines
| **Max CPUs** | >500 |
| **Min CPUs** | 0 |
-#### Fusion file system
+#### Fusion file system
The [Fusion](../supported_software/fusion/overview) file system enables seamless read and write operations to cloud object stores, leading to
simpler pipeline logic and faster, more efficient execution. While Fusion is not required to run nf-core/rnaseq, it is recommended for optimal performance. See [nf-core/rnaseq performance in Platform](#nf-corernaseq-performance-in-platform) at the end of this guide.
-Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage (see Create compute environment below), you must select EC2 instances with NVMe SSD storage (`m5d` or `r5d` families).
+Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage (see Create compute environment below), you must select EC2 instances with NVMe SSD storage (`m5d` or `r5d` families).
-:::note
+:::note
Fusion requires a license for use in Seqera Platform compute environments or directly in Nextflow. Fusion can be trialed at no cost. [Contact Seqera](https://seqera.io/contact-us/) for more details.
:::
@@ -85,7 +85,7 @@ From the **Compute Environments** tab in your organization workspace, select **A
| **Resource labels** | `name=value` pairs to tag the AWS resources created by this compute environment.|
-## Add pipeline to Platform
+## Add pipeline to Platform
:::info
The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline is a highly configurable and robust workflow designed to analyze RNA-Seq data. It performs quality control, alignment and quantification.
@@ -100,28 +100,28 @@ To use Seqera Pipelines to import the `nf-core/rnaseq` pipeline to your workspac

1. Search for `nf-core/rnaseq` and select **Launch** next to the pipeline name in the list. In the **Add pipeline** tab, select **Cloud** or **Enterprise** depending on your Platform account type, then provide the information needed for Seqera Pipelines to access your Platform instance:
- - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
+ - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
- **Seqera Enterprise**: Specify the **Seqera Platform URL** (hostname) and **Base API URL** for your Enterprise instance, then paste your Platform **Access token** and select **Next**.
:::tip
If you do not have a Platform access token, select **Get your access token from Seqera Platform** to open the Access tokens page in a new browser tab.
:::
-1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
+1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
1. (Optional) Customize the **Pipeline Name** and **Pipeline Description**.
-1. Select **Add Pipeline**.
+1. Select **Add Pipeline**.
:::info
-To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
+To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
:::
## Pipeline input data
-The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline works with input datasets (samplesheets) containing sample names, FASTQ file locations (paths to FASTQ files in cloud or local storage), and strandedness. For example, the dataset used in the `test_full` profile is derived from the publicly available iGenomes collection of datasets, commonly used in bioinformatics analyses.
+The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline works with input datasets (samplesheets) containing sample names, FASTQ file locations (paths to FASTQ files in cloud or local storage), and strandedness. For example, the dataset used in the `test_full` profile is derived from the publicly available iGenomes collection of datasets, commonly used in bioinformatics analyses.
This dataset represents RNA-Seq samples from various human cell lines (GM12878, K562, MCF7, and H1) with biological replicates, stored in an AWS S3 bucket (`s3://ngi-igenomes`) as part of the iGenomes resource. These RNA-Seq datasets consist of paired-end sequencing reads, which can be used to study gene expression patterns in different cell types.
**nf-core/rnaseq test_full profile dataset**
-
+
| sample | fastq_1 | fastq_2 | strandedness |
|--------|---------|---------|--------------|
| GM12878_REP1 | s3://ngi-igenomes/test-data/rnaseq/SRX1603629_T1_1.fastq.gz | s3://ngi-igenomes/test-data/rnaseq/SRX1603629_T1_2.fastq.gz | reverse |
@@ -135,12 +135,12 @@ This dataset represents RNA-Seq samples from various human cell lines (GM12878,
-In Platform, samplesheets and other data can be made easily accessible in one of two ways:
+In Platform, samplesheets and other data can be made easily accessible in one of two ways:
- Use **Data Explorer** to browse and interact with remote data from AWS S3, Azure Blob Storage, and Google Cloud Storage repositories, directly in your organization workspace.
- Use **Datasets** to upload structured data to your workspace in CSV (Comma-Separated Values) or TSV (Tab-Separated Values) format.
- **Add a cloud bucket via Data Explorer**
+ **Add a cloud bucket via Data Explorer**
Private cloud storage buckets accessible with the credentials in your workspace are added to Data Explorer automatically by default. However, you can also add custom directory paths within buckets to your workspace to simplify direct access.
@@ -148,7 +148,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of

- 1. From the **Data Explorer** tab, select **Add cloud bucket**.
+ 1. From the **Data Explorer** tab, select **Add cloud bucket**.
1. Specify the bucket details:
- The cloud **Provider**.
- An existing cloud **Bucket path**.
@@ -157,7 +157,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- An optional bucket **Description**.
1. Select **Add**.
- You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
+ You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
@@ -175,7 +175,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- Select the **First row as header** option to prevent Platform from parsing the header row of the samplesheet as sample data.
- Select **Upload file** and browse to your CSV or TSV samplesheet file in local storage, or simply drag and drop it into the box.
- The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
+ The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
:::info
Platform does not store the data used for analysis in pipelines. The dataset must specify the locations of data stored on your own infrastructure.
@@ -186,14 +186,14 @@ In Platform, samplesheets and other data can be made easily accessible in one of
## Launch pipeline
:::note
-This guide is based on version 3.15.1 of the nf-core/rnaseq pipeline. Launch form parameters and tools may differ in other versions.
+This guide is based on version 3.15.1 of the nf-core/rnaseq pipeline. Launch form parameters and tools may differ in other versions.
:::
With your compute environment created, nf-core/rnaseq added to your workspace Launchpad, and your samplesheet accessible in Platform, you are ready to launch your pipeline. Navigate to the Launchpad and select **Launch** next to `nf-core-rnaseq` to open the launch form.
-The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
+The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
-### General config
+### General config

@@ -202,30 +202,30 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc
- **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution. Config profiles must be defined in the `nextflow.config` file in the pipeline repository.
- **Workflow run name**: An identifier for the run, pre-filled with a random name. This can be customized.
- **Labels**: Assign new or existing [labels](../labels/overview) to the run.
-- **Compute environment**: Your AWS Batch compute environment.
+- **Compute environment**: Your AWS Batch compute environment.
- **Work directory**: The cloud storage path where pipeline scratch data is stored. Platform will create a scratch sub-folder if only a cloud bucket location is specified.
:::note
The credentials associated with the compute environment must have access to the work directory.
:::
-### Run parameters
+### Run parameters

There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
-Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.
+Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.
-Specify your pipeline input and output and modify other pipeline parameters as needed.
+Specify your pipeline input and output and modify other pipeline parameters as needed.
**input**
- Use **Browse** to select your pipeline input data:
+ Use **Browse** to select your pipeline input data:
- In the **Data Explorer** tab, select the existing cloud bucket that contains your samplesheet, browse or search for the samplesheet file, and select the chain icon to copy the file path before closing the data selection window and pasting the file path in the input field.
- In the **Datasets** tab, search for and select your existing dataset.
@@ -234,7 +234,7 @@ Specify your pipeline input and output and modify other pipeline parameters as n
**outdir**
- Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
+ Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
**Browse** and copy cloud storage directory paths using Data Explorer, or enter a path manually.
@@ -244,9 +244,9 @@ Modify other parameters to customize the pipeline execution through the paramete

-### Advanced settings
+### Advanced settings
-- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
+- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
- [Pipeline secrets](../secrets/overview) store keys and tokens used by workflow tasks to interact with external systems. Enter the names of any stored user or workspace secrets required for the workflow execution.
- See [Advanced options](../launch/advanced) for more details.
@@ -283,7 +283,7 @@ After you have filled the necessary launch details, select **Launch**. The **Run
The paths to report files point to a location in cloud storage (in the `outdir` directory specified during launch), but you can view the contents directly and download each file without navigating to the cloud or a remote filesystem.
:::info
- See [Reports](../reports/overview) for more information.
+ See [Reports](../reports/overview) for more information.
:::
#### View general information
@@ -309,9 +309,9 @@ After you have filled the necessary launch details, select **Launch**. The **Run

- - The **About** tab contains extensive task execution details.
+ - The **About** tab contains extensive task execution details.
- The **Execution log** tab provides a real-time log of the selected task's execution. Task execution and other logs (such as stdout and stderr) are available for download from here, if still available in your compute environment.
- - The **Data Explorer** tab allows you to view the task working directory directly in Platform.
+ - The **Data Explorer** tab allows you to view the task working directory directly in Platform.
Nextflow hash-addresses each task of the pipeline and creates unique directories based on these hashes. Data Explorer allows you to view the log files and output files generated for each task in its working directory, directly within Platform. You can view, download, and retrieve the link for these intermediate files in cloud storage from the **Data Explorer** tab to simplify troubleshooting.
@@ -327,9 +327,9 @@ For the purposes of this guide, an RStudio environment will be used to normalize
### Prepare your data
-#### Gene counts
+#### Gene counts
-Salmon is the default tool used during the `pseudo-aligner` step of the nf-core/rnaseq pipeline. In the pipeline output data, the `/salmon` directory contains the tool's output, including a `salmon.merged.gene_counts_length_scaled.tsv` file.
+Salmon is the default tool used during the `pseudo-aligner` step of the nf-core/rnaseq pipeline. In the pipeline output data, the `/salmon` directory contains the tool's output, including a `salmon.merged.gene_counts_length_scaled.tsv` file.
#### Sample info
@@ -371,15 +371,15 @@ The analysis script provided in this section requires a sample information file
From the **Data Studios** tab, select **Add a data studio** and complete the following:
- Select the latest **RStudio** container image template from the list.
-- Select your AWS Batch compute environment.
+- Select your AWS Batch compute environment.
:::note
-Data studios compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and data studio sessions. The default CPU and memory allocation for a data studio is 2 CPUs and 8192 MB RAM.
+Data studios compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and data studio sessions. The default CPU and memory allocation for a data studio is 2 CPUs and 8192 MB RAM.
:::
-- Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your RNA-Seq run.
+- Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your RNA-Seq run.
- Optional: Enter CPU and memory allocations. The default values are 2 CPUs and 8192 MB memory (RAM).
- Select **Add**.
- Once the data studio has been created, select the options menu next to it and select **Start**.
-- When the data studio is in a running state, **Connect** to it.
+- When the data studio is in a running state, **Connect** to it.
### Perform the analysis and explore results
@@ -467,7 +467,7 @@ The RStudio environment can be configured with the packages you wish to install
:::info
MDS plots are used to visualize the overall similarity between RNA-Seq samples based on their gene expression profiles, helping to identify sample clusters and potential batch effects.
:::
-
+
```r
# Create MDS plot
# a. Display in RStudio
@@ -523,7 +523,7 @@ The RStudio environment can be configured with the packages you wish to install
names(results) <- colnames(my.contrasts)
```
- :::info
+ :::info
This script is written for the analysis of human data, based on nf-core/rnaseq's `test_full` dataset. To adapt the script for your data, modify the contrasts based on the comparisons you want to make between your sample groups:
```r
@@ -536,7 +536,7 @@ The RStudio environment can be configured with the packages you wish to install
```
:::
-1. Print the number of differentially expressed genes for each comparison and save the results to CSV files:
+1. Print the number of differentially expressed genes for each comparison and save the results to CSV files:
```r
# Print the number of differentially expressed genes for each comparison
@@ -673,20 +673,20 @@ The nf-core/rnaseq pipeline involves several key steps, each with distinct compu
#### Overall run metrics
-**Total pipeline run cost (USD)**:
+**Total pipeline run cost (USD)**:
- Fusion file system with fast instance storage: $34.90
- Plain S3 storage without Fusion: $58.40
**Pipeline runtime**:
-The Fusion file system used with NVMe instance storage contributed to a 34% improvement in total pipeline runtime and a 49% reduction in CPU hours.
+The Fusion file system used with NVMe instance storage contributed to a 34% improvement in total pipeline runtime and a 49% reduction in CPU hours.

#### Process run time
-The Fusion file system demonstrates significant performance improvements for most processes in the nf-core/rnaseq pipeline, particularly for I/O-intensive tasks:
+The Fusion file system demonstrates significant performance improvements for most processes in the nf-core/rnaseq pipeline, particularly for I/O-intensive tasks:
- The most time-consuming processes see improvements of 36.07% to 70.15%, saving hours of runtime in a full pipeline execution.
- Most processes show significant performance improvements with Fusion, with time savings ranging from 35.57% to 99.14%.
@@ -694,7 +694,7 @@ The Fusion file system demonstrates significant performance improvements for mos
- SALMON_INDEX shows a notable 70.15% improvement, reducing runtime from 102.18 minutes to 30.50 minutes.
- STAR_ALIGN_IGENOMES, one of the most time-consuming processes, is 53.82% faster with Fusion, saving nearly an hour of runtime.
-
+
| Process | S3 Runtime (min) | Fusion Runtime (min) | Time Saved (min) | Improvement (%) |
|---------|------------------|----------------------|------------------|-----------------|
@@ -733,9 +733,9 @@ The Fusion file system demonstrates significant performance improvements for mos
This profile consists of Nextflow configuration settings for each process and each resource directive (where applicable): **cpus**, **memory**, and **time**. The optimized setting for a given process and resource directive is based on the maximum use of that resource across all tasks in that process.
- Once optimization is selected, subsequent runs of that pipeline will inherit the optimized configuration profile, indicated by the black lightbulb icon with a checkmark.
+ Once optimization is selected, subsequent runs of that pipeline will inherit the optimized configuration profile, indicated by the black lightbulb icon with a checkmark.
- :::info
+ :::info
Optimization profiles are generated from one run at a time, defaulting to the most recent run, and _not_ an aggregation of previous runs.
:::
@@ -749,4 +749,4 @@ The Fusion file system demonstrates significant performance improvements for mos
| Memory usage | `peakRss` |
| Runtime | `start` and `complete` |
-
\ No newline at end of file
+
diff --git a/platform-enterprise_versioned_docs/version-24.2/launch/launchpad.md b/platform-enterprise_versioned_docs/version-24.2/launch/launchpad.md
index 9ade4dea2..22530a830 100644
--- a/platform-enterprise_versioned_docs/version-24.2/launch/launchpad.md
+++ b/platform-enterprise_versioned_docs/version-24.2/launch/launchpad.md
@@ -86,7 +86,7 @@ The dropdown of available config profiles is populated by inspecting the Nextflo
There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text, select attributes from dropdowns, and browse input and output locations with [Data Explorer](../data/data-explorer).
-- The **Config view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** dropdown.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** dropdown.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
Seqera uses a `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters. Most pipelines contain at least input and output parameters:
diff --git a/platform-enterprise_versioned_docs/version-25.1/getting-started/proteinfold.mdx b/platform-enterprise_versioned_docs/version-25.1/getting-started/proteinfold.mdx
index e52a3a059..ffca552af 100644
--- a/platform-enterprise_versioned_docs/version-25.1/getting-started/proteinfold.mdx
+++ b/platform-enterprise_versioned_docs/version-25.1/getting-started/proteinfold.mdx
@@ -9,10 +9,10 @@ toc_max_heading_level: 2
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
-This guide details how to perform best-practice analysis for protein 3D structure prediction on an AWS Batch compute environment in Platform. It includes:
+This guide details how to perform best-practice analysis for protein 3D structure prediction on an AWS Batch compute environment in Platform. It includes:
- Creating AWS Batch compute environments to run your pipeline and downstream analysis
-- Adding the *nf-core/proteinfold* pipeline to your workspace
+- Adding the *nf-core/proteinfold* pipeline to your workspace
- Importing your pipeline input data
- Launching the pipeline and monitoring execution from your workspace
- Setting up a custom analysis environment with Studios
@@ -22,7 +22,7 @@ You will need the following to get started:
- [Admin](../orgs-and-teams/roles) permissions in an existing organization workspace. See [Set up your workspace](./workspace-setup) to create an organization and workspace from scratch.
- An existing AWS cloud account with access to the AWS Batch service.
-- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
+- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
:::
## Compute environment
@@ -35,24 +35,24 @@ Given the data sizes and computational intensity, production pipelines perform b
The *nf-core/proteinfold* pipeline performs protein folding prediction using one of three deep learning models: AlphaFold2, ColabFold, or ESMFold. The computationally intensive tasks for protein structure prediction perform better on GPUs due to their ability to handle large matrix operations efficiently and perform parallel computations. GPUs can dramatically reduce the time required for protein structure predictions, making it feasible to analyze larger datasets or perform more complex simulations.
-Platform supports the allocation of both CPUs and GPUs in the same compute environment. For example, specify `m6id`, `c6id`, `r6id`, `g5`, `p3` instance families in the **Instance types** field when creating your AWS Batch compute environment. See [Create compute environment](#create-compute-environment) below.
+Platform supports the allocation of both CPUs and GPUs in the same compute environment. For example, specify `m6id`, `c6id`, `r6id`, `g5`, `p3` instance families in the **Instance types** field when creating your AWS Batch compute environment. See [Create compute environment](#create-compute-environment) below.
-When you launch *nf-core/proteinfold* in Platform, enable **use_gpu** to instruct Nextflow to run GPU-compatible pipeline processes on GPU instances. See [Launch pipeline](#launch-pipeline) below.
+When you launch *nf-core/proteinfold* in Platform, enable **use_gpu** to instruct Nextflow to run GPU-compatible pipeline processes on GPU instances. See [Launch pipeline](#launch-pipeline) below.
### Fusion file system
-The [Fusion](../supported_software/fusion/overview) file system enables seamless read and write operations to cloud object stores, leading to simpler pipeline logic and faster, more efficient execution. While Fusion is not required to run nf-core/proteinfold, it significantly enhances I/O-intensive tasks and eliminates the need for intermediate data copies, which is particularly beneficial when working with the large databases used by deep learning models for prediction.
+The [Fusion](../supported_software/fusion/overview) file system enables seamless read and write operations to cloud object stores, leading to simpler pipeline logic and faster, more efficient execution. While Fusion is not required to run nf-core/proteinfold, it significantly enhances I/O-intensive tasks and eliminates the need for intermediate data copies, which is particularly beneficial when working with the large databases used by deep learning models for prediction.
-Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage, you must select EC2 instances with NVMe SSD storage (`g4dn`, `g5`, or `P3` families or greater).
+Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage, you must select EC2 instances with NVMe SSD storage (`g4dn`, `g5`, or `P3` families or greater).
-:::note
+:::note
Fusion requires a license for use in Seqera Platform compute environments or directly in Nextflow. See [Fusion licensing](https://docs.seqera.io/fusion/licensing) for more information.
:::
### Create compute environment
:::info
-The same compute environment can be used for pipeline execution and running your Studios notebook environment, but Studios does not support AWS Fargate. To use this compute environment for both *nf-core/proteinfold* execution and your Studio, leave **Enable Fargate for head job** disabled and include a CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
+The same compute environment can be used for pipeline execution and running your Studios notebook environment, but Studios does not support AWS Fargate. To use this compute environment for both *nf-core/proteinfold* execution and your Studio, leave **Enable Fargate for head job** disabled and include a CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
Alternatively, create a second basic AWS Batch compute environment and a Studio with at least 2 CPUs and 8192 MB of RAM.
:::
@@ -77,7 +77,7 @@ From the **Compute Environments** tab in your organization workspace, select **A
| **Enable Fargate for head job** | Run the Nextflow head job using the Fargate container service to speed up pipeline launch. Requires Fusion v2. Do not enable for Studios compute environments. |
| **Use Amazon-recommended GPU-optimized ECS AMI** | When enabled, Batch Forge specifies the most current AWS-recommended GPU-optimized ECS AMI as the EC2 fleet AMI when creating the compute environment. |
| **Allowed S3 buckets** | Additional S3 buckets or paths to be granted read-write permission for this compute environment. For the purposes of this guide, add `s3://proteinfold-dataset` to grant compute environment access to the DB and params used for prediction by AlphaFold2 and ColabFold. |
-| **Instance types** | Specify the instance types to be used for computation. You must include GPU-enabled instance types (`g4dn`, `g5`) when the Amazon-recommended GPU-optimized ECS AMI is in use. Include CPU-based instance families for Studios compute environments. |
+| **Instance types** | Specify the instance types to be used for computation. You must include GPU-enabled instance types (`g4dn`, `g5`) when the Amazon-recommended GPU-optimized ECS AMI is in use. Include CPU-based instance families for Studios compute environments. |
| **Resource labels** | `name=value` pairs to tag the AWS resources created by this compute environment.|

@@ -97,17 +97,17 @@ To use Seqera Pipelines to import the *nf-core/proteinfold* pipeline to your wor

1. Search for *nf-core/proteinfold* and select **Launch** next to the pipeline name in the list. In the **Add pipeline** tab, select **Cloud** or **Enterprise** depending on your Platform account type, then provide the information needed for Seqera Pipelines to access your Platform instance:
- - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
+ - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
- **Seqera Enterprise**: Specify the **Seqera Platform URL** (hostname) and **Base API URL** for your Enterprise instance, then paste your Platform **Access token** and select **Next**.
:::tip
If you do not have a Platform access token, select **Get your access token from Seqera Platform** to open the Access tokens page in a new browser tab.
:::
-1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
+1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
1. (Optional) Customize the **Pipeline Name** and **Pipeline Description**.
-1. Select **Add Pipeline**.
+1. Select **Add Pipeline**.
:::info
-To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
+To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
:::
## Pipeline input data
@@ -116,7 +116,7 @@ The [*nf-core/proteinfold*](https://github.com/nf-core/proteinfold) pipeline wor
**nf-core/proteinfold example samplesheet**
-
+
| sequence | fasta |
| -------- | ----- |
| T1024 | https://raw.githubusercontent.com/nf-core/test-datasets/proteinfold/testdata/sequences/T1024.fasta |
@@ -124,12 +124,12 @@ The [*nf-core/proteinfold*](https://github.com/nf-core/proteinfold) pipeline wor
-In Platform, samplesheets and other data can be made easily accessible in one of two ways:
+In Platform, samplesheets and other data can be made easily accessible in one of two ways:
- Use **Data Explorer** to browse and interact with remote data from AWS S3, Azure Blob Storage, and Google Cloud Storage repositories, directly in your organization workspace.
- Use **Datasets** to upload structured data to your workspace in CSV (Comma-Separated Values) or TSV (Tab-Separated Values) format.
- **Add a cloud bucket via Data Explorer**
+ **Add a cloud bucket via Data Explorer**
Private cloud storage buckets accessible with the credentials in your workspace are added to Data Explorer automatically by default. However, you can also add custom directory paths within buckets to your workspace to simplify direct access.
@@ -137,7 +137,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of

- 1. From the **Data Explorer** tab, select **Add cloud bucket**.
+ 1. From the **Data Explorer** tab, select **Add cloud bucket**.
1. Specify the bucket details:
- The cloud **Provider**: AWS
- An existing cloud **Bucket path**: `s3://proteinfold-dataset`
@@ -146,7 +146,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- An optional bucket **Description**.
1. Select **Add**.
- You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
+ You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
@@ -164,7 +164,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- Select the **First row as header** option to prevent Platform from parsing the header row of the samplesheet as sample data.
- Select **Upload file** and browse to your CSV or TSV samplesheet file in local storage, or simply drag and drop it into the box.
- The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
+ The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
:::info
Platform does not store the data used for analysis in pipelines. The dataset must specify the locations of data stored on your own infrastructure.
@@ -175,21 +175,21 @@ In Platform, samplesheets and other data can be made easily accessible in one of
## Launch pipeline
:::note
-This guide is based on [version 1.1.1](https://nf-co.re/proteinfold/1.1.1) of the *nf-core/proteinfold* pipeline. Launch form parameters and tools may differ in other versions.
+This guide is based on [version 1.1.1](https://nf-co.re/proteinfold/1.1.1) of the *nf-core/proteinfold* pipeline. Launch form parameters and tools may differ in other versions.
:::
With your compute environment created, *nf-core/proteinfold* added to your workspace Launchpad, and your samplesheet accessible in Platform, you are ready to launch your pipeline. Navigate to the Launchpad and select **Launch** next to *nf-core-proteinfold* to open the launch form.
-The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
+The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
-### General config
+### General config
- **Pipeline to launch**: The pipeline Git repository name or URL: `https://github.com/nf-core/proteinfold`. For saved pipelines, this is prefilled and cannot be edited.
- **Revision number**: A valid repository commit ID, tag, or branch name: `1.1.1`. For saved pipelines, this is prefilled and cannot be edited.
-- **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution. Config profiles must be defined in the `nextflow.config` file in the pipeline repository. Benchmarking runs for this guide used nf-core profiles with included test datasets — `test_full_alphafold2_multimer` for Alphafold2 and `test_full_alphafold2_multimer` for Colabfold.
+- **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution. Config profiles must be defined in the `nextflow.config` file in the pipeline repository. Benchmarking runs for this guide used nf-core profiles with included test datasets — `test_full_alphafold2_multimer` for Alphafold2 and `test_full_alphafold2_multimer` for Colabfold.
- **Workflow run name**: An identifier for the run, pre-filled with a random name. This can be customized.
- **Labels**: Assign new or existing [labels](../labels/overview) to the run.
-- **Compute environment**: Your AWS Batch compute environment.
+- **Compute environment**: Your AWS Batch compute environment.
- **Work directory**: The cloud storage path where pipeline scratch data is stored. Platform will create a scratch sub-folder if only a cloud bucket location is specified.
:::note
The credentials associated with the compute environment must have access to the work directory.
@@ -197,24 +197,24 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc

-### Run parameters
+### Run parameters
There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
-Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.
+Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.

-Specify your pipeline input and output and modify other pipeline parameters as needed.
+Specify your pipeline input and output and modify other pipeline parameters as needed.
**input**
- Use **Browse** to select your pipeline input data:
+ Use **Browse** to select your pipeline input data:
- In the **Data Explorer** tab, select the existing cloud bucket that contains your samplesheet, browse or search for the samplesheet file, and select the chain icon to copy the file path before closing the data selection window and pasting the file path in the input field.
- In the **Datasets** tab, search for and select your existing dataset.
@@ -223,24 +223,24 @@ Specify your pipeline input and output and modify other pipeline parameters as n
**outdir**
- Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
+ Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
**Browse** and copy cloud storage directory paths using Data Explorer, or enter a path manually.
-- The **mode** menu allows you to select the deep learning model used for structure prediction (`alphafold2`, `colabfold`, or `esmfold`).
-- Enable **use_gpu** to run GPU-compatible tasks on GPUs. This requires **Use Amazon-recommended GPU-optimized ECS AMI** to be enabled and GPU-enabled instances to be specified under **Instance types** in your compute environment.
+- The **mode** menu allows you to select the deep learning model used for structure prediction (`alphafold2`, `colabfold`, or `esmfold`).
+- Enable **use_gpu** to run GPU-compatible tasks on GPUs. This requires **Use Amazon-recommended GPU-optimized ECS AMI** to be enabled and GPU-enabled instances to be specified under **Instance types** in your compute environment.

:::info
For the purposes of this guide, run the pipeline in both `alphafold2` and `colabfold` modes. Specify unique directory paths for the `outdir` parameter (such as "Alphafold2" and "ColabFold") to ensure output data is kept separate and not overwritten. Predicted protein structures for each model will be visualized side-by-side in the [Interactive analysis](#interactive-analysis-with-studios) section.
-:::
+:::
-### Advanced settings
+### Advanced settings
-- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
+- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
- [Pipeline secrets](../secrets/overview) store keys and tokens used by workflow tasks to interact with external systems. Enter the names of any stored user or workspace secrets required for the workflow execution.
- See [Advanced options](../launch/advanced) for more details.
@@ -271,7 +271,7 @@ After you have filled the necessary launch details, select **Launch**. The **Run
The paths to report files point to a location in cloud storage (in the `outdir` directory specified during launch), but you can view the contents directly and download each file without navigating to the cloud or a remote filesystem.
:::info
- See [Reports](../reports/overview) for more information.
+ See [Reports](../reports/overview) for more information.
:::
#### View general information
@@ -295,9 +295,9 @@ After you have filled the necessary launch details, select **Launch**. The **Run
Select a task in the task table to open the **Task details** dialog. The dialog has three tabs:
- - The **About** tab contains extensive task execution details.
+ - The **About** tab contains extensive task execution details.
- The **Execution log** tab provides a real-time log of the selected task's execution. Task execution and other logs (such as stdout and stderr) are available for download from here, if still available in your compute environment.
- - The **Data Explorer** tab allows you to view the task working directory directly in Platform.
+ - The **Data Explorer** tab allows you to view the task working directory directly in Platform.

@@ -309,29 +309,29 @@ After you have filled the necessary launch details, select **Launch**. The **Run
[Studios](../studios/overview) streamlines the process of creating interactive analysis environments for Platform users. With built-in templates for platforms like Jupyter Notebook, RStudio, and VS Code, creating a Studio is as simple as adding and sharing pipelines or datasets. The Studio URL can also be shared with any user with the [Connect role](../orgs-and-teams/roles) for real-time access and collaboration.
-For the purposes of this guide, a Jupyter Notebook environment will be used for interactive visualization of the predicted protein structures, optionally comparing AlphaFold2 and Colabfold structures for the same sequence data.
+For the purposes of this guide, a Jupyter Notebook environment will be used for interactive visualization of the predicted protein structures, optionally comparing AlphaFold2 and Colabfold structures for the same sequence data.
### Create a Jupyter Notebook Studio
From the **Studios** tab, select **Add a Studio** and complete the following:
- In the **Compute & Data** tab:
- - Select your AWS Batch compute environment.
+ - Select your AWS Batch compute environment.
:::info
- The same compute environment can be used for pipeline execution and running your Studios notebook environment, but Studios does not support AWS Fargate and sessions must run on CPUs. To use one compute environment for both *nf-core/proteinfold* execution and your Studio, leave **Enable Fargate for head job** disabled and include at least one CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
+ The same compute environment can be used for pipeline execution and running your Studios notebook environment, but Studios does not support AWS Fargate and sessions must run on CPUs. To use one compute environment for both *nf-core/proteinfold* execution and your Studio, leave **Enable Fargate for head job** disabled and include at least one CPU-based EC2 instance family (`c6id`, `r6id`, etc.) in your **Instance types**.
Alternatively, create a second basic AWS Batch compute environment with at least 2 CPUs and 8192 MB of RAM for your data studio.
:::
- Optional: Enter CPU and memory allocations. The default values are 2 CPUs and 8192 MB memory (RAM).
:::note
- Studios compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and data studio sessions.
+ Studios compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and data studio sessions.
:::
- - Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your Proteinfold run.
+ - Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your Proteinfold run.
- In the **General config** tab:
- Select the latest **Jupyter** container image template from the list.
- - Optional: Enter a unique name and description for the data studio.
+ - Optional: Enter a unique name and description for the data studio.
- Check **Install Conda packages** and paste the following Conda environment YAML snippet:
- ```yaml
+ ```yaml
channels:
- bioconda
- conda-forge
@@ -344,7 +344,7 @@ From the **Studios** tab, select **Add a Studio** and complete the following:
- Confirm the Studio details in the **Summary** tab
- Select **Add** and choose whether to add and start the Studio immediately.
-- When the Studio is created and in a running state, **Connect** to it.
+- When the Studio is created and in a running state, **Connect** to it.

@@ -354,7 +354,7 @@ The Jupyter environment can be configured with the packages and scripts you need
1. Import libraries and check versions:
- ```python
+ ```python
import sys
import jupyter_core
import nglview
@@ -372,7 +372,7 @@ The Jupyter environment can be configured with the packages and scripts you need
1. Define visualization functions:
- ```python
+ ```python
import os
import ipywidgets as widgets
from IPython.display import display, HTML
@@ -382,14 +382,14 @@ The Jupyter environment can be configured with the packages and scripts you need
view.add_representation('cartoon', selection='protein', color='residueindex')
view.add_representation('ball+stick', selection='hetero')
view._remote_call('setSize', target='Widget', args=[width, height])
-
+
# Set initial view
view._remote_call('autoView')
view._remote_call('centerView')
-
+
# Adjust zoom level (you may need to adjust this value)
view._remote_call('zoom', target='stage', args=[0.8])
-
+
return view
def compare_proteins(pdb_files):
@@ -406,7 +406,7 @@ The Jupyter environment can be configured with the packages and scripts you need
1. Set up file paths and create file dictionary:
- ```python
+ ```python
# Replace with the actual paths to your AlphaFold2 and ColabFold PDB files
alphafold_pdb = "data/path/to/your/alphafold/output.pdb"
colabfold_pdb = "data/path/to/your/colabfold/output.pdb"
@@ -453,9 +453,9 @@ The Jupyter environment can be configured with the packages and scripts you need
description='Select method:',
disabled=False,
)
-
+
info_output = widgets.Output()
-
+
def on_change(change):
with info_output:
info_output.clear_output()
@@ -464,9 +464,9 @@ The Jupyter environment can be configured with the packages and scripts you need
print(f"Selected method: {selected_method}")
print(f"File path: {selected_file}")
print(f"File size: {os.path.getsize(selected_file) / 1024:.2f} KB")
-
+
method_dropdown.observe(on_change, names='value')
-
+
display(HTML("Structure Information:
"))
display(widgets.VBox([method_dropdown, info_output]))
```
@@ -490,4 +490,3 @@ The Jupyter environment can be configured with the packages and scripts you need
```

-
diff --git a/platform-enterprise_versioned_docs/version-25.1/getting-started/quickstart-demo/launch-pipelines.md b/platform-enterprise_versioned_docs/version-25.1/getting-started/quickstart-demo/launch-pipelines.md
index 4ffc3cdae..e97fc8702 100644
--- a/platform-enterprise_versioned_docs/version-25.1/getting-started/quickstart-demo/launch-pipelines.md
+++ b/platform-enterprise_versioned_docs/version-25.1/getting-started/quickstart-demo/launch-pipelines.md
@@ -63,7 +63,7 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc
There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
Specify your pipeline input and output and modify other pipeline parameters as needed:
diff --git a/platform-enterprise_versioned_docs/version-25.1/getting-started/rnaseq.mdx b/platform-enterprise_versioned_docs/version-25.1/getting-started/rnaseq.mdx
index f5feab59b..778c47670 100644
--- a/platform-enterprise_versioned_docs/version-25.1/getting-started/rnaseq.mdx
+++ b/platform-enterprise_versioned_docs/version-25.1/getting-started/rnaseq.mdx
@@ -9,10 +9,10 @@ toc_max_heading_level: 2
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
-This guide details how to run bulk RNA sequencing (RNA-Seq) data analysis, from quality control to differential expression analysis, on an AWS Batch compute environment in Platform. It includes:
+This guide details how to run bulk RNA sequencing (RNA-Seq) data analysis, from quality control to differential expression analysis, on an AWS Batch compute environment in Platform. It includes:
- Creating an AWS Batch compute environment to run your pipeline and analysis environment
-- Adding pipelines to your workspace
+- Adding pipelines to your workspace
- Importing your pipeline input data
- Launching the pipeline and monitoring execution from your workspace
- Setting up a custom analysis environment with Studios
@@ -23,17 +23,17 @@ You will need the following to get started:
- [Admin](../orgs-and-teams/roles) permissions in an existing organization workspace. See [Set up your workspace](./workspace-setup) to create an organization and workspace from scratch.
- An existing AWS cloud account with access to the AWS Batch service.
-- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
+- Existing access credentials with permissions to create and manage resources in your AWS account. See [IAM](../compute-envs/aws-batch#iam-user-creation) for guidance to set up IAM permissions for Platform.
:::
## Compute environment
-Compute and storage requirements for RNA-Seq analysis are dependent on the number of samples and the sequencing depth of your input data. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for details on RNA-Seq datasets and the CPU and memory requirements for important steps of RNA-Seq pipelines.
+Compute and storage requirements for RNA-Seq analysis are dependent on the number of samples and the sequencing depth of your input data. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for details on RNA-Seq datasets and the CPU and memory requirements for important steps of RNA-Seq pipelines.
-In this guide, you will create an AWS Batch compute environment with sufficient resources allocated to run the [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline with a large dataset. This compute environment will also be used to run a Studios R-IDE session for interactive analysis of the resulting pipeline data.
+In this guide, you will create an AWS Batch compute environment with sufficient resources allocated to run the [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline with a large dataset. This compute environment will also be used to run a Studios R-IDE session for interactive analysis of the resulting pipeline data.
:::note
-The compute recommendations below are based on internal benchmarking performed by Seqera. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for more information.
+The compute recommendations below are based on internal benchmarking performed by Seqera. See [RNA-Seq data and requirements](#rna-seq-data-and-requirements) for more information.
:::
### Recommended compute environment resources
@@ -48,12 +48,12 @@ The following compute resources are recommended for production RNA-Seq pipelines
| **Max CPUs** | >500 |
| **Min CPUs** | 0 |
-#### Fusion file system
+#### Fusion file system
The [Fusion](../supported_software/fusion/overview) file system enables seamless read and write operations to cloud object stores, leading to
simpler pipeline logic and faster, more efficient execution. While Fusion is not required to run *nf-core/rnaseq*, it is recommended for optimal performance. See [nf-core/rnaseq performance in Platform](#nf-corernaseq-performance-in-platform) at the end of this guide.
-Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage (see Create compute environment below), you must select EC2 instances with NVMe SSD storage (`m5d` or `r5d` families).
+Fusion works best with AWS NVMe instances (fast instance storage) as this delivers the fastest performance when compared to environments using only AWS EBS (Elastic Block Store). Batch Forge selects instances automatically based on your compute environment configuration, but you can optionally specify instance types. To enable fast instance storage (see Create compute environment below), you must select EC2 instances with NVMe SSD storage (`m5d` or `r5d` families).
:::note
Fusion requires a license for use in Seqera Platform compute environments or directly in Nextflow. See [Fusion licensing](https://docs.seqera.io/fusion/licensing) for more information.
@@ -85,7 +85,7 @@ From the **Compute Environments** tab in your organization workspace, select **A
| **Resource labels** | `name=value` pairs to tag the AWS resources created by this compute environment.|
-## Add pipeline to Platform
+## Add pipeline to Platform
:::info
The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline is a highly configurable and robust workflow designed to analyze RNA-Seq data. It performs quality control, alignment and quantification.
@@ -100,28 +100,28 @@ To use Seqera Pipelines to import the *nf-core/rnaseq* pipeline to your workspac

1. Search for *nf-core/rnaseq* and select **Launch** next to the pipeline name in the list. In the **Add pipeline** tab, select **Cloud** or **Enterprise** depending on your Platform account type, then provide the information needed for Seqera Pipelines to access your Platform instance:
- - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
+ - **Seqera Cloud**: Paste your Platform **Access token** and select **Next**.
- **Seqera Enterprise**: Specify the **Seqera Platform URL** (hostname) and **Base API URL** for your Enterprise instance, then paste your Platform **Access token** and select **Next**.
:::tip
If you do not have a Platform access token, select **Get your access token from Seqera Platform** to open the Access tokens page in a new browser tab.
:::
-1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
+1. Select your Platform **Organization**, **Workspace**, and **Compute environment** for the imported pipeline.
1. (Optional) Customize the **Pipeline Name** and **Pipeline Description**.
-1. Select **Add Pipeline**.
+1. Select **Add Pipeline**.
:::info
-To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
+To add a custom pipeline not listed in Seqera Pipelines to your Platform workspace, see [Add pipelines](./quickstart-demo/add-pipelines#) for manual Launchpad instructions.
:::
## Pipeline input data
-The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline works with input datasets (samplesheets) containing sample names, FASTQ file locations (paths to FASTQ files in cloud or local storage), and strandedness. For example, the dataset used in the `test_full` profile is derived from the publicly available iGenomes collection of datasets, commonly used in bioinformatics analyses.
+The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline works with input datasets (samplesheets) containing sample names, FASTQ file locations (paths to FASTQ files in cloud or local storage), and strandedness. For example, the dataset used in the `test_full` profile is derived from the publicly available iGenomes collection of datasets, commonly used in bioinformatics analyses.
This dataset represents RNA-Seq samples from various human cell lines (GM12878, K562, MCF7, and H1) with biological replicates, stored in an AWS S3 bucket (`s3://ngi-igenomes`) as part of the iGenomes resource. These RNA-Seq datasets consist of paired-end sequencing reads, which can be used to study gene expression patterns in different cell types.
**nf-core/rnaseq test_full profile dataset**
-
+
| sample | fastq_1 | fastq_2 | strandedness |
|--------|---------|---------|--------------|
| GM12878_REP1 | s3://ngi-igenomes/test-data/rnaseq/SRX1603629_T1_1.fastq.gz | s3://ngi-igenomes/test-data/rnaseq/SRX1603629_T1_2.fastq.gz | reverse |
@@ -135,12 +135,12 @@ This dataset represents RNA-Seq samples from various human cell lines (GM12878,
-In Platform, samplesheets and other data can be made easily accessible in one of two ways:
+In Platform, samplesheets and other data can be made easily accessible in one of two ways:
- Use **Data Explorer** to browse and interact with remote data from AWS S3, Azure Blob Storage, and Google Cloud Storage repositories, directly in your organization workspace.
- Use **Datasets** to upload structured data to your workspace in CSV (Comma-Separated Values) or TSV (Tab-Separated Values) format.
- **Add a cloud bucket via Data Explorer**
+ **Add a cloud bucket via Data Explorer**
Private cloud storage buckets accessible with the credentials in your workspace are added to Data Explorer automatically by default. However, you can also add custom directory paths within buckets to your workspace to simplify direct access.
@@ -148,7 +148,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of

- 1. From the **Data Explorer** tab, select **Add cloud bucket**.
+ 1. From the **Data Explorer** tab, select **Add cloud bucket**.
1. Specify the bucket details:
- The cloud **Provider**.
- An existing cloud **Bucket path**.
@@ -157,7 +157,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- An optional bucket **Description**.
1. Select **Add**.
- You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
+ You can now select data directly from this bucket as input when launching your pipeline, without the need to interact with cloud consoles or CLI tools.
@@ -175,7 +175,7 @@ In Platform, samplesheets and other data can be made easily accessible in one of
- Select the **First row as header** option to prevent Platform from parsing the header row of the samplesheet as sample data.
- Select **Upload file** and browse to your CSV or TSV samplesheet file in local storage, or simply drag and drop it into the box.
- The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
+ The dataset is now listed in your organization workspace datasets and can be selected as input when launching your pipeline.
:::info
Platform does not store the data used for analysis in pipelines. The dataset must specify the locations of data stored on your own infrastructure.
@@ -186,14 +186,14 @@ In Platform, samplesheets and other data can be made easily accessible in one of
## Launch pipeline
:::note
-This guide is based on version 3.15.1 of the *nf-core/rnaseq* pipeline. Launch form parameters and tools may differ in other versions.
+This guide is based on version 3.15.1 of the *nf-core/rnaseq* pipeline. Launch form parameters and tools may differ in other versions.
:::
With your compute environment created, *nf-core/rnaseq* added to your workspace Launchpad, and your samplesheet accessible in Platform, you are ready to launch your pipeline. Navigate to the Launchpad and select **Launch** next to **nf-core-rnaseq** to open the launch form.
-The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
+The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
-### General config
+### General config

@@ -202,30 +202,30 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc
- **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution. Config profiles must be defined in the `nextflow.config` file in the pipeline repository.
- **Workflow run name**: An identifier for the run, pre-filled with a random name. This can be customized.
- **Labels**: Assign new or existing [labels](../labels/overview) to the run.
-- **Compute environment**: Your AWS Batch compute environment.
+- **Compute environment**: Your AWS Batch compute environment.
- **Work directory**: The cloud storage path where pipeline scratch data is stored. Platform will create a scratch sub-folder if only a cloud bucket location is specified.
:::note
The credentials associated with the compute environment must have access to the work directory.
:::
-### Run parameters
+### Run parameters

There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
-Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.
+Platform uses the `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters.
-Specify your pipeline input and output and modify other pipeline parameters as needed.
+Specify your pipeline input and output and modify other pipeline parameters as needed.
**input**
- Use **Browse** to select your pipeline input data:
+ Use **Browse** to select your pipeline input data:
- In the **Data Explorer** tab, select the existing cloud bucket that contains your samplesheet, browse or search for the samplesheet file, and select the chain icon to copy the file path before closing the data selection window and pasting the file path in the input field.
- In the **Datasets** tab, search for and select your existing dataset.
@@ -234,7 +234,7 @@ Specify your pipeline input and output and modify other pipeline parameters as n
**outdir**
- Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
+ Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
**Browse** and copy cloud storage directory paths using Data Explorer, or enter a path manually.
@@ -244,9 +244,9 @@ Modify other parameters to customize the pipeline execution through the paramete

-### Advanced settings
+### Advanced settings
-- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
+- Use [resource labels](../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
- [Pipeline secrets](../secrets/overview) store keys and tokens used by workflow tasks to interact with external systems. Enter the names of any stored user or workspace secrets required for the workflow execution.
- See [Advanced options](../launch/advanced) for more details.
@@ -283,7 +283,7 @@ After you have filled the necessary launch details, select **Launch**. The **Run
The paths to report files point to a location in cloud storage (in the `outdir` directory specified during launch), but you can view the contents directly and download each file without navigating to the cloud or a remote filesystem.
:::info
- See [Reports](../reports/overview) for more information.
+ See [Reports](../reports/overview) for more information.
:::
#### View general information
@@ -309,9 +309,9 @@ After you have filled the necessary launch details, select **Launch**. The **Run

- - The **About** tab contains extensive task execution details.
+ - The **About** tab contains extensive task execution details.
- The **Execution log** tab provides a real-time log of the selected task's execution. Task execution and other logs (such as stdout and stderr) are available for download from here, if still available in your compute environment.
- - The **Data Explorer** tab allows you to view the task working directory directly in Platform.
+ - The **Data Explorer** tab allows you to view the task working directory directly in Platform.
Nextflow hash-addresses each task of the pipeline and creates unique directories based on these hashes. Data Explorer allows you to view the log files and output files generated for each task in its working directory, directly within Platform. You can view, download, and retrieve the link for these intermediate files in cloud storage from the **Data Explorer** tab to simplify troubleshooting.
@@ -327,9 +327,9 @@ For the purposes of this guide, an R-IDE will be used to normalize the pipeline
### Prepare your data
-#### Gene counts
+#### Gene counts
-Salmon is the default tool used during the `pseudo-aligner` step of the *nf-core/rnaseq* pipeline. In the pipeline output data, the `/salmon` directory contains the tool's output, including a `salmon.merged.gene_counts_length_scaled.tsv` file.
+Salmon is the default tool used during the `pseudo-aligner` step of the *nf-core/rnaseq* pipeline. In the pipeline output data, the `/salmon` directory contains the tool's output, including a `salmon.merged.gene_counts_length_scaled.tsv` file.
#### Sample info
@@ -371,15 +371,15 @@ The analysis script provided in this section requires a sample information file
From the **Studios** tab, select **Add a studio** and complete the following:
- Select the latest **R-IDE** container image template from the list.
-- Select your AWS Batch compute environment.
+- Select your AWS Batch compute environment.
:::note
-Studio sessions compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and sessions. The default CPU and memory allocation for a Studio is 2 CPUs and 8192 MB RAM.
+Studio sessions compete for computing resources when sharing compute environments. Ensure your compute environment has sufficient resources to run both your pipelines and sessions. The default CPU and memory allocation for a Studio is 2 CPUs and 8192 MB RAM.
:::
-- Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your RNA-Seq run.
+- Mount data using Data Explorer: Mount the S3 bucket or directory path that contains the pipeline work directory of your RNA-Seq run.
- Optional: Enter CPU and memory allocations. The default values are 2 CPUs and 8192 MB memory (RAM).
- Select **Add**.
- Once the Studio has been created, select the options menu next to it and select **Start**.
-- When the Studio is in a running state, **Connect** to it.
+- When the Studio is in a running state, **Connect** to it.
### Perform the analysis and explore results
@@ -467,7 +467,7 @@ The R-IDE can be configured with the packages you wish to install and the R scri
:::info
MDS plots are used to visualize the overall similarity between RNA-Seq samples based on their gene expression profiles, helping to identify sample clusters and potential batch effects.
:::
-
+
```r
# Create MDS plot
# a. Display in RStudio
@@ -523,7 +523,7 @@ The R-IDE can be configured with the packages you wish to install and the R scri
names(results) <- colnames(my.contrasts)
```
- :::info
+ :::info
This script is written for the analysis of human data, based on *nf-core/rnaseq*'s `test_full` dataset. To adapt the script for your data, modify the contrasts based on the comparisons you want to make between your sample groups:
```r
@@ -536,7 +536,7 @@ The R-IDE can be configured with the packages you wish to install and the R scri
```
:::
-1. Print the number of differentially expressed genes for each comparison and save the results to CSV files:
+1. Print the number of differentially expressed genes for each comparison and save the results to CSV files:
```r
# Print the number of differentially expressed genes for each comparison
@@ -673,20 +673,20 @@ The *nf-core/rnaseq* pipeline involves several key steps, each with distinct com
#### Overall run metrics
-**Total pipeline run cost (USD)**:
+**Total pipeline run cost (USD)**:
- Fusion file system with fast instance storage: $34.90
- Plain S3 storage without Fusion: $58.40
**Pipeline runtime**:
-The Fusion file system used with NVMe instance storage contributed to a 34% improvement in total pipeline runtime and a 49% reduction in CPU hours.
+The Fusion file system used with NVMe instance storage contributed to a 34% improvement in total pipeline runtime and a 49% reduction in CPU hours.

#### Process run time
-The Fusion file system demonstrates significant performance improvements for most processes in the *nf-core/rnaseq* pipeline, particularly for I/O-intensive tasks:
+The Fusion file system demonstrates significant performance improvements for most processes in the *nf-core/rnaseq* pipeline, particularly for I/O-intensive tasks:
- The most time-consuming processes see improvements of 36.07% to 70.15%, saving hours of runtime in a full pipeline execution.
- Most processes show significant performance improvements with Fusion, with time savings ranging from 35.57% to 99.14%.
@@ -694,7 +694,7 @@ The Fusion file system demonstrates significant performance improvements for mos
- `SALMON_INDEX` shows a notable 70.15% improvement, reducing runtime from 102.18 minutes to 30.50 minutes.
- `STAR_ALIGN_IGENOMES`, one of the most time-consuming processes, is 53.82% faster with Fusion, saving nearly an hour of runtime.
-
+
| Process | S3 Runtime (min) | Fusion Runtime (min) | Time Saved (min) | Improvement (%) |
|---------|------------------|----------------------|------------------|-----------------|
@@ -733,9 +733,9 @@ The Fusion file system demonstrates significant performance improvements for mos
This profile consists of Nextflow configuration settings for each process and each resource directive (where applicable): **cpus**, **memory**, and **time**. The optimized setting for a given process and resource directive is based on the maximum use of that resource across all tasks in that process.
- Once optimization is selected, subsequent runs of that pipeline will inherit the optimized configuration profile, indicated by the black lightbulb icon with a checkmark.
+ Once optimization is selected, subsequent runs of that pipeline will inherit the optimized configuration profile, indicated by the black lightbulb icon with a checkmark.
- :::info
+ :::info
Optimization profiles are generated from one run at a time, defaulting to the most recent run, and _not_ an aggregation of previous runs.
:::
diff --git a/platform-enterprise_versioned_docs/version-25.1/launch/launchpad.md b/platform-enterprise_versioned_docs/version-25.1/launch/launchpad.md
index 4c0d6a34d..e6636f7ef 100644
--- a/platform-enterprise_versioned_docs/version-25.1/launch/launchpad.md
+++ b/platform-enterprise_versioned_docs/version-25.1/launch/launchpad.md
@@ -86,7 +86,7 @@ The dropdown of available config profiles is populated by inspecting the Nextflo
There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text, select attributes from dropdowns, and browse input and output locations with [Data Explorer](../data/data-explorer).
-- The **Config view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** dropdown.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** dropdown.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
Seqera uses a `nextflow_schema.json` file in the root of the pipeline repository to dynamically create a form with the necessary pipeline parameters. Most pipelines contain at least input and output parameters:
diff --git a/platform-enterprise_versioned_docs/version-25.2/getting-started/quickstart-demo/launch-pipelines.md b/platform-enterprise_versioned_docs/version-25.2/getting-started/quickstart-demo/launch-pipelines.md
index 4ffc3cdae..6c9d6309b 100644
--- a/platform-enterprise_versioned_docs/version-25.2/getting-started/quickstart-demo/launch-pipelines.md
+++ b/platform-enterprise_versioned_docs/version-25.2/getting-started/quickstart-demo/launch-pipelines.md
@@ -23,12 +23,12 @@ The Launchpad in every Platform workspace allows users to easily create and shar
## Launch a pipeline
:::note
-This guide is based on version 3.15.1 of the [nf-core/rnaseq pipeline](https://github.com/nf-core/rnaseq). Launch form parameters and tools will differ for other pipelines.
+This guide is based on version 3.15.1 of the [nf-core/rnaseq pipeline](https://github.com/nf-core/rnaseq). Launch form parameters and tools will differ for other pipelines.
:::
Navigate to the Launchpad and select **Launch** next to your pipeline to open the launch form.
-The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
+The launch form consists of **General config**, **Run parameters**, and **Advanced options** sections to specify your run parameters before execution, and an execution summary. Use section headings or select the **Previous** and **Next** buttons at the bottom of the page to navigate between sections.
Nextflow parameter schema
@@ -36,48 +36,48 @@ The launch form consists of **General config**, **Run parameters**, and **Advanc
The launch form lets you configure the pipeline execution. The pipeline parameters in this form are rendered from a [pipeline schema](../../pipeline-schema/overview) file in the root of the pipeline Git repository. `nextflow_schema.json` is a simple JSON-based schema describing pipeline parameters for pipeline developers to easily adapt their in-house Nextflow pipelines to be executed in Platform.
:::tip
- See [Best Practices for Deploying Pipelines with the Seqera Platform](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-seqera-platform/) to learn how to build the parameter schema for any Nextflow pipeline automatically with tooling maintained by the nf-core community.
+ See [Best Practices for Deploying Pipelines with the Seqera Platform](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-seqera-platform/) to learn how to build the parameter schema for any Nextflow pipeline automatically with tooling maintained by the nf-core community.
:::
-### General config
+### General config

- **Pipeline to launch**: The pipeline Git repository name or URL. For saved pipelines, this is prefilled and cannot be edited.
- **Revision number**: A valid repository commit ID, tag, or branch name. For saved pipelines, this is prefilled and cannot be edited.
-- (*Optional*) **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution.
+- (*Optional*) **Config profiles**: One or more [configuration profile](https://www.nextflow.io/docs/latest/config.html#config-profiles) names to use for the execution.
- **Workflow run name**: An identifier for the run, pre-filled with a random name. This can be customized.
- (*Optional*) **Labels**: Assign new or existing [labels](../../labels/overview) to the run.
-- **Compute environment**: Select an existing workspace [compute environment](../../compute-envs/overview).
+- **Compute environment**: Select an existing workspace [compute environment](../../compute-envs/overview).
- **Work directory**: The (cloud or local) file storage path where pipeline scratch data is stored. Platform will create a scratch sub-folder if only a cloud bucket location is specified.
:::note
The credentials associated with the compute environment must have access to the work directory.
:::
-### Run parameters
+### Run parameters

There are three ways to enter **Run parameters** prior to launch:
- The **Input form view** displays form fields to enter text or select attributes from lists, and browse input and output locations with [Data Explorer](../../data/data-explorer).
-- The **Config view** displays raw configuration text that you can edit directly. Select JSON or YAML format from the **View as** list.
+- The **Params file view** displays a raw schema that you can edit directly. Select JSON or YAML format from the **View as** list.
- **Upload params file** allows you to upload a JSON or YAML file with run parameters.
Specify your pipeline input and output and modify other pipeline parameters as needed:
#### input
-Use **Browse** to select your pipeline input data:
+Use **Browse** to select your pipeline input data:
- In the **Data Explorer** tab, select the existing cloud bucket that contains your samplesheet, browse or search for the samplesheet file, and select the chain icon to copy the file path before closing the data selection window and pasting the file path in the input field.
- In the **Datasets** tab, search for and select your existing dataset.
#### outdir
-Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
+Use the `outdir` parameter to specify where the pipeline outputs are published. `outdir` must be unique for each pipeline run. Otherwise, your results will be overwritten.
**Browse** and copy cloud storage directory paths using Data Explorer, or enter a path manually.
@@ -87,10 +87,10 @@ Modify other parameters to customize the pipeline execution through the paramete

-### Advanced settings
+### Advanced settings
-- Use [resource labels](../../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
+- Use [resource labels](../../resource-labels/overview) to tag the computing resources created during the workflow execution. While resource labels for the run are inherited from the compute environment and pipeline, workspace admins can override them from the launch form. Applied resource label names must be unique.
- [Pipeline secrets](../../secrets/overview) store keys and tokens used by workflow tasks to interact with external systems. Enter the names of any stored user or workspace secrets required for the workflow execution.
- See [Advanced options](../../launch/advanced) for more details.
-After you have filled the necessary launch details, select **Launch**. The **Runs** tab shows your new run in a **submitted** status at the top of the list. Select the run name to navigate to the [**View Workflow Run**](../../monitoring/overview) page and view the configuration, parameters, status of individual tasks, and run report.
\ No newline at end of file
+After you have filled the necessary launch details, select **Launch**. The **Runs** tab shows your new run in a **submitted** status at the top of the list. Select the run name to navigate to the [**View Workflow Run**](../../monitoring/overview) page and view the configuration, parameters, status of individual tasks, and run report.