Implement persistent shells for improved performance#960
Draft
GlassOfWhiskey wants to merge 1 commit intomasterfrom
Draft
Implement persistent shells for improved performance#960GlassOfWhiskey wants to merge 1 commit intomasterfrom
GlassOfWhiskey wants to merge 1 commit intomasterfrom
Conversation
❌ 2 Tests Failed:
View the top 2 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
ca1f57a to
5f1d8a6
Compare
2fe08b9 to
d122de8
Compare
5e098c3 to
ffbd7e7
Compare
f20ace3 to
8810c6b
Compare
d8301c5 to
cdb8ab0
Compare
This commit introduces a persistent shell architecture that
significantly speeds up remote command execution in StreamFlow
by reusing shell sessions instead of creating new processes for
each command. Key features are:
- Core Shell Architecture:
- Add `Shell` abstract base class in `core/deployment.py`
defining the interface for persistent shell sessions
- Implement `BaseShell` in `deployment/shell.py` with command
execution, output capture, and lifecycle management
- Create specialized shell implementations for different
connector types (Base, SSH, Docker, Kubernetes)
- Connector Integration:
- Add the `get_shell()` method to the `Connector` interface for
obtaining persistent shell instances
- Implement shell caching and reuse in `BaseConnector` with
thread-safe access via locks
- Update `run()` methods across connectors to automatically
use persistent shells when possible
- Add graceful fallback to direct execution if shell
operations fail
- Connector-Specific Implementations:
- `SubprocessShell`: Local shell execution with asyncio
subprocess pipes
- `SSHShell`: Remote shell over SSH connections using
`asyncssh`
- `KubernetesShell`: Shell execution in pods via WebSocket
streams
- `Docker/Singularity`: Shell execution in containers via
`docker exec` and `singularity exec` wrappers
- Performance Optimizations:
- Reuse shell sessions across multiple commands to reduce
connection overhead
- Implement buffered I/O with configurable buffer sizes
- Use unique end markers to reliably detect command
completion
- Remote Path Improvements:
- Optimize `glob()` implementation in `RemoteStreamFlowPath`
to reduce command overhead
- Fix shell quoting issues in `rmtree()` command
This commit also resolves a race condition caused by concurrent
updates of file tokens in the `CWLTokenProcessor` class, and
fixes race conditions with stream `close` logic when transferring
files through `tar` streams.
cdb8ab0 to
47de0f6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit introduces a persistent shell architecture that significantly speeds up remote command execution in StreamFlow by reusing shell sessions instead of creating new processes for each command. Key features are:
Core Shell Architecture:
Shellabstract base class incore/deployment.pydefining the interface for persistent shell sessionsBaseShellindeployment/shell.pywith command execution, output capture, and lifecycle managementConnector Integration:
get_shell()method to theConnectorinterface for obtaining persistent shell instancesBaseConnectorwith thread-safe access via locksrun()methods across connectors to automatically use persistent shells when possibleConnector-Specific Implementations:
SubprocessShell: Local shell execution with asyncio subprocess pipesSSHShell: Remote shell over SSH connections usingasyncsshKubernetesShell: Shell execution in pods via WebSocket streamsDocker/Singularity: Shell execution in containers viadocker execandsingularity execwrappersPerformance Optimizations:
Remote Path Improvements:
glob()implementation inRemoteStreamFlowPathto reduce command overheadrmtree()commandThis commit also resolves a race condition caused by concurrent updates of file tokens in the
CWLTokenProcessorclass , and fixes race conditions with streamcloselogic when transferring files throughtarstreams.