[query][api] add querying to backend via SDK by asaiacai · Pull Request #52 · Trainy-ai/pluto

asaiacai · 2026-02-18T19:18:20Z

Tested (run the relevant ones):

Code formatting: bash format.sh
Any manual or new tests for this PR (please specify below)

Proposes a pluto.query module wrapping existing server REST endpoints to support programmatic run reads, metric fetching, file downloads, and run resumption for the Neptune-to-Pluto migration use case. https://claude.ai/code/session_01VUDsz1aDmSAeuThR5yJraP

Implements a thin Python client wrapping the existing Pluto server REST API (/api/runs/*) with API-key authentication. This enables the Neptune-to-Pluto migration use case where users need to programmatically read run data (configs, metrics, files) for async evaluation pipelines. API surface: - Client class with explicit auth/host config - list_projects, list_runs, get_run (by numeric or display ID) - get_metrics (returns pandas DataFrame), get_metric_names - get_files, download_file - get_logs, get_statistics, compare_runs, leaderboard - Module-level convenience functions (use PLUTO_API_TOKEN env var) Includes 44 unit tests with mocked HTTP responses. https://claude.ai/code/session_01VUDsz1aDmSAeuThR5yJraP

gemini-code-assist · 2026-02-18T19:18:38Z

Summary of Changes

Hello @asaiacai, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Pluto Python SDK by introducing a dedicated pluto.query module. This new capability provides users with programmatic access to read and query various aspects of Pluto runs, such as metrics, files, and logs, directly from the SDK. The primary motivation is to facilitate customer migrations from Neptune by offering a robust way to fetch existing run data and subsequently append new evaluation results. The implementation leverages existing Pluto server REST API endpoints, minimizing the need for new backend development, and includes a forward-looking proposal to streamline run resumption for both read and write operations.

Highlights

New pluto.query Module: Introduced a new pluto.query module to the Pluto Python SDK, providing programmatic read and query access to runs, metrics, files, and logs. This module wraps existing Pluto server REST API endpoints.
Comprehensive Read Operations: The pluto.query.Client class and module-level functions enable listing projects and runs, retrieving detailed run information, fetching time-series metrics, metric names, file metadata with download URLs, console logs, and computing metric statistics.
Run Resumption for Writing: Proposed an enhancement to pluto.init() with a with_id parameter, allowing users to resume an existing run by its server-assigned ID for appending data. This includes a ReadOnlyRun mode for lightweight read-only access and requires a minor backend change.
Neptune Migration Support: The new API aims to provide read/query parity with Neptune for customer migrations, supporting a workflow where users read run details and then append evaluation results.
Robust HTTP Client: The Client class includes built-in authentication via API tokens, URL resolution logic, and retry mechanisms for transient network or server errors.

Changelog

design/read-query-api.md
- Added a design document outlining the proposed read/query API for Pluto, including context, goals, existing endpoints, proposed API structure, Neptune migration considerations, implementation plan, alternatives, and open questions.
pluto/init.py
- Updated the package's __init__.py to import and expose the new query submodule.
pluto/query.py
- Added a new Python module that implements the Pluto query API client, including the Client class for HTTP requests, methods for various read operations (list projects, list runs, get run, get metrics, get files, get logs, etc.), and helper functions for API token and URL resolution.
tests/test_query.py
- Added new unit tests to validate the functionality of the pluto.query module, covering token and URL resolution, client initialization, all implemented API methods, and error handling with retries.

Activity

No specific activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a much-needed read/query API to the Pluto SDK, which is a great step towards feature parity with Neptune and enabling customer migration workflows. However, a high-severity path traversal vulnerability was identified in the download_file method. This vulnerability arises from using untrusted filenames from the server to construct local file paths without sanitization, and it must be addressed by ensuring only the base filename is used for destination paths. While the implementation is well-structured with clear separation between the Client class and convenience functions, and includes a comprehensive test suite and design document, my review also provides suggestions to improve design consistency, code clarity, and resilience.

gemini-code-assist · 2026-02-18T19:25:50Z

pluto/query.py

+
+        dest = Path(destination)
+        if dest.is_dir():
+            dest = dest / file_info.get('fileName', file_name)


The download_file method is vulnerable to path traversal. It uses the fileName returned by the Pluto server (or the user-supplied file_name as a fallback) to construct the destination path without any sanitization. If a malicious or compromised server returns a fileName containing path traversal sequences (e.g., ../../../../etc/passwd), the client will write the downloaded content to an arbitrary location on the user's file system, potentially overwriting sensitive files. To remediate this, sanitize the filename using os.path.basename() to extract only the filename part.

Suggested change

dest = dest / file_info.get('fileName', file_name)

if dest.is_dir():

dest = dest / os.path.basename(file_info.get('fileName', file_name))

gemini-code-assist · 2026-02-18T19:25:50Z

design/read-query-api.md

+
+### Module: `pluto.query`
+
+New module at `pluto/query.py`. All functions are standalone (no global state). Users pass project name and authenticate via the same `PLUTO_API_TOKEN` used for logging.


The statement "All functions are standalone (no global state)" is not entirely accurate. The implementation in pluto/query.py uses a shared, module-level _default_client instance for the convenience functions (e.g., pq.list_runs). This is a form of global state. While this is a common and often acceptable pattern for client libraries, the design document should reflect this to avoid confusion. Consider rephrasing to mention that the module-level functions use a shared client instance for convenience.

gemini-code-assist · 2026-02-18T19:25:51Z

design/read-query-api.md

+client.get_metric_names(
+    project: str,
+    run_id: int | None = None,   # None = all metrics in project
+    search: str | None = None,   # Filter by name substring
+) -> list[str]


The parameter run_id: int | None = None seems inconsistent with the implementation and the corresponding API endpoint. The implementation uses run_ids: Optional[List[int]] = None, which aligns with the GET /api/runs/metric-names endpoint that accepts a runIds query parameter (a comma-separated list). To maintain consistency between the design and implementation, consider updating the signature here to run_ids: list[int] | None = None.

gemini-code-assist · 2026-02-18T19:25:51Z

pluto/query.py

+        """Fetch time-series metric data for a run.
+
+        The server returns up to ``limit`` data points per metric, sampled
+        via reservoir sampling when the full series exceeds the limit.
+
+        When *pandas* is installed the return value is a
+        :class:`~pandas.DataFrame` with columns ``metric``, ``step``,
+        ``value``, ``time``. Otherwise a list of dicts is returned.
+
+        Args:
+            project: Project name.
+            run_id: Numeric server ID.
+            metric_names: Metric names to fetch. ``None`` fetches all.
+            limit: Max data points per metric (max 10 000).
+
+        Returns:
+            ``pandas.DataFrame`` or ``list[dict]``.
+        """


The implementation for fetching multiple metrics makes a separate API call for each metric name. This is an important performance consideration that users should be aware of. Please add a note to the docstring to clarify this behavior.

"""Fetch time-series metric data for a run. The server returns up to ``limit`` data points per metric, sampled via reservoir sampling when the full series exceeds the limit. When *pandas* is installed the return value is a :class:`~pandas.DataFrame` with columns ``metric``, ``step``, ``value``, ``time``. Otherwise a list of dicts is returned. .. note:: When multiple metric names are provided, a separate API request is made for each metric. This may have performance implications if a large number of metrics are requested. Args: project: Project name. run_id: Numeric server ID. metric_names: Metric names to fetch. ``None`` fetches all. limit: Max data points per metric (max 10 000). Returns: ``pandas.DataFrame`` or ``list[dict]``. """

gemini-code-assist · 2026-02-18T19:25:51Z

pluto/query.py

+    ) -> Any:
+        if retry >= _RETRY_MAX:
+            raise QueryError(f'Failed after {retry} retries: {error_info}')
+        wait = min(_RETRY_WAIT_MIN * (2 ** (retry + 1)), _RETRY_WAIT_MAX)


The current retry mechanism uses exponential backoff, which is great. To further improve resilience, especially in scenarios with multiple clients retrying simultaneously, consider adding jitter to the backoff delay. This helps to spread out retry attempts and avoid thundering herd effects. A common approach is to add a small, random amount of time to the delay, for example: import random; ...; wait_with_jitter = wait + random.uniform(0, wait * 0.5).

gemini-code-assist · 2026-02-18T19:25:51Z

pluto/query.py

+            assert __import__('sys').platform == 'darwin'
+            token = keyring.get_password('pluto', 'pluto')


Using assert for control flow is unconventional and can be confusing. It's better to use a standard if statement to check the platform and explicitly raise an exception to trigger the fallback logic. This improves readability.

Suggested change

assert __import__('sys').platform == 'darwin'

token = keyring.get_password('pluto', 'pluto')

if __import__('sys').platform != 'darwin':

raise AssertionError

token = keyring.get_password('pluto', 'pluto')

claude added 2 commits February 17, 2026 23:24

gemini-code-assist bot reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[query][api] add querying to backend via SDK#52

[query][api] add querying to backend via SDK#52
asaiacai wants to merge 2 commits intomainfrom
claude/neptune-pluto-migration-design-n4bSd

asaiacai commented Feb 18, 2026

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

gemini-code-assist bot Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	dest = dest / file_info.get('fileName', file_name)
	if dest.is_dir():
	dest = dest / os.path.basename(file_info.get('fileName', file_name))


		### Module: `pluto.query`

		New module at `pluto/query.py`. All functions are standalone (no global state). Users pass project name and authenticate via the same `PLUTO_API_TOKEN` used for logging.

		assert __import__('sys').platform == 'darwin'
		token = keyring.get_password('pluto', 'pluto')

Conversation

asaiacai commented Feb 18, 2026

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants