feat: support biglake tables in pandas_gbq.sample #1014

tswast · 2026-01-21T18:34:21Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

gemini-code-assist · 2026-01-21T18:34:40Z

Summary of Changes

Hello @tswast, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the pandas_gbq.sample functionality by integrating support for BigLake tables. It introduces a dedicated module for BigLake-specific operations, refactors the existing sampling mechanisms to accommodate different table types, and updates dependencies to support the new BigLake interactions. The overall aim is to broaden the utility of the sampling function to a wider range of BigQuery-related data sources.

Highlights

BigLake Table Support: Introduced initial support for BigLake tables within the pandas_gbq.sample function, allowing it to differentiate and handle BigLake table identifiers.
New BigLake Utility Module: Added a new module pandas_gbq/core/biglake.py which includes utilities for interacting with BigLake tables, such as a BigLakeTableId dataclass and a get_table_metadata function to fetch table metadata using the Iceberg REST Catalog API.
Refactored Sampling Logic: The core sampling logic in pandas_gbq.core.sample.py has been refactored into two new helper functions: _sample_bq_table for standard BigQuery tables and _sample_biglake_table for BigLake tables. The main sample function now dispatches to these helpers based on the table_id format.
Dependency Update: Added requests as a new dependency to setup.py and pinned its version in testing/constraints-3.9.txt, which is required for making HTTP requests to the BigLake Iceberg REST Catalog.
Internal API Change: The _sample_with_tablesample and _sample_with_limit internal functions now accept a table_id string directly instead of a google.cloud.bigquery.Table object, making them more flexible.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds the initial structure to support sampling from BigLake tables in pandas_gbq.sample. It refactors the existing sampling logic into separate functions for BigQuery and BigLake tables and introduces a new biglake.py module to interact with the BigLake REST API.

My review has identified a few critical issues. The core logic for sampling BigLake tables is not yet implemented. The new biglake.py module contains a critical bug in URL construction and lacks proper error handling for API requests. I've also noted some areas for improvement regarding the robustness of table ID parsing and documentation. Please address these points to complete the feature.

gemini-code-assist · 2026-01-21T18:35:46Z

pandas_gbq/core/biglake.py

+    return session.get(
+        f"{_ICEBERG_REST_CATALOG_URI}.{path}",
+        headers={
+            "x-goog-user-project": billing_project_id,
+            "Content-Type": "application/json; charset=utf-8",
+            # TODO(tswast): parameter for this option (or get from catalog metadata?)
+            # /iceberg/{$api_version}/restcatalog/extensions/{name=projects/*/catalogs/*}
+            "X-Iceberg-Access-Delegation": "vended-credentials",
+        },
+    ).json()


The URL for the BigLake REST API is constructed incorrectly. There's an extra . between the base URI and the path, which will lead to a 404 Not Found error. The path already starts with a /. Additionally, the response from session.get is not checked for errors before attempting to parse it as JSON. This can lead to unhelpful JSONDecodeError exceptions on HTTP failures. You should call response.raise_for_status() to handle non-2xx responses gracefully.

response = session.get( f"{_ICEBERG_REST_CATALOG_URI}{path}", headers={ "x-goog-user-project": billing_project_id, "Content-Type": "application/json; charset=utf-8", # TODO(tswast): parameter for this option (or get from catalog metadata?) # /iceberg/{$api_version}/restcatalog/extensions/{name=projects/*/catalogs/*} "X-Iceberg-Access-Delegation": "vended-credentials", }, ) response.raise_for_status() return response.json()

gemini-code-assist · 2026-01-21T18:35:46Z

pandas_gbq/core/sample.py

+def _sample_biglake_table(
+    *,
+    table_id: str,
+    credentials: google.oauth2.credentials.Credentials,
+    bqclient: google.cloud.bigquery.Client,
+    target_bytes: int,
+    progress_bar_type: str | None,
+    use_bqstorage_api: bool,
+) -> Optional[pandas.DataFrame]:
+    pass


The function _sample_biglake_table is a stub with a pass statement. The PR title suggests support for BigLake tables is being added, but the core implementation is missing. This function needs to be implemented to fulfill the feature's goal. Additionally, corresponding unit tests should be added to verify its functionality.

gemini-code-assist · 2026-01-21T18:35:46Z

pandas_gbq/core/biglake.py

+    """
+    # https://iceberg.apache.org/spec/#metrics
+    # total-files-size
+    project, catalog, namespace, table = table_id.split(".")


Unpacking table_id.split('.') directly into four variables is unsafe. If table_id doesn't contain exactly three dots, this will raise a ValueError. It's better to validate the number of parts before unpacking to provide a more informative error message and prevent unexpected crashes.

parts = table_id.split(".") if len(parts) != 4: raise ValueError( "Invalid BigLake table ID. Expected format: " "project.catalog.namespace.table" ) project, catalog, namespace, table = parts

gemini-code-assist · 2026-01-21T18:35:46Z

pandas_gbq/core/biglake.py

+    """
+    Docstring for get_table_metadata


The docstring for get_table_metadata is a placeholder. Please add a proper docstring explaining the function's purpose, its parameters (table_id, credentials, billing_project_id), and what it returns. This is important for maintainability and usability.

gemini-code-assist · 2026-01-21T18:35:46Z

pandas_gbq/core/sample.py

+    # BigLake tables can't be read directly by the BQ Storage Read API, so make
+    # sure we run a query first.
+    parts = table_id.split(".")
+    if len(parts) == 4:


Relying on len(table_id.split('.')) == 4 to identify a BigLake table is fragile. BigQuery identifiers can contain dots if they are quoted (e.g., my-project.dataset.with.dot.table). This could lead to misidentifying a standard BigQuery table as a BigLake table. Consider a more robust detection mechanism. For example, you could attempt to parse the ID as a BigLake ID and handle failure, or introduce an explicit parameter to specify the table type.

feat: support biglake tables in pandas_gbq.sample

8784664

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-pandas API. samples Issues that are directly related to samples. labels Jan 21, 2026

gemini-code-assist bot reviewed Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support biglake tables in pandas_gbq.sample #1014

feat: support biglake tables in pandas_gbq.sample #1014

Uh oh!

tswast commented Jan 21, 2026

Uh oh!

gemini-code-assist bot commented Jan 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Uh oh!

gemini-code-assist bot Jan 21, 2026

Uh oh!

gemini-code-assist bot Jan 21, 2026

Uh oh!

gemini-code-assist bot Jan 21, 2026

Uh oh!

gemini-code-assist bot Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: support biglake tables in pandas_gbq.sample #1014

Are you sure you want to change the base?

feat: support biglake tables in pandas_gbq.sample #1014

Uh oh!

Conversation

tswast commented Jan 21, 2026

Uh oh!

gemini-code-assist bot commented Jan 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant