feat: adds support for image builder api v3#504
Conversation
cd6b0db to
5aa6cb6
Compare
81b41a2 to
112e296
Compare
b9f4ecb to
6896c00
Compare
|
|
||
| def __str__(self) -> str: | ||
| # Shows a simple human readable representation of the Function. Used in error messages. | ||
| function_name: str = ( |
There was a problem hiding this comment.
Don't change this please. This is handling cases of inconsistent misconfigured by user function objects.
When app is deployed it's fully validated and doesn't have any inconsistencies.
| return self._awaitables_factory | ||
|
|
||
| @property | ||
| def function_name(self) -> str: |
There was a problem hiding this comment.
This new getter is most probably not needed because we're already using this information provided by SDK.
I'll figure out how you use this and write an approach that aligns with current architecture.
There was a problem hiding this comment.
It was getting it via the private path
| ) | ||
| from tensorlake.applications.image_builder.client_v3 import ( | ||
| ImageBuilderClientV3, | ||
| ImageBuilderClientV3Options, |
There was a problem hiding this comment.
Why ImageBuilderClient and ImageBuilder are in separate packages? Aren't all of this is an ImageBuilderClient stuff?
| f"⚙️ Preparing deployment for application(s) from {application_file_path}" | ||
| ) | ||
|
|
||
| opts = ImageBuilderClientV3Options.from_env() |
There was a problem hiding this comment.
Please add a type hint to opts.
|
|
||
| if not build_req.images: | ||
| click.secho( | ||
| f"❌ No images found for application '{fn_config.function_name}'. " |
There was a problem hiding this comment.
This is already validated during application validation. No need to do this check again.
| raise click.Abort | ||
|
|
||
| await builder.build(build_req) | ||
| except ( |
There was a problem hiding this comment.
If we're rewriting ImageBuilder then it should provide a clear contract on what exception types it rases in which cases. We also shouldn't use click inside SDK where the new ImageBuilder was moved now as SDK doesn't depend on Click, only CLI.
| except Exception: | ||
| # Error message and summary are already printed by builder.build() | ||
| # Print final error message | ||
| click.secho("\n❌ Image build(s) failed", err=True, fg="red") |
There was a problem hiding this comment.
We swallow the Exception here (not printing it) how we and user are going to debug such errors?
| from typing import AsyncGenerator | ||
| from uuid import uuid4 as uuid | ||
|
|
||
| import click |
There was a problem hiding this comment.
Click is a CLI only dependency, please don't add it into SDK.
| import tempfile | ||
| from datetime import datetime | ||
| from typing import AsyncGenerator | ||
| from uuid import uuid4 as uuid |
There was a problem hiding this comment.
We typically use nanoid (vendored module) but if you want we can use something else.
|
|
||
|
|
||
| class BuildRequest: | ||
| """Represents a request to build an application with multiple container images. |
There was a problem hiding this comment.
Do we need such verbose docstrings? Asking because these are internal non-public components.
There was a problem hiding this comment.
yea i thought these were public
| @@ -0,0 +1,1122 @@ | |||
| """ | |||
There was a problem hiding this comment.
Please reduce verbosity of docstrings and make them concise and clear.
| """ | ||
|
|
||
| try: | ||
| v3_req = ( |
There was a problem hiding this comment.
Please add type hints to all local variables to keep the code base easy to read and enforce discipline and alignment with existing coding practices in the repo.
There was a problem hiding this comment.
anyway to force that with a tool?
| } | ||
| except ImageBuilderClientV3Error as e: | ||
| click.secho(str(e), err=True, fg="red") | ||
| raise |
There was a problem hiding this comment.
All these exceptions need to bubble to CLI and then logged their during click.echo.
ImageBuilder needs to document all exceptions it can raise with enough details to make exception handling by CLI meaningful. If CLI is not going to do different things depending on the exception type then just document "raises Exception on error".
There was a problem hiding this comment.
yea. im thinking we need some kind of verbose mode, otherwise showing full stack traces all the time is annoying
| ] | ||
| _ = await asyncio.gather(*process_log_events_tasks, return_exceptions=True) | ||
|
|
||
| except (asyncio.CancelledError, KeyboardInterrupt, click.Abort): |
There was a problem hiding this comment.
Should we remove except for these exceptions from cli code?
| reporters.values(), log_streams, strict=True | ||
| ) | ||
| ] | ||
| _ = await asyncio.gather(*process_log_events_tasks, return_exceptions=True) |
There was a problem hiding this comment.
return_exceptions=True means that all exceptions raised are put into the returned list which is ignored here.
Is this correct?
| self._handle_build_info_error( | ||
| reporter, | ||
| summary, | ||
| f"Error getting final build info for {reporter.image_build_id}: {e}", |
There was a problem hiding this comment.
Do we need to write all these separate except blocks to add these "Error getting final build info" lines while a full backtrace and exception is available to us?
There was a problem hiding this comment.
just different error messages. happy to learn
| Args: | ||
| summary: Dictionary with counts for total, succeeded, failed, canceled, and unknown. | ||
| """ | ||
| total = summary["total"] |
There was a problem hiding this comment.
Please use a @DataClass or anything similar instead of untyped dicts.
Also please don't print anything except one success message for all image builds if everything succeeded.
We need to reduce cognitive load on users, not increase it by printing more build stats than they need to see.
I think all these print methods can be removed unless you have a clear reason why to keep them.
Also we need to output a single final success/failure message for all applications. Thus this printing needs to be done in CLI.
| """ | ||
| if not image_info.functions: | ||
| raise ValueError("image_info.functions cannot be empty") | ||
| self.image_info = image_info |
There was a problem hiding this comment.
If the whole purpose of this object is to wrap ImageInformation and provide a method that transforms it, why don't just replace it with private function def _v3_request(image_info) ? This would remove > 100 lines of docstrings and not useful code. Also ImageBuildRequest and BuildRequest are hard to understand entities. It sounds like the same things. If we keep ImageBuildRequest class then it needs to be clear what it is just from its name.
| may consist of multiple container images. Each image can contain different | ||
| sets of functions and their dependencies. | ||
|
|
||
| Tensorlake applications can use multiple images to separate functions with |
There was a problem hiding this comment.
We don't need to explain Tensorlake Applications UX here as this class if for internal use and we have good public docs that are responsible for explaining all of this.
| The v3 image build request. | ||
| """ | ||
| image = self.image_info.image | ||
| function_names = [func.function_name for func in self.image_info.functions] |
There was a problem hiding this comment.
Please use func._function_config.function_name. It's okay because this code is part of SDK.
This is consistent with how function name is obtained currently all across SDK for properly initialized function objects. Also .function_name public property shouldn't be added to Function object because it makes it usable by users because they work with Function objects directly. We minimize API surface exposed to users to minimal to ease maintenance.
There was a problem hiding this comment.
felt kind of yucky reaching into its internals...
| print_validation_messages, | ||
| validate_loaded_applications, | ||
| ) | ||
| from tensorlake.builder.client_v2 import BuildContext, ImageBuilderV2Client |
There was a problem hiding this comment.
Please delete all clients not used anymore in this PR.
| self._print_build_summary(summary) | ||
| # Use os._exit() to bypass asyncio cleanup and avoid "unhandled exception" errors | ||
| # This exits immediately without triggering asyncio.run() cleanup issues | ||
| os._exit(0) |
There was a problem hiding this comment.
This looks like a hack, please fix this properly or explain why it's not possible at the moment.
| f"Build service error getting final build info for {reporter.image_build_id}: {e}", | ||
| ) | ||
| except Exception as e: # pylint: disable=broad-except | ||
| self._handle_build_info_error( |
There was a problem hiding this comment.
Why do we have many except blocks that do almost the same?
Also why do we call _handle_build_info_error instead of bubbling the exception up to the caller?
|
|
||
| from tensorlake.cli._common import ASYNC_HTTP_EVENT_HOOKS | ||
|
|
||
| # Enable httpx debug logging if requested via environment variable |
There was a problem hiding this comment.
httpx is used across SDK and CLI.
What side effects does this have on them?
Is it right to put this code to client_v3.py?
Is this approach documented in httpx docs?
| return self._project_id | ||
|
|
||
| @classmethod | ||
| def from_env(cls) -> "ImageBuilderClientV3Options": |
There was a problem hiding this comment.
This looks to me like we're copy-pasting CLI code here.
Why do we do that? Who's going to use from_env() ?
| Validation is not performed in __init__. Call validate() to validate the configuration. | ||
| """ | ||
|
|
||
| _base_url: str = field(init=False) |
There was a problem hiding this comment.
If this is a simple @DataClass then why do we make these fields private and add lots of getter properties for it? Can we remove the getter properties?
| _api_key: str | None = field(default=None, init=False) | ||
| _pat: str | None = field(default=None, init=False) | ||
| _organization_id: str | None = field(default=None, init=False) | ||
| _project_id: str | None = field(default=None, init=False) |
There was a problem hiding this comment.
What's the use for field here?
|
|
||
|
|
||
| @dataclass | ||
| class ImageBuilderClientV3Options: |
There was a problem hiding this comment.
Can we remove this and just create ImageBuilderClientV3() directly in CLI?
We're essentially creating an abstraction here over concepts that ImageBuilderClient shouldn't know about like pat,api_key,organization_id, project_id. This is all managed by CLI. Long story short there's nothing ImageBuilder specific in ImageBuilderClientV3Options.
Also please extract common logic with APIClient, see https://github.com/tensorlakeai/tensorlake/blob/main/src/tensorlake/applications/remote/api_client.py#L157C9-L157C16 and https://github.com/tensorlakeai/tensorlake/blob/main/src/tensorlake/applications/remote/api_client.py#L490 and reuse it in both this client and APIClient. This will result in a more complete refactoring that you started already.
There was a problem hiding this comment.
If you really want to keep ImageBuilderClientV3Options then it needs to be a common non-image builder specific options object accepted by both image builder and existing API clients. The options objects needs to be defined in a common package and filled by CLI which handles configuration of all these things.
Also if you use it then existing CLI code needs to use it too where applicable.
| # ============================================================================ | ||
|
|
||
| # Type aliases for clarity (these are just str at runtime) | ||
| ImageBuildId = str |
There was a problem hiding this comment.
Why do we add these type aliases instead of using str directly?
eabatalov
left a comment
There was a problem hiding this comment.
Overall feedback: please remove unnecessary docstrings and abstractions + address the comments I left so far.
|
Another thing. Please move all image builder related stuff inside SDK under I didn't review the following but please ensure that it's correct:
|
Uh oh!
There was an error while loading. Please reload this page.