-
Notifications
You must be signed in to change notification settings - Fork 876
feat: add support for preserving and labeling intermediate stage images #6556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add support for preserving and labeling intermediate stage images #6556
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ezopezo The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Ephemeral COPR build failed. @containers/packit-build please check. |
59cd9ae to
6bd3187
Compare
b7e81df to
3314051
Compare
|
/retest |
|
@ezopezo: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@nalind can you please take a look and put ok-to-test label? It seems to me that tests are failing most likely with some timeouts and thus I would like to try to re-run them (or please tell me what I just broke :) ). |
|
/ok-to-test |
|
/test |
|
@ezopezo: No presubmit jobs available for containers/buildah@main DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
3314051 to
3955b20
Compare
|
@nalind @mtrmac @TomSweeneyRedHat can you please take a look on this? (or pick up some appropriate reviewers?) Thanks in advance! |
|
@flouthoc if you have time |
|
|
||
| buildah build --layers -t imageName . | ||
|
|
||
| buildah build --cache-stages --stage-labels -t imageName . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to show the bool values in play, perhaps
| buildah build --cache-stages --stage-labels -t imageName . | |
| buildah build --cache-stages false --stage-labels false -t imageName . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skipped that since this is a relatively uncommon use case for boolean flags, because the false behavior is implicit whenever the flag is absent. Other flags such as --no-cache or --layers follow the same pattern (present when true, absent when not, without presence of the --layers false) so I wanted to stay consistent with your docs. If you feel strongly about it, I certainly can add the examples :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a long-standing bit of confusion in the man pages where, even though the boolean argument values are optional, we don't suggest that they always have to be supplied in the --flag=value form, with an equal sign, to prevent them from being treated as unrelated arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalind Does it make a sense to you to add args with equal sign and boolean or leave it as it is? What do you think?
| Find an intermediate image for a specific stage name: | ||
|
|
||
| buildah images --filter "label=io.buildah.stage.name=builder" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! TY!
pkg/cli/build.go
Outdated
| LayerLabels: iopts.LayerLabel, | ||
| Layers: layers, | ||
| CacheStages: iopts.CacheStages, | ||
| StageLabels: iopts.StageLabels, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add these three options in alphabetically. I.e. CacheStages at line 396.5 and so on.
pkg/cli/common.go
Outdated
| Format string | ||
| From string | ||
| Iidfile string | ||
| BuildIDFile string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep making my OCD happy, please move this to line 63.5
pkg/cli/common.go
Outdated
| Layers bool | ||
| ForceRm bool | ||
| Layers bool | ||
| CacheStages bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More OCD alpha changes, please put this at line 28.5
|
|
||
| run_buildah rmi --all | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update and/or add tests to use "true" and "false" for the boolean flags? Leave at least one using the default true for each too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For false I added two new tests for --cache-stages and --stage-labels and for true I updated one existing test 👍
TomSweeneyRedHat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6b2baab to
3272226
Compare
nalind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think I understand the goals here. The new flags interact with --layers in confusing ways, though - the build ID recorded for --build-id-file won't match the image whose ID we return if the build is a cache hit, but we may quietly commit new versions of the final product of intermediate stages. I'm not really sure what the intended result is there.
define/build.go
Outdated
| CacheStages bool | ||
| // StageLabels tells the builder to add metadata labels to intermediate stage images for easier recognition. | ||
| // These labels include stage name, base image, build ID, and parent stage name (when a stage uses another | ||
| // intermediate stage as its base, i.e., transitive aliases). This option requires CacheStages to be enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even as it's being defined, "transitive aliases" doesn't make any sense to me as jargon. Does the term come from somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point - you're right that “transitive alias” isn't established jargon. I started using it internally as shorthand for multi-stage builds where one named stage is used as the base for another, and it unfortunately leaked here. I think I picked it up from an article or docs at some point, but I can't find a reference now - which indeed makes it sound like it was revealed to me in a dream 🙂
I'll happily rephrase all occurrences using some more standard terminology!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
|
|
||
| buildah build --layers -t imageName . | ||
|
|
||
| buildah build --cache-stages --stage-labels -t imageName . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a long-standing bit of confusion in the man pages where, even though the boolean argument values are optional, we don't suggest that they always have to be supplied in the --flag=value form, with an equal sign, to prevent them from being treated as unrelated arguments.
imagebuildah/stage_executor.go
Outdated
| logrus.Debugf("Committing intermediate stage %s (index %d) for --cache-stages", s.name, s.index) | ||
| createdBy := fmt.Sprintf("/bin/sh -c #(nop) STAGE %s", s.name) | ||
| // Commit the stage without squashing, using empty output name (intermediate image) | ||
| imgID, commitResults, err = s.commit(ctx, createdBy, false, "", false, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set emptyLayer as false here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You re right - it should not be false. I updated it such way, if stage was already committed, we only add metadata labels without creating a duplicate empty layer. Let me know, if it is okay!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm thinking that I was confused here - there are cases where we don't commit after the final instruction in a stage, since we're not going to base later stages off of it, and the working container is sufficient for COPY --from references to it later (for example, when imageIsUsedLater is false in the single-layer build case). I think we're okay with false for multi-layer builds, but should probably give it a once-over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My aim with emptyLayer := imgID != "" is to not duplicate an empty layer created by the previous commit that already occurred in the execute function.
When imgID != "" (this means to me that stage was already committed):
- Regular builds (without –layers) commit the stage in execute function if
imageIsUsedLater=true - Also now with
--layers, every instruction is commited, so imgID is set
I think, in these cases, we only need to add labels by commit andemptyLayer=trueshould prevent creating a duplicate layer in the intermediate image (right?).
When imgID == "" (I grasp this as stage was not committed and is left as working container) regular builds with imageIsUsedLater=false don't commit the stage, but we need to capture filesystem changes of the working container- here is emptyLayer=false okay because ensures we create a proper layer with the stage's content..
I think this prevents duplicate layers while correctly capturing filesystem changes when the stage hasn't been committed yet (and we need them).
Let me know if there's an edge case I'm missing or I got tangled into this too much and I am wrong 😀
Does it make a sense to add an mutual exclusion condition that --layers and --cache-stages cannot be used together (as serve fundamentally different purposes)? I want to keep this functionality as isolated as possible from others to avoid unnecessary complexities like this. |
|
@nalind Would you be able to take an another look please? Thanks in advance! |
|
|
Yes, it works well |
|
@nalind Also I almost forgot to ask - how I can effectively test if current changes are compatible with podman?
|
Since we're on |
0fee359 to
a54b1ee
Compare
a54b1ee to
1bfc86d
Compare
This adds support for preserving and labeling intermediate stage images in multi-stage builds. In contrast to the --layers flag, --cache-stages preserves only the final image from each named stage (FROM ... AS name), not every instruction layer. This also keeps the final image's layer count unchanged compared to a regular build. New flags: - --cache-stages: preserve intermediate stage images instead of removing them - --stage-labels: add metadata labels to intermediate stage images (stage name, base image, build ID, parent stage name). Requires --cache-stages. - --build-id-file: write unique build ID (UUID) to file for easier identification and grouping of intermediate images from a single build. Requires --stage-labels. The implementation also includes: - Detection of transitive alias patterns (stage using another intermediate stage as base) - Validation that --stage-labels requires --cache-stages - Validation that --build-id-file requires --stage-labels - Test coverage (15 tests) and documentation updates This functionality is useful for debugging, exploring, and reusing intermediate stage images in multi-stage builds. Signed-off-by: Erik Mravec <emravec@redhat.com>
- removed unused buildIDFile - removed "transitive alias" wording - update logic of the emptyLayer arg in commit - --layers and --cache-stages are mutually exclusive This commit will be suqashed serves only for review purposes. Signed-off-by: Erik Mravec <emravec@redhat.com>
--layers and --cache-stages can be used together. When --cache-stages is enabled (with or without --layers) checkForLayers is set to false, blocking cache lookup (fresh build every time with --cache-stages). When --cache-stages is enabled with --layers together, all other subsequent builds without --cache-stages can use cached layers. This commit will be squashed serves only for review purposes. Signed-off-by: Erik Mravec <emravec@redhat.com>
1bfc86d to
89f7b79
Compare
@nalind Thank you I did it according your guide and created this testing PR. I think it is failing because of the docs mismatch and exceeding some limit of the binary size (which could be overridden by label I think) and also I think I need from maintainers approval for running all of the jobs, could you help me with this please? |
This adds support for preserving and labeling intermediate stage images in multi-stage builds. In contrast to the
--layersflag,--cache-stagespreserves only the final image from each named stage (FROM ... AS name), not every instruction layer. This also keeps the final image's layer count unchanged compared to a regular build.New flags:
--cache-stages: preserve intermediate stage images instead of removing them--stage-labels: add metadata labels to intermediate stage images (stage name, base image, build ID, parent stage name). Requires--cache-stages.--build-id-file: write unique build ID (UUID) to file for easier identification and grouping of intermediate images from a single build. Requires--stage-labels.The implementation also includes:
--stage-labelsrequires--cache-stages--build-id-filerequires--stage-labelsWhat type of PR is this?
/kind feature
What this PR does / why we need it:
General use: This functionality is useful for identification, debugging, and reusing intermediate stage images in multi-stage builds.
Specific need: Identifying the content copied from intermediate stages in multi-stage builds into the final image is a hard requirement for supporting Contextual SBOM - an SBOM that understands the origin of each component.
While intermediate images can be extracted using the
--layersoption, this approach has several issues for our use case:buildah images --all), which introduces unnecessary noise for our purposes.--layers), meaning:Related repositories:
konflux (uses mobster for SBOM generation),
mobster (implements contextual SBOM functionality requiring this change),
capo (wraps builder content identification functionality for mobster),
Contact person: emravec (RedHat) / @ezopezo (Github)
How to verify it
Run any multistage build with intermediate stage specified with implemented arguments. Resulting intermediate images should be correctly labeled. Example:
buildah build --cache-stages --stage-labels --build-id-file ./file.txt -t test:0.1 .Which issue(s) this PR fixes:
Fixes: #6257
Internal Jira: https://issues.redhat.com/browse/ISV-6122
Does this PR introduce a user-facing change?