-
Notifications
You must be signed in to change notification settings - Fork 241
[release-0.0.99.5] Prepare stable branch (part 4) #1742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-0.0.99.5] Prepare stable branch (part 4) #1742
Conversation
Fedora 38 reached End of Life on 21st May 2024: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1527 containers#1741 (cherry picked from commit b684b19)
It's far more consistent and understandable if all tests start with a clean state without any containers or images present. Otherwise, the subtle side-effects of having some image left behind from a previous test can lead to surprises, and there's no need to spend time wondering whether some tests should only clean up the containers or both containers and images. This additional work of cleaning up the images for all tests makes it necessary to increase the timeout for all Fedora nodes to prevent the CI from timing out. containers#1526 containers#1741 (backported from commits 67d4002 and 55c0e63)
Currently, the CI has been frequently timing out on stable Fedora nodes. So, increase the timeout from 1 hour 50 minutes to 2 hours to avoid that. For what it's worth, the timeout for Fedora Rawhide nodes is 2 hours 10 minutes and it seems enough. containers#1546 containers#1741 (backported from commit f2dc3b8)
Summary of ChangesHello @debarshiray, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request is part of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
The system tests download several images when setting up the test suite,
and cache them for later use by the tests [1]. This saves time and
avoids hitting rate limits imposed by OCI registries by not downloading
the same images repeatedly for several tests, but at the cost of
increased use of storage space to cache the images.
The images are cached under BATS_TMPDIR. It defaults to the TMPDIR
environment variable, and if that's not set then to /tmp [2]. Normally,
TMPDIR isn't set, and the images end up getting cached under /tmp. Now,
/tmp is typically on tmpfs backed by RAM or swap, which means that it
should be used for smaller size-bounded files only, and /var/tmp should
be used for everything else [3].
The images are big enough that a collection of them can't be described
as smaller and size-bounded, and it led to:
1..306
# test suite: Set up
# test suite: Tear down
not ok 1 setup_suite
# (from function `setup_suite' in test file ./setup_suite.bash, line
55)
# `_pull_and_cache_distro_image fedora "$((system_version-1))" ||
false' failed
# Failed to cache image registry.fedoraproject.org/fedora-toolbox:40
to /tmp/bats-run-IPz4Cn/image-cache/fedora-toolbox-40
# time="2024-02-19T11:41:43Z" level=fatal msg="copying system image
from manifest list: writing blob: write
/tmp/bats-run-IPz4Cn/image-cache/fedora-toolbox-40/dir-put-blob607392514:
no space left on device"
# bats warning: Executed 1 instead of expected 306 tests
So, change the default location of the BATS_TMPDIR environment variable
to /var/tmp by setting TMPDIR.
[1] Commit 50683c9
containers@50683c9d9a78adc9
containers#375
[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[3] https://systemd.io/TEMPORARY_DIRECTORIES/
containers#1462
containers#1742
(backported from commit 571dc97)
81a6175 to
5655647
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request prepares for a new release by updating CI configurations for the release-0.0.99.5 branch, refreshing the Fedora versions used in system tests, and improving test cleanup logic. The changes to use /var/tmp for temporary test files and documenting it are good improvements.
However, I've identified a significant performance issue with the new test cleanup strategy. The change from cleanup_containers to cleanup_all in the setup and teardown of each test will cause container images to be removed and reloaded for every single test case. This will make the test suite much slower, and likely explains the substantially increased timeouts in the Zuul configuration.
My review includes suggestions to revert this change to improve test performance while still maintaining a good level of test isolation. I recommend cleaning containers after each test, and performing a full image cleanup less frequently.
I am having trouble creating individual review comments. Click here to see my feedback.
test/system/101-create.bats (24)
Using cleanup_all in setup (and teardown) causes all container images to be removed and reloaded for every single test case. This introduces a significant performance bottleneck in the test suite, and is likely the reason for the large increase in CI job timeouts. For better performance, it's recommended to only clean up containers between tests. This change and the corresponding one in teardown should be reverted in all modified .bats files.
cleanup_containers
test/system/101-create.bats (28)
As mentioned for the setup function, using cleanup_all here is inefficient. It's better to use cleanup_containers to only remove containers after each test.
cleanup_containers
test/system/libs/helpers.bash (40-42)
This function should be kept to allow for cleaning up only containers between tests, which is much faster than cleaning up images as well. Using cleanup_all for every test case introduces a significant performance overhead. Please see my comment on test/system/101-create.bats.
This should have been part of commit 5655647. containers#1464 containers#1742 (backported from commit 818a22b)
containers#1550 containers#1742 (cherry picked from commit 679bf87)
Fedora 39 reached End of Life on 26th November 2024: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1602 containers#1742 (backported from commit 0bb4ff8)
containers#1619 containers#1742 (backported from commit c2520f2)
Fedora 40 reached End of Life on 13th May 2025: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1650 containers#1742 (backported from commit 7cfe9a7)
containers#1730 containers#1742 (backported from commit 431f7f0)
Fedora 41 reached End of Life on 15th December 2025: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1733 containers#1742 (backported from commit 36605d8)
The GitHub Actions workflows for building and publishing the images were removed because the image definitions were removed from this branch [1]. [1] Commit f2b2a18 containers@f2b2a18ddef288a3 containers#1739 containers#1742
This keeps the timeout for the Fedora nodes synchronized with the main branch. containers#1548 containers#1741 (backported from commit 83f28c5)
The system tests download several images when setting up the test suite,
and cache them for later use by the tests [1]. This saves time and
avoids hitting rate limits imposed by OCI registries by not downloading
the same images repeatedly for several tests, but at the cost of
increased use of storage space to cache the images.
The images are cached under BATS_TMPDIR. It defaults to the TMPDIR
environment variable, and if that's not set then to /tmp [2]. Normally,
TMPDIR isn't set, and the images end up getting cached under /tmp. Now,
/tmp is typically on tmpfs backed by RAM or swap, which means that it
should be used for smaller size-bounded files only, and /var/tmp should
be used for everything else [3].
The images are big enough that a collection of them can't be described
as smaller and size-bounded, and it led to:
1..306
# test suite: Set up
# test suite: Tear down
not ok 1 setup_suite
# (from function `setup_suite' in test file ./setup_suite.bash, line
55)
# `_pull_and_cache_distro_image fedora "$((system_version-1))" ||
false' failed
# Failed to cache image registry.fedoraproject.org/fedora-toolbox:40
to /tmp/bats-run-IPz4Cn/image-cache/fedora-toolbox-40
# time="2024-02-19T11:41:43Z" level=fatal msg="copying system image
from manifest list: writing blob: write
/tmp/bats-run-IPz4Cn/image-cache/fedora-toolbox-40/dir-put-blob607392514:
no space left on device"
# bats warning: Executed 1 instead of expected 306 tests
So, change the default location of the BATS_TMPDIR environment variable
to /var/tmp by setting TMPDIR.
[1] Commit 50683c9
containers@50683c9d9a78adc9
containers#375
[2] https://bats-core.readthedocs.io/en/stable/writing-tests.html
[3] https://systemd.io/TEMPORARY_DIRECTORIES/
containers#1462
containers#1742
(backported from commit 571dc97)
This should have been part of commit 5655647. containers#1464 containers#1742 (backported from commit 818a22b)
containers#1550 containers#1742 (cherry picked from commit 679bf87)
5655647 to
dc8a35d
Compare
Fedora 39 reached End of Life on 26th November 2024: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1602 containers#1742 (backported from commit 0bb4ff8)
containers#1619 containers#1742 (backported from commit c2520f2)
Fedora 40 reached End of Life on 13th May 2025: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1650 containers#1742 (backported from commit 7cfe9a7)
containers#1730 containers#1742 (backported from commit 431f7f0)
Fedora 41 reached End of Life on 15th December 2025: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1733 containers#1742 (backported from commit 36605d8)
The GitHub Actions workflows for building and publishing the images were removed because the image definitions were removed from this branch [1]. [1] Commit f2b2a18 containers@f2b2a18ddef288a3 containers#1739 containers#1742
|
Build succeeded. ✔️ unit-test SUCCESS in 1m 47s |
The system tests can be very I/O intensive, because many of them copy OCI images from the test suite's image cache directory to its local container/storage store, create containers, and then delete everything to run the next test with a clean slate. This makes them slow. The runtime environment tests, which includes the resource limit tests, are particularly slow because they don't skip the I/O even when testing error handling. This makes them a good target for optimizations. The resource limit tests query the values for different resources from the same default container without changing its state. Therefore, a lot of disk I/O can be avoided by creating the default container only once for all the tests. This can save even 30 minutes. containers#1552 containers#1742 (backported from commit fb9e2e7)
|
Build succeeded. ✔️ unit-test SUCCESS in 1m 44s |
|
Build succeeded. ✔️ unit-test SUCCESS in 1m 41s |
|
Build succeeded. ✔️ unit-test SUCCESS in 1m 50s |
The working directory from which bats(1) is invoked might not be part of
the Toolbx container. eg., the downstream Fedora CI invokes the tests
as:
$ cd /path/to/toolbox/test/system
$ bats .
... and it led to:
not ok 8 help: Try unknown command (forwarded to host)
# tags: commands-options
# (from function `assert_line' in file
./libs/bats-assert/src/assert.bash, line 488,
# in test file ./002-help.bats, line 135)
# `assert_line --index 0
"Error: unknown command \"foo\" for \"toolbox\""' failed
#
# -- line differs --
# index : 0
# expected : Error: unknown command "foo" for "toolbox"
# actual : Error: crun: chdir to `/usr/share/toolbox/test/system`:
No such file or directory: OCI runtime attempted to invoke a
command that was not found
# --
#
containers#1560
containers#1742
(backported from commit 1e90c72)
The system tests can be very I/O intensive, because many of them copy OCI images from the test suite's image cache directory to its local container/storage store, create containers, and then delete everything to run the next test with a clean slate. This makes them slow. In the case of these two particular tests, toolbox(1) is supposed to validate the command line options before trying to find the image. So, there's no need to copy the image from the test suite's image cache directory to its local container/storage store. Fallout from 32b147b containers#1595 containers#1742 (backported from commit adc8650)
Fedora 39 reached End of Life on 26th November 2024: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1602 containers#1742 (backported from commit 0bb4ff8)
containers#1619 containers#1742 (backported from commit c2520f2)
The system tests can be very I/O intensive, because many of them copy OCI images from the test suite's image cache directory to its local container/storage store, create containers, and then delete everything to run the next test with a clean slate. This makes them slow. The runtime environment tests, which includes the group and user tests, are particularly slow because they don't skip the I/O even when testing error handling. This makes them a good target for optimizations. The group and user tests check the group and user configuration in different containers without changing their state. Therefore, a lot of disk I/O can be avoided by creating these containers only once for all the tests. This can reduce the time needed to run the group and user tests from almost 22 minutes to almost 5 minutes. containers#1635 containers#1742 (backported from commit 3017a46)
Fedora 40 reached End of Life on 13th May 2025: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1650 containers#1742 (backported from commit 7cfe9a7)
containers#1730 containers#1742 (backported from commit 431f7f0)
Fedora 41 reached End of Life on 15th December 2025: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1733 containers#1742 (backported from commit 36605d8)
The GitHub Actions workflows for building and publishing the images were removed because the image definitions were removed from this branch [1]. [1] Commit f2b2a18 containers@f2b2a18ddef288a3 containers#1739 containers#1742
The system tests can be very I/O intensive, because many of them copy OCI images from the test suite's image cache directory to its local container/storage store, create containers, and then delete everything to run the next test with a clean slate. This makes them slow. The runtime environment tests, which includes the group and user tests, are particularly slow because they don't skip the I/O even when testing error handling. This makes them a good target for optimizations. The group and user tests check the group and user configuration in different containers without changing their state. Therefore, a lot of disk I/O can be avoided by creating these containers only once for all the tests. This can reduce the time needed to run the group and user tests from almost 22 minutes to almost 5 minutes. containers#1635 containers#1742 (backported from commit 3017a46)
Fedora 40 reached End of Life on 13th May 2025: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1650 containers#1742 (backported from commit 7cfe9a7)
containers#1730 containers#1742 (backported from commit 431f7f0)
Fedora 41 reached End of Life on 15th December 2025: https://docs.fedoraproject.org/en-US/releases/eol/ containers#1733 containers#1742 (backported from commit 36605d8)
The GitHub Actions workflows for building and publishing the images were removed because the image definitions were removed from this branch [1]. [1] Commit f2b2a18 containers@f2b2a18ddef288a3 containers#1739 containers#1742
... for CVE-2025-65637 or GHSA-4f99-4q7p-p3gh.
https://github.com/containers/toolbox/security/dependabot/26