Skip to content

Conversation

@winterhazel
Copy link
Member

@winterhazel winterhazel commented Jan 26, 2026

Description

This change allows the batch removal of usage records in order to prevent query timeouts when there is a huge amount of records to delete. Also, an useless ORDER BY that would result in higher response times was removed from the batch expunge query.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Screenshots (if appropriate):

How Has This Been Tested?

In a local environment, I enabled batch deletion by setting a value greater than 0 to delete.query.batch.size, called removeRawUsageRecords, and debugged to code to validate that the deletion was being correctly performed in batches.

@codecov
Copy link

codecov bot commented Jan 26, 2026

Codecov Report

❌ Patch coverage is 10.52632% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.25%. Comparing base (7536516) to head (02d8f18).
⚠️ Report is 11 commits behind head on 4.20.

Files with missing lines Patch % Lines
...rc/main/java/com/cloud/usage/dao/UsageDaoImpl.java 0.00% 13 Missing ⚠️
...c/main/java/com/cloud/utils/db/GenericDaoBase.java 0.00% 3 Missing ⚠️
...rc/main/java/com/cloud/usage/UsageServiceImpl.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##              4.20   #12522       +/-   ##
============================================
+ Coverage     4.03%   16.25%   +12.21%     
- Complexity       0    13400    +13400     
============================================
  Files          402     5658     +5256     
  Lines        32715   499276   +466561     
  Branches      5831    60601    +54770     
============================================
+ Hits          1319    81134    +79815     
- Misses       31241   409095   +377854     
- Partials       155     9047     +8892     
Flag Coverage Δ
uitests 4.03% <ø> (-0.01%) ⬇️
unittests 17.10% <10.52%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines 527 to +530
public static final ConfigKey<Long> DELETE_QUERY_BATCH_SIZE = new ConfigKey<>("Advanced", Long.class, "delete.query.batch.size", "0",
"Indicates the limit applied while deleting entries in bulk. With this, the delete query will apply the limit as many times as necessary," +
" to delete all the entries. This is advised when retaining several days of records, which can lead to slowness. <= 0 means that no limit will " +
"be applied. Default value is 0. For now, this is used for deletion of vm & volume stats only.", true);
"be applied. Default value is 0. For now, this is used for deletion of VM stats, volume stats, and usage records.", true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make the default some biug value to protect the operator from trying to delete 500000+ records? (i am thinking about 10000)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would be good to enable batch deletion by default. 10,000 is too low and might end up slowing down deletion too much though. I've seen vm_stats with over 100 million entries for deletion, which would take too long with batches of 10,000. I will check how some different values behave later.

By the way, I'm putting this in draft because I noticed an issue in the code.

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm otherwise btw

@Pearl1594 Pearl1594 added this to the 4.20.3 milestone Jan 27, 2026
@winterhazel winterhazel marked this pull request as draft January 27, 2026 18:17
@winterhazel winterhazel marked this pull request as ready for review January 27, 2026 20:48
@winterhazel
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16569

@borisstoyanov
Copy link
Contributor

@blueorangutan test

@borisstoyanov borisstoyanov self-assigned this Jan 28, 2026
@blueorangutan
Copy link

@borisstoyanov a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants