Skip to content

Conversation

@winterhazel
Copy link
Member

Description

removeRawUsageRecords calculates the next moment in which Usage will execute in order to prevent the removal of usage records in a window of 15 minutes before and after the Usage job executes. However, this calculation is performed incorrectly because the date is not rolled to the next day in case the initial calculated date has already passed. Due to this, if the Usage job is scheduled to execute at 00:00 and the current time is 23:59 (the job will execute in a minute), for instance, this check fails and the usage records are removed.

This PR fixes the calculation.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

The tests were performed in a VM with its date set to 2026-01-26 GMT-3.

Before the changes

  • I scheduled Usage to execute at 00:00 and set the Management Server's date to 23:50. I executed removeRawUsageRecords. I verified that the calculated execution date was 2026-01-26 00:00 (incorrect value), and that the usage records were removed.

    2026-01-27 02:50:37,340 DEBUG [c.c.u.UsageServiceImpl] (qtp1438988851-19:[ctx-406f7af2, ctx-443960b1]) (logid:3f310c43) Trying to remove old raw cloud_usage records older than 30 day(s), current time=1769482237337 next job execution time=1769472000000
    

With the patch

Regarding removeRawUsageRecords:

  • I scheduled Usage to execute at 00:00 and set the Management Server's date to 23:50. I executed removeRawUsageRecords. I verified that the calculated execution date was correct, and that the usage records were not removed.

    2026-01-27 02:51:48,514 DEBUG [c.c.u.UsageServiceImpl] (qtp1404565079-24:[ctx-41b413d8, ctx-9398b56c]) (logid:e7cc40c5) Next Usage job is scheduled to execute at [2026-01-27T00:00:00-0300]; previous execution was at [2026-01-26T00:00:00-0300].
    2026-01-27 02:51:48,514 INFO  [c.c.u.UsageServiceImpl] (qtp1404565079-24:[ctx-41b413d8, ctx-9398b56c]) (logid:e7cc40c5) Not removing any cloud_usage records because the next Usage job is scheduled to execute in less than 15 minute(s).
    
  • I scheduled Usage to execute at 00:00 and set the Management Server's date to 00:05. I executed removeRawUsageRecords. I verified that the calculated execution date was correct, and that the usage records were not removed.

    2026-01-26 03:07:20,495 DEBUG [c.c.u.UsageServiceImpl] (qtp1404565079-23:[ctx-bec9b411, ctx-07e46009]) (logid:c37addd5) Next Usage job is scheduled to execute at [2026-01-27T00:00:00-0300]; previous execution was at [2026-01-26T00:00:00-0300].
    2026-01-26 03:07:20,496 INFO  [c.c.u.UsageServiceImpl] (qtp1404565079-23:[ctx-bec9b411, ctx-07e46009]) (logid:c37addd5) Not removing any cloud_usage records because the last Usage job executed in less than 15 minute(s) ago.
    
  • I scheduled Usage to execute at 00:00 and set the Management Server's date to 23:40. I executed removeRawUsageRecords. I verified that the calculated execution date was correct, and that the usage records were removed.

    2026-01-27 02:40:34,826 DEBUG [c.c.u.UsageServiceImpl] (qtp1404565079-23:[ctx-0b325ec7, ctx-c3715fae]) (logid:cf95c7ea) Next Usage job is scheduled to execute at [2026-01-27T00:00:00-0300]; previous execution was at [2026-01-26T00:00:00-0300].
    2026-01-27 02:40:34,826 INFO  [c.c.u.UsageServiceImpl] (qtp1404565079-23:[ctx-0b325ec7, ctx-c3715fae]) (logid:cf95c7ea) Removing cloud_usage records older than 30 day(s).
    
  • I scheduled Usage to execute at 00:00 and set the Management Server's date to 00:20. I executed removeRawUsageRecords. I verified that the calculated execution date was correct, and that the usage records were removed.

    2026-01-26 03:20:47,151 DEBUG [c.c.u.UsageServiceImpl] (qtp1404565079-23:[ctx-2dc17d46, ctx-8c10ccad]) (logid:7f4dbf36) Next Usage job is scheduled to execute at [2026-01-27T00:00:00-0300]; previous execution was at [2026-01-26T00:00:00-0300].
    2026-01-26 03:20:47,151 INFO  [c.c.u.UsageServiceImpl] (qtp1404565079-23:[ctx-2dc17d46, ctx-8c10ccad]) (logid:7f4dbf36) Removing cloud_usage records older than 30 day(s).
    

Regarding the job execution time in the Usage Server itself:

  • I scheduled Usage to execute at 00:00 and set the Management Server's date to 23:50. I restarted the Usage Server. I verified that the calculated execution date was correct.

    2026-01-27 02:50:08,724 INFO  [cloud.usage.UsageManagerImpl_EnhancerByCloudStack_68b35dd4] (main:[]) (logid:) Usage is configured to execute in time zone [GMT-03:00], at [00:00], each [1440] minutes; the current time in that timezone is [2026-01-26T23:50:08-0300] and the next job is scheduled to execute at [2026-01-27T00:00:00-0300]. During its execution, Usage will aggregate stats according to the time zone [GMT-03:00] defined in global setting [usage.aggregation.timezone].
    
  • I scheduled Usage to execute at 00:00 and set the Management Server's date to 00:20. I restarted the Usage Server. I verified that the calculated execution date was correct.

    2026-01-26 03:20:07,917 INFO  [cloud.usage.UsageManagerImpl_EnhancerByCloudStack_68b35dd4] (main:[]) (logid:) Usage is configured to execute in time zone [GMT-03:00], at [00:00], each [1440] minutes; the current time in that timezone is [2026-01-26T00:20:07-0300] and the next job is scheduled to execute at [2026-01-27T00:00:00-0300]. During its execution, Usage will aggregate stats according to the time zone [GMT-03:00] defined in global setting [usage.aggregation.timezone].
    
  • I scheduled Usage to execute at 00:10 and set the Management Server's date to 00:05. I restarted the Usage Server. I verified that the calculated execution date was correct.

    2026-01-26 03:05:08,964 INFO  [cloud.usage.UsageManagerImpl_EnhancerByCloudStack_68b35dd4] (main:[]) (logid:) Usage is configured to execute in time zone [GMT-03:00], at [00:10], each [1440] minutes; the current time in that timezone is [2026-01-26T00:05:08-0300] and the next job is scheduled to execute at [2026-01-26T00:10:00-0300]. During its execution, Usage will aggregate stats according to the time zone [GMT-03:00] defined in global setting [usage.aggregation.timezone].
    
  • I scheduled Usage to execute at 00:10 and set the Management Server's date to 00:12. I restarted the Usage Server. I verified that the calculated execution date was correct.

    2026-01-26 03:12:09,134 INFO  [cloud.usage.UsageManagerImpl_EnhancerByCloudStack_68b35dd4] (main:[]) (logid:) Usage is configured to execute in time zone [GMT-03:00], at [00:10], each [1440] minutes; the current time in that timezone is [2026-01-26T00:12:09-0300] and the next job is scheduled to execute at [2026-01-27T00:10:00-0300]. During its execution, Usage will aggregate stats according to the time zone [GMT-03:00] defined in global setting [usage.aggregation.timezone].
    

@winterhazel
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link

codecov bot commented Jan 26, 2026

Codecov Report

❌ Patch coverage is 44.61538% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.25%. Comparing base (8db065a) to head (e717ae4).
⚠️ Report is 29 commits behind head on 4.20.

Files with missing lines Patch % Lines
...rc/main/java/com/cloud/usage/UsageServiceImpl.java 0.00% 21 Missing ⚠️
...rc/main/java/com/cloud/usage/UsageManagerImpl.java 10.00% 9 Missing ⚠️
.../org/apache/cloudstack/utils/usage/UsageUtils.java 82.35% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               4.20   #12518    +/-   ##
==========================================
  Coverage     16.24%   16.25%            
- Complexity    13396    13407    +11     
==========================================
  Files          5658     5658            
  Lines        499169   499297   +128     
  Branches      60579    60602    +23     
==========================================
+ Hits          81087    81142    +55     
- Misses       409041   409108    +67     
- Partials       9041     9047     +6     
Flag Coverage Δ
uitests 4.03% <ø> (-0.01%) ⬇️
unittests 17.10% <44.61%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 16534

@winterhazel winterhazel changed the base branch from main to 4.20 January 26, 2026 13:25
@winterhazel winterhazel reopened this Jan 26, 2026
@winterhazel
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16535

@DaanHoogland
Copy link
Contributor

@blueorangutan test keepEnv

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to fix a bug in removeRawUsageRecords where the calculation of the next usage job execution time was incorrect when the scheduled time had already passed during the same day. The fix extracts the date calculation logic into a new utility class UsageUtils and uses it consistently across UsageManagerImpl and UsageServiceImpl.

Changes:

  • Introduced new utility methods getNextJobExecutionTime and getPreviousJobExecutionTime in UsageUtils class to centralize job execution time calculations
  • Refactored UsageManagerImpl.configure() to use the new utility methods instead of inline date calculation
  • Refactored UsageServiceImpl.removeRawUsageRecords() to use the new utility methods and improved the logic to check both next and previous job execution times
  • Added comprehensive unit tests for the new utility methods

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
utils/src/main/java/org/apache/cloudstack/utils/usage/UsageUtils.java New utility class providing centralized methods for calculating next and previous job execution times
utils/src/test/java/org/apache/cloudstack/utils/usage/UsageUtilsTest.java New test file with unit tests for the utility methods
usage/src/main/java/com/cloud/usage/UsageManagerImpl.java Refactored to use new utility methods, removing duplicate date calculation logic
server/src/main/java/com/cloud/usage/UsageServiceImpl.java Refactored removeRawUsageRecords to use new utility methods with improved 15-minute window checks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@winterhazel
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16565

@blueorangutan
Copy link

[SF] Trillian test result (tid-15294)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 49884 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12518-t15294-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, based on code review and Marvin tests. For the actual bug described, your tests are more than sufficient @winterhazel, good work.

@RosiKyu RosiKyu self-assigned this Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants