Skip to content

Conversation

@sgossage
Copy link
Contributor

@sgossage sgossage commented Dec 16, 2025

Testing a potential solution to reduce some of the RAM used during population synthesis.

Right now we load entire data array for the initial and final values into RAM. About 5 GB of memory is allocated in total. If we instead use lazy access (storing the memory reference and not the entire array), we can save some RAM. With lazy access, we instead allocate only ~1.5 GB in total. Looking into further optimizations and still testing this one.

Also looking into RAM usage of binary evolution, although @maxbriel has looked into this quite a bit already (see Issue#224).

@sgossage
Copy link
Contributor Author

sgossage commented Dec 16, 2025

This is an example of the memory usage over time of a binary population run (BinaryPopulation.evolve() with 10 binaries) with the current main release (v2.2.2).

example_binpop_evolve10_nofix

The initial spike is when we load our PSyGrids, then the gradual accumulation after that is from binary evolution. The peak data consumption is about 6 GB.

@sgossage
Copy link
Contributor Author

sgossage commented Dec 16, 2025

This is with lazy loading of the PSyGrids:

example_binpop_evolve10_fix

The peak data consumption is about 3 GB, reducing total RAM usage by 50%. The RAM usage due to grid/step loading is reduced by ~70%. This does not seem to affect runtime, but still checking. This brings RAM usage back down to around what was shown e.g., in Issue#224.

The change made here is in PSyGrid, mainly rewriting:

        self.initial_values = hdf5['/grid/initial_values'][()]
        self.final_values = hdf5['/grid/final_values'][()]

as

        self.initial_values = hdf5['/grid/initial_values']
        self.final_values = hdf5['/grid/final_values']

@sgossage sgossage added the enhancement New feature or request label Dec 16, 2025
@sgossage sgossage self-assigned this Dec 16, 2025
…hens run time due to I/O overhead, so not worth it yet
@sgossage sgossage force-pushed the sg_fix_grid_data_ram_usage branch from dc52455 to 53520fc Compare December 16, 2025 21:44
pre-commit-ci bot and others added 3 commits December 16, 2025 21:44
…nt lengthens run time due to I/O overhead, so not worth it yet"

This reverts commit 53520fc.
@sgossage
Copy link
Contributor Author

sgossage commented Dec 17, 2025

Every time step_detached is entered, the root matrices are recalculated and we also train any scalers that don't exist. We can precalculate these objects so that this doesn't happen during binary evolution. This is not a huge issue, but something to consider.

Also, we create a TrackMatcher object for every instance of step_detached (we have 5 for step_detached itself, step_dco, step_merged, etc.). Instead, we can create a single TrackMatcher object and reuse it. We also create one for step_CE.

I'm testing making TrackMatcher a simulation property with the idea being that it could be reused globally by steps that need it. There's some issue with e.g., step_CE using a different matching list than all the others which needs to be sorted out. Here is a quick look at performance with the above implemented:

example_with_preloaded_training_and_root0calc

We get the large increase in RAM at the start now when we create the various objects the steps need. Then during evolution we get a gradual rise mostly from the binaries being added to RAM. We save a bit on runtime (mostly on the step loading part now) by preloading and reusing these objects -- e.g., creating just one TrackMatcher and its associated GRIDInterpolator objects instead of doing that each time for every instance of step_detached.

For 1000 binaries:

example_with_preloaded_training_and_root0calc_1000bin

and this is the default (current main v2.2.2) behavior w/ 1000 for comparison:

example_default_nofix_1000bin

sgossage and others added 2 commits December 16, 2025 23:59
…kMatcher. Also trying a TrackMatcher that is accessible by all evolution steps. Not done correctly yet, just for testing.
@maxbriel
Copy link
Collaborator

maxbriel commented Dec 18, 2025

Great work here! Identifying the culprits is not easy.

I really like the TrackMatcher being limited to 1. This to me seems to be a major culprit for the higher memory usage, since every single-star model, after its read in once, is stored in memory. This would increase the load significantly with one TrackMatch.

For the initial and final values, I am a bit worried about the I/O component. For detached evolution, reading in a single star model from disk is the slowest part on the HPC facility. I am worried that switching to the memory reference will make this a larger issue.

I would pose more general questions:

  1. Do we care about speed or about memory usage?
    If we read in everything from disk, that will be our bottleneck; if we keep things in memory then that will be.
  2. We should figure out if we want this gradual read-in of the single-star models. While it speeds up processing over time, constant IO is slower than one big IO read. And it feels dangerous not knowing exactly how much RAM your population run will use.

@sgossage
Copy link
Contributor Author

thanks @maxbriel, I'll be doing some testing on HPC and at todays dev meeting we discussed preloading the single star grid tracks that are used during detached evolution prior to binary evolution which I will also look in to. This may help save on I/O overhead during the detached step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants