-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hello,
I encountered a problem with RAM overflow when using something like:
system = func(parameters)
drive = md.Data(name=system.name)[-1]
xarray = drive.to_xarray()
where func() calls function with system initialisation and TimeDrive with particular parameters.
For a small system there is no problem to convert drive data to xarray.DataArray for further analysis or export to other formats. But when a system is large RAM overflows and conversion to xarray.DataArray object is impossible even using chunking and Dask.
For example, I have 64GB RAM and my system dimensions are 30x30 um laterally (XY) with 20 nm thickness (Z) and the cell sizes are 25x25x20 nm (XYZ). TimeDrive is 5 ns long with 2 ps time step (2500 files).
The main reason to use xarray for analysis in my case is to implement a spatially weighted mean for magnetisation components in lateral dimensions instead of spatially uniform mean provided by drive.table.data[].values. Particularly, I tried to use xarray chunking with Dask virtual cluster client initialised to obtain a spatially weighted mean with 2D Gaussian function:
CHUNK_SIZES = {
't': 20,
'x': 750,
'y': 750,
'z': 1,
'vdims': 3
}
system = func(parameters)
ds = (md.Data(name=system.name)[-1]).to_xarray().chunk(CHUNK_SIZES)
weights_2d = np.exp(
-0.5 * (
((ds.x - x0) / sigmax)**2 +
((ds.y - y0) / sigmay)**2
)
).chunk({'x': 750, 'y': 750})
result = ds.weighted(weights_2d).mean(dim=['x', 'y'])
I checked that my code works when the cell sizes are increased to 250x250x20 nm. But with 25x25x20 nm it overflows likely at the step ds = (md.Data(name=system.name)[-1]).to_xarray().chunk(CHUNK_SIZES). So, I think chunking is useless to solve this problem.