-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Hi @andrewkern. It occurred to me that we ought to think about memory alignment for the SIMD stuff. Buffers for SIMD commonly need to be (or at least ought to be) aligned to a memory boundary that is more stringent than what is provided by malloc() – maybe every 16, 32, or 64 bytes. Right now the buffers that we're throwing at your SIMD code are not explicitly aligned in that manner. (Depending on the platform, they might happen to be, but we're not guaranteeing it.) Unaligned buffers could cause an actual crash (but maybe this is only on older hardware platforms than we care about?); if that is a possibility for any platform that we care about, then we need to check the alignment and drop through to the non-SIMD loop if the pointers aren't adequately aligned. Furthermore, even if a crash doesn't result the performance of the SIMD loop could be significantly lower with unaligned pointers, so we really want to guarantee alignment whenever we can. Doing this might significantly improve your benchmarks, on platforms where the memory alignment isn't already guaranteed by the OS/toolchain.
I will add usage of aligned_alloc() (https://en.cppreference.com/w/c/memory/aligned_alloc) in places where it seems likely to be helpful, such as the allocation of the internal value buffers in EidosValue. What would be great to get from you would be (a) a #define or a function in eidos_simd.h that provides the optimal memory alignment for the particular SIMD present on the platform, which I can pass to aligned_alloc(); (b) the addition of alignment checks in all your SIMD functions that produce a log on misalignment (in #if DEBUG only) so we can see whether alignment is working for various tests; and (c) when an alignment check fails, drop through to the non-SIMD loop, but only on platforms where misalignment would result in a crash – if that is even an issue for any of our target platforms.
Given the potential for crashes, I'm marking this priority for now; if we can determine that crashes are not a risk on any supported platform, then we can downgrade this, since then it is purely a performance optimization.
Thoughts?