From 3ff66492e50039b30777b6f608399d58d608a5f0 Mon Sep 17 00:00:00 2001 From: Sherry Yuan Date: Tue, 15 Feb 2022 14:29:33 -0800 Subject: [PATCH] Add documentation for usm pointers -------------------------------------------- It specifically contains information about: 1. how memories are allocated 2. how is it similar to buffers 3. how is board spec related to it --- docs/usm.md | 168 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 168 insertions(+) create mode 100644 docs/usm.md diff --git a/docs/usm.md b/docs/usm.md new file mode 100644 index 00000000..83242976 --- /dev/null +++ b/docs/usm.md @@ -0,0 +1,168 @@ +# USM Pointers + +Unlike buffers, user has to take care of memory migration, transfer manually. + +## Summary + +1. The memory allocation of usm device pointer are borrowed from buffers. +2. The underlying functionality of usm host and shared allocation are the same (except that they may be in different memory address range). + +## Memory accessibility summary + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameInitial LocationAccessible ByMigratable To
HostHostHostYesHostN/A
Any DeviceYes (perhaps over a bus, such as PCIe)DeviceNo
DeviceSpecific DeviceHostNoHostNo
Specific DeviceYesDeviceN/A
Another DeviceOptionalAnother DeviceNo
SharedHost, or Specific Device, Or UnspecifiedHostYesHostYes
Specific DeviceYesDeviceYes
Another DeviceOptionalAnother DeviceOptional
Shared SystemHostHostYesHostYes
DeviceYesDeviceYes
+ +## Memory allocation implementations +### Device USM Allocation + +User allocate device memories through calling [clDeviceMemAllocINTEL](https://github.com/intel/fpga-runtime-for-opencl/blob/f79a9c250979cb7818ecb95b5e5778ab4263e3c2/src/acl_usm.cpp#L190-L300). This function borrows majority of the functionality of buffers for reserving and allocating memories. Specifically, it calls `clCreateBufferWithPropertiesINTEL` for creating buffer object, and calls `acl_bind_buffer_to_device` to reserve memory for the buffer (The procedure for `acl_bind_buffer_to_device` is documented in [buffer.md](https://github.com/intel/fpga-runtime-for-opencl/blob/main/docs/buffer.md)). + +The function then gives back a wrapper ([acl_usm_allocation_t](https://github.com/intel/fpga-runtime-for-opencl/blob/1264543c0361530f5883e35dc0c9d48ac0fd3653/include/acl.h#L442-L449)) that contains the following information; +1. The buffer object that was used +2. The device this memory is allocated on +3. The address range of this allocation, with same [encoding](https://github.com/intel/fpga-runtime-for-opencl/blob/1264543c0361530f5883e35dc0c9d48ac0fd3653/include/acl.h#L264-L274) as buffer memory's device pointer, queried using [l_get_address_of_writable_copy](https://github.com/intel/fpga-runtime-for-opencl/blob/b08e0af97351718ce0368a9ee507242b35f4929e/src/acl_mem.cpp#L4666-L4722). +4. The alignment, which can only be powers of 2 (In bytes) and no larger than the largest device supported data type `cl_long16`, although MMD's hard limit is 2M (and this was because the max page size is 2M during mmap). + +The last step of creating device usm allocation is to let context [track](https://github.com/intel/fpga-runtime-for-opencl/blob/3f7a228133f92c63be5b04e222f3fc8ff72310e6/include/acl_types.h#L1187) this memory in a list. + +Note: A good analogy for usm device pointer is malloc on device. + +### Device Host and Shared Allocation + +[`board_spec.xml`](https://github.com/intel/fpga-runtime-for-opencl/blob/main/test/board/a10_ref/hardware/a10_ref_small/board_spec.xml) may have the following memory section specified, which contains information about the range of memory available for each `allocation_type`: `host`, `shared`, `device` + +``` + + + + + + + + + + + + +``` + +Both [clSharedMemAllocINTEL](https://github.com/intel/fpga-runtime-for-opencl/blob/f79a9c250979cb7818ecb95b5e5778ab4263e3c2/src/acl_usm.cpp#L302-L457) and [clHostMemAllocINTEL](https://github.com/intel/fpga-runtime-for-opencl/blob/f79a9c250979cb7818ecb95b5e5778ab4263e3c2/src/acl_usm.cpp#L45-L188) lead to the same MMD calls ([aocl_mmd_host_alloc](https://gitlab.devtools.intel.com/OPAE/opencl-bsp/-/blob/master/agilex_f_dk/source/host/ccip_mmd.cpp#L930-1049)) under the hood. +1. If the allocation size is greater than 4K, then MMD uses huge page (`MAP_HUGETLB `) when mapping virtual memory through `mmap`. +2. The mapped memory is a private block of hardware memory not accessible by othe rprocesses. (mmap flags: `MAP_ANONYMOUS`, `MAP_PRIVATE`, `MAP_LOCKED`) +3. The allocated memory may be composed of multiple physical pages that is not physcally contiguous. For more details, read OPAE [shim_vtp.h](https://github.com/OPAE/intel-fpga-bbb/blob/b4093957cd1ebb0359f2f67dd541aa5356d43709/BBB_cci_mpf/sw/include/opae/mpf/shim_vtp.h#L84-L116) +4. The difference between host and shared allocation is host allocation may target all device in current context, where as the shared allocation will always target a single device. + +## Launching kernel with usm pointer arguments + +To bind a specific usm pointer to kernel arguments, user calls [`clSetKernelArgMemPointerINTEL`](https://github.com/intel/fpga-runtime-for-opencl/blob/fc99b92704a466f7dc4d84bd45d465d64d03dbb0/src/acl_kernel.cpp#L842-L1076). This requires the given pointer to satisfy the following requirements: +1. The pointer is [tracked](https://github.com/intel/fpga-runtime-for-opencl/blob/3f7a228133f92c63be5b04e222f3fc8ff72310e6/include/acl_types.h#L1187) in the context in a list. +2. The usm pointer alignment match the expected alignment of the kernel arg +3. The global memory that kernel expect has the same type as the given usm pointer. (eg. whether they are both has `shared` allocation type) +4. Set the kernel argment value as the given pointer, and keep track of the pointer in kernel's `ptr_arg_vector` + +Note: OpenCL compiles assumes unlabeled pointers are device global mem (but this is not true for sycl compilers). + +Unlike buffer where there were additional migration operation inserted right before kernel launch, there is no more additional processing done on the usm pointers after it is set as kernel arg. + + +For more information, see [USM spec](https://github.com/KhronosGroup/OpenCL-Docs/blob/master/extensions/cl_intel_unified_shared_memory.asciidoc) \ No newline at end of file