Skip to content

Conversation

@lohika-denis-kotov
Copy link

Description

This PR adds uni_gatherdps and uni_scatterdps for simplifying writing kernels.
Consider the following usage of this operations:

uni_vgatherdps(vmm_load_value0, reg_load_value0_ptr, vmm_delta_idx, sizeof(float), 0, vmm_load_value0_mask);
uni_vgatherdps(vmm_load_value1, reg_load_value1_ptr, vmm_delta_idx, sizeof(float), 0, vmm_load_value1_mask);
uni_vgatherdps(vmm_load_value2, reg_load_value2_ptr, vmm_delta_idx, sizeof(float), 0, vmm_load_value2_mask);
uni_vgatherdps(vmm_load_value3, reg_load_value3_ptr, vmm_delta_idx, sizeof(float), 0, vmm_load_value3_mask);

...

uni_vscatterdps(vmm_store_value0, reg_store_value0_ptr, sizeof(float), 0, vmm_value0, vmm_store_value0_mask);
uni_vscatterdps(vmm_store_value1, reg_store_value1_ptr, sizeof(float), 0, vmm_value1, vmm_store_value1_mask);
uni_vscatterdps(vmm_store_value2, reg_store_value2_ptr, sizeof(float), 0, vmm_value2, vmm_store_value2_mask);
uni_vscatterdps(vmm_store_value3, reg_store_value3_ptr, sizeof(float), 0, vmm_value3, vmm_store_value3_mask);

@lohika-denis-kotov
Copy link
Author

@dmitry-gorokhov Please, take a look at this PR and assign reviewers

const size_t kDataTypeSize = sizeof(float);
if (is_valid_isa(cpu_isa_t::avx512_core)) {
assert(reg_mask.isOPMASK());
vgatherdps(xmm_val, ptr[reg_addr + xmm_index * scale + disp]);
Copy link

@avoskoboinyk-lohika avoskoboinyk-lohika Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we apply reg_mask here, in addition to xmm_val in vgatherdps()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, according the documentation the reg_mask will be applied implicitly, see section:
https://www.felixcloutier.com/x86/vgatherdps:vgatherdpd#instruction-operand-encoding

@ceciliapeng2011
Copy link
Collaborator

If I understand correctly, we are going to do onednn reduction, to move functions from onednn to openvino. @dmitry-gorokhov would you please give some guideline on these custom instructions?

@dmitry-gorokhov
Copy link

dmitry-gorokhov commented Sep 12, 2022

I am ok to merge here simple uni_* mnemonics which actual serve here as pure wrappers to map register type on correct instruction. Once base generator class for all CPU Plugin Jit kernels is introduced we will be able to move all this custom mnemonics into CPU plugin codebase.

Here is different situation. Gather/Scatter mnemonics are complex and contain complicated logic of emulation for legacy ISA and registers allocation. My recommendation for such cases is to implement the logic under jit_emitter hierarchy and put into CPU plugin codebase. You can use Load Emitter as an example since it is widely used in the plugin.
As an advantage Emitter solves task of auxiliary registers allocation. All you need to do is to specify required number of vector registers or general-purpose registers and then easily use them on implementation level. Preamble/Postamble logic guarantees that state of allocated registers will be preserved and an emitter can be safely called inside JIT kernel.

@ceciliapeng2011
Copy link
Collaborator

I am ok to merge here simple uni_* mnemonics which actual serve here as pure wrappers to map register type on correct instruction. Once base generator class for all CPU Plugin Jit kernels is introduced we will be able to move all this custom mnemonics into CPU plugin codebase.

Here is different situation. Gather/Scatter mnemonics are complex and contain complicated logic of emulation for legacy ISA and registers allocation. My recommendation for such cases is to implement the logic under jit_emitter hierarchy and put into CPU plugin codebase. You can use Load Emitter as an example since it is widely used in the plugin. As an advantage Emitter solves task of auxiliary registers allocation. All you need to do is to specify required number of vector registers or general-purpose registers and then easily use them on implementation level. Preamble/Postamble logic guarantees that state of allocated registers will be preserved and an emitter can be safely called inside JIT kernel.

@avoskoboinyk-lohika @lohika-denis-kotov Do you have more questions about this recommendation?

@lohika-denis-kotov
Copy link
Author

I am closing this PR.
Moved all bits of implementation in 2 separate PRs:
openvinotoolkit/openvino#12991
#148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants