[Allocator] Introduce tensorpool_mkl_allocator #54

yuanhu2435 · 2022-04-20T06:19:42Z

Tensorpool mkl allocator combines tensorpool_allocator and mkl_allocator to improve allocator performance for both small size and large size memory allocation.

To confirm our allocator is effective, we test it on "shoucai" model.

Test env: CPU Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz, 1 socket, 32 cores, lock frequency to 2.6GHz
Command line:
$python3 graph_runner.py --input-graph=sub_graph_external.pbtxt --input-data=placeholder_dump.json

Result

shoucai
Allocator	loop	process_num	latency(1)	latency(2)	latency(3)	latency(4)	latency(5)	avg
TensorPoolAllocator	1000	1	94.84	86.32	90.45	86.47	91.89	90.00
	1000	5	342.78	324.96	344.92	337.50	339.84	338.00
	1000	10	716.42	722.26	716.44	709.10	720.14	716.87
MKLAllocator	1000	1	59.56	58.15	58.80	59.82	60.03	59.27
	1000	5	324.29	322.76	321.69	326.25	324.64	323.93
	1000	10	701.34	700.69	701.10	701.55	701.09	701.15
TensorpoolMKLAllocator	1000	1	61.27	58.49	57.48	58.51	61.36	59.42
	1000	5	327.43	327.40	327.32	326.21	326.87	327.05
	1000	10	707.21	707.69	706.83	708.57	708.25	707.71

It combines tensorpool_allocator and mkl_allocator to improve allocator performance for both small size and large size memory allocation. Add an env value "TENSORPOOL_MKL_LARGE_SIZE" to set large size threshold. It is by default 512K. Signed-off-by: Lin Xie <lin.xie@intel.com> Signed-off-by: Yuan Hu <yuan1.hu@intel.com>

changqi1 · 2022-04-20T07:46:02Z

@yuanhu2435 Thanks. I got your perf data from shoucai model, but I didn't catch the perf differences both small size and large size. Did you test shoucai model perf on small batch size and large batch size?

I could see the MKLAllocator's latency is lowest then others, not TensorpoolMKLAllocator. So would you please give us two situations about TensorpoolMKLAllocator latency == MKLAllocator latency < TensorPoolAllocator latency and TensorpoolMKLAllocator latency == TensorPoolAllocator latency < MKLAllocator latency, right?

pujiang2018 · 2022-04-20T08:22:01Z

I think we need to collect more perf data for more models since this is the fundamental change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Allocator] Introduce tensorpool_mkl_allocator #54

[Allocator] Introduce tensorpool_mkl_allocator #54

Uh oh!

yuanhu2435 commented Apr 20, 2022

Uh oh!

changqi1 commented Apr 20, 2022 •

edited

Loading

Uh oh!

pujiang2018 commented Apr 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Allocator] Introduce tensorpool_mkl_allocator #54

Are you sure you want to change the base?

[Allocator] Introduce tensorpool_mkl_allocator #54

Uh oh!

Conversation

yuanhu2435 commented Apr 20, 2022

Result

Uh oh!

changqi1 commented Apr 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pujiang2018 commented Apr 20, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changqi1 commented Apr 20, 2022 •

edited

Loading