-
Notifications
You must be signed in to change notification settings - Fork 8
Internal Implementation
Cumo started from a fork of Numo. So, this design document basically describes how Numo is implemented. It is better to read Numo Wiki, too.
Numo uses ERB to templatize codes. In each implementation like ext/cumo/narray/gen/tmpl/binary.c, you can use
- dtype
- c_func
- c_iter
- type_name
- etc, etc
as erb variables.
cogen.rb is a ruby script to generate C source codes of various dtypes from ERB templates. erbpp2.rb is a library for it.
gen/spec.rb is a configuration file for the cogen.rb. This config file describes which template files under tmpl directory should be read to generate C source codes. You need to modify gen/spec.rb when you add a new routine method.
def/{dtype}.rb such as def/dfloat.rb is also a configuration file, and it defines parameters for each dtype.
ndloop is probably most importance concept of Numo.
Imagine that Element-wise addition of two N-dimensional arrays.
Typically, each implementation like ext/cumo/narray/gen/tmpl/binary.c supports only one-dimensional computation.
To apply this one-dimensional computation to N-dimensional arrays, ndloop does:
- Extract memories for the most inner one-dimension.
- Apply the each implementation for the most inner one-dimension (called user loop).
- Repeat the above for all N-1 dimensions.
TODO: ndloop is a bottleneck to improve performance especially in Cumo. We are planning to stop using ndloop, but it is tough work and not yet done.
Some routines support idx (advanced indexing), but some routines do not.
Buffering is a mechanism to call routines which do not support advanced indexing as:
- Copies the non-contiguous (advanced indexed) memories into contiguous memories
- Call the routine
- Copies back
TODO: We are planning to stop having idx (advanced indexing) because it makes implementation be complex.
contract_loops does compaction of dimensions.
Example 1:
Given Shape{2, 3}, all contiguous strides => Shape{6} and Axes{1}.
Example 2:
Given Shape{3, 2, 1, 2}, strides with padded first dimension => Shape{3, 4} and Axes{0, 3}.
This reduces number of loops, and makes computation faster, especially because we can eliminate i++, p++.