Skip to content

Internal Implementation

Naotoshi Seo edited this page Mar 3, 2019 · 5 revisions

Cumo started from a fork of Numo. So, this design document basically describes how Numo is implemented. It is better to read Numo Wiki, too.

How Numo supports multiple dtypes

Numo uses ERB to templatize codes. In each implementation like ext/cumo/narray/gen/tmpl/binary.c, you can use

  • dtype
  • c_func
  • c_iter
  • type_name
  • etc, etc

as erb variables.

cogen.rb is a ruby script to generate C source codes of various dtypes from ERB templates. erbpp2.rb is a library for it.

gen/spec.rb is a configuration file for the cogen.rb. This config file describes which template files under tmpl directory should be read to generate C source codes. You need to modify gen/spec.rb when you add a new routine method.

def/{dtype}.rb such as def/dfloat.rb is also a configuration file, and it defines parameters for each dtype.

NDLOOP

ndloop is probably most importance concept of Numo.

Imagine that Element-wise addition of two N-dimensional arrays.

Typically, each implementation like ext/cumo/narray/gen/tmpl/binary.c supports only one-dimensional computation.

To apply this one-dimensional computation to N-dimensional arrays, ndloop does:

  1. Extract memories for the most inner one-dimension.
  2. Apply the each implementation for the most inner one-dimension (called user loop).
  3. Repeat the above for all N-1 dimensions.

TODO: ndloop is a bottleneck to improve performance especially in Cumo. We are planning to stop using ndloop, but it is tough work and not yet done.

buffering

https://github.com/ruby-numo/numo-narray/blob/6f9248d10f1dedcfbe54b5659887ca11c14dd5fd/ext/numo/narray/ndloop.c#L1283

Some routines support idx (advanced indexing), but some routines do not. Buffering is a mechanism to call routines which do not support advanced indexing as:

  1. Copies the non-contiguous (advanced indexed) memories into contiguous memories
  2. Call the routine
  3. Copies back

TODO: We are planning to stop having idx (advanced indexing) because it makes implementation be complex.

contract_loop

https://github.com/ruby-numo/numo-narray/blob/7bba089c3114ff08d110e47ba8bc448aa775f0d4/ext/numo/narray/ndloop.c#L1274

contract_loops does compaction of dimensions.

Example 1:

Given Shape{2, 3}, all contiguous strides => Shape{6} and Axes{1}.

Example 2:

Given Shape{3, 2, 1, 2}, strides with padded first dimension => Shape{3, 4} and Axes{0, 3}.

This reduces number of loops, and makes computation faster, especially because we can eliminate i++, p++.

Clone this wiki locally