-
Notifications
You must be signed in to change notification settings - Fork 0
Description
This task is to implement a pure Ruby seq2seq like Fairseq.
Seq2seq is a family of machine learning approaches used for language processing developed by Google. Facebook's implementation "Fairseq" is developed in Python.
The Interscript framework is pure Ruby. Since it is in pure Ruby, it can be compiled into JavaScript via Opal.
We wish to extend Interscript to utilize a Ruby version of seq2seq in order to implement the Khmer transliteration system as described in these links:
- https://viblo.asia/p/nlp-khmer-word-segmentation-YWOZrgNNlQ0
- https://viblo.asia/p/nlp-khmer-romanization-using-seq2seq-m68Z07OQKkG
Ruby seq2seq requirements:
-
The developer can train models and provide the trained model to users. In order to to train models, raw computing and their bindings can be used, e.g. OpenCL.
-
Users of the library in Ruby who only want to "use" the trained models should not require special bindings to run.
Future work (new issues):
- Integrate seq2seq into Interscript Ruby.
- Implement the Khmer transliteration system as described above.
Reference work and potential dependencies
- https://en.wikipedia.org/wiki/Seq2seq
- https://blog.chezo.uno/ruby-for-data-science-and-machine-learning-9f03e99125e0
- https://red-data-tools.github.io
- https://github.com/red-data-tools/red-chainer
- https://github.com/yoshoku/rumale
- https://dev.to/kojix2/easy-machine-learning-with-ruby-using-svmkit-4n86
- https://rubykaigi.org/2018/presentations/mrkn.html
- https://github.com/jedld/tensor_stream
- https://github.com/ankane/torch.rb
- https://github.com/mrkn/mxnet.rb