Skip to content

rmraya/HybridTM

Repository files navigation

HybridTM

HybridTM is a TypeScript translation memory engine that stores bilingual content in LanceDB and scores matches by combining semantic embeddings (Xenova/Transformers.js) with the built-in MatchQuality fuzzy metric.

Highlights

  • Imports XLIFF 2.x, TMX 1.4b, and SDLTM files, preserving metadata, notes, and custom properties
  • Generates semantic vectors with any Xenova-compatible text model (default: HybridTM.QUALITY_MODEL, LaBSE)
  • Provides semanticTranslationSearch, semanticSearch, and concordanceSearch APIs with metadata-aware filtering
  • Streams data into LanceDB through a JSONL-based batch importer to keep memory usage predictable
  • Prevents duplicate segments by rewriting entries with deterministic IDs (fileId:unitId:segmentIndex:lang)

Models download automatically the first time you initialize an instance and are cached in the standard Hugging Face directory.

Requirements

  • Node.js 22 LTS or later
  • npm 11+
  • Disk space for both the LanceDB directory you choose and the embedding model cache

Installation

npm install hybridtm

Quick start

import path from 'node:path';
import { HybridTM, HybridTMFactory, Utils } from 'hybridtm';

const INSTANCE_NAME = 'docs-basic';
const DB_PATH = path.resolve('.hybridtm', INSTANCE_NAME + '.lancedb');

function getOrCreateTM(): HybridTM {
  return HybridTMFactory.getInstance(INSTANCE_NAME)
    ?? HybridTMFactory.createInstance(INSTANCE_NAME, DB_PATH, HybridTM.QUALITY_MODEL);
}

async function main(): Promise<void> {
  const tm = getOrCreateTM();
  const source = Utils.buildXMLElement('<source>Hello world</source>');
  const target = Utils.buildXMLElement('<target>Hola mundo</target>');

  await tm.storeLangEntry('demo', 'demo.xlf', 'unit1', 'en', 'Hello world', source, undefined, 1, 1, { state: 'final' });
  await tm.storeLangEntry('demo', 'demo.xlf', 'unit1', 'es', 'Hola mundo', target, undefined, 1, 1, { state: 'final' });

  const matches = await tm.semanticTranslationSearch('Hi world', 'en', 'es', 50, 5);
  matches.forEach((match) => {
    console.log('Hybrid', match.hybridScore(), 'Semantic', match.semantic, 'Fuzzy', match.fuzzy);
    console.log('Source:', match.source.toString());
    console.log('Target:', match.target.toString());
  });

  await tm.close();
}

main().catch((error) => {
  console.error(error);
  process.exit(1);
});

Import XLIFF/TMX/SDLTM content at any time:

await tm.importXLIFF('./translations/project.xlf', { minState: 'reviewed' });
await tm.importTMX('./translations/legacy.tmx');
await tm.importSDLTM('./translations/legacy.sdltm');

semanticTranslationSearch automatically pairs every source hit with its matching target segment (same fileId, unitId, and segmentIndex), making the output ready for CAT integrations.

Documentation

Each guide is short and task-oriented, so you can jump directly to the workflow you need.

Runnable samples

The samples project contains three scripts (dev:basic, dev:import, dev:filters) plus miniature XLIFF/TMX fixtures.

When working on the repository:

npm install
npm run build
cd samples
npm install
npm run dev:basic

If you copy samples/ elsewhere, update samples/package.json so the hybridtm dependency points to the published version you intend to test, then run npm install.

Project layout

  • ts/ – source files for the library
  • dist/ – compiled JavaScript and declarations (npm run build)
  • docs/ – task-focused tutorials referenced above
  • samples/ – standalone TypeScript project with runnable workflows
  • models/ – local cache for pre-downloaded Xenova models (optional)

Development

  • npm run build – compile TypeScript to dist/
  • node dist/tmxtest.js and node dist/xlifftest.js – regression checks for the TMX and XLIFF importers (run after building)

Contributions should include unit or integration coverage when you touch importer or search logic. Use HybridTMFactory.removeInstance(name) to clean up any throwaway databases you create during manual tests.

License

Eclipse Public License 1.0 — see LICENSE for details.

About

Hybrid Translation Memory engine

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published