HybridTM is a TypeScript translation memory engine that stores bilingual content in LanceDB and scores matches by combining semantic embeddings (Xenova/Transformers.js) with the built-in MatchQuality fuzzy metric.
- Imports XLIFF 2.x, TMX 1.4b, and SDLTM files, preserving metadata, notes, and custom properties
- Generates semantic vectors with any Xenova-compatible text model (default:
HybridTM.QUALITY_MODEL, LaBSE) - Provides
semanticTranslationSearch,semanticSearch, andconcordanceSearchAPIs with metadata-aware filtering - Streams data into LanceDB through a JSONL-based batch importer to keep memory usage predictable
- Prevents duplicate segments by rewriting entries with deterministic IDs (
fileId:unitId:segmentIndex:lang)
Models download automatically the first time you initialize an instance and are cached in the standard Hugging Face directory.
- Node.js 22 LTS or later
- npm 11+
- Disk space for both the LanceDB directory you choose and the embedding model cache
npm install hybridtmimport path from 'node:path';
import { HybridTM, HybridTMFactory, Utils } from 'hybridtm';
const INSTANCE_NAME = 'docs-basic';
const DB_PATH = path.resolve('.hybridtm', INSTANCE_NAME + '.lancedb');
function getOrCreateTM(): HybridTM {
return HybridTMFactory.getInstance(INSTANCE_NAME)
?? HybridTMFactory.createInstance(INSTANCE_NAME, DB_PATH, HybridTM.QUALITY_MODEL);
}
async function main(): Promise<void> {
const tm = getOrCreateTM();
const source = Utils.buildXMLElement('<source>Hello world</source>');
const target = Utils.buildXMLElement('<target>Hola mundo</target>');
await tm.storeLangEntry('demo', 'demo.xlf', 'unit1', 'en', 'Hello world', source, undefined, 1, 1, { state: 'final' });
await tm.storeLangEntry('demo', 'demo.xlf', 'unit1', 'es', 'Hola mundo', target, undefined, 1, 1, { state: 'final' });
const matches = await tm.semanticTranslationSearch('Hi world', 'en', 'es', 50, 5);
matches.forEach((match) => {
console.log('Hybrid', match.hybridScore(), 'Semantic', match.semantic, 'Fuzzy', match.fuzzy);
console.log('Source:', match.source.toString());
console.log('Target:', match.target.toString());
});
await tm.close();
}
main().catch((error) => {
console.error(error);
process.exit(1);
});Import XLIFF/TMX/SDLTM content at any time:
await tm.importXLIFF('./translations/project.xlf', { minState: 'reviewed' });
await tm.importTMX('./translations/legacy.tmx');
await tm.importSDLTM('./translations/legacy.sdltm');semanticTranslationSearch automatically pairs every source hit with its matching target segment (same fileId, unitId, and segmentIndex), making the output ready for CAT integrations.
Each guide is short and task-oriented, so you can jump directly to the workflow you need.
The samples project contains three scripts (dev:basic, dev:import, dev:filters) plus miniature XLIFF/TMX fixtures.
When working on the repository:
npm install
npm run build
cd samples
npm install
npm run dev:basicIf you copy samples/ elsewhere, update samples/package.json so the hybridtm dependency points to the published version you intend to test, then run npm install.
ts/– source files for the librarydist/– compiled JavaScript and declarations (npm run build)docs/– task-focused tutorials referenced abovesamples/– standalone TypeScript project with runnable workflowsmodels/– local cache for pre-downloaded Xenova models (optional)
npm run build– compile TypeScript todist/node dist/tmxtest.jsandnode dist/xlifftest.js– regression checks for the TMX and XLIFF importers (run after building)
Contributions should include unit or integration coverage when you touch importer or search logic. Use HybridTMFactory.removeInstance(name) to clean up any throwaway databases you create during manual tests.
Eclipse Public License 1.0 — see LICENSE for details.