-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Hi everyone,
First of all, thank you for taking the time to look into this—I really appreciate your help! I’m fairly new to programming, and I’ve set myself the challenge of translating very large documents (~250 MB) without losing any of their hyperlinks or internal references. It’s probably a bit ambitious for my current skill level, but I’m eager to learn and would love any guidance you can offer.
--
What I’m Trying to Do
I need to translate documents larger than 40 MB and keep all hyperlinks and internal cross-references intact.
What I’ve Tried So Far
-
DocumentTranslator-Legacy
- ✅ Successfully translates files up to 250 MB
- ❌ Unfortunately, all hyperlinks and cross-references are stripped out in the output
-
DocumentTranslation (new)
- ❌ Fails immediately on any file over 40 MB (as documented)
- ✅ Works perfectly on files under 40 MB and preserves every link and reference
Current Testing Status
I have successfully built a small personal web project that replicates the capabilities of the new DocumentTranslation service—glossaries, custom translation, and more—and it works fine for my purpose. I’m very satisfied with the results, but the 40 MB limit still prevents me from translating some of my larger documents. I’m now exploring whether there’s a way to achieve the same or similar outcomes with larger files.
What I’d Like to Happen
Translate documents above 40 MB without breaking any hyperlinks or internal references.
Background / Approach Comparison
If I understand correctly, the new DocumentTranslation service sends the entire file to Azure’s translation API, which per documentation is capped at 40 MB. The legacy translator, on the other hand, uses Open XML to extract and chunk the text, translates it as plain text, then reinserts it—preserving some formatting but losing links and references.
My Question
Based on these two approaches, I see two possible paths forward:
- New service approach: Is there any way to raise or work around Azure’s 40 MB document limit?
- Legacy approach: Is there a better method to handle extraction/reinsertion so that all hyperlinks and internal references survive the translation?