-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hi there,
I'm not entirely certain of where to post this but given that I picked up this repo from the Chromium source I'm gonna guess this is the best place to land it.
The translation API is already an amazing thing to have available in the browser, and the current implementation brings me joy and wonders. However I'm seeing improvements that I would really like to have available.
Current issues
For example, if like me you are developing an extension which allows for translating the user's messages, especially in a WhatsApp/IM context, you might find that some notions are hard to take into account:
The speaker's gender
For example in Spanish, "I am tall" would translate to "Soy alto" or "Soy alta" depending whether you're a man or a woman.
This is obviously not the only gender-related issue, but it's a big one if you are working on IM (thus mostly first-person sentences).
Relative references
Particularly in the context of a conversation, you might be referencing the previous message, that you are not passing to the translation API. This can impact the declination of adjectives, disambiguate meanings, etc.
For example in French you could have:
- "Qui s'occupe de la baguette ?" ("who takes care of the baguette?")
- "Je la prends" ("I'll take it")
Again, a gender-related issue. The sentence could be entirely different if it were not for a baguette. For example "pain" is masculine so you would say "Je le prends".
In this case if you are only translating the second sentence, and you cannot contextualize the translation with the first one, then you don't know which gender to use in there.
Regionalisms
The language is different depending on the region. Obviously Brits will get upset whenever they see "color" instead of "colour" but it can get a lot more dramatic with other languages. For example French French is entirely different from French Canadian on any word that appeared after the 17th Century. Or even regions of France will not agree on some words (have a look at the "chocolatine" debate). Examples are countless.
My point is that different regions and different people are going to pick different words or idioms and this is absolutely not bound by country or language.
Source language
The Translator API requires a sourceLanguage, which is in my opinion not bringing any value.
For example, in my current company, our standard meeting will happen in two if not three languages. Take a transcript of that, you will have French, Spanish and English in there. What is the source language of that?
On the other hand, when you're going to be translating, you will probably want the smoothest user experience and thus not ask for the source language. Most of the time you are probably going to use the LanguageDetector.
And finally, the goal of translation is not to be gate-keeping incorrect spelling or grammar. We want to understand the source material, whichever its language, and generate an output in the target language.
So essentially today you need to start two APIs (language detection and translation), download the associated models, all that to determine a single source language while the source material is very probably mixed.
Proposed improvements
As an avid LLM user, I have been using LLMs to do translation for quite a while now, and my workflow essentially is:
- Give a bit of context (previous message, lexical field, etc)
- A bit about myself
- Target language
- What I want to translate
For example:
I'm a man, please translate this message for my doctor in Spanish from Spain: Could you renew my receta? Thanks!
This workflow works well for me, because:
- It does not ask me for the source language
- I can ask whatever context is relevant to the conversation, using my judgement
At the end of the day this suggests that a more powerful API could be possible. For example modify the create options:
/**
* Core options required for creating a translator
*/
export interface TranslatorCreateCoreOptions {
/** Optional source language hint in BCP-47 format (e.g., "en", "es", "zh") */
sourceLanguage?: string;
/** Target language in BCP-47 format (e.g., "en", "es", "zh") */
targetLanguage: string;
/** Semi-formal information about the dialect/region to translate for */
targetDialect: string; // "es", "south-west of France", etc
}
As you can see:
sourceLanguagebecomes optional, and it's a hint not a hard requirement (helps with disambiguation, but that's it)targetDialectgives more information about which variants of the language the developer wants. Semi formal, because you might want to instruct regionalisms, tone, use of specific slang, etc
And add to the translate options:
/**
* Options for translation operations
*/
export interface TranslatorTranslateOptions {
/** AbortSignal to cancel the translation */
signal?: AbortSignal;
/** Plain-text general context of the conversation (speaker's gender, lexical field, etc) */
context?: string;
/** Previous parts of the text/conversation being translated */
previousText?: string;
}Essentially, we're adding two things in there:
contextis essentially an add-on to the system prompt, to give any info deemed valuable by the developer. Given all the possible nuances that can happen, it seems futile to me to model it any further, we can probably let the developer give us plain-text details of what they know in regards to the expected translationpreviousTextthat would be the previous parts of the conversation, essentially we'd ask the LLM to complete this text by translating the input text
Conclusion
After using these APIs for my app, those are the things that I feel are needed for my use-case. I'm sure that I'm all kinds of wrong between not posting in the right form, place or with the right constraints in mind. But let me know if this can go any further, and if so I will happily contribute where required.