Exploring Multilingual and Local-Language Encoding Extensions to LIMIT

I’m interested in extending the LIMIT benchmark to explore how multilingual and local-language encoding affects the observed theoretical limitations of single-vector embedding-based retrieval. While LIMIT convincingly demonstrates that these limitations are language-agnostic and dimension-dependent, it would be valuable to empirically analyze whether encoding queries and documents in their native or local languages changes the failure structure, retrieval dynamics, or error distribution across languages. 
Specifically, I’d like to investigate 
(1) multilingual variants of the LIMIT dataset, 
(2) language-conditioned or language-aware embedding pipelines, 
(3) cross-lingual versus mono-lingual retrieval settings, while keeping the theoretical guarantees intact. 
My goal is not to circumvent the LIMIT result, but to better understand how these theoretical constraints manifest in multilingual conversational and retrieval systems. I’d appreciate feedback on whether such an extension aligns with the project’s direction and any guidance on best practices for contributing this experimentally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploring Multilingual and Local-Language Encoding Extensions to LIMIT #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Exploring Multilingual and Local-Language Encoding Extensions to LIMIT #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions