Skip to content

Exploring Multilingual and Local-Language Encoding Extensions to LIMIT #12

@Arsh123344423

Description

@Arsh123344423

I’m interested in extending the LIMIT benchmark to explore how multilingual and local-language encoding affects the observed theoretical limitations of single-vector embedding-based retrieval. While LIMIT convincingly demonstrates that these limitations are language-agnostic and dimension-dependent, it would be valuable to empirically analyze whether encoding queries and documents in their native or local languages changes the failure structure, retrieval dynamics, or error distribution across languages.
Specifically, I’d like to investigate
(1) multilingual variants of the LIMIT dataset,
(2) language-conditioned or language-aware embedding pipelines,
(3) cross-lingual versus mono-lingual retrieval settings, while keeping the theoretical guarantees intact.
My goal is not to circumvent the LIMIT result, but to better understand how these theoretical constraints manifest in multilingual conversational and retrieval systems. I’d appreciate feedback on whether such an extension aligns with the project’s direction and any guidance on best practices for contributing this experimentally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions