Skip to content

lang not correctly set for most subreddits #268

@tolik518

Description

@tolik518

The Reddit API doesn't always tells us the correct language, as I figured out in #267. The language-feature was partially removed (it can't be set anymore in the UI, only in the API).
While the legacy language was removed from the UI, another language setting was added, which doesn't expose with the API.
Apart from that, the language-identifiers are not ISO 639-1 as expected. They are pretty much a mess. Mexican is "es-mx" for example (which isn't a huge issue).

This leads for us to not being able to identify the correct language of most non-english subreddits.

The current config has some architectural limitations, which makes it not possible to add default subreddit-languages without them being parsed (also too many subreddits here would make the reddit api too long at some point).

SUBREDDITS=mathmemes:en+mathmemescirclejerk:en+unexpectedfactorial:en+factorialchain:en+doublefactorialchain:en+theydidthemath:en:shorten+theydidthemonstermath:en+uselessfactorial:en+redundantfactorial:en+anticipatedfactorial:en+expectedfactorial:en+unexpectedTermial:en:termial+Notorite:tr

I see only one viable option here right now:

  • a dedicated subreddit_config similar to the channel_config for discord
  • SUBREDDITS in the .env should be added to the api url and parsed
  • subreddits in subreddit_config only store configurations
    • this would make it possible to configure default languages to subreddits by hand

Another option would be using an LLM to figure out the language of a subreddit and store it a the default language, this is quite overkill though (albeit might be a fun idea for a dedicated api)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions