Skip to content

Separate search from LLM #212

@ppinchuk

Description

@ppinchuk

It might be worth adding an optional "pre-processing" step that goes out and does all the web scraping, but without LLMs. This means the result would contain links from search engine searches, website crawls, and potentially even ordinance databases like amlegal. These links would be stored in some sort of output (JSON?) that the user could provide as an optional input to running COMPASS, which would help the main execution bypass the heavy and long-running search steps.

Separating out the search step to not include LLM's can also be beneficial since we could run locally on HPC. Alternatively, if we keep LLM's in the search process, we can scale the search portion using kubernetes.

Since this would be a new "step" to running compass, it would be strongly encouraged to set up a WMS like Airflow.

Metadata

Metadata

Assignees

Labels

new computationUpdate that adds a new computation methodp-mediumPriority: mediumrefactorCode improvements that do not change functionalitytopic-python-cliIssues/pull requests related to running the python processing

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions