Separate search from LLM

It might be worth adding an optional "pre-processing" step that goes out and does all the web scraping, but _without_ LLMs. This means the result would contain links from search engine searches, website crawls, and potentially even ordinance databases like amlegal. These links would be stored in some sort of output (JSON?) that the user could provide as an optional input to running COMPASS, which would help the main execution bypass the heavy and long-running search steps. 

Separating out the search step to not include LLM's can also be beneficial since we could run locally on HPC. Alternatively, if we keep LLM's in the search process, we can scale the search portion using kubernetes. 

Since this would be a new "step" to running compass, it would be strongly encouraged to set up a WMS like Airflow. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate search from LLM #212

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Separate search from LLM #212

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions