-
Notifications
You must be signed in to change notification settings - Fork 3
Description
It might be worth adding an optional "pre-processing" step that goes out and does all the web scraping, but without LLMs. This means the result would contain links from search engine searches, website crawls, and potentially even ordinance databases like amlegal. These links would be stored in some sort of output (JSON?) that the user could provide as an optional input to running COMPASS, which would help the main execution bypass the heavy and long-running search steps.
Separating out the search step to not include LLM's can also be beneficial since we could run locally on HPC. Alternatively, if we keep LLM's in the search process, we can scale the search portion using kubernetes.
Since this would be a new "step" to running compass, it would be strongly encouraged to set up a WMS like Airflow.