-
Notifications
You must be signed in to change notification settings - Fork 5
Description
This is fine for now, I think:). The better place would be in my opinion here [1]. The reason for this is:
-
We can calculate how to chunk the results from the vector db so that they fit the context of the re-rank model. If the chunks would be too small, we could proceed to chunking of the history.
-
We can send bigger chunk to the embedding model. With this change, we are truncating the input for the embedding model as well. This might not always be necessary, as the rerank model has to accommodate both for the vector database results and the history. In contrast with the embedding model which has to accommodate only for the history. But this one is up for discussion I guess:). I see some downsides with this as well.
Anyway, this is just me thinking aloud here. We can improve later. This fixes the issue for now.
[1]
Line 134 in 9bc0e74
| max_chunk_size = config.reranking_model_max_context // 2 |
Originally posted by @lpiwowar in #143 (comment)