Skip to content

Conversation

@carlos-verdes
Copy link
Contributor

@carlos-verdes carlos-verdes commented Jan 22, 2025

Resolves #5 based on PR #157

@milancermak mentioned not having time to finish his PR so I created this one based on his implementation.

Current code has just the library code but I added an integration test based on rig-qdrant module, it starts a Docker container with Postgres + PgVector and it simulate calls to OpenAI using a mocked API.

Difference from previous PR:

  • removed static lifetime
  • the documents are stored as json in the database so you can use any type that implements Serializable, Deserializable
  • support for more than one embedding per document (this also is not supported in rig-qdrant that only takes first vector), in the case more than one result hit the same document only nearest one is returned.
  • use sqlx instead of tokio-postgres
  • creation of database is outside the code and I provide example of setup in integration test (using sqlx migrations)
  • you can use any distance filter supported by PgVector (is not hardcoded to cosine)
  • Created example that can be launch using make run (from rig-postgres folder), load environmental variables from .env file and handles documents with more than one embedding. It also runs migrations automatically on the database to make sure Postgres tables are ready for the test.

Pending:

  • add documentation

@carlos-verdes carlos-verdes marked this pull request as draft January 22, 2025 17:56
@carlos-verdes
Copy link
Contributor Author

@0xMochan @cvauclair do you mind taking a look into this?

@cvauclair
Copy link
Contributor

Hey @carlos-verdes just saw your ping, wasn't sure if the PR was ready for review since it's still a draft but I'll take a look later today!

Thanks a lot for the PR, let's get this postgres integration done 🦾

@cvauclair cvauclair self-requested a review January 23, 2025 18:30
@carlos-verdes
Copy link
Contributor Author

It's in draft because documentation is not finished, but the code + example + integration test is ready.

I can add later an example using Streams if that would be useful!

Copy link
Contributor

@cvauclair cvauclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid! I added a couple suggestions/comments, please take a look when you have a moment.

Cheers!

@carlos-verdes
Copy link
Contributor Author

@cvauclair thanks for the review, I updated code based on your comments, take a look and let me know if you want to squash the commits.

@carlos-verdes carlos-verdes marked this pull request as ready for review January 24, 2025 18:27
Copy link
Contributor

@cvauclair cvauclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thanks for the contribution @carlos-verdes !

@cvauclair cvauclair merged commit 9a12942 into 0xPlaygrounds:main Jan 27, 2025
4 checks passed
@github-actions github-actions bot mentioned this pull request Jan 27, 2025
@carlos-verdes
Copy link
Contributor Author

Thanks for the kind words, I just realized that main README is not updated :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add support for PostgreSQL vector store

2 participants