Skip to content

feat: Event-Driven Document/Video Ingestion Pipeline#351

Merged
shubhadeepd merged 19 commits intorelease-v2.5.0from
minh/continuous-ingestion-notebook
Feb 27, 2026
Merged

feat: Event-Driven Document/Video Ingestion Pipeline#351
shubhadeepd merged 19 commits intorelease-v2.5.0from
minh/continuous-ingestion-notebook

Conversation

@minhngu-glitch
Copy link
Collaborator

@minhngu-glitch minhngu-glitch commented Feb 12, 2026

Description

Add rag_event_ingest.ipynb notebook and supporting changes for an end-to-end continuous ingestion pipeline that monitors object storage (MinIO) for new document and video uploads, automatically routes them to the appropriate AI services, and makes all content searchable via RAG.

Notebook walkthrough:

  • Deploy NVIDIA RAG stack (NIMs, Milvus, Ingestor, RAG Server)
  • Deploy NVIDIA VSS stack (VLM, LLM, Embedding, Reranker NIMs)
  • Deploy continuous ingestion pipeline (Kafka, MinIO, Kafka Consumer)
  • Configure video analysis prompts for the Kafka consumer
  • Upload documents and videos to MinIO with storage verification
  • Verify ingestion status via consumer logs
  • Query ingested content via RAG API or Frontend UI

Checklist

  • I am familiar with the Contributing Guidelines.
  • All commits are signed-off (git commit -s) and GPG signed (git commit -S).
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@minhngu-glitch minhngu-glitch changed the base branch from develop to release-v2.5.0 February 26, 2026 09:46
@minhngu-glitch minhngu-glitch force-pushed the minh/continuous-ingestion-notebook branch from 116e7eb to 0e0d7bb Compare February 27, 2026 05:26
…stion pipeline

- Kafka consumer that monitors MinIO object storage for new uploads
- Routes documents to RAG Ingestor, videos to VSS for analysis
- Docker Compose deployment for Kafka, MinIO, and consumer
- Jupyter notebook for end-to-end deployment and testing
- Sample test data (PDF document, MP4 video) tracked via Git LFS

Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
… logs

Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
…consumer prompts

- Add verify_file_in_storage() helper to confirm files landed in MinIO
- Merge storage verification into document/video ingestion checks
- Add RAG Frontend UI link (port 8090) to query sections
- Make Kafka consumer VSS prompts configurable via env vars in docker-compose
- Install git/git-lfs in notebook setup cell
- Index cells in Deploy Continuous Ingestion section

Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
minhngu-glitch and others added 6 commits February 27, 2026 05:35
Add rag_event_ingest.ipynb notebook that provides an end-to-end walkthrough for:
- Deploying NVIDIA RAG stack (NIMs, Milvus, Ingestor, RAG Server)
- Deploying NVIDIA VSS stack (VLM, LLM, Embedding, Reranker NIMs)
- Deploying continuous ingestion pipeline (Kafka, MinIO, Kafka Consumer)
- Configurable video analysis prompts for the Kafka consumer
- Uploading documents and videos to MinIO with storage verification
- Verifying ingestion via consumer logs
- Querying ingested content via RAG API or Frontend UI

Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
…te hw req to 4 GPUs

Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
…ng/reranker

The via-server runs on the local_deployment_single_gpu_default network,
not nvidia-rag, so it cannot resolve nemoretriever-embedding-ms or
nemoretriever-ranking-ms. Route through host.docker.internal with the
correct host-mapped ports instead (9080 for embedding, 1976 for reranker).

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
Signed-off-by: Minh Nguyen <minhngu@nvidia.com>
Made-with: Cursor
@minhngu-glitch minhngu-glitch force-pushed the minh/continuous-ingestion-notebook branch from 0e0d7bb to 6d9f001 Compare February 27, 2026 05:38
@shubhadeepd shubhadeepd merged commit 1944a7e into release-v2.5.0 Feb 27, 2026
5 of 6 checks passed
@shubhadeepd shubhadeepd deleted the minh/continuous-ingestion-notebook branch February 27, 2026 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants