Ensemble

A universal content collector with conversational search in your personal digital space

⸻

Introduction

In an age of accelerating digital distractions, MineCollect provides a unified foundation to capture, structure, and surface your personal information footprint. More than a mere data aggregator, it transforms scattered highlights, articles, chats, and media into a coherent, searchable “second brain,” enabling deeper knowledge work and on-demand insight via conversational AI.

⸻

Key Features • Unified Interface: One place for every piece of collected content—web clippings, notes, transcripts, bookmarks. • Hierarchical Organization: Automatic tagging and tree-structured paths for intuitive browsing (e.g., Readings.Articles.SenecaOnVirtue). • Hybrid Search: Seamless blend of keyword (BM25) and semantic similarity engines for both precision and recall. • AI-Powered Q&A: RAG pipelines with extractive fallback ensure accurate, citation-rich answers. • Cross-Device Capture & Review: Browser extensions, mobile share targets, folder watchers, and CLI make collection frictionless.

⸻

Architectural Overview

┌───────────────┐ ┌──────────────┐ ┌────────────────┐ ┌───────────────┐ │ Connectors │──▶ │ Normalizer │──▶ │ Data Store │──▶ │ Retriever │ └──────┬────────┘ └──────┬───────┘ └──────┬─────────┘ └──────┬────────┘ │ │ │ │ ▼ ▼ ▼ ▼ Scheduler Universal JSON SQL & Vector DBs QA & Search & Watchers Schema (Postgres + pgvector) (Haystack, Meili) │ ▼ CLI/API │ ▼ Frontend/UI

⸻

Core Components

4.1 Connectors

Each source (RSS, YouTube, Kindle, WeChat, Zotero, local folders) is handled by an isolated, stateless worker. Authentication (OAuth or cookies) is managed securely, with API-first fallbacks to scraping when needed.

4.2 Ingestion & Normalizer

Raw inputs are mapped into a universal JSON schema—with fields such as source, url, title, tags, content_chunks, and metadata. This uniformity underpins consistent downstream processing.

4.3 Data Store & Indexing • Primary Storage: PostgreSQL with pgvector for embeddings and ltree for hierarchy (with adjacent-list fallback for extremely deep trees). • Full-Text Index: Meilisearch (or TypeSense) for BM25-powered queries, typo tolerance, and hierarchical filters. • Vector Store: Chroma or Qdrant as scale-ready options if embedded vectors outgrow Meili.

4.4 Scheduler & Folder Watchers

A lightweight orchestration layer (cron or Airflow/Dagster) triggers connector runs, normalizes data, and writes to the database. Local folders (screenshots, Zotero libraries) are observed via file watchers to ingest new files in real time.

4.5 Retriever & QA • Hybrid Retrieval: Merges keyword and semantic scores, then reranks top results. • Conversational Q&A: Haystack pipelines inject parent-node summaries into LLM prompts; extractive QA serves as a low-latency, citation-accurate fallback.

4.6 Interface Layer • CLI (Typer/Textual) for power users. • Web/Mobile (Tauri + React; React Native) for broad accessibility. • Monitoring & Alerts: Sentry for errors; Prometheus + Grafana for health and performance metrics.

⸻

Development Strategy • Modularity: Independent evolution of connectors, pipelines, and UIs. • Extensibility: Plug-and-play architecture welcomes new data sources without rework. • Schema Coherence: Uniform data model simplifies processing and search. • Skeptical Engineering: Start narrow, prove value, then generalize—avoiding over-engineering up front.

⸻

Conclusion

MineCollect is not just an aggregator—it is the foundational layer in a personal metacognitive stack. By persistently organizing every digital trace into a structured archive, it empowers AI-driven search, synthesis, and decision-making. In the coming era of personalized intelligence, such an architecture will be indispensable—transforming digital entropy into actionable knowledge.

⸻

License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.cursor/rules		.cursor/rules
.pm		.pm
.vscode		.vscode
ai-server		ai-server
apps/web		apps/web
docker/postgres		docker/postgres
packages		packages
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ensemble

About

Uh oh!

Releases

Packages

Languages

License

tztsai/Ensemble

Folders and files

Latest commit

History

Repository files navigation

Ensemble

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages