Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# converSQL Copilot Guide

## System map
- `app.py` is the Streamlit shell: it hydrates cached data via `initialize_app_data()`, gates the UI through `simple_auth_wrapper`, and delegates all heavy work to `src/core.py` and `src/ai_service.py`.
- `src/core.py` owns DuckDB execution and orchestrates data prep. `scan_parquet_files()` will run `scripts/sync_data.py` if `data/processed/*.parquet` are missing, so keep a local Parquet copy handy during tests to avoid network pulls.
- `src/ai_service.py` routes natural-language prompts into adapter implementations in `src/ai_engines/`. The prompt embeds the mortgage risk heuristics baked into `src/data_dictionary.py`; reuse `AIService._build_sql_prompt()` instead of crafting ad-hoc prompts.

## Data + ontology expectations
- Loan metadata lives in `data/processed/data.parquet`; schema text comes from `generate_enhanced_schema_context()` which stitches DuckDB types with ontology metadata from `src/data_dictionary.py` and `docs/DATA_DICTIONARY.md`.
- When adding derived features, update both the Parquet schema and the ontology entry so AI output and the Ontology Explorer tab stay in sync.
- The Streamlit Ontology tab imports `LOAN_ONTOLOGY` and `PORTFOLIO_CONTEXT`; breaking their shape (dict → FieldMetadata) will crash the UI.

## AI engine adapters
- Adapters must subclass `AIEngineAdapter` in `src/ai_engines/base.py`, expose `provider_id`, `name`, `is_available()`, and `generate_sql()`, then be exported via `src/ai_engines/__init__.py` and registered inside `AIService.adapters`.
- Use `clean_sql_response()` to strip markdown fences, and return `(sql, "")` on success; downstream callers treat any non-empty error string as failure.
- Keep `AI_PROVIDER` fallbacks working—tests rely on `AIService` surviving with zero credentials, so default to "unavailable" rather than raising.

## Developer workflows
- Install deps with `pip install -r requirements.txt`; prefer `make setup` for a clean environment (installs + cleanup).
- Fast test cycle: `make test-unit` skips integration markers; `make test` mirrors CI (pytest + coverage). Integration adapters are ignored by default via `pytest.ini`; remove the `--ignore` flags there if you really need live API coverage.
- Lint/format stack is Black 120 cols + isort + flake8 + mypy. `make ci` runs the whole suite and matches the GitHub Actions workflow.

## Environment & secrets
- Copy `.env.example` to `.env`, then set one provider block (`CLAUDE_API_KEY`, `AWS_*`, or `GEMINI_API_KEY`). Without credentials the UI drops to “AI unavailable” but manual SQL still works.
- Data sync needs Cloudflare R2 keys (`R2_ACCESS_KEY_ID`, `R2_SECRET_ACCESS_KEY`, `R2_ENDPOINT_URL`). In offline dev, set `FORCE_DATA_REFRESH=false` and place Parquet files under `data/processed/`.
- Authentication defaults to Google OAuth (`ENABLE_AUTH=true`); set it to `false` for local hacking or provide `GOOGLE_CLIENT_ID/SECRET` plus HTTPS when deploying.

## Practical tips
- Clear Streamlit caches with `streamlit cache clear` if schema or ontology changes; otherwise stale `@st.cache_data` results linger.
- When writing new ingest code, mirror the type-casting helpers in `notebooks/pipeline_csv_to_parquet*.ipynb` so DuckDB types stay compatible.
- Logging to Cloudflare D1 is optional—`src/d1_logger.py` silently no-ops without `CLOUDFLARE_*` secrets, so you can call it safely even in tests.
61 changes: 0 additions & 61 deletions .github/workflows/format-code.yml

This file was deleted.

8 changes: 4 additions & 4 deletions .streamlit/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ enableCORS = true
port = 5000

[theme]
primaryColor = "#1f77b4"
backgroundColor = "#ffffff"
secondaryBackgroundColor = "#f0f2f6"
textColor = "#262730"
primaryColor = "#B45F4D"
backgroundColor = "#FAF6F0"
secondaryBackgroundColor = "#FDFDFD"
textColor = "#3A3A3A"
14 changes: 11 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ There are many ways to contribute to converSQL:

```bash
# Fork the repository on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/conversql.git
cd conversql
git clone https://github.com/YOUR_USERNAME/converSQL.git
cd converSQL

# Add upstream remote
git remote add upstream https://github.com/ravishan16/conversql.git
git remote add upstream https://github.com/ravishan16/converSQL.git
```

### 2. Set Up Development Environment
Expand Down Expand Up @@ -83,6 +83,14 @@ black src/ app.py
flake8 src/ app.py --max-line-length=100
```

### Front-end Styling

When updating Streamlit UI components:

- Re-use the CSS custom properties defined in `app.py` (`--color-background`, `--color-accent-primary`, etc.) instead of hard-coded hex values.
- Mirror changes in `.streamlit/config.toml` when altering primary/secondary colors so the Streamlit theme and custom CSS stay aligned.
- Include before/after screenshots in your pull request whenever you adjust layout, typography, or palette usage.

**Example:**
```python
def execute_sql_query(sql_query: str, parquet_files: List[str]) -> pd.DataFrame:
Expand Down
Loading