Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,12 @@ jobs:
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Run smoke tests
run: |
pytest -k smoke -v
env:
PYTHONPATH: ${{ github.workspace }}

- name: Run unit tests with coverage
run: |
pytest tests/unit/ -v --cov=src --cov-report=xml --cov-report=term-missing
Expand Down
152 changes: 152 additions & 0 deletions REPO_INVENTORY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Repository Inventory

**Generated:** 2024-10-04
**Purpose:** Document potentially unused files, large notebooks, and items for cleanup consideration

---

## Summary

This inventory identifies files that may be candidates for removal, archiving, or relocation based on static analysis and repository structure review.

### Statistics
- Total Python files: 44
- Total test files: ~20
- Documentation files: 12
- Notebooks: 2

---

## Actions Taken

### ✅ Moved to Archive
- `notebooks/pipeline_csv_to_parquet multifile.ipynb` → `notebooks/experiments/`
- **Reason:** Large experimental notebook (36KB) not referenced in main pipeline
- **Status:** Preserved in experiments directory for historical reference
- **Action:** MOVED (not deleted)

---

## Files Under Review

### Configuration Files - Potential Consolidation

#### `.replit`
- **Size:** 714 bytes
- **Purpose:** Replit IDE configuration
- **Recommendation:** Keep if using Replit; consider adding to `.gitignore` if not needed in version control
- **Risk:** Low - IDE-specific configuration
- **Action:** REVIEW

#### `pyproject.toml` + `setup.cfg`
- **Purpose:** Both contain tool configuration (Black, isort, mypy, flake8)
- **Observation:** Configuration is split between two files
- **Recommendation:** Consider consolidating all tool config into `pyproject.toml` (modern standard)
- **Action:** CONSOLIDATE (optional optimization)

---

## Notebooks

### Active Notebooks
- `notebooks/pipeline_csv_to_parquet.ipynb` (24KB)
- **Status:** Active, referenced in documentation
- **Action:** KEEP

### Archived Notebooks
- `notebooks/experiments/pipeline_csv_to_parquet multifile.ipynb` (36KB)
- **Status:** Experimental, preserved for reference
- **Action:** ARCHIVED

---

## Documentation Assessment

All documentation files appear active and well-maintained:
- ✅ `docs/ARCHITECTURE.md` (20KB) - Core architecture documentation
- ✅ `docs/AI_ENGINES.md` (24KB) - AI adapter documentation
- ✅ `docs/DATA_PIPELINE.md` (19KB) - Data pipeline documentation
- ✅ `docs/VISUALIZATION.md` (17KB) - Visualization documentation
- ✅ Other docs: All appear referenced and active

**Observation:** `docs/ARCHITECTURE_V2.md` (1.7KB) is notably smaller than `ARCHITECTURE.md` (20KB)
- **Recommendation:** Verify if V2 is intended to replace or supplement ARCHITECTURE.md
- **Action:** REVIEW DOCUMENTATION STRATEGY

---

## Code Quality Notes

### Ignored Files in Coverage/Linting
The following file is explicitly excluded from coverage and linting:
- `src/ai_service_old.py`
- **Status:** Listed in `.gitignore`, `pyproject.toml`, and `setup.cfg`
- **Issue:** File reference exists but file not found in repository
- **Recommendation:** Remove references from config files since file doesn't exist
- **Action:** CLEANUP CONFIG

---

## Testing Infrastructure

### Test Structure
- Unit tests: `tests/unit/` - Well organized with 49 passing tests
- Integration tests: `tests/integration/` - Properly separated
- New smoke test: `tests/test_smoke_app.py` - Added for basic import validation

### Coverage
Current coverage is 75% for base adapter, with lower coverage for AI adapters (expected for integration code).

---

## Recommendations

### High Priority
1. ✅ **COMPLETED:** Move `pipeline_csv_to_parquet multifile.ipynb` to experiments
2. ✅ **COMPLETED:** Add smoke test for app initialization
3. **TODO:** Remove references to `src/ai_service_old.py` from config files

### Medium Priority
1. **Consider:** Consolidate tool configuration into `pyproject.toml` only
2. **Review:** Clarify relationship between `ARCHITECTURE.md` and `ARCHITECTURE_V2.md`
3. **Consider:** Add `.replit` to `.gitignore` if not needed in version control

### Low Priority
1. **Monitor:** Track unused code with regular `vulture` runs (tool added in `scripts/report_unused.sh`)
2. **Consider:** Create `notebooks/README.md` to document purpose of each notebook

---

## Future Cleanup Strategy

The `scripts/report_unused.sh` script has been added to help identify:
- Unused Python functions and classes (via vulture)
- Code with zero coverage (via pytest-cov)
- Import patterns that may indicate dead code

Run periodically with:
```bash
bash scripts/report_unused.sh
```

---

## Risk Assessment

**Overall Risk:** LOW
- All changes are non-destructive
- Experimental notebook preserved (moved, not deleted)
- No production code removed
- Smoke test added for safety

**Rollback:** Simple - revert the branch; all moved files can be restored from git history.

---

## Notes

This inventory is a living document. As the codebase evolves:
1. Run `scripts/report_unused.sh` periodically
2. Update this inventory when making structural changes
3. Review archived notebooks annually for potential deletion
4. Keep test coverage high to identify truly unused code
118 changes: 118 additions & 0 deletions scripts/report_unused.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
#!/bin/bash
#
# Report unused code using vulture and coverage analysis
# This script helps identify potential dead code for cleanup
#
# Usage:
# bash scripts/report_unused.sh [--install]
#
# Options:
# --install Install vulture if not present

set -e

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"

echo "🔍 Unused Code Report for converSQL"
echo "===================================="
echo ""

# Function to check if a command exists
command_exists() {
command -v "$1" >/dev/null 2>&1
}

# Check for --install flag
if [[ "$1" == "--install" ]]; then
echo "📦 Installing vulture..."
pip install vulture
echo "✅ vulture installed"
echo ""
fi

# Check if vulture is available
if ! command_exists vulture; then
echo "⚠️ vulture is not installed"
echo "Install with: pip install vulture"
echo "Or run: bash scripts/report_unused.sh --install"
echo ""
VULTURE_AVAILABLE=false
else
VULTURE_AVAILABLE=true
fi

# Change to project root
cd "${PROJECT_ROOT}"

# Run vulture if available
if [ "$VULTURE_AVAILABLE" = true ]; then
echo "📊 Running vulture static analysis..."
echo "------------------------------------"

# Run vulture with confidence threshold
# Higher confidence = more likely to be actually unused
vulture src/ --min-confidence 80 --sort-by-size 2>/dev/null || {
echo "✅ No high-confidence unused code found (confidence >= 80%)"
}
echo ""

echo "📊 Vulture summary (all confidence levels)..."
echo "------------------------------------"
vulture src/ --min-confidence 60 2>&1 | head -20 || {
echo "✅ No unused code found (confidence >= 60%)"
}
echo ""
fi

# Run coverage analysis if pytest is available
if command_exists pytest; then
echo "📊 Coverage analysis for untested code..."
echo "----------------------------------------"

# Run tests with coverage, focusing on source code
pytest tests/unit/ --cov=src --cov-report=term-missing --cov-report=html -q 2>&1 | \
grep -E "^src/" | \
grep -E "0%" || echo "✅ All source files have some test coverage"

echo ""
echo "📁 Detailed coverage report available at: htmlcov/index.html"
echo ""
else
echo "⚠️ pytest not available for coverage analysis"
echo ""
fi

# Check for common patterns that might indicate unused code
echo "🔍 Checking for potential cleanup patterns..."
echo "-------------------------------------------"

# Look for files with "_old" suffix
OLD_FILES=$(find src/ -type f -name "*_old.py" 2>/dev/null || true)
if [ -n "$OLD_FILES" ]; then
echo "⚠️ Found files with '_old' suffix:"
echo "$OLD_FILES"
else
echo "✅ No '_old' backup files found"
fi

# Look for TODO/FIXME/DEPRECATED comments
TODO_COUNT=$(grep -r "TODO\|FIXME\|DEPRECATED" src/ --include="*.py" 2>/dev/null | wc -l || echo "0")
if [ "$TODO_COUNT" -gt 0 ]; then
echo "📝 Found $TODO_COUNT TODO/FIXME/DEPRECATED comments in source code"
echo " Run: grep -rn 'TODO\|FIXME\|DEPRECATED' src/ --include='*.py'"
else
echo "✅ No TODO/FIXME/DEPRECATED markers found"
fi

echo ""
echo "===================================="
echo "✅ Unused code report complete!"
echo ""
echo "💡 Tips:"
echo " - Review vulture output carefully (false positives are common)"
echo " - Focus on code with 0% coverage first"
echo " - Check _old backup files for safe removal"
echo " - Address TODO/FIXME comments before cleanup"
echo ""
echo "📖 See REPO_INVENTORY.md for cleanup recommendations"
111 changes: 111 additions & 0 deletions tests/test_smoke_app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
"""
Smoke test for app initialization.

This test ensures that the application can be imported and core initialization
paths work without network calls or full UI rendering.
"""

import sys
from unittest.mock import MagicMock, patch


def test_app_imports_successfully():
"""Test that the app.py module can be imported without errors."""
# Mock streamlit to prevent UI operations
sys.modules["streamlit"] = MagicMock()

try:
import app # noqa: F401

assert True, "app.py imported successfully"
except ImportError as e:
assert False, f"Failed to import app.py: {e}"


def test_app_logic_imports_successfully():
"""Test that src.app_logic can be imported."""
try:
from src import app_logic # noqa: F401

assert True, "src.app_logic imported successfully"
except ImportError as e:
assert False, f"Failed to import src.app_logic: {e}"


def test_app_logic_initialize_without_streamlit():
"""Test that initialize_app_data can be called with mocked streamlit."""

# Create a session state mock that supports both dict and attribute access
class SessionStateMock:
def __init__(self):
self._data = {}

def __contains__(self, key):
return key in self._data

def __setitem__(self, key, value):
self._data[key] = value

def __getitem__(self, key):
return self._data[key]

def __setattr__(self, key, value):
if key == "_data":
super().__setattr__(key, value)
else:
self._data[key] = value

def __getattr__(self, key):
if key == "_data":
return super().__getattribute__(key)
return self._data.get(key)

def setdefault(self, key, value):
return self._data.setdefault(key, value)

# Mock streamlit with custom session state
mock_session_state = SessionStateMock()
mock_st = MagicMock()
mock_st.session_state = mock_session_state

# Mock spinner context manager
mock_spinner = MagicMock()
mock_spinner.__enter__ = MagicMock(return_value=None)
mock_spinner.__exit__ = MagicMock(return_value=None)
mock_st.spinner.return_value = mock_spinner

# Mock the data services to avoid file I/O
with patch("src.app_logic.st", mock_st):
with patch("src.app_logic.load_parquet_files", return_value=[]):
with patch("src.app_logic.load_schema_context", return_value=""):
with patch("src.app_logic.load_ai_service", return_value=MagicMock(is_available=lambda: False)):
from src.app_logic import initialize_app_data

# Should not raise any exceptions
initialize_app_data()

# Verify session state was initialized
assert "generated_sql" in mock_session_state
assert "ai_error" in mock_session_state
assert "show_edit_sql" in mock_session_state
assert "_rendered_this_run" in mock_session_state


def test_core_imports_successfully():
"""Test that src.core module can be imported."""
try:
from src import core # noqa: F401

assert True, "src.core imported successfully"
except ImportError as e:
assert False, f"Failed to import src.core: {e}"


def test_ai_service_imports_successfully():
"""Test that src.ai_service module can be imported."""
try:
from src import ai_service # noqa: F401

assert True, "src.ai_service imported successfully"
except ImportError as e:
assert False, f"Failed to import src.ai_service: {e}"