Skip to content

Conversation

@leonidb
Copy link
Collaborator

@leonidb leonidb commented Dec 30, 2025

What does this PR do?

Sets up the SemanticInsightsGenerator module structure with validation framework integration. LLM generation logic is stubbed out and ready for implementation.

Changes

  • Created SemanticInsightsGenerator module (agentune/analyze/feature/gen/semantic_insights_generator/)
    • generator.py - Main orchestrator implementing FeatureGenerator interface
    • basic_generator.py - Core generation with validate_and_retry() loop (LLM calls stubbed)
    • corrector.py - LLM-based SQL repair stub (returns None for now)
    • llm/schema.py - GeneratedFeatureSpec dataclass (minimal dataclass, as a starter, might need more fields)
  • Test reorganization
    • Generator tests moved to nested folders (insightful_text_generator/, semantic_insights_generator/)
    • Shared fixtures in gen/conftest.py. For now it uses the same data (conversiaional). We'll need to find a small test data, probably take a sample from one of the generated problems
    • Stub test expects empty results until LLM implemented

Related Issues

Closes SparkBeyond/ao-core#113

@leonidb leonidb changed the title Feat: semantic insights generator Feat: semantic insights generator structure Dec 30, 2025
Base automatically changed from feat/376-feature-validations to main December 31, 2025 13:43
@leonidb leonidb force-pushed the feat/409-semantic-insights-generator branch from 5424e66 to c6c7663 Compare December 31, 2025 14:23
test_dataset_with_strategy,
)

__all__ = ['problem', 'real_llm_with_spec', 'test_dataset_with_strategy']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure we want to share ['problem', 'real_llm_with_spec', 'test_dataset_with_strategy'] between all generators?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we'll definitely want a separate, non conversation tabular data, I just didn't want to decide on it in this PR, so this is a placeholder, for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then define an placeholder ['problem', 'real_llm_with_spec', 'test_dataset_with_strategy'] in 'semantic_insights_generator/test_e2e.py' and remove this conftest.py.

generation_model: LLMWithSpec
repair_model: LLMWithSpec
seed: int | None = None
validator: FeatureValidator = field(factory=LawAndOrderValidator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a list

Create minimal SemanticInsightsGenerator module with validation integration:
- SemanticInsightsGenerator: Main orchestrator implementing FeatureGenerator
- BasicFeatureGenerator: Core generation logic with validate_and_retry loop
- LLMSqlFeatureCorrector: LLM-based SQL repair stub (TODO implementation)
- GeneratedFeatureSpec: Minimal schema for LLM-generated features
- Configurable validator and retry budgets with module defaults
- Uses official validation infrastructure (LawAndOrderValidator, validate_and_retry)
- Streaming feature generation (yields immediately after validation)
- Seed parameter for reproducible generation
- Created nested test folder: tests/agentune/analyze/feature/gen/semantic_insights_generator/
- test_e2e.py: Validates generator instantiation and API contract
- Expects empty results until LLM generation is implemented
- refactor: Move InsightfulTextGenerator tests into a nested folder structure, matching the SemanticInsightsGenerator test structure
@leonidb leonidb force-pushed the feat/409-semantic-insights-generator branch from bfdc6ce to 98f91f7 Compare January 1, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants