Feat: semantic insights generator structure #138

leonidb · 2025-12-30T08:10:59Z

What does this PR do?

Sets up the SemanticInsightsGenerator module structure with validation framework integration. LLM generation logic is stubbed out and ready for implementation.

Changes

Created SemanticInsightsGenerator module (agentune/analyze/feature/gen/semantic_insights_generator/)
- generator.py - Main orchestrator implementing FeatureGenerator interface
- basic_generator.py - Core generation with validate_and_retry() loop (LLM calls stubbed)
- corrector.py - LLM-based SQL repair stub (returns None for now)
- llm/schema.py - GeneratedFeatureSpec dataclass (minimal dataclass, as a starter, might need more fields)
Test reorganization
- Generator tests moved to nested folders (insightful_text_generator/, semantic_insights_generator/)
- Shared fixtures in gen/conftest.py. For now it uses the same data (conversiaional). We'll need to find a small test data, probably take a sample from one of the generated problems
- Stub test expects empty results until LLM implemented

Related Issues

Closes SparkBeyond/ao-core#113

yotam319-sparkbeyond · 2026-01-01T07:56:58Z

tests/agentune/analyze/feature/gen/conftest.py

+    test_dataset_with_strategy,
+)
+
+__all__ = ['problem', 'real_llm_with_spec', 'test_dataset_with_strategy']


are we sure we want to share ['problem', 'real_llm_with_spec', 'test_dataset_with_strategy'] between all generators?

No, we'll definitely want a separate, non conversation tabular data, I just didn't want to decide on it in this PR, so this is a placeholder, for now

then define an placeholder ['problem', 'real_llm_with_spec', 'test_dataset_with_strategy'] in 'semantic_insights_generator/test_e2e.py' and remove this conftest.py.

yotam319-sparkbeyond · 2026-01-01T07:59:06Z

agentune/analyze/feature/gen/semantic_insights_generator/basic_generator.py

+    generation_model: LLMWithSpec
+    repair_model: LLMWithSpec
+    seed: int | None = None
+    validator: FeatureValidator = field(factory=LawAndOrderValidator)


should be a list

Create minimal SemanticInsightsGenerator module with validation integration: - SemanticInsightsGenerator: Main orchestrator implementing FeatureGenerator - BasicFeatureGenerator: Core generation logic with validate_and_retry loop - LLMSqlFeatureCorrector: LLM-based SQL repair stub (TODO implementation) - GeneratedFeatureSpec: Minimal schema for LLM-generated features - Configurable validator and retry budgets with module defaults - Uses official validation infrastructure (LawAndOrderValidator, validate_and_retry) - Streaming feature generation (yields immediately after validation) - Seed parameter for reproducible generation

- Created nested test folder: tests/agentune/analyze/feature/gen/semantic_insights_generator/ - test_e2e.py: Validates generator instantiation and API contract - Expects empty results until LLM generation is implemented - refactor: Move InsightfulTextGenerator tests into a nested folder structure, matching the SemanticInsightsGenerator test structure

leonidb requested a review from yotam319-sparkbeyond December 30, 2025 08:11

leonidb changed the title ~~Feat: semantic insights generator~~ Feat: semantic insights generator structure Dec 30, 2025

Base automatically changed from feat/376-feature-validations to main December 31, 2025 13:43

leonidb force-pushed the feat/409-semantic-insights-generator branch from 5424e66 to c6c7663 Compare December 31, 2025 14:23

yotam319-sparkbeyond approved these changes Jan 1, 2026

View reviewed changes

leonidb added 2 commits January 1, 2026 16:39

leonidb force-pushed the feat/409-semantic-insights-generator branch from bfdc6ce to 98f91f7 Compare January 1, 2026 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: semantic insights generator structure #138

Feat: semantic insights generator structure #138

Uh oh!

leonidb commented Dec 30, 2025

Uh oh!

yotam319-sparkbeyond Jan 1, 2026

Uh oh!

leonidb Jan 1, 2026

Uh oh!

yotam319-sparkbeyond Jan 1, 2026

Uh oh!

yotam319-sparkbeyond Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat: semantic insights generator structure #138

Are you sure you want to change the base?

Feat: semantic insights generator structure #138

Uh oh!

Conversation

leonidb commented Dec 30, 2025

What does this PR do?

Changes

Related Issues

Uh oh!

yotam319-sparkbeyond Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

leonidb Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

yotam319-sparkbeyond Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

yotam319-sparkbeyond Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants