🚀 EarthShock

🎯 Problem Statement

What are we trying to solve?

Generate realistic synthetic data to address the lack of, or issues with existing, datasets.
Enable intelligent data generation based on metadata or prompts using LLM capabilities.

🛠️ Solution Approach

How we plan to tackle this:

Build an LLM-powered agent that intelligently generates synthetic data.
Leverage LLMs to understand context and metadata for informed data creation.
Create a system capable of producing realistic, meaningful data based on prompts or schema information.
Explore the potential benefits of utilizing multiple LLMs.

📋 Implementation

What we built:

While we developed more features than listed here, these points directly relate to our core goal.

Key Features/Components:
- AI-powered data generation for small datasets (5-10 rows).
- Intelligent filename generation and CSV saving.
Architecture Decisions:
- Followed the "build from scratch" philosophy, avoiding external frameworks.
- Utilized hand-fed JSON table schemas as input.
Code Structure Overview:
- The overall structure mirrors Fireball's approach, with tools added to a tools class allowing the AI agent autonomous decision-making and tool selection.

📊 Retrospective

✅ What Worked Well

Rapid Onboarding & Code Portability: Marco was able to contribute within 60 minutes of starting, building his own tool and integrating it into the agent workflow. This demonstrates excellent code design and ease of use.
Successful Proof-of-Concept (PoC): We achieved our primary goal before lunchtime – generating realistic data for any given schema. This allowed us to focus on exploration and learning in the afternoon.
Agentic Workflow Understanding: Our clear understanding of agentic workflows, combined with our self-built code base, enabled rapid deployment.

❌ What Didn't Work

Inconsistencies at Scale: Generating large amounts of data solely through LLM calls resulted in noticeable inconsistencies.
Limited Scalability: Our custom build approach hinders scalability.
Debugging Challenges: Pasting JSON into the terminal triggered an LLM call for each line, causing unexpected delays and debugging time.

🔄 What We'd Do Differently

Prioritize Framework Adoption: Begin with a framework designed for scale from the outset.
Refactor Data Generation Tool: Develop a more Pythonic data generation tool powered by LLMs.
Implement Orchestration: Increase LLM orchestration to enable calls to specialized agents for improved scalability and customization.

🎉 What We Learned

Key takeaways from this project:

While our self-build approach was valuable for knowledge acquisition, we should transition to a framework to accelerate development and improve scalability.
Data generation is manageable for LLMs on a small scale with schema input; however, scaling to larger datasets (1,000 - 10,000 rows) requires further investigation.
Orchestration and specialized agents are crucial for achieving greater scale and bespoke code/prompt handling.

Next Steps

Framework Evaluation: Research the pros and cons of PydanticAi & DSPy frameworks.
Scalable Solution Planning: Design a more scalable AI Agent solution, considering both code complexity and task complexity. This should incorporate our existing features.
Future Hack Day: Schedule another hack day once planning is complete.

Made with 💻 and lots of ☕ Let's keep throwing spaghetti! 🍝

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 EarthShock

🎯 Problem Statement

🛠️ Solution Approach

📋 Implementation

📊 Retrospective

✅ What Worked Well

❌ What Didn't Work

🔄 What We'd Do Differently

🎉 What We Learned

Next Steps

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

sorcware/earthshock

Folders and files

Latest commit

History

Repository files navigation

🚀 EarthShock

🎯 Problem Statement

🛠️ Solution Approach

📋 Implementation

📊 Retrospective

✅ What Worked Well

❌ What Didn't Work

🔄 What We'd Do Differently

🎉 What We Learned

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages