Create synthetic datasets from scratch using AI-powered generation. Define topics, customize prompts, and generate high-quality reasoning traces in the SYNTH format.
Core idea: SYNTH The New Data Frontier by PleIAs
Transform existing datasets into reasoning-enhanced formats. Full HuggingFace integration lets you search, preview, and convert public datasets with automatic reasoning trace generation.
Multiple AI agents working together in sophisticated pipelines:
- Meta Agent: Analyzes and plans approach
- Retrieval Agent: Gathers relevant information
- Derivation Agent: Builds logical chains
- Writer Agent: Composes the response
- Rewriter Agent: Polishes and refines
Go beyond single Q&A pairs:
- Generate multi-turn conversations
- Let the model ask follow-up questions
- Choose responders using SYNTH-style thinking
- Perfect for dialogue and instruction-following datasets
Have data but unsure what's inside? Explore it directly with our HuggingFace-style table viewer:
- Column type detection (string, number, array, object)
- Search and filter capabilities
- Fullscreen expansion with pagination
- Click any row to see full details
Quality control your generated data:
- Review and evaluate entries
- Remove duplicates automatically
- Assign ratings (1-5 stars)
- Export only verified, high-quality data
Seamless Firebase/Firestore support:
- Development Mode: Download data directly as JSONL files
- Production Mode: Upload to your Firestore database with one click
- Session management and persistence
- Real-time sync across devices
| Feature | Description |
|---|---|
| Multiple Providers | Support for Gemini, OpenAI, Anthropic, and custom endpoints |
| Concurrent Workers | Parallel processing for faster generation |
| Smart Retry | Automatic retry with exponential backoff |
| Session Management | Save, load, and manage multiple generation sessions |
| Export Formats | JSONL, JSON, and Parquet support |
| HuggingFace Upload | Push directly to HuggingFace Hub |
- Node.js 18+ OR Bun 1.0+
- API keys for your preferred provider(s)
-
Clone and install dependencies:
git clone <repository-url> cd synthlabs-reasoning-generator # Using npm npm install # OR using Bun (faster) bun install
-
Configure API keys:
Copy
.env.exampleto.env.localand add your keys:cp .env.example .env.local
Edit
.env.localwith your API keys:VITE_GEMINI_API_KEY=your-gemini-key VITE_OPENAI_API_KEY=your-openai-key VITE_ANTHROPIC_API_KEY=your-anthropic-key # Add other provider keys as needed
-
Run the app:
# Using npm npm run dev # Frontend only (custom port) npm run dev:client -- --port 3000 # OR using Bun (standalone) bun run bun:dev
-
Open in browser: Navigate to
http://localhost:3000
Follow this guide to get started with SynthLabs Reasoning Generator.
The main dashboard gives you quick access to all your generation sessions and configuration settings.
Navigate to Settings > API Keys to configure your AI providers. For local inference or custom endpoints (e.g. vLLM, Aphrodite):
- Select Other (Custom).
- Set your Base URL (e.g.,
http://localhost:8001/v1). - Enter your Model ID (e.g.,
Qwen/Qwen3-14B).
In the Generator (or Engine) tab, you can define your dataset parameters, customize system prompts, and configure output fields.
Navigate to Settings > DB Provider to configure your database providers. Switch to Prod (Cloud) mode to access and manage your cloud-persisted sessions. This allows you to collaborate and sync data across devices.
Use the Verifier (or Review) interface to inspect generated samples, check reasoning traces within <think> tags, and validate data quality.
Interact with your data using the integrated AI assistant to analyze patterns, summarize findings, or ask questions about your dataset.
This repo includes a minimal Node backend to handle Firebase Admin operations.
-
Set backend env vars (example):
VITE_BACKEND_URL=http://localhost:8787 VITE_SESSION_LIST_PAGE_SIZE=50 VITE_SESSION_LIST_TTL_MS=60000 VITE_SESSION_MAX_TEXT_LEN=10000 SESSION_LIST_TTL_MS=60000 BACKEND_JSON_LIMIT_MB=10 FIREBASE_PROJECT_ID=your-project-id FIREBASE_CLIENT_EMAIL=your-service-account-email FIREBASE_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
-
Run (Vite + backend):
npm run dev
The frontend will use the backend when VITE_BACKEND_URL is set.
If 8787 is busy (e.g., multiple desktop windows), the backend will auto-increment to the next available port (default range: 8787-8797).
The frontend will probe /health and attach to the first healthy backend in that range.
Optional envs:
# Backend port behavior
PORT=8787
PORT_RANGE=10
# Frontend discovery (falls back to VITE_BACKEND_URL if healthy)
VITE_BACKEND_PORT_START=8787
VITE_BACKEND_PORT_RANGE=10You can also set these in .env.example and copy to .env.local.
| Command | Description |
|---|---|
bun install |
Install dependencies |
bun run bun:dev |
Start dev server with Bun runtime |
bun run bun:build |
Build for production |
bun run bun:preview |
Preview production build |
Build standalone desktop applications for Windows, macOS, and Linux using Electron.
| Command | Description |
|---|---|
npm run electron:dev |
Run in development mode (with hot reload) |
npm run electron:build |
Build for all platforms |
npm run electron:build:win |
Build Windows installer (NSIS + portable) |
npm run electron:build:mac |
Build macOS app (DMG + ZIP) |
On Windows or cross-platform:
npm run electron:build:winOutput files will be in the release/ directory:
SynthLabs Reasoning Generator Setup X.X.X.exe- NSIS installerSynthLabs Reasoning Generator X.X.X.exe- Portable executable
On macOS:
npm run electron:build:macOutput files will be in the release/ directory:
SynthLabs Reasoning Generator-X.X.X.dmg- Disk imageSynthLabs Reasoning Generator-X.X.X-mac.zip- ZIP archive
Both builds support:
- x64 (Intel) architecture
- arm64 (Apple Silicon) architecture
- Code signing with hardened runtime
- Network permissions for API calls
npm run electron:buildOutput files:
SynthLabs Reasoning Generator-X.X.X.AppImage- Universal Linux appsynthlabs-reasoning-generator_X.X.X_amd64.deb- Debian/Ubuntu package
Windows:
- Windows 10 or later
- Node.js 18+
- No additional dependencies required
macOS:
- macOS 10.15 (Catalina) or later
- Xcode Command Line Tools:
xcode-select --install - Node.js 18+
- For code signing: Apple Developer account (optional, for distribution)
Linux:
- Any modern Linux distribution
- Node.js 18+
- Build tools:
sudo apt-get install build-essential(Debian/Ubuntu)
-
Start development server:
npm run electron:dev
This runs Vite dev server and Electron concurrently with hot reload.
-
Build for production:
npm run electron:build
-
Test the built app:
- Run the installer/exe/dmg from
release/directory - All features work the same as the web version
- Run the installer/exe/dmg from
Electron settings are in electron/main.js:
- Window size, icon, and appearance
- Menu configuration
- Security settings (context isolation enabled)
- Platform-specific behavior
electron-builder configuration is in package.json under the build section:
- Output directories
- Platform-specific targets
- Code signing and entitlements
- Installer options
The generator supports dynamic prompt sets. You can create your own "persona" or logical framework by adding files to the prompts/ directory.
- Create a new folder in
prompts/(e.g.,prompts/my-set/). - Inside your set folder, create subdirectories for each category:
generator/converter/verifier/
- Add
.txtfiles for each role. The app will automatically discover your set and show it in the Settings > Prompts tab.
prompts/
βββ <set_name>/
βββ generator/
β βββ system.txt (Main generator persona)
β βββ meta.txt (Task decomposition)
β βββ retrieval.txt (Constraint identification)
β βββ derivation.txt (Logical reasoning chains)
β βββ responder.txt (Final answer formulation)
β βββ user_agent.txt (Multi-turn interaction agent)
βββ converter/
β βββ system.txt (Main converter persona)
β βββ writer.txt (Writing the final reasoning trace)
β βββ rewriter.txt (Polishing converted output)
βββ verifier/
βββ query_rewrite.txt
βββ reasoning_rewrite.txt
βββ answer_rewrite.txt
βββ message_rewrite.txt
Tip
If a specific role file is missing in your custom set, the system will automatically fall back to the version in the default set.
For cloud persistence and production mode, set up Firestore:
-
Create a Firebase project at console.firebase.google.com
-
Enable Firestore Database
-
Add these Security Rules (Firestore Database β Rules):
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
match /synth_logs/{document=**} {
allow read, write: if true; # change if needed (too open for production)
}
match /synth_sessions/{document=**} {
allow read, write: if true; # change if needed (too open for production)
}
}
}- Configure your Firebase credentials in the app's settings panel
Generated data follows the SYNTH format:
{
"query": "What is the capital of France?",
"reasoning": "<think>The user is asking about geography...</think>",
"answer": "The capital of France is Paris.",
"messages": [...],
"isMultiTurn": false,
"metadata": {
"provider": "gemini",
"model": "gemini-2.0-flash",
"timestamp": 1704067200000
}
}Contributions are welcome! Please feel free to submit issues and pull requests.
This project is licensed under the Apache 2.0 License.
If you find this tool useful, please cite it as:
@misc{synthlabs,
author = {Kurman, Mariusz},
title = {SYNTHLabs Reasoning Generator},
howpublished = {\url{https://github.com/mkurman/synthlabs}},
year = {2026}
}
Thank you!
Built with β€οΈ for the AI research community






