An AI-powered data engineering assistant with an interactive chat interface. This Tower app provides expert guidance on data pipelines, ETL/ELT workflows, database design, query optimization, and more.
- Interactive Chat Interface: Modern, responsive web UI with real-time streaming responses
- Claude AI Integration: Powered by Claude Sonnet 4.5 for expert data engineering assistance
- Code Execution: Write and execute Python code directly in the chat
- Tower Integration: Deploy and orchestrate applications using Tower
- Workspace Management: Persistent file system for building and testing code
- Package Installation: Install Python packages on-demand
- Comprehensive Expertise: Covers data warehousing, lakehouse architecture, orchestration, streaming, and more
- Production-Ready Advice: Get practical, actionable recommendations with working code examples
The agent can help with:
- Data Pipelines: ETL/ELT design, orchestration (Airflow, Dagster, Prefect, dbt)
- Database Design: Schema design, optimization, normalization, dimensional modeling
- Query Optimization: SQL performance tuning, indexing strategies
- Data Warehouses & Lakehouses: Snowflake, BigQuery, Databricks, Apache Iceberg
- Stream Processing: Kafka, Flink, Spark Streaming
- Data Quality: Testing strategies, validation, monitoring
- Python Libraries: pandas, polars, PyArrow, DuckDB, SQLAlchemy
- Cloud Platforms: AWS, GCP, Azure data services
- Data Modeling: Kimball, Data Vault, normalized schemas
The agent has access to powerful tools that allow it to actually build and test solutions:
- write_python_file: Create Python files in a workspace
- execute_python: Run Python code or scripts with timeout protection
- read_file: Read file contents from the workspace
- list_files: Browse workspace directory structure
- install_package: Install Python packages via pip
- tower_deploy: Deploy Tower applications from workspace using Tower Python API (creates TAR package and uploads)
- tower_run: Execute Tower applications on the platform using Tower Python API
- tower_list_apps: View all deployed Tower apps using Tower Python API
The agent uses the Tower Python API bindings for all Tower operations, providing direct programmatic access without CLI commands. Deployment now creates TAR packages in-memory and uploads them via the API.
Build and Test Code:
User: "Create a script that extracts data from a CSV and loads it into a database"
Agent: [Writes Python file, executes it, shows output]
Deploy to Tower:
User: "Create a Tower app that runs daily to sync data"
Agent: [Creates Towerfile, task.py, requirements.txt, deploys to Tower]
Install Dependencies:
User: "Use pandas to analyze this data"
Agent: [Installs pandas if needed, writes analysis code, runs it]
The agent uses Tower only for deploying and orchestrating applications. For quick tests and prototyping, it uses the execute_python tool.
- Python 3.11 or higher
- Anthropic API key
- Tower account with API access (the Tower Python package is included in dependencies)
- Navigate to the app directory:
cd data-engineering-agent- Install dependencies:
uv pip install -e .Anthropic API Key: Set as a Tower secret for the agent to function:
# Add the secret to Tower
tower secrets create ANTHROPIC_API_KEY "your-api-key-here"Tower API Authentication: The agent's tools use Tower Python API which authenticates via environment variables:
TOWER_API_KEY: Your Tower API key for deploying and running appsTOWER_URL: Tower API URL (defaults to https://api.tower.dev)TOWER_ENVIRONMENT: Environment to use (defaults to "default")
When running locally, set these in your environment:
export TOWER_API_KEY="your-tower-api-key"When deployed on Tower, the platform automatically provides authentication context.
Run locally with Tower:
tower run --localThe app will start on http://localhost:50051
- Deploy the app:
tower deploy-
Enable external accessibility:
- Go to the Tower UI
- Navigate to the app settings
- Toggle "External Accessibility" to ON
-
Access your app:
- A unique URL will be generated for your app
- Visit the URL to start chatting with the agent
- The app will automatically start a run when you visit
The app uses the following environment variables:
ANTHROPIC_API_KEY(required): Your Anthropic API key for Claude accessTOWER_API_KEY(optional): Tower API key for deploying/running apps via agent toolsTOWER_URL(optional): Tower API URL (defaults to https://api.tower.dev)TOWER_ENVIRONMENT(optional): Tower environment (defaults to "default")PORT(optional): Port to run the server on (defaults to 50051)TOWER__HOSTNAME(auto-set): The hostname assigned by Tower
The agent itself can deploy Tower apps programmatically! When you ask it to create a Tower app:
- Creates files: Writes Towerfile, task.py, requirements.txt, etc.
- Packages: Creates a TAR.GZ archive with MANIFEST
- Deploys: Uploads via Tower Python API using TOWER_API_KEY
- Returns status: Provides app name and version number
This allows the agent to build and deploy production apps entirely through the chat interface.
Once the app is running:
- Open the app URL in your browser
- You'll see a welcome screen with suggested topics
- Click a suggestion or type your own question
- The agent will respond with detailed, practical advice
Pipeline Design:
- "How do I design a scalable ETL pipeline for processing millions of records daily?"
- "What's the best way to orchestrate dependencies between data tasks?"
Query Optimization:
- "My PostgreSQL query is slow. How can I optimize it?"
- "What indexes should I create for a large fact table?"
Data Modeling:
- "How should I model customer data using dimensional modeling?"
- "What's the difference between star schema and snowflake schema?"
Data Quality:
- "What are best practices for data quality testing in a data warehouse?"
- "How do I implement data validation in my pipeline?"
- Be specific about your requirements and constraints
- Mention your tech stack (databases, tools, cloud providers)
- Ask follow-up questions to dive deeper
- Request code examples when helpful
- FastAPI: Web framework handling HTTP requests and serving static files
- Anthropic SDK: Integration with Claude AI for intelligent responses
- Tower Python API: Direct programmatic access to Tower platform for running and managing apps
- Streaming API: Real-time response streaming for better UX
- uvicorn: ASGI server for production deployment
- Vanilla JavaScript: No framework dependencies, lightweight and fast
- Server-Sent Events (SSE): Streaming responses from the API
- Responsive Design: Works on desktop and mobile devices
- Dark Theme: Easy on the eyes for extended usage
GET /: Serves the chat interfaceGET /health: Health check endpointPOST /api/chat: Chat endpoint with streaming support
data-engineering-agent/
├── Towerfile # Tower app configuration
├── pyproject.toml # Python dependencies
├── main.py # FastAPI application
├── tools.py # Tool definitions and executor
├── system_prompt.md # Agent system prompt (easily editable)
├── README.md # This file
└── static/
├── index.html # Chat UI
└── app.js # Frontend logic
- Edit the files locally
- Test with
tower run --local - Deploy with
tower deploy - Changes will be live after redeployment
Modify the System Prompt: Edit system_prompt.md to customize the agent's behavior, expertise areas, and instructions. This file is loaded at startup, so changes require redeployment.
Update the UI: Modify static/index.html and static/app.js to change the appearance or add features.
Change the Model: Update the model parameter in main.py to use different Claude models (e.g., Claude Opus for more complex reasoning).
Add or Modify Tools: Edit tools.py to add new capabilities or modify existing tool behavior.
If you see "API key not configured":
- Ensure
ANTHROPIC_API_KEYis set as a Tower secret - Redeploy the app after adding the secret
If the external URL doesn't work:
- Check that "External Accessibility" is enabled in app settings
- Wait a few moments for the run to start
- Check the Tower logs for any startup errors
- The first response may take a few seconds as the model initializes
- Subsequent responses should be faster with streaming
- Consider using a different model if speed is critical
This is a Tower Apps repository. To contribute:
- Make your changes in a new branch
- Test thoroughly with
tower run --local - Submit a pull request with a clear description
This app is part of the Tower Apps collection and follows the repository's license.
For issues or questions:
- Check the Tower documentation: https://docs.tower.dev
- Review the Tower examples in the parent repository
- Contact the Tower team for platform-specific issues