Data Engineering Agent

An AI-powered data engineering assistant with an interactive chat interface. This Tower app provides expert guidance on data pipelines, ETL/ELT workflows, database design, query optimization, and more.

Features

Interactive Chat Interface: Modern, responsive web UI with real-time streaming responses
Claude AI Integration: Powered by Claude Sonnet 4.5 for expert data engineering assistance
Code Execution: Write and execute Python code directly in the chat
Tower Integration: Deploy and orchestrate applications using Tower
Workspace Management: Persistent file system for building and testing code
Package Installation: Install Python packages on-demand
Comprehensive Expertise: Covers data warehousing, lakehouse architecture, orchestration, streaming, and more
Production-Ready Advice: Get practical, actionable recommendations with working code examples

Topics Covered

The agent can help with:

Data Pipelines: ETL/ELT design, orchestration (Airflow, Dagster, Prefect, dbt)
Database Design: Schema design, optimization, normalization, dimensional modeling
Query Optimization: SQL performance tuning, indexing strategies
Data Warehouses & Lakehouses: Snowflake, BigQuery, Databricks, Apache Iceberg
Stream Processing: Kafka, Flink, Spark Streaming
Data Quality: Testing strategies, validation, monitoring
Python Libraries: pandas, polars, PyArrow, DuckDB, SQLAlchemy
Cloud Platforms: AWS, GCP, Azure data services
Data Modeling: Kimball, Data Vault, normalized schemas

Agent Tools

The agent has access to powerful tools that allow it to actually build and test solutions:

Code Execution Tools

write_python_file: Create Python files in a workspace
execute_python: Run Python code or scripts with timeout protection
read_file: Read file contents from the workspace
list_files: Browse workspace directory structure
install_package: Install Python packages via pip

Tower Integration Tools (Python API)

tower_deploy: Deploy Tower applications from workspace using Tower Python API (creates TAR package and uploads)
tower_run: Execute Tower applications on the platform using Tower Python API
tower_list_apps: View all deployed Tower apps using Tower Python API

The agent uses the Tower Python API bindings for all Tower operations, providing direct programmatic access without CLI commands. Deployment now creates TAR packages in-memory and uploads them via the API.

Example Interactions

Build and Test Code:

User: "Create a script that extracts data from a CSV and loads it into a database"
Agent: [Writes Python file, executes it, shows output]

Deploy to Tower:

User: "Create a Tower app that runs daily to sync data"
Agent: [Creates Towerfile, task.py, requirements.txt, deploys to Tower]

Install Dependencies:

User: "Use pandas to analyze this data"
Agent: [Installs pandas if needed, writes analysis code, runs it]

The agent uses Tower only for deploying and orchestrating applications. For quick tests and prototyping, it uses the execute_python tool.

Setup

Prerequisites

Python 3.11 or higher
Anthropic API key
Tower account with API access (the Tower Python package is included in dependencies)

Installation

Navigate to the app directory:

cd data-engineering-agent

Install dependencies:

uv pip install -e .

Configuration

Anthropic API Key: Set as a Tower secret for the agent to function:

# Add the secret to Tower
tower secrets create ANTHROPIC_API_KEY "your-api-key-here"

Tower API Authentication: The agent's tools use Tower Python API which authenticates via environment variables:

TOWER_API_KEY: Your Tower API key for deploying and running apps
TOWER_URL: Tower API URL (defaults to https://api.tower.dev)
TOWER_ENVIRONMENT: Environment to use (defaults to "default")

When running locally, set these in your environment:

export TOWER_API_KEY="your-tower-api-key"

When deployed on Tower, the platform automatically provides authentication context.

Local Development

Run locally with Tower:

tower run --local

The app will start on http://localhost:50051

Deployment

Deploy to Tower

Deploy the app:

tower deploy

Enable external accessibility:
- Go to the Tower UI
- Navigate to the app settings
- Toggle "External Accessibility" to ON
Access your app:
- A unique URL will be generated for your app
- Visit the URL to start chatting with the agent
- The app will automatically start a run when you visit

Environment Variables

The app uses the following environment variables:

ANTHROPIC_API_KEY (required): Your Anthropic API key for Claude access
TOWER_API_KEY (optional): Tower API key for deploying/running apps via agent tools
TOWER_URL (optional): Tower API URL (defaults to https://api.tower.dev)
TOWER_ENVIRONMENT (optional): Tower environment (defaults to "default")
PORT (optional): Port to run the server on (defaults to 50051)
TOWER__HOSTNAME (auto-set): The hostname assigned by Tower

Agent Deployment Capability

The agent itself can deploy Tower apps programmatically! When you ask it to create a Tower app:

Creates files: Writes Towerfile, task.py, requirements.txt, etc.
Packages: Creates a TAR.GZ archive with MANIFEST
Deploys: Uploads via Tower Python API using TOWER_API_KEY
Returns status: Provides app name and version number

This allows the agent to build and deploy production apps entirely through the chat interface.

Usage

Starting a Conversation

Once the app is running:

Open the app URL in your browser
You'll see a welcome screen with suggested topics
Click a suggestion or type your own question
The agent will respond with detailed, practical advice

Example Questions

Pipeline Design:

"How do I design a scalable ETL pipeline for processing millions of records daily?"
"What's the best way to orchestrate dependencies between data tasks?"

Query Optimization:

"My PostgreSQL query is slow. How can I optimize it?"
"What indexes should I create for a large fact table?"

Data Modeling:

"How should I model customer data using dimensional modeling?"
"What's the difference between star schema and snowflake schema?"

Data Quality:

"What are best practices for data quality testing in a data warehouse?"
"How do I implement data validation in my pipeline?"

Tips

Be specific about your requirements and constraints
Mention your tech stack (databases, tools, cloud providers)
Ask follow-up questions to dive deeper
Request code examples when helpful

Architecture

Backend

FastAPI: Web framework handling HTTP requests and serving static files
Anthropic SDK: Integration with Claude AI for intelligent responses
Tower Python API: Direct programmatic access to Tower platform for running and managing apps
Streaming API: Real-time response streaming for better UX
uvicorn: ASGI server for production deployment

Frontend

Vanilla JavaScript: No framework dependencies, lightweight and fast
Server-Sent Events (SSE): Streaming responses from the API
Responsive Design: Works on desktop and mobile devices
Dark Theme: Easy on the eyes for extended usage

API Endpoints

GET /: Serves the chat interface
GET /health: Health check endpoint
POST /api/chat: Chat endpoint with streaming support

Development

File Structure

data-engineering-agent/
├── Towerfile              # Tower app configuration
├── pyproject.toml         # Python dependencies
├── main.py                # FastAPI application
├── tools.py               # Tool definitions and executor
├── system_prompt.md       # Agent system prompt (easily editable)
├── README.md              # This file
└── static/
    ├── index.html         # Chat UI
    └── app.js             # Frontend logic

Making Changes

Edit the files locally
Test with tower run --local
Deploy with tower deploy
Changes will be live after redeployment

Customization

Modify the System Prompt: Edit system_prompt.md to customize the agent's behavior, expertise areas, and instructions. This file is loaded at startup, so changes require redeployment.

Update the UI: Modify static/index.html and static/app.js to change the appearance or add features.

Change the Model: Update the model parameter in main.py to use different Claude models (e.g., Claude Opus for more complex reasoning).

Add or Modify Tools: Edit tools.py to add new capabilities or modify existing tool behavior.

Troubleshooting

API Key Issues

If you see "API key not configured":

Ensure ANTHROPIC_API_KEY is set as a Tower secret
Redeploy the app after adding the secret

App Not Accessible

If the external URL doesn't work:

Check that "External Accessibility" is enabled in app settings
Wait a few moments for the run to start
Check the Tower logs for any startup errors

Slow Responses

The first response may take a few seconds as the model initializes
Subsequent responses should be faster with streaming
Consider using a different model if speed is critical

Contributing

This is a Tower Apps repository. To contribute:

Make your changes in a new branch
Test thoroughly with tower run --local
Submit a pull request with a clear description

License

This app is part of the Tower Apps collection and follows the repository's license.

Support

For issues or questions:

Check the Tower documentation: https://docs.tower.dev
Review the Tower examples in the parent repository
Contact the Tower team for platform-specific issues

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
static		static
.gitignore		.gitignore
README.md		README.md
Towerfile		Towerfile
main.py		main.py
pyproject.toml		pyproject.toml
system_prompt.md		system_prompt.md
tools.py		tools.py

tower/data-engineering-agent-example

Folders and files

Latest commit

History

Repository files navigation