Project Bumblebee 🐝

Speak less, show more. A visual-first AI experiment that communicates primarily through generated imagery rather than text, building a unique visual language with every conversation.

🔗 Live Demo: bumblebee.fladrycreative.com

🎯 Project Overview

Project Bumblebee is an experimental AI interface that inverts the traditional text-heavy chatbot paradigm. Instead of generating walls of text, Bumblebee "thinks" visually—translating user queries into a sequence of meaningful images using Google's Gemini multimodal AI models.

The project explores a fundamental question: What if AI communicated primarily through images, developing its own visual language over time? This concept draws inspiration from linguistic relativity theory and films like Arrival, where understanding emerges through visual and conceptual patterns rather than verbal language.

What Makes This Different?

Unlike DALL-E chat or other multimodal systems that use images as supplements to text responses, Bumblebee:

Only thinks in images: The model cannot respond with text alone—every output is visual

Develops visual language: Over conversation turns, establishes consistent visual metaphors and styles

Shows its reasoning: A transparent "Thinking Layer" reveals how the AI interprets queries and plans visuals

Two-stage architecture: Separates reasoning (interpretation) from generation (visualization)
This isn't about generating prettier responses—it's about exploring whether visual-first communication can be more intuitive,
universal, or expressive than text for certain tasks.

🧠 How It Works

Bumblebee flips the traditional chatbot script through a three-layer architecture:

1. Interpretation Layer (`gemini-2.5-flash`)

The user's query is first sent to a reasoning model. Instead of answering directly, this model acts as a "Director." It analyzes the intent and outputs a structured Thinking JSON object containing:

interpretation: What the user actually wants

visualApproach: How to best visualize the answer (e.g., "a sequential diagram," "a side-by-side comparison," "a metaphorical illustration")

prompts: A set of optimized image generation prompts tailored to the visual approach

styleConsiderations: Notes on maintaining visual consistency with previous turns
2. Generation Layer (gemini-2.5-flash-image)
The system takes the generated prompts and fires parallel requests to the image generation model.
Model Note: This uses Google Gemini's experimental image generation model, internally nicknamed "Nano Banana" (now evolved to "Nano Banana Pro" in the gemini-3.0-flash-image-preview). This detail represents the bleeding edge of Google's multimodal capabilities.
3. Presentation Layer
The result is delivered as a chat bubble containing a gallery of images. A hidden "Thinking Layer" allows the user to peek behind the curtain and see the AI's reasoning, interpretation, and the exact prompts it used to generate the visuals.

🛠 Tech Stack

Component	Technology
Framework	React (v19)
Build Tool	Vite
Language	TypeScript
Styling	Tailwind CSS
AI Models	Google Gemini 2.5 Flash (Reasoning) Google Gemini 2.5 Flash Image (Generation)
SDK	Google GenAI SDK

⚙️ Getting Started

Prerequisites

Node.js (v18 or higher)
- Google Gemini API key (Get one here)
- Installation
- 1. Clone the repository
  2. ```
     git clone https://github.com/drftstatic/Project-Bumblebee.git
     cd Project-Bumblebee
```
  1. Install dependencies
  2. npm install
    1. Set up environment variables
    2. Copy the example environment file:
    3. ```bash
    4. cp .env.example .env
    5. Then edit `.env` and add your API key:
      
      GEMINI_API_KEY=your_actual_gemini_api_key_here ```
    6. Run the development server
    7. npm run dev
      
      Open your browser
      
      Navigate to `http://localhost:5173` (or the port shown in your terminal)
      
      ⚡ Core Logic
      
      The heart of Bumblebee is the getVisualResponse orchestrator. It manages the two-step "Think then Draw" process, handling context, history, and error states seamlessly.
      
      export const getVisualResponse = async ( prompt: string, history: ChatMessage[], imageBase64?: string ): Promise<{ images: string[]; thinking: ThinkingData; groundingSources?: GroundingSource[]; }> => { // 1. Get reasoning and prompts from Gemini 2.5 Flash const reasoningModel = 'gemini-2.5-flash'; const contents = [ ...history.flatMap(msg => { const parts = (msg.images && msg.images.length > 0) ? [{ text: msg.text }, base64ToGenerativePart(msg.images[0])] : [{ text: msg.text }]; return { role: msg.role, parts }; }), { role: 'user', parts: imageBase64 ? [{ text: prompt }, base64ToGenerativePart(imageBase64)] : [{ text: prompt }] } ]; const reasoningResponse = await ai.models.generateContent({ model: reasoningModel, contents: contents, config: { systemInstruction: SYSTEM_PROMPT, tools: [{ googleSearch: {} }], } }); // ... (JSON Parsing & Error Handling Logic) ... // 2. Generate images using Gemini 2.5 Flash Image const imageModel = 'gemini-2.5-flash-image'; const imagePromises = thinking.prompts.map(p => ai.models.generateContent({ model: imageModel, contents: { parts: [{ text: p }] }, config: { responseModalities: [Modality.IMAGE], } }) ); const imageResponses = await Promise.all(imagePromises); // ... (Response Formatting) ... return { images, thinking, groundingSources }; };
      
      🔬 Research Context
      
      This project was built to understand how AI "thinks" when it generates images. The visual language concept is inspired by the linguistic challenges in Denis Villeneuve's Arrival—where aliens communicate through complex circular visual symbols rather than linear spoken language.
      
      Potential Applications
      
      Education: Visual explanations for complex concepts
      
      Accessibility: Communication aid for visual learners or non-verbal individuals
      
      Creative Workflows: Ideation and concept visualization for designers
      
      Cross-cultural Communication: Images as a more universal language
      
      Limitations
      
      Early Stage: This is experimental research, not production-ready
      
      Image Generation Costs: Each response generates multiple images via API
      
      Reasoning Transparency: While the "Thinking Layer" exposes the process, interpreting AI reasoning remains challenging
      
      Visual Consistency: Maintaining style across turns is difficult without fine-tuning
      
      📁 Project Structure
      
      Project-Bumblebee/ ├── components/ # React components (ChatBubble, ThinkingLayer, etc.) ├── services/ # API service layer (Gemini integration) ├── App.tsx # Main application component ├── constants.ts # System prompts and configuration ├── types.ts # TypeScript interfaces ├── .env.example # Example environment variables └── README.md # You are here
      
      🤝 Contributing
      
      This is an open research project. Contributions, ideas, and feedback are welcome!
      
      Fork the repository
      
      Create a feature branch (git checkout -b feature/amazing-idea)
      
      Commit your changes (git commit -m 'Add amazing idea')
      
      Push to the branch (git push origin feature/amazing-idea)
      
      Open a Pull Request
      
      📄 License
      
      This project is licensed under the MIT License - see the LICENSE file for details.
      
      👥 Credits & Attribution
      
      Developed by Fladry Creative
      
      Robb Fladry - Lead Developer & Human-in-the-Loop
      
      Contact: robb@fladrycreative.com
      
      AI Co-Contributors
      
      Antigravity (Google DeepMind) - Agentic Coding Assistant
      
      Gemini (Google) - Core Intelligence Engine
      
      This is an AI-assisted development project. We own that.
      
      🔮 Future Directions
      
      Fine-tune visual consistency across conversation turns
      
      - [ ] Add support for video generation (Gemini Video models)
      
      - [ ] Implement user-defined visual "style profiles"
      
      - [ ] Explore chain-of-thought visualization for complex reasoning
      
      - [ ] Build comparison metrics vs. text-only explanations
      
      ---
      
      ## 📚 Citation
      
      If you reference this project in academic work or research, please use:
      
```bibtex
  
  @software{fladry2025bumblebee,
  
  author = {Fladry, Robb},
  
  title = {Project Bumblebee: A Visual-First AI Communication Experiment},
  
  year = {2025},
  
  url = {https://github.com/drftstatic/Project-Bumblebee},
  
  note = {Experimental multimodal AI interface using Google Gemini}
  
  }
  
```
      
      ---
      
      Questions? Issues? Ideas? Open an issue or reach out at robb@fladrycreative.com

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
components		components
services		services
.env.example		.env.example
.gitignore		.gitignore
App.tsx		App.tsx
LICENSE		LICENSE
README.md		README.md
constants.ts		constants.ts
index.html		index.html
index.tsx		index.tsx
metadata.json		metadata.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
types.ts		types.ts
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Bumblebee 🐝

🎯 Project Overview

What Makes This Different?

🧠 How It Works

1. Interpretation Layer (`gemini-2.5-flash`)

2. Generation Layer (`gemini-2.5-flash-image`)

3. Presentation Layer

🛠 Tech Stack

⚙️ Getting Started

Prerequisites

Installation

⚡ Core Logic

🔬 Research Context

Potential Applications

Limitations

📁 Project Structure

🤝 Contributing

📄 License

👥 Credits & Attribution

AI Co-Contributors

🔮 Future Directions

About

Uh oh!

Releases

Packages

Languages

License

drftstatic/Project-Bumblebee

Folders and files

Latest commit

History

Repository files navigation

Project Bumblebee 🐝

🎯 Project Overview

What Makes This Different?

🧠 How It Works

1. Interpretation Layer (gemini-2.5-flash)

2. Generation Layer (gemini-2.5-flash-image)

3. Presentation Layer

🛠 Tech Stack

⚙️ Getting Started

Prerequisites

Installation

⚡ Core Logic

🔬 Research Context

Potential Applications

Limitations

📁 Project Structure

🤝 Contributing

📄 License

👥 Credits & Attribution

AI Co-Contributors

🔮 Future Directions

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Interpretation Layer (`gemini-2.5-flash`)

2. Generation Layer (`gemini-2.5-flash-image`)

Packages