Skip to content

Speak less, show more. A visual-first AI experiment that communicates primarily through generated imagery rather than text, building a unique visual language with every conversation.

License

Notifications You must be signed in to change notification settings

drftstatic/Project-Bumblebee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Project Bumblebee ๐Ÿ

Status License: MIT TypeScript

Speak less, show more. A visual-first AI experiment that communicates primarily through generated imagery rather than text, building a unique visual language with every conversation.

๐Ÿ”— Live Demo: bumblebee.fladrycreative.com

Bumblebee Visual Language Builder


๐ŸŽฏ Project Overview

Project Bumblebee is an experimental AI interface that inverts the traditional text-heavy chatbot paradigm. Instead of generating walls of text, Bumblebee "thinks" visuallyโ€”translating user queries into a sequence of meaningful images using Google's Gemini multimodal AI models.

The project explores a fundamental question: What if AI communicated primarily through images, developing its own visual language over time? This concept draws inspiration from linguistic relativity theory and films like Arrival, where understanding emerges through visual and conceptual patterns rather than verbal language.

What Makes This Different?

Unlike DALL-E chat or other multimodal systems that use images as supplements to text responses, Bumblebee:

  • Only thinks in images: The model cannot respond with text aloneโ€”every output is visual
    • Develops visual language: Over conversation turns, establishes consistent visual metaphors and styles
      • Shows its reasoning: A transparent "Thinking Layer" reveals how the AI interprets queries and plans visuals

        • Two-stage architecture: Separates reasoning (interpretation) from generation (visualization)

        • This isn't about generating prettier responsesโ€”it's about exploring whether visual-first communication can be more intuitive,

        • Bumblebee vs DALL-E Comparisonuniversal, or expressive than text for certain tasks.


        ๐Ÿง  How It Works

        Bumblebee flips the traditional chatbot script through a three-layer architecture:

        Bumblebee Architecture

        1. Interpretation Layer (gemini-2.5-flash)

        The user's query is first sent to a reasoning model. Instead of answering directly, this model acts as a "Director." It analyzes the intent and outputs a structured Thinking JSON object containing:

        • interpretation: What the user actually wants
          • visualApproach: How to best visualize the answer (e.g., "a sequential diagram," "a side-by-side comparison," "a metaphorical illustration")
            • prompts: A set of optimized image generation prompts tailored to the visual approach

              • styleConsiderations: Notes on maintaining visual consistency with previous turns

              • 2. Generation Layer (gemini-2.5-flash-image)

              • The system takes the generated prompts and fires parallel requests to the image generation model.

              • Model Note: This uses Google Gemini's experimental image generation model, internally nicknamed "Nano Banana" (now evolved to "Nano Banana Pro" in the gemini-3.0-flash-image-preview). This detail represents the bleeding edge of Google's multimodal capabilities.

              • 3. Presentation Layer

              • The result is delivered as a chat bubble containing a gallery of images. A hidden "Thinking Layer" allows the user to peek behind the curtain and see the AI's reasoning, interpretation, and the exact prompts it used to generate the visuals.


              ๐Ÿ›  Tech Stack

              Component Technology
              Framework React (v19)
              Build Tool Vite
              Language TypeScript
              Styling Tailwind CSS
              AI Models Google Gemini 2.5 Flash (Reasoning)
              Google Gemini 2.5 Flash Image (Generation)
              SDK Google GenAI SDK

              โš™๏ธ Getting Started

              Prerequisites

              • Node.js (v18 or higher)
                • Google Gemini API key (Get one here)

                • Installation

                  1. Clone the repository

                  2.    git clone https://github.com/drftstatic/Project-Bumblebee.git
                       cd Project-Bumblebee
                    1. Install dependencies

                    2.    npm install
                      1. Set up environment variables

                      2.   Copy the example environment file:
                        
                      3.  ```bash
                        
                      4. cp .env.example .env
                        
                      5. 
                           Then edit `.env` and add your API key:
                        

                        GEMINI_API_KEY=your_actual_gemini_api_key_here ```

                      6. Run the development server

                      7.    npm run dev
                        1. Open your browser

                        2.   Navigate to `http://localhost:5173` (or the port shown in your terminal)
                          

                        3. โšก Core Logic

                        4. The heart of Bumblebee is the getVisualResponse orchestrator. It manages the two-step "Think then Draw" process, handling context, history, and error states seamlessly.

                        5. export const getVisualResponse = async (
                            prompt: string,
                            history: ChatMessage[],
                            imageBase64?: string
                          ): Promise<{
                            images: string[];
                            thinking: ThinkingData;
                            groundingSources?: GroundingSource[];
                          }> => {
                            // 1. Get reasoning and prompts from Gemini 2.5 Flash
                            const reasoningModel = 'gemini-2.5-flash';
                            const contents = [
                              ...history.flatMap(msg => {
                                const parts = (msg.images && msg.images.length > 0)
                                  ? [{ text: msg.text }, base64ToGenerativePart(msg.images[0])]
                                  : [{ text: msg.text }];
                                return { role: msg.role, parts };
                              }),
                              {
                                role: 'user',
                                parts: imageBase64
                                  ? [{ text: prompt }, base64ToGenerativePart(imageBase64)]
                                  : [{ text: prompt }]
                              }
                            ];
                          
                            const reasoningResponse = await ai.models.generateContent({
                              model: reasoningModel,
                              contents: contents,
                              config: {
                                systemInstruction: SYSTEM_PROMPT,
                                tools: [{ googleSearch: {} }],
                              }
                            });
                          
                            // ... (JSON Parsing & Error Handling Logic) ...
                          
                            // 2. Generate images using Gemini 2.5 Flash Image
                            const imageModel = 'gemini-2.5-flash-image';
                            const imagePromises = thinking.prompts.map(p =>
                              ai.models.generateContent({
                                model: imageModel,
                                contents: { parts: [{ text: p }] },
                                config: {
                                  responseModalities: [Modality.IMAGE],
                                }
                              })
                            );
                          
                            const imageResponses = await Promise.all(imagePromises);
                          
                            // ... (Response Formatting) ...
                          
                            return { images, thinking, groundingSources };
                          };

                          ๐Ÿ”ฌ Research Context

                          This project was built to understand how AI "thinks" when it generates images. The visual language concept is inspired by the linguistic challenges in Denis Villeneuve's Arrivalโ€”where aliens communicate through complex circular visual symbols rather than linear spoken language.

                          Potential Applications

                          • Education: Visual explanations for complex concepts
                            • Accessibility: Communication aid for visual learners or non-verbal individuals
                            • Creative Workflows: Ideation and concept visualization for designers
                            • Cross-cultural Communication: Images as a more universal language

                            • Limitations

                            • Early Stage: This is experimental research, not production-ready
                            • Image Generation Costs: Each response generates multiple images via API
                            • Reasoning Transparency: While the "Thinking Layer" exposes the process, interpreting AI reasoning remains challenging

                            • Visual Consistency: Maintaining style across turns is difficult without fine-tuning

                            • ๐Ÿ“ Project Structure

                              Project-Bumblebee/
                              โ”œโ”€โ”€ components/          # React components (ChatBubble, ThinkingLayer, etc.)
                              โ”œโ”€โ”€ services/            # API service layer (Gemini integration)
                              โ”œโ”€โ”€ App.tsx             # Main application component
                              โ”œโ”€โ”€ constants.ts        # System prompts and configuration
                              โ”œโ”€โ”€ types.ts            # TypeScript interfaces
                              โ”œโ”€โ”€ .env.example        # Example environment variables
                              โ””โ”€โ”€ README.md           # You are here
                              

                              ๐Ÿค Contributing

                              This is an open research project. Contributions, ideas, and feedback are welcome!

                              1. Fork the repository
                                1. Create a feature branch (git checkout -b feature/amazing-idea)
                                  1. Commit your changes (git commit -m 'Add amazing idea')
                                    1. Push to the branch (git push origin feature/amazing-idea)
                                      1. Open a Pull Request


                                      2. ๐Ÿ“„ License

                                      3. This project is licensed under the MIT License - see the LICENSE file for details.


                                      4. ๐Ÿ‘ฅ Credits & Attribution

                                      5. Developed by Fladry Creative

                                      6. Robb Fladry - Lead Developer & Human-in-the-Loop
                                      7. Contact: robb@fladrycreative.com

                                      8. AI Co-Contributors

                                      9. Antigravity (Google DeepMind) - Agentic Coding Assistant

                                      10. Gemini (Google) - Core Intelligence Engine

                                      11. This is an AI-assisted development project. We own that.


                                      12. ๐Ÿ”ฎ Future Directions

                                      13. Fine-tune visual consistency across conversation turns

                                      14. - [ ] Add support for video generation (Gemini Video models)

                                      15. - [ ] Implement user-defined visual "style profiles"

                                      16. - [ ] Explore chain-of-thought visualization for complex reasoning

                                      17. - [ ] Build comparison metrics vs. text-only explanations

                                      18. ---

                                      19. ## ๐Ÿ“š Citation

                                      20. If you reference this project in academic work or research, please use:

                                      21. ```bibtex

                                      22. @software{fladry2025bumblebee,

                                      23. author = {Fladry, Robb},

                                      24. title = {Project Bumblebee: A Visual-First AI Communication Experiment},

                                      25. year = {2025},

                                      26. url = {https://github.com/drftstatic/Project-Bumblebee},

                                      27. note = {Experimental multimodal AI interface using Google Gemini}

                                      28. }

                                      29. ```

                                      30. ---

                                      31. Questions? Issues? Ideas? Open an issue or reach out at robb@fladrycreative.com

About

Speak less, show more. A visual-first AI experiment that communicates primarily through generated imagery rather than text, building a unique visual language with every conversation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published