ComfyUI-ExternalAPI-Helpers

A collection of powerful custom nodes for ComfyUI that connect your local workflows to closed-source AI models via their APIs. Use Google's Gemini, Imagen, Veo, OpenAI's GPT-Image-1, and Black Forest Labs' FLUX models directly within ComfyUI.

Key Features

FLUX Kontext Pro & Max: Image-to-image transformations using the FLUX models via the Replicate API.
Flux.2 (Replicate): Generate images using the latest FLUX.2 models (Pro, Max, Dev) via Replicate.
Gemini Chat: Google's powerful multimodal AI. Ask questions about an image, generate detailed descriptions or create prompts for other models. Supports thinking budget controls for applicable models. Now supports audio input.
Gemini Segmentation: Generate segmentation masks for objects in an image using Gemini.
Gemini Speaker Diarization: Separate audio into different speaker tracks using Gemini.
GPT Image Edit: OpenAI's gpt-image-1 for prompt-based image editing and inpainting. Simply mask an area and describe the change you want to see.
OpenAI LLM: Access OpenAI's powerful language models (GPT-4, GPT-5, o1, etc.) for text generation and reasoning.
OpenAI Text-to-Speech: Generate high-quality speech using OpenAI's TTS models.
Google Imagen Generator & Edit: Create and edit images with Google's Imagen models, with support for Vertex AI.
Nano Banana: A creative image generation node using a specialized Gemini model.
Veo Video Generator: Generate high-quality video clips from text prompts using Google's Veo model via Vertex AI or the Gemini API.
ElevenLabs TTS: Generate high-quality speech from text using ElevenLabs' diverse range of voices and models.
Gemini TTS: Create speech from text using Google's Gemini models.

🚀 Installation

Navigate to your ComfyUI installation directory.
Go into the custom_nodes folder:
```
cd ComfyUI/custom_nodes/
```

Clone this repository:

git clone https://github.com/Aryan185/ComfyUI-ExternalAPI-Helpers.git

Install the required Python packages. Navigate into the newly cloned directory and use pip to install the dependencies:
```
cd ComfyUI-ExternalAPI-Helpers
pip install -r requirements.txt
```
Restart ComfyUI. After restarting, you should find the new nodes in the "Add Node" menu.

🔑 Prerequisites: API Keys

All nodes in this collection require API keys to function.

FLUX Nodes (Replicate): You will need a Replicate API Token.
Gemini, Imagen, Nano Banana, Gemini TTS, Gemini Diarization, and Veo (Gemini API) Nodes: You will need a Google AI Studio API Key.
OpenAI Nodes (GPT Image Edit, OpenAI LLM, OpenAI TTS): You will need an OpenAI API Key.
ElevenLabs TTS Node: You will need an ElevenLabs API Key.
Vertex AI Nodes (Imagen Edit, Veo Vertex AI): You will need a Google Cloud Project ID, a service account with appropriate permissions, and the location for the resources.

You can paste your key directly into the api_key field on the corresponding node. For Vertex AI nodes, you will need to provide the project ID, location, and path to your service account JSON file.

📚 Node Guide

Flux Kontext Pro / Max

These nodes allow you to transform an input image based on a text prompt. They are ideal for applying artistic styles or making significant conceptual changes to an existing image.

Category: image/edit
Inputs:
- image: The source image to transform.
- prompt: A text description of the desired output (e.g., "A vibrant Van Gogh painting", "Make this a 90s cartoon").
- replicate_api_token: Your API token from Replicate.
- aspect_ratio: The desired output aspect ratio. match_input_image is highly recommended to preserve the original composition.
- output_format: jpg or png.
- safety_tolerance: Adjust the content safety filter level.
Output:
- image: The generated image.

Flux.2 (Replicate)

Generate images using Black Forest Labs' FLUX.2 models via the Replicate API.

Category: image/generation
Inputs:
- prompt: The text prompt for image generation.
- api_key: Your Replicate API token.
- model: Choose between flux-2-max, flux-2-pro, or flux-2-dev.
- aspect_ratio: The desired aspect ratio for the generated image.
- output_format: webp, jpg, or png.
- output_quality: Quality of the output image (0-100).
- image_1 to image_5 (Optional): Input images for image-to-image or control tasks.
Output:
- image: The generated image.

Gemini Chat

A versatile node for text generation and image/audio analysis. Use it to understand an image's content, analyze audio, or to generate creative text for other nodes.

Category: text/generation
Inputs:
- prompt: The text prompt or question you want to ask the model.
- model: The Gemini model to use (e.g., gemini-2.5-pro, gemini-2.5-flash).
- temperature: Controls the creativity of the output.
- thinking: Enables the model's thinking/reasoning process.
- seed: Seed for reproducibility.
- api_key: Your API key from Google AI Studio.
- system_instruction (Optional): Provide context or rules for how the model should behave.
- thinking_budget (Optional): Token budget for thinking.
- image (Optional): An input image for the model to analyze.
- audio (Optional): An input audio for the model to analyze.
Output:
- response: The text generated by the Gemini model.

Gemini Segmentation

This node uses a Gemini model to generate segmentation masks for specified objects within an image.

Category: image/generation
Inputs:
- image: The source image for segmentation.
- segment_prompt: A text description of the objects to segment (e.g., "the car", "all people").
- model: The Gemini model to use.
- temperature: Controls randomness.
- thinking: Enable thinking process.
- seed: Seed for reproducibility.
- api_key: Your API key from Google AI Studio.
- thinking_budget (Optional): Token budget for thinking.
Output:
- mask: A black and white mask of the segmented objects.

Gemini Speaker Diarization

Separate audio into different speaker tracks using Gemini.

Category: audio/diarise
Inputs:
- audio: The input audio to process.
- num_speakers: The expected number of speakers.
- model: The Gemini model to use.
- api_key: Your API key from Google AI Studio.
- seed: Seed for reproducibility.
- temperature: Controls randomness.
- thinking (Optional): Enable thinking process.
- thinking_budget (Optional): Token budget for thinking.
Output:
- speaker_1 to speaker_4: Audio tracks for up to 4 separated speakers.

GPT Image Edit

This node uses OpenAI's API to perform powerful, prompt-based inpainting and editing.

Category: image/edit
Inputs:
- image: The source image to edit.
- mask (Optional): A black and white mask. The model will edit the white area of the mask.
- prompt: A description of the edit to perform.
- api_key: Your API key from OpenAI.
- ...other_params: Various quality and formatting options for the OpenAI API.
Output:
- image: The edited image.

OpenAI LLM

Access OpenAI's powerful language models for text generation and reasoning.

Category: text/generation
Inputs:
- prompt: The text prompt.
- model: The OpenAI model to use (e.g., gpt-4.1, o1, gpt-5).
- temperature: Controls randomness.
- reasoning_effort: Effort level for reasoning models.
- api_key: Your OpenAI API key.
- max_output_tokens: Maximum number of tokens to generate.
- system_instruction (Optional): System level instructions.
- image (Optional): Input image for multimodal models.
Output:
- response: The generated text response.

OpenAI Text-to-Speech

Generate high-quality speech from text using OpenAI's TTS models.

Category: audio/generation
Inputs:
- text: The text to convert to speech.
- model: The TTS model to use (e.g., gpt-4o-mini-tts, tts-1).
- voice: The voice to use (e.g., alloy, echo).
- response_format: Output audio format.
- speed: Speaking speed.
- api_key: Your OpenAI API key.
- instructions (Optional): Instructions for the model (supported by some models).
Output:
- audio: The generated audio.

Google Imagen Generator

Generate images from a text prompt using Google's Imagen models.

Category: image/generation
Inputs:
- prompt: A text description of the image to generate.
- api_key: Your API key from Google AI Studio.
- model: The Imagen model to use.
- ...other_params: Options for number of images, aspect ratio, and image size.
Output:
- images: The generated image(s).

Google Imagen Edit (Vertex AI only)

Perform advanced image editing, inpainting, outpainting, and background swapping using Imagen on Google's Vertex AI platform.

Category: image/edit
Inputs:
- image: The source image to edit.
- mask: A mask defining the area to edit.
- prompt: A description of the desired edit.
- project_id: Your Google Cloud Project ID.
- location: The Google Cloud location for the model.
- service_account: Path to your Google Cloud service account JSON file.
- edit_mode: The type of edit to perform (e.g., inpainting, outpainting).
- ...other_params: Controls for negative prompt, seed, and steps.
Output:
- edited_images: The edited image(s).

Nano Banana

A creative image generation node that can take a combination of text and up to five images as input.

Category: image/generation
Inputs:
- api_key: Your API key from Google AI Studio.
- prompt (Optional): A text prompt.
- image_1 to image_5 (Optional): Up to five source images.
- ...other_params: Controls for aspect ratio, temperature, top_p, and seed.
Output:
- image: The generated image.

Veo Video Generator (Vertex AI)

Generate short, high-quality video clips from a text description using Google's Veo model on Vertex AI.

Category: video/generation
Inputs:
- prompt: A text description of the video to generate.
- project_id: Your Google Cloud Project ID.
- location: The Google Cloud location for the model.
- service_account: Path to your Google Cloud service account JSON file.
- ...other_params: Controls for negative prompt, aspect ratio, audio generation, and seed.
Output:
- frames: The generated video frames, output as an image batch.

Veo Video Generator (Gemini API)

Generate videos using Google's Veo 2.0 model via the Gemini API. Supports text-to-video and image-to-video.

Category: video/generation
Inputs:
- prompt: A text description of the video.
- image (Optional): An input image for image-to-video generation.
- api_key: Your API key from Google AI Studio.
- model: The Veo model to use (e.g., veo-2.0-generate-001).
- aspect_ratio: Desired aspect ratio (16:9 or 9:16).
- duration_seconds: Duration of the video (e.g., 5-8 seconds).
- ...other_params: Controls for negative prompt and seed.
Output:
- frames: The generated video frames.

ElevenLabs TTS

Generate speech from text using the ElevenLabs API.

Category: audio/generation
Inputs:
- text: The text to convert to speech.
- api_key: Your API key from ElevenLabs.
- voice_id: The ID of the voice to use for generation.
- model_id: The ElevenLabs model to use.
- output_format: The desired output audio format.
- stability: Controls the stability and variability of the generated speech.
- similarity_boost: Enhances the similarity of the generated speech to the chosen voice.
- speed: Adjusts the speaking rate.
- style: Controls the expressiveness of the speech.
- use_speaker_boost: A boolean to enable or disable speaker boost.
- seed: A seed for ensuring reproducible results.
Output:
- audio: The generated audio waveform and sample rate.

Gemini TTS

Generate speech from text using Google's Gemini TTS models.

Category: audio/generation
Inputs:
- text: The text to be converted into speech.
- api_key: Your API key from Google AI Studio.
- model: The specific Gemini model to use for generation.
- voice_id: The prebuilt voice to use for the output.
- temperature: Controls the randomness and creativity of the output.
- seed: A seed for ensuring reproducible results.
- system_prompt (Optional): A system-level instruction to guide the model's behavior.
Output:
- audio: The generated audio waveform and sample rate.

Acknowledgements

The ComfyUI team for creating such a flexible and powerful platform.
Google, OpenAI, and Black Forest Labs for developing these incredible models.
Replicate for providing easy API access to a wide range of models.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
cleanup.py		cleanup.py
elevenlabs_tts.py		elevenlabs_tts.py
flux2_replicate.py		flux2_replicate.py
flux_kontext_replicate.py		flux_kontext_replicate.py
gemini_diarisation.py		gemini_diarisation.py
gemini_node.py		gemini_node.py
gemini_segment.py		gemini_segment.py
gemini_tts.py		gemini_tts.py
gpt_image_edit.py		gpt_image_edit.py
imagen.py		imagen.py
imagen_edit.py		imagen_edit.py
nano_banana.py		nano_banana.py
openai_node.py		openai_node.py
openai_tts.py		openai_tts.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sora.py		sora.py
veo.py		veo.py
veo_api.py		veo_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ComfyUI-ExternalAPI-Helpers

Key Features

🚀 Installation

🔑 Prerequisites: API Keys

📚 Node Guide

Flux Kontext Pro / Max

Flux.2 (Replicate)

Gemini Chat

Gemini Segmentation

Gemini Speaker Diarization

GPT Image Edit

OpenAI LLM

OpenAI Text-to-Speech

Google Imagen Generator

Google Imagen Edit (Vertex AI only)

Nano Banana

Veo Video Generator (Vertex AI)

Veo Video Generator (Gemini API)

ElevenLabs TTS

Gemini TTS

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Aryan185/ComfyUI-ExternalAPI-Helpers

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-ExternalAPI-Helpers

Key Features

🚀 Installation

🔑 Prerequisites: API Keys

📚 Node Guide

Flux Kontext Pro / Max

Flux.2 (Replicate)

Gemini Chat

Gemini Segmentation

Gemini Speaker Diarization

GPT Image Edit

OpenAI LLM

OpenAI Text-to-Speech

Google Imagen Generator

Google Imagen Edit (Vertex AI only)

Nano Banana

Veo Video Generator (Vertex AI)

Veo Video Generator (Gemini API)

ElevenLabs TTS

Gemini TTS

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages