diff --git a/README.md b/README.md index b89832e..0beba0f 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ ## 💻 Links: [DeepFloyd.AI](https://deepfloyd.ai) | [Discord](https://discord.gg/umz62Mgr) | [Twitter](https://twitter.com/deepfloydai) -We introduce DeepFloyd IF, a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding. DeepFloyd IF is a modular composed of a frozen text encoder and three cascaded pixel diffusion modules: a base model that generates 64x64 px image based on text prompt and two super-resolution models, each designed to generate images of increasing resolution: 256x256 px and 1024x1024 px. All stages of the model utilize a frozen text encoder based on the T5 transformer to extract text embeddings, which are then fed into a UNet architecture enhanced with cross-attention and attention pooling. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset. Our work underscores the potential of larger UNet architectures in the first stage of cascaded diffusion models and depicts a promising future for text-to-image synthesis. +We introduce DeepFloyd IF, a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding. DeepFloyd IF is a modular system composed of a frozen text encoder and three cascaded pixel diffusion modules: a base model that generates 64x64 px image based on text prompt and two super-resolution models, each designed to generate images of increasing resolution: 256x256 px and 1024x1024 px. All stages of the model utilize a frozen text encoder based on the T5 transformer to extract text embeddings, which are then fed into a UNet architecture enhanced with cross-attention and attention pooling. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset. Our work underscores the potential of larger UNet architectures in the first stage of cascaded diffusion models and depicts a promising future for text-to-image synthesis.
@@ -19,10 +19,14 @@ We introduce DeepFloyd IF, a novel state-of-the-art open-source text-to-image mo
*Inspired by* [*Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding*](https://arxiv.org/pdf/2205.11487.pdf)
## Minimum requirements to use all IF models:
-- 16GB vRAM for IF-I-XL (4.3B text to 64x64 base module) & IF-II-L (1.2B to 256x256 upscaler module)
-- 24GB vRAM for IF-I-XL (4.3B text to 64x64 base module) & IF-II-L (1.2B to 256x256 upscaler module) & Stable x4 (to 1024x1024 upscaler)
- `xformers` and set env variable `FORCE_MEM_EFFICIENT_ATTN=1`
-
+#### The following require 16GB of VRAM
+- IF-I-XL (4.3B text to 64x64 base module)
+- IF-II-L (1.2B to 256x256 upscaler module)
+#### The following require 16GB of VRAM
+- 24GB vRAM for IF-I-XL (4.3B text to 64x64 base module)
+- IF-II-L (1.2B to 256x256 upscaler module)
+- Stable x4 (to 1024x1024 upscaler)
## Quick Start
[](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb)
@@ -46,12 +50,12 @@ The Dream, Style Transfer, Super Resolution or Inpainting modes are avaliable in
IF is also integrated with the 🤗 Hugging Face [Diffusers library](https://github.com/huggingface/diffusers/).
-Diffusers runs each stage individually allowing the user to customize the image generation process as well as allowing to inspect intermediate results easily.
+Diffusers runs each stage individually allowing the user to customize the image generation process and inspect intermediate results easily.
### Example
Before you can use IF, you need to accept its usage conditions. To do so:
-1. Make sure to have a [Hugging Face account](https://huggingface.co/join) and be loggin in
+1. Make sure to have a [Hugging Face account](https://huggingface.co/join) and be logged in
2. Accept the license on the model card of [DeepFloyd/IF-I-XL-v1.0](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0)
3. Make sure to login locally. Install `huggingface_hub`
```sh
@@ -68,15 +72,15 @@ login()
and enter your [Hugging Face Hub access token](https://huggingface.co/docs/hub/security-tokens#what-are-user-access-tokens).
-Next we install `diffusers` and dependencies:
+Next, we install `diffusers` and other dependencies:
```sh
pip install diffusers accelerate transformers safetensors
```
-And we can now run the model locally.
+We can now run the model locally.
-By default `diffusers` makes use of [model cpu offloading](https://huggingface.co/docs/diffusers/optimization/fp16#model-offloading-for-fast-inference-and-memory-savings) to run the whole IF pipeline with as little as 14 GB of VRAM.
+By default, `diffusers` makes use of [model cpu offloading](https://huggingface.co/docs/diffusers/optimization/fp16#model-offloading-for-fast-inference-and-memory-savings) to run the whole IF pipeline with as little as 14 GB of VRAM.
If you are using `torch>=2.0.0`, make sure to **delete all** `enable_xformers_memory_efficient_attention()`
functions.
@@ -131,7 +135,7 @@ image[0].save("./if_stage_III.png")
- 🚀 [Optimizing for inference time](https://huggingface.co/docs/diffusers/api/pipelines/if#optimizing-for-speed)
- ⚙️ [Optimizing for low memory during inference](https://huggingface.co/docs/diffusers/api/pipelines/if#optimizing-for-memory)
-For more in-detail information about how to use IF, please have a look at [the IF blog post](https://huggingface.co/blog/if) and [the documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/if) 📖.
+For more in-detail information about how to use IF, please have a look at the [IF blog post](https://huggingface.co/blog/if) and [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/if) 📖.
## Run the code locally
@@ -184,7 +188,8 @@ if_III.show(result['III'], size=14)

-In Style Transfer mode, the output of your prompt comes out at the style of the `support_pil_img`
+In Style Transfer mode, the output of your prompt comes out as the style of the `support_pil_img`
+
```python
from deepfloyd_if.pipelines import style_transfer
@@ -316,14 +321,13 @@ The link to download the weights as well as the model cards will be available so
The code in this repository is released under the bespoke license (see added [point two](https://github.com/deep-floyd/IF/blob/main/LICENSE#L13)).
-The weights will be available soon via [the DeepFloyd organization at Hugging Face](https://huggingface.co/DeepFloyd) and have their own LICENSE.
+The weights and licenses will be available soon via the [DeepFloyd organization] on Hugging Face(https://huggingface.co/DeepFloyd).
**Disclaimer:** *The initial release of the IF model is under a restricted research-purposes-only license temporarily to gather feedback, and after that we intend to release a fully open-source model in line with other Stability AI models.*
## Limitations and Biases
-The models available in this codebase have known limitations and biases. Please refer to [the model card](https://huggingface.co/DeepFloyd/IF-I-L-v1.0) for more information.
-
+The models available in this codebase have known limitations and biases. Please refer to the [model card](https://huggingface.co/DeepFloyd/IF-I-L-v1.0) for more information.
## 🎓 DeepFloyd IF creators: