-
Notifications
You must be signed in to change notification settings - Fork 1
[A] Add 24.06 MAR paper #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a full summary of the new paper “Autoregressive Image Generation without Vector Quantization”, covering background, methodology, implementation details, experiments, and references
- Introduces paper metadata, author link, and Chinese translation
- Details vector quantization background and the proposed diffusion-based autoregressive method
- Provides experiments on loss functions, tokenizers, MLP ablations, and system comparisons
Comments suppressed due to low confidence (1)
papers/image-generation/2406-mar/index.md:35
- [nitpick] List indentation is inconsistent here and in subsequent bullet points. Use uniform indent levels for nested lists to improve readability.
+- 以[VQ-VAE, 2017]為例
| - Diffusion Loss:consine形狀的noise schedule,訓練時DDPM有1000 step而推論則僅有100 step | ||
| - Denosing MLP(small MLP):3層1024個channel的block,每一個block包含LayerNorm, linear layer, SiLU 激活函數並使用residual connection連接,實作上是使用AdaLN將transformer的輸出z加入到LayerNorm層當中 | ||
| - Tokenizer:使用LDM提供的公開tokenizer,包括VQ-16和KL-16。其中VQ-16是基於VQ-GAN的量化模型,使用GAN loss和感知loss,KL-16則透過KL散度做regularization且不依賴VQ | ||
| - Transformer:使用 ViT 來接收 tokenizer 處理後的 token sequene,加上位置編碼和類別token [CLS],然後通過32層1024個channel的transformer block |
Copilot
AI
May 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in 'sequene'; it should be 'sequence'.
| - Transformer:使用 ViT 來接收 tokenizer 處理後的 token sequene,加上位置編碼和類別token [CLS],然後通過32層1024個channel的transformer block | |
| - Transformer:使用 ViT 來接收 tokenizer 處理後的 token sequence,加上位置編碼和類別token [CLS],然後通過32層1024個channel的transformer block |
| - Tokenizer:使用LDM提供的公開tokenizer,包括VQ-16和KL-16。其中VQ-16是基於VQ-GAN的量化模型,使用GAN loss和感知loss,KL-16則透過KL散度做regularization且不依賴VQ | ||
| - Transformer:使用 ViT 來接收 tokenizer 處理後的 token sequene,加上位置編碼和類別token [CLS],然後通過32層1024個channel的transformer block | ||
| - Masked autoregressive models:在訓練階段使用 [0.7, 1.0] 的masking ratio,0.7代表隨機遮蔽掉70%的token,另外為了避免抽樣出來的序列太短,他們始終pad 64個[cls] token到其中。在推理階段會逐步將1.0的masking ratio降低到0,並使用cosine schedule來調整步數,預設是64步 | ||
| - Baseline Autoregressive Model: casual attention的GPT模型,輸入有append一個[cls],並且有使用kv cache以及溫度參數 |
Copilot
AI
May 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in 'casual attention'; it should be 'causal attention'.
| - Baseline Autoregressive Model: casual attention的GPT模型,輸入有append一個[cls],並且有使用kv cache以及溫度參數 | |
| - Baseline Autoregressive Model: causal attention的GPT模型,輸入有append一個[cls],並且有使用kv cache以及溫度參數 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new markdown document detailing a paper on autoregressive image generation without using vector quantization.
- Added a new markdown file with paper details, experimental setups, and comparison figures.
- Provides background, methodology, and implementation details for the proposed approach.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
No description provided.