[A] Add 24.06 MAR paper #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

zephyr-sh merged 15 commits into main from feat/add_mar_paper

May 17, 2025

Collaborator

nbswords commented May 17, 2025

No description provided.


          [A] Add 24.06 MAR paper

7d2233c

nbswords self-assigned this


          [R] Remove a deprecated reference

a7aec29

nbswords requested review from Copilot and zephyr-sh

May 17, 2025 06:32

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull Request Overview

Adds a full summary of the new paper “Autoregressive Image Generation without Vector Quantization”, covering background, methodology, implementation details, experiments, and references

Introduces paper metadata, author link, and Chinese translation
Details vector quantization background and the proposed diffusion-based autoregressive method
Provides experiments on loss functions, tokenizers, MLP ablations, and system comparisons

Comments suppressed due to low confidence (1)

papers/image-generation/2406-mar/index.md:35

[nitpick] List indentation is inconsistent here and in subsequent bullet points. Use uniform indent levels for nested lists to improve readability.

+- 以[VQ-VAE, 2017]為例

papers/image-generation/2406-mar/index.md Outdated Show resolved Hide resolved

papers/image-generation/2406-mar/index.md Outdated

+              - Diffusion Loss：consine形狀的noise schedule，訓練時DDPM有1000 step而推論則僅有100 step
+              - Denosing MLP(small MLP)：3層1024個channel的block，每一個block包含LayerNorm, linear layer， SiLU 激活函數並使用residual connection連接，實作上是使用AdaLN將transformer的輸出z加入到LayerNorm層當中
+              - Tokenizer：使用LDM提供的公開tokenizer，包括VQ-16和KL-16。其中VQ-16是基於VQ-GAN的量化模型，使用GAN loss和感知loss，KL-16則透過KL散度做regularization且不依賴VQ
+              - Transformer：使用 ViT 來接收 tokenizer 處理後的 token sequene，加上位置編碼和類別token [CLS]，然後通過32層1024個channel的transformer block

Copilot AI May 17, 2025

Typo in 'sequene'; it should be 'sequence'.

Suggested change

      
            - Transformer：使用 ViT 來接收 tokenizer 處理後的 token sequene，加上位置編碼和類別token [CLS]，然後通過32層1024個channel的transformer block
          
            - Transformer：使用 ViT 來接收 tokenizer 處理後的 token sequence，加上位置編碼和類別token [CLS]，然後通過32層1024個channel的transformer block

Copilot uses AI. Check for mistakes.

papers/image-generation/2406-mar/index.md Outdated

+              - Tokenizer：使用LDM提供的公開tokenizer，包括VQ-16和KL-16。其中VQ-16是基於VQ-GAN的量化模型，使用GAN loss和感知loss，KL-16則透過KL散度做regularization且不依賴VQ
+              - Transformer：使用 ViT 來接收 tokenizer 處理後的 token sequene，加上位置編碼和類別token [CLS]，然後通過32層1024個channel的transformer block
+              - Masked autoregressive models：在訓練階段使用 [0.7, 1.0] 的masking ratio，0.7代表隨機遮蔽掉70%的token，另外為了避免抽樣出來的序列太短，他們始終pad 64個[cls] token到其中。在推理階段會逐步將1.0的masking ratio降低到0，並使用cosine schedule來調整步數，預設是64步
+              - Baseline Autoregressive Model: casual attention的GPT模型，輸入有append一個[cls]，並且有使用kv cache以及溫度參數

Copilot AI May 17, 2025

Typo in 'casual attention'; it should be 'causal attention'.

Suggested change

      
            - Baseline Autoregressive Model: casual attention的GPT模型，輸入有append一個[cls]，並且有使用kv cache以及溫度參數
          
            - Baseline Autoregressive Model: causal attention的GPT模型，輸入有append一個[cls]，並且有使用kv cache以及溫度參數

Copilot uses AI. Check for mistakes.

papers/image-generation/2406-mar/index.md Outdated Show resolved Hide resolved

papers/image-generation/2406-mar/index.md Outdated Show resolved Hide resolved


          [F] Fix img2 filename

16a54bd

nbswords requested a review from Copilot

May 17, 2025 06:36

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull Request Overview

This PR introduces a new markdown document detailing a paper on autoregressive image generation without using vector quantization.

Added a new markdown file with paper details, experimental setups, and comparison figures.
Provides background, methodology, and implementation details for the proposed approach.

papers/image-generation/2406-mar/index.md Outdated Show resolved Hide resolved

papers/image-generation/2406-mar/index.md Outdated Show resolved Hide resolved

papers/image-generation/2406-mar/index.md Outdated Show resolved Hide resolved

papers/image-generation/2406-mar/index.md Outdated Show resolved Hide resolved

papers/image-generation/2406-mar/index.md Outdated Show resolved Hide resolved

nbswords and others added 12 commits

May 17, 2025 14:41


          Update papers/image-generation/2406-mar/index.md

e8f6962

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          Update papers/image-generation/2406-mar/index.md

b4d75e7

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          Update papers/image-generation/2406-mar/index.md

7bd3872

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          Update papers/image-generation/2406-mar/index.md

fe0216a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          Update papers/image-generation/2406-mar/index.md

31a1778

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>


          [C] Change some image size

8d80afb


          [F] Fix phrase

a1a37d8


          [F] Fix img alt and add steps experiments

9bba119


          [F] Fix steps experiments img size

3f98728


          [R] Remove SOTA phrase

e0727a6


          [A] Add article MAR

e4a4943


          [C] Update info

c6bfac8

zephyr-sh approved these changes

View reviewed changes

zephyr-sh merged commit f59f2e0 into main

1 check passed

zephyr-sh deleted the feat/add_mar_paper branch

May 17, 2025 11:21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet