Skip to content

Details about CLIP fine-tuning and zero-shot text-guided editing #30

@JamesLong199

Description

@JamesLong199

Hi,

Could you kindly provide more details on the setting for model fine-tuning with CLIP and the zero-shot text-guided expression editing procedure?

For model fine-tuning with CLIP, my understanding is that:
the same losses in emotion adaptation are used in addition to CLIP loss;
the fine-tuning is performed on MEAD, where each training video is paired with fixed text prompts of the corresponding emotion category (attached in the screenshot).

For the zero-shot text-guided expression editing, I was wondering how is the CLIP text feature incorporated into the existing model structure (e.g. a mapping from CLIP feature to the latent code z or to the emotion prompt?).

Thank you in advance for your time and help!

image

Originally posted by @JamesLong199 in #23 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions