M³Face Model Card

We introduce M³Face, a unified multi-modal multilingual framework for controllable face generation and editing. This framework enables users to utilize only text input to generate controlling modalities automatically, for instance, semantic segmentation or facial landmarks, and subsequently generate face images.

Getting Started

Installation

Clone our repository:

git clone https://huggingface.co/m3face/m3face
cd m3face

Install dependencies:
```
 pip install -r requirements.txt
```

Resources

For face generation, VRAM of 10 GB+ for 512x512 images is required.
For face editing, VRAM of 14 GB+ for 512x512 images is required.

Pre-trained Models

You can find the checkpoints for the ControlNet model at m3face/FaceControlNet and the mask/landmark generator model at m3face/FaceConditioning.

M³CelebA Dataset

The M³CelebA Dataset is available at m3face/M3CelebA. You can view or download it from there.

Face Generation

You can do face generation with text, segmentation mask, facial landmarks, or a combination of them by running the following command:

python generate.py --seed 1111 \
                   --condition "landmark" \
                   --prompt "This attractive woman has narrow eyes, rosy cheeks, and wears heavy makeup." \
                   --save_condition

You can define the type of conditioning modality with --condition. By default, a conditioning modality will be generated by our framework and will be saved if the --save_condition argument is given. Otherwise, you can use your condition image with the condition_path argument.

Face Editing

For face editing, you can run the following command:

python edit.py --enable_xformers_memory_efficient_attention \
               --seed 1111 \
               --condition "landmark" \
               --prompt "She is a smiling." \
               --image_path "/path/to/image" \
               --condition_path "/path/to/condition" \
               --edit_condition \
               --embedding_optimize_it 500 \
               --model_finetune_it 1000 \
               --alpha 0.7 1 1.1 \
               --num_inference_steps 30 \
               --unet_layer "2and3"

You need to specify the input image and original conditioning modality. You can edit the face with an edit conditioning modality (specifying --edit_condition_path) or by editing the original conditioning modality with our framework (specifying --edit_condition). The --unet_layer argument specifies which UNet layers in the SD to finetune.

Note: If you don't have the original conditioning modality you can simply generate it using the plot_mask.py and plot_landmark.py scripts:

pip install git+https://github.com/mapillary/inplace_abn
python utils/plot_mask.py --image_path "/path/to/image"
python utils/plot_landmark.py --image_path "/path/to/image"

Training

The code and instruction for training our models will be posted soon!

M3Face Model Card