--- tags: - text-to-image - controlnet --- # M3Face Model Card We introduce M3Face, a unified multi-modal multilingual framework for controllable face generation and editing. This framework enables users to utilize only text input to generate controlling modalities automatically, for instance, semantic segmentation or facial landmarks, and subsequently generate face images. ## Getting Started ### Installation 1. Clone our repository: ```bash git clone https://huggingface.co/m3face/m3face cd m3face ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` ### Resources - For face generation, VRAM of 10 GB+ for 512x512 images is required. - For face editing, VRAM of 14 GB+ for 512x512 images is required. ### Pre-trained Models You can find the checkpoints for the ControlNet model at [`m3face/FaceControlNet`](https://huggingface.co/m3face/FaceControlNet) and the mask/landmark generator model at [`m3face/FaceConditioning`](https://huggingface.co/m3face/FaceConditioning). ### M3CelebA Dataset The M3CelebA Dataset is available at [`m3face/M3CelebA`](https://huggingface.co/m3face/M3CelebA). You can view or download it from there. ## Face Generation You can do face generation with text, segmentation mask, facial landmarks, or a combination of them by running the following command: ```bash python generate.py --seed 1111 \ --condition "landmark" \ --prompt "This attractive woman has narrow eyes, rosy cheeks, and wears heavy makeup." \ --save_condition ``` You can define the type of conditioning modality with `--condition`. By default, a conditioning modality will be generated by our framework and will be saved if the `--save_condition` argument is given. Otherwise, you can use your condition image with the `condition_path` argument. ## Face Editing For face editing, you can run the following command: ```bash python edit.py --enable_xformers_memory_efficient_attention \ --seed 1111 \ --condition "landmark" \ --prompt "She is a smiling." \ --image_path "/path/to/image" \ --condition_path "/path/to/condition" \ --edit_condition \ --embedding_optimize_it 500 \ --model_finetune_it 1000 \ --alpha 0.7 1 1.1 \ --num_inference_steps 30 \ --unet_layer "2and3" ``` You need to specify the input image and original conditioning modality. You can edit the face with an edit conditioning modality (specifying `--edit_condition_path`) or by editing the original conditioning modality with our framework (specifying `--edit_condition`). The `--unet_layer` argument specifies which UNet layers in the SD to finetune. > Note: If you don't have the original conditioning modality you can simply generate it using the `plot_mask.py` and `plot_landmark.py` scripts: ```bash pip install git+https://github.com/mapillary/inplace_abn python utils/plot_mask.py --image_path "/path/to/image" python utils/plot_landmark.py --image_path "/path/to/image" ``` ## Training The code and instruction for training our models will be posted soon!