We present an inversion-free editing (InfEdit) method that allows for consistent editing at both the semantic and spatial levels, catering to intricate modifications without compromising on the image's integrity and explicit inversion. Through extensive experiments, InfEdit shows strong performance in complex editing tasks and also maintains a seamless workflow (less than 3 seconds on one A40), demonstrating the potential for real-time applications.
A painting of a waterfall
[+and angels] in the mountains
A woman in a coat
[+and dress] is dancing
[+Oil painting of] a lake with mountains in the background
A woman in a white red
dress sitting on a chair with flowers
A man in a white shirt standing in front of trees mountains
A light brown bear sitting standing on the ground
Muffin Chihuahua
A football with OSU UMich logo
A blue droplet red fire emoji with a smiling angry face with yellow dot
Performance in image editing: DDCM matches or exceeds other algorithms, with LCM and UAC bringing further improvement. Notably, it runs about an order of magnitude faster.
Qualitative examples: InfEdit vs prior methods. InfEdit attains editing goals with the best consistency with source images.
Qualitative examples: InfEdit vs prior methods. InfEdit attains editing goals with the best consistency with source images.
More ResultsWe make an attempt to eliminate the inversion process and introduce Denoising Diffusion Consistent Model (DDCM), a sampling strategy that enables virtual inversion. DDCM leverages a diffusion process that significantly enhances consistency throughout the image generation phases, ensuring fidelity and speed in transforming and refining visual content.
We also present Unified Attention Control (UAC) for tuning-free image editing through natural language that integrates cross-attention and self-attention control within a unified framework.
@article{xu2023infedit,
title={Inversion-Free Image Editing with Natural Language},
author={Sihan Xu and Yidong Huang and Jiayi Pan and Ziqiao Ma and Joyce Chai},
booktitle={Conference on Computer Vision and Pattern Recognition 2024},
year={2024}
}