Inversion-Free Image Editing
with Natural Language

Sihan Xu¹* Yidong Huang¹* Jiayi Pan² Ziqiao Ma^1∞ Joyce Chai¹

¹University of Michigan ²University of California, Berkeley

* Equal contribution ^∞ Correspondence

CVPR 2024

Paper ArXiv Code 🤗Demo BibTex
User Handbook (Under Construction)

We present an inversion-free editing (InfEdit) method that allows for consistent editing at both the semantic and spatial levels, catering to intricate modifications without compromising on the image's integrity and explicit inversion. Through extensive experiments, InfEdit shows strong performance in complex editing tasks and also maintains a seamless workflow (less than 3 seconds on one A40), demonstrating the potential for real-time applications.

A painting of a waterfall
[+and angels] in the mountains

A woman in a coat
[+and dress] is dancing

[+Oil painting of] a lake with mountains in the background

A woman in a ~~white~~ red
dress sitting on a chair with flowers

A man in a white shirt standing in front of ~~trees~~ mountains

A light brown bear
~~sitting~~ standing on the ground

~~Muffin~~ Chihuahua

A football with ~~OSU~~ UMich logo

A ~~blue droplet~~ red fire emoji with a ~~smiling~~ angry face with yellow dot

Experiments

InfEdit in various complex image editing tasks:

Gallery

Comparison

Comparison with inversion-base methods:

Performance in image editing: DDCM matches or exceeds other algorithms, with LCM and UAC bringing further improvement. Notably, it runs about an order of magnitude faster.

Qualitative examples: InfEdit vs prior methods. InfEdit attains editing goals with the best consistency with source images.

Comparison with existing methods:

Qualitative examples: InfEdit vs prior methods. InfEdit attains editing goals with the best consistency with source images.

More Results

Method

We make an attempt to eliminate the inversion process and introduce Denoising Diffusion Consistent Model (DDCM), a sampling strategy that enables virtual inversion. DDCM leverages a diffusion process that significantly enhances consistency throughout the image generation phases, ensuring fidelity and speed in transforming and refining visual content.

We also present Unified Attention Control (UAC) for tuning-free image editing through natural language that integrates cross-attention and self-attention control within a unified framework.

Detail

BibTeX

@article{xu2023infedit,
  title={Inversion-Free Image Editing with Natural Language}, 
  author={Sihan Xu and Yidong Huang and Jiayi Pan and Ziqiao Ma and Joyce Chai},
  booktitle={Conference on Computer Vision and Pattern Recognition 2024},
  year={2024}
}