How Many Van Goghs Does It Take to Van Gogh?

How Many Van Goghs Does It Take to Van Gogh?
Finding the Imitation Threshold

1University of Washington, 2Bar-Ilan University, 3University of California, Irvine, 4Allen Institute for AI

Abstract

Text-to-image models are trained using large datasets collected by scraping image-text pairs from the internet. These datasets often include private, copyrighted, and licensed material. Training models on such datasets enables them to generate images with such content, which might violate copyright laws and individuals' privacy. This phenomenon is termed imitation -- generation of images with recognizable similarity to training images. In this work we study the relationship between a concept's frequency in a dataset and the ability of a model to imitate it. We seek to determine the point at which a model was trained on enough instances to imitate a concept -- the imitation threshold. We posit this question as a new problem: Finding the Imitation Threshold (FIT) and propose an efficient approach that estimates the imitation threshold without incurring the colossal cost of training multiple models from scratch. We experiment with two domains -- human faces and art styles for which we create three datasets, and evaluate three text-to-image models which were trained on two pre-training datasets. Our results reveal that the imitation threshold of these models is in the range of 200-600 images, depending on the domain and the model. The imitation threshold can provide an empirical basis for copyright violation claims and acts as a guiding principle for providers of text-to-image models that aim to comply with copyright and privacy laws.

MIMETIC2 Overview

Architecture of CountGen

An overview of FIT, where we seek the imitation threshold -- the point at which a model was exposed to enough instances of a concept that it can reliably imitate it. The figure shows four concepts (e.g., Van Gogh's art style) that have different frequencies in the training data (213K for Van Gogh). As the frequency of a concept's images increases, the ability of the text-to-image model to imitate it increases (e.g. Piet Mondrian and Van Gogh). We propose an efficient approach, MIMETIC2, that estimates the imitation threshold without training models from scratch.

MIMETIC2 Architecture

Architecture of CountGen

An overview of MIMETIC2. There are two main steps: (1) Estimating Concept Frequency: we estimate the frequency of a concept in the training data by using a set of gold-standard images of that concept and counting the number of training images that match those gold gold-standard images; and (2) Computing Imitation Score: we compute the imitation score of a concept by embedding the training images of that concept (found in Step 1) and the generated images of that concept, and computing the cosine similarity between the two sets of embeddings.

Interact with MIMETIC2

Move the slider

0 362K

Model: Stable Diffusion 1.1
Prompt: "A photorealistic close-up photograph of {celebrity}"
Image Count: We count the number of images of a celebrity by using a celebrity specific detector applied to the images whose caption mention that celebrity.

Cite Us

@misc{verma-diffusion-model-imitation-threshold,
      title={How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold}, 
      author={Sahil Verma and Royi Rassin and Arnav Das and Gantavya Bhatt and Preethi Seshadri and Chirag Shah and Jeff Bilmes and Hannaneh Hajishirzi and Yanai Elazar},
      year={2024},
      eprint={2410.15002},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.15002}
}