Generative AI and Deepfake
Generative AI is the umbrella term for systems that create new content, meaning images, videos, text or audio, rather than just sorting or classifying what already exists. It learns patterns from large amounts of example data and combines them into new outputs. Well-known applications include image generators and language models.
A deepfake is one specific application of this: an image or video that has been manipulated or fully generated with AI to make a real person do or say something they never did. The term combines deep learning and fake. Not every piece of AI content is a deepfake, but every deepfake is AI-driven.
How AI Creates Images: Diffusion Model and GAN
A diffusion model is the technique behind most of today's image generators. In simple terms, the model learns to work a meaningful image out of random image noise step by step, guided by your text input. These models often deliver highly photorealistic results and are the current standard.
A GAN, or generative adversarial network, is an older approach in which two neural networks work against each other: one creates images, the other tries to tell fakes apart from real images. This race makes the results ever more convincing. GANs were used for face generation for a long time and are the root of many early deepfakes.
What You Put In and What Comes Out: Prompt, Text-to-Image, Text-to-Video
A prompt is the input you use to tell an AI model what to create. It is usually text, but it can also be an image or a combination. How precisely and how much detail you put into the prompt strongly influences the result.
Text-to-image refers to generating an image from a text description, and text-to-video accordingly means generating a video clip. Text-to-video is technically far more demanding, because many individual frames have to stay consistent over time. This is exactly where errors often still show up, such as flickering details or objects that jump around.
Targeted Manipulation: Face-Swap, Inpainting and Outpainting
In a face-swap, one face in an image or video is replaced with another. This is a common technique in deepfakes of celebrities and private individuals, and it can be abused for fraud, bullying or disinformation.
Inpainting means that a selected part of an image is refilled or altered by AI, for example to remove or insert an object. Outpainting extends an image beyond its original edges by having the AI add matching image areas. Both methods make it possible to manipulate only part of a real photo, which makes detection harder.
Limits of the Technology: Hallucination, Cheapfake and Shallowfake
Hallucination describes the case where an AI system produces content that looks plausible but is factually wrong or entirely made up. In images this often shows up as impossible details like deformed hands, illegible text or objects that make no physical sense. Such oddities can be an indication, but they are not solid proof.
A cheapfake or shallowfake needs no elaborate AI at all. Simple means are enough here: a video is slowed down or sped up, taken out of context, mislabeled or crudely cut. Such fakes are technically simple, yet they spread fast and often do just as much damage as real deepfakes.
Origin and Labeling: C2PA, SynthID, Watermark, Provenance
Provenance, or proof of origin, means that it is documented in a traceable way where an image comes from and how it was created or edited. C2PA is an open industry standard for this: it stores so-called content credentials, meaning signed information about the origin, directly in the file. If the file is manipulated, this signature can become invalid.
SynthID is a digital watermark developed by Google that embeds an invisible pattern into AI-generated content, which special software can read back out. Such watermarks are meant to mark the origin without visibly changing the image. Important to know: origin data and watermarks can be missing, be removed or get lost in screenshots. Their presence is an indication, but their absence is no proof of authenticity.
Traces in the Material: Image Metadata, EXIF, Media Forensics and Upscaling
Image metadata is additional information that can be stored in an image file, such as the time of capture, camera model or location. EXIF is the most widespread format for this in photos. This data can be read out, but it can also be removed or faked easily, and many platforms delete it automatically on upload.
Media forensics examines an image or video for technical traces of manipulation or AI generation, for example unusual noise patterns, compression artifacts or inconsistencies in light and shadow. Upscaling, in turn, enlarges or sharpens an image with AI and invents details that were not present in the original. Even a real photo can pick up artificial elements through upscaling, which makes forensic analyses even harder.