The question sounds simple, but the old advice is wearing out fast. A few years ago, people were told to look for bad blinking, mangled hands, or a face that floated oddly on a body. Those clues still catch weak fakes. They do not reliably catch the best ones. Current video models are being built for sharper realism, better motion, and more convincing physical behavior, while standards such as C2PA, Content Credentials, and watermarking systems like SynthID are shifting the debate from instinct to provenance.
Table of Contents
The old deepfake checklist is no longer enough
The biggest mistake is to assume there is still one giveaway that settles the matter. There is not. NIST’s current framework on synthetic content is blunt on this point: no single technical approach offers a comprehensive answer, and the value of any method depends on context, implementation, and oversight. That matters because a clip can look polished and still be false, while an authentic clip can look rough, compressed, cropped, or suspicious simply because it passed through social platforms.
Older “spot the fake” tricks are also less dependable than they once were. Microsoft’s recent reporting on AI-enabled deception notes that deepfake techniques can even defeat some liveness checks by simulating behaviors such as blinking and head turns, which tells you how outdated the classic checklist has become. If your entire method is “I looked at the eyes,” you are already behind the tools.
That does not mean visual analysis is useless. It means it belongs lower in the hierarchy. The strongest test is no longer visual polish. It is whether the media carries a believable chain of origin. NIST frames this as digital content transparency: provenance, labeling, detection, and authentication working together rather than pretending one clue will do the job alone.
Start with provenance before you start guessing
If you can access the original file, provenance should be your first stop. C2PA describes Content Credentials as a way to establish the origin and edits of digital content, and Content Credentials itself presents that record as a kind of nutrition label for media, showing how a file was made and how it changed. In practical terms, that is far stronger than squinting at a jawline and trying to feel whether it “looks AI.”
This is no longer abstract. OpenAI says Sora videos include C2PA metadata, and its current Sora 2 system card says downloaded videos also carry a visible moving watermark alongside internal detection tooling. Google says SynthID embeds invisible watermarks in AI-generated image, audio, text, and video content, and its detector is designed to scan for those signals. Adobe, meanwhile, gives users inspection tools for Content Credentials and includes them automatically in certain AI-generated media workflows.
A missing label, though, is not proof that a video is real. NIST explicitly warns that transparency can help but does not guarantee trustworthiness, and that metadata can be missing, inaccurate, manipulated, or stripped entirely when files are copied and redistributed, especially through social platforms. That means provenance is strongest when you have the original file or a platform that preserves it well. Once a clip has been screen-recorded, reposted, and compressed three times, the evidence gets weaker.
So the smartest first question is not “Does this look fake?” but “What proof travels with this file?” If the answer is “none,” move from trust to verification mode immediately.
What the eye can still catch
When provenance is absent or ambiguous, visual analysis still matters, but it works best when you watch for patterns rather than a single flaw. Microsoft’s guidance still flags poor lip sync, strange lighting, odd hand positioning, unnatural hair movement, patchy skin, robotic voices, and suspicious eye behavior as common warning signs. NIST’s content-based detection framework describes the same broad logic more technically: look for regularities and inconsistencies left behind by generation or manipulation.
In practice, the best place to look is not the center of the face. It is the boundary zones. Watch how the mouth connects to the cheeks and jaw when a person hits hard consonants. Watch eyeglass rims, earrings, hairlines, collars, teeth, fingers, microphones, and anything that crosses in front of the face. Synthetic video often holds up in a still frame better than it holds up through motion, occlusion, and transitions. One weird frame proves little. A repeated pattern of small breaks across frames means much more.
Lighting and physics are another fault line. The best generators are getting better at both, but consistency is harder than realism. Ask whether shadows stay coherent from one angle to the next, whether reflections match the room, whether the skin tone shifts with no change in light, and whether camera movement feels optically plausible rather than perfectly frictionless. The right standard is not “does this look cinematic.” It is “does every part of the scene obey the same physical world.”
Listen as hard as you look
A suspicious video is often betrayed by audio before video. Microsoft’s guide includes robotic voice quality, unnatural speech patterns, and poor lip sync among the common signs of manipulation. That remains useful because many fake clips are really two separate tricks stitched together: a synthetic or cloned voice laid over a generated or altered face.
Listen for mismatch rather than ugliness. Does the room sound like the room you are seeing. Does the voice have the same distance from the microphone throughout the clip. Do plosive sounds land with believable mouth movement. Does the breathing pattern match the emotion, pace, and body motion. NIST’s framework emphasizes combining cues rather than treating any one modality as decisive, which is exactly right here: a convincing face with strange audio is still suspicious, and a clean voice with unstable facial timing is just as suspicious.
Check the story around the video
Context is where many people get lazy, and where many fakes survive. NIST specifically notes that detectors and analysts can gain value from surrounding information such as the source URL, the account or user posting the content, other media attached to the same claim, and claims about when the footage was captured. That is editorial common sense, but it is also now part of the technical playbook.
So if a clip claims to show a protest, explosion, arrest, confession, celebrity outburst, or military event, do not judge only the pixels. Ask who posted it first. Ask whether earlier uploads exist. Ask whether local news, eyewitness footage, geolocation clues, weather, landmarks, shadows, signage, or companion videos support the same event. A believable fake often collapses not under magnification but under chronology. It appears nowhere until one anonymous account uploads it, and suddenly everyone else is merely reposting the same source.
This is also where AI-generated video exploits human weakness. People are often better at reacting than verifying. That is why the most dangerous clips are not always the most technically perfect ones. They are the ones matched to a moment when audiences are primed to believe them.
Use tools, but do not outsource judgment to one detector
There is a temptation to solve the problem with a single detector. That is understandable, and incomplete. NIST says synthetic-content detection broadly falls into multiple categories, including provenance detection, automated content-based detection, and human-assisted detection. It also notes that detectors often perform best on the generators they were trained on and may struggle to generalize to newer models.
That has two consequences. First, specialist tools are useful, especially when they are checking something concrete such as a known watermark or signed provenance record. Google’s SynthID detection and Adobe’s Content Credentials inspection fit that category. Second, a detector score is not a verdict. NIST warns that false positives can be seriously damaging, which is another reason to resist dramatic certainty when the evidence is thin.
The best workflow is layered. Start with provenance. Then inspect the clip manually across multiple frames. Then test the surrounding claim. Then use specialized tools where they fit the platform or model ecosystem. That sequence is slower than gut instinct, but much closer to how reliable verification actually works.
What actually counts as evidence
The deepest shift here is conceptual. “Looks real” is no longer strong evidence. “Has a trustworthy origin and survives cross-checking” is. A low-resolution phone clip from a known witness, with matching time, place, and corroborating uploads, can be more credible than a beautifully lit anonymous video that appears from nowhere. NIST’s language around provenance, integrity, and credibility points in exactly that direction.
That is why the most mature response to AI video is neither panic nor cynicism. You do not need to assume every clip is fake. You also should not grant reality status because a clip triggers the right emotional reflex. The practical question is always the same: what supports this file besides its surface appearance. If the answer is “nothing,” treat it as unverified. If the answer includes provenance, original context, corroboration, and internal consistency, then trust becomes more rational.
AI video will keep getting better. That makes verification more important, not less. The skill worth building now is not mystical intuition. It is disciplined skepticism: check origin, check context, check continuity, check audio, and prefer evidence that survives more than one kind of test. That is how you separate real footage from synthetic persuasion when both are designed to feel immediate.
Author:
Jan Bielik
CEO & Founder of Webiano Digital & Marketing Agency

Sources
Reducing Risks Posed by Synthetic Content
NIST’s framework on synthetic content, including provenance, watermarking, automated detection, human-assisted detection, and the limits of verification.
https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-4.pdf
C2PA
The official Coalition for Content Provenance and Authenticity site explaining the standard for origin and edit history in digital media.
https://c2pa.org/
Content Credentials
An overview of how Content Credentials work as a practical record of how digital media was created and edited.
https://contentcredentials.org/
Inspect Content Credentials
Adobe’s guide to inspecting files for Content Credentials, including creator information and whether generative AI tools were involved.
https://helpx.adobe.com/creative-cloud/apps/adobe-content-authenticity/inspect/inspect-tool.html
SynthID
Google DeepMind’s official page describing SynthID and its watermarking approach for AI-generated content across modalities.
https://deepmind.google/models/synthid/
SynthID Detector
Google’s explanation of its SynthID detector and how it checks uploaded media for embedded watermark signals.
https://blog.google/technology/google-deepmind/synthid-detector/
Sora 2 system card
OpenAI’s technical documentation describing safety measures, provenance signals, watermarking, and metadata for Sora-generated video.
https://cdn.openai.com/pdf/50d5973c-c4ff-4c2d-986f-c72b5d0ff069/sora_2_system_card.pdf
What are deepfakes and how do I spot them
Microsoft’s guide to common visual and audio signs that can help people identify manipulated or synthetic video.
https://www.microsoft.com/en-us/microsoft-365-life-hacks/privacy-and-safety/deepfakes



