Social Engineering via Deepfakes MCQ 60 Tests With Answers (2026)

Social Engineering via Deepfakes represent one of the most dangerous emerging threats to organizational security. These 60 carefully structured MCQs guide you from understanding fundamental voice cloning and face-swap technologies through advanced adversarial machine learning, forensic detection, and legal compliance challenges.
Questions are organized into three progressive levels: Basics (20 questions covering definitions, common attack vectors, psychological triggers, synthetic media), Concepts (20 questions covering GANs, audio/video synthesis, enterprise threats, liveness bypass, out-of-band defenses), and Advanced (20 questions covering adversarial perturbations, diffusion models, synthetic identity, spectral forensics, legal frameworks). Each question includes detailed explanations connecting theory to practical attack and defense implementation.
Use Study Mode to build conceptual mastery as you progress, or use Exam Mode to simulate real-world assessment conditions with timed testing and immediate feedback scoring.
Contents
- 1.Basics (20 Questions)Deepfake definitions · Vishing · Voice cloning · GANs · Synthetic media · CEO fraud
- 2.Concepts (20 Questions)GANs · Voice conversion · Liveness detection · Out-of-band authentication · Face swapping
- 3.Advanced (20 Questions)Adversarial attacks · Spectral analysis · Diffusion models · Synthetic identity · GDPR compliance
- 4.Conclusionsummary · next steps · study tips
- 5.Key Takeawaysquick-fire bullet recap of essential facts
- 6.Quick Review Summaryconcept · definition · key fact table
- 7.FAQcommon questions answered
Social Engineering via Deepfakes — Basics
1What is the primary definition of a deepfake in the context of cybersecurity?
CorrectC: Synthetic media manipulated by AI to replace someone's likeness or voice
A deepfake is synthetic media created using AI techniques to convincingly replace a person's appearance or voice. It's a primary vector for modern social engineering attacks.
IncorrectC: Synthetic media manipulated by AI to replace someone's likeness or voice
A deepfake is synthetic media created using AI techniques to convincingly replace a person's appearance or voice. It's a primary vector for modern social engineering attacks.
2Which social engineering tactic relies heavily on cloned audio to deceive victims over the phone?
CorrectA: Vishing (Voice Phishing)
Vishing (voice phishing) is weaponized using cloned voices and deepfakes to manipulate victims into divulging sensitive information or executing unauthorized actions over the phone.
IncorrectA: Vishing (Voice Phishing)
Vishing (voice phishing) is weaponized using cloned voices and deepfakes to manipulate victims into divulging sensitive information or executing unauthorized actions over the phone.
3What is the main objective of using a deepfake in a Business Email Compromise (BEC) scenario?
CorrectD: To manipulate employees into executing unauthorized financial wire transfers
BEC attacks amplified by deepfakes impersonate executives to trick finance teams into transferring large sums of money to attacker-controlled accounts.
IncorrectD: To manipulate employees into executing unauthorized financial wire transfers
BEC attacks amplified by deepfakes impersonate executives to trick finance teams into transferring large sums of money to attacker-controlled accounts.
4Which underlying technology is predominantly responsible for generating highly realistic deepfake videos?
CorrectB: Generative Adversarial Networks (GANs)
GANs use a generator and discriminator network competing to produce increasingly realistic synthetic media. They are the primary technology powering deepfake creation.
IncorrectB: Generative Adversarial Networks (GANs)
GANs use a generator and discriminator network competing to produce increasingly realistic synthetic media. They are the primary technology powering deepfake creation.
5Why do cybercriminals often prefer deepfakes over traditional text-based phishing?
CorrectB: Audio and video bypass human suspicion algorithms more effectively by exploiting inherent trust
Humans inherently trust audio and video more than text, making deepfakes highly effective at overriding critical thinking and suspicion mechanisms.
IncorrectB: Audio and video bypass human suspicion algorithms more effectively by exploiting inherent trust
Humans inherently trust audio and video more than text, making deepfakes highly effective at overriding critical thinking and suspicion mechanisms.
6In a "CEO Fraud" deepfake attack, what is the attacker primarily weaponizing?
CorrectD: The executive authority and hierarchical influence of a high-ranking corporate official
CEO fraud exploits organizational hierarchy—employees are conditioned to respond quickly and unquestioningly to perceived directives from executives.
IncorrectD: The executive authority and hierarchical influence of a high-ranking corporate official
CEO fraud exploits organizational hierarchy—employees are conditioned to respond quickly and unquestioningly to perceived directives from executives.
7How much original audio is typically required by modern commercial AI tools to create a convincing voice clone for social engineering?
CorrectA: A few seconds to a few minutes of high-quality speech
Modern voice cloning models can generate convincing audio from just 3-10 seconds of sample audio, dramatically lowering the barrier to deepfake voice attacks.
IncorrectA: A few seconds to a few minutes of high-quality speech
Modern voice cloning models can generate convincing audio from just 3-10 seconds of sample audio, dramatically lowering the barrier to deepfake voice attacks.
8Which indicator often betrays a low-quality live video deepfake during a video conference?
CorrectC: Unnatural blinking patterns and blurring around the edges of the face
Low-quality deepfakes often show unnatural eye behavior, blending artifacts at facial boundaries, and inconsistent lighting—hallmarks of imperfect rendering.
IncorrectC: Unnatural blinking patterns and blurring around the edges of the face
Low-quality deepfakes often show unnatural eye behavior, blending artifacts at facial boundaries, and inconsistent lighting—hallmarks of imperfect rendering.
9What is the first step an attacker usually takes before launching a targeted deepfake vishing attack?
CorrectB: Scraping social media and corporate videos to harvest acoustic training data
Attackers begin by OSINT activities—gathering public audio/video from LinkedIn, YouTube, company presentations, and social media to train voice/face models.
IncorrectB: Scraping social media and corporate videos to harvest acoustic training data
Attackers begin by OSINT activities—gathering public audio/video from LinkedIn, YouTube, company presentations, and social media to train voice/face models.
10Which psychological trigger is most commonly exploited during an urgent deepfake phone call?
CorrectA: Fear of reprimand and the pressure to act quickly without verifying facts
Urgency and authority are powerful social engineering levers. Deepfake calls create time pressure that suppresses verification behaviors.
IncorrectA: Fear of reprimand and the pressure to act quickly without verifying facts
Urgency and authority are powerful social engineering levers. Deepfake calls create time pressure that suppresses verification behaviors.
11What does the term "synthetic media" encompass?
CorrectD: Media generated or manipulated by artificial algorithms rather than recorded naturally
Synthetic media includes deepfakes, AI-generated text, voice clones, and manipulated imagery—any content created or altered by AI rather than captured naturally.
IncorrectD: Media generated or manipulated by artificial algorithms rather than recorded naturally
Synthetic media includes deepfakes, AI-generated text, voice clones, and manipulated imagery—any content created or altered by AI rather than captured naturally.
12How do attackers commonly deliver asynchronous deepfake video payloads to targets?
CorrectC: Through manipulated attachments or links embedded in spear-phishing emails
Asynchronous (pre-recorded) deepfakes are delivered via email links or attachments, social media messages, or posted on platforms where targets will discover them.
IncorrectC: Through manipulated attachments or links embedded in spear-phishing emails
Asynchronous (pre-recorded) deepfakes are delivered via email links or attachments, social media messages, or posted on platforms where targets will discover them.
13Which defense mechanism is most effective for an employee receiving a suspicious, urgent voice call from their "boss"?
CorrectA: Hanging up and calling the boss back on a verified, internally trusted phone number
Out-of-band verification is the gold standard: end the suspicious call and independently contact the claimed caller using a known, trusted phone number.
IncorrectA: Hanging up and calling the boss back on a verified, internally trusted phone number
Out-of-band verification is the gold standard: end the suspicious call and independently contact the claimed caller using a known, trusted phone number.
14What role does social media play in the proliferation of deepfake social engineering?
CorrectC: It acts as a massive repository of open-source audio and visual training material
Social media platforms—LinkedIn, YouTube, TikTok—provide attackers with abundant public audio/video for training deepfake models on targeted victims.
IncorrectC: It acts as a massive repository of open-source audio and visual training material
Social media platforms—LinkedIn, YouTube, TikTok—provide attackers with abundant public audio/video for training deepfake models on targeted victims.
15Which term describes the malicious use of a deepfake to blackmail an individual by placing their face in compromising situations?
CorrectD: Synthetic extortion
Synthetic extortion involves creating deepfake videos of victims in non-consensual, compromising, or explicit scenarios to extort money or leverage.
IncorrectD: Synthetic extortion
Synthetic extortion involves creating deepfake videos of victims in non-consensual, compromising, or explicit scenarios to extort money or leverage.
16Why are deepfake voice attacks particularly dangerous for helpdesk and IT support teams?
CorrectB: Attackers can impersonate authorized users to bypass security questions and reset MFA tokens
Helpdesk staff are social engineering targets because they can reset passwords and disable MFA—deepfake impersonation can grant account access to attackers.
IncorrectB: Attackers can impersonate authorized users to bypass security questions and reset MFA tokens
Helpdesk staff are social engineering targets because they can reset passwords and disable MFA—deepfake impersonation can grant account access to attackers.
17What does "liveness detection" attempt to verify in a video authentication system?
CorrectA: Whether the user on camera is a physical, present human rather than a pre-recorded or AI-generated stream
Liveness detection requires the user to perform spontaneous, real-time actions (blink, smile, turn head) that pre-recorded deepfakes cannot execute convincingly.
IncorrectA: Whether the user on camera is a physical, present human rather than a pre-recorded or AI-generated stream
Liveness detection requires the user to perform spontaneous, real-time actions (blink, smile, turn head) that pre-recorded deepfakes cannot execute convincingly.
18Which demographic is often heavily targeted by deepfake "family emergency" telephone scams?
CorrectC: Elderly individuals who may struggle to distinguish AI voices from their actual relatives
Elderly victims are targeted because they may struggle to recognize subtle audio artifacts and lack familiarity with deepfake technology.
IncorrectC: Elderly individuals who may struggle to distinguish AI voices from their actual relatives
Elderly victims are targeted because they may struggle to recognize subtle audio artifacts and lack familiarity with deepfake technology.
19What is the primary purpose of a "challenge-response" test during a suspected deepfake video call?
CorrectD: To force the attacker to perform spontaneous movements that the AI rendering software struggles to track
Challenge-response tests (unusual requests, complex gestures, sudden questions) exploit the limitations of deepfake rendering to respond in real-time.
IncorrectD: To force the attacker to perform spontaneous movements that the AI rendering software struggles to track
Challenge-response tests (unusual requests, complex gestures, sudden questions) exploit the limitations of deepfake rendering to respond in real-time.
20How does a deepfake generally differ from a "cheapfake"?
CorrectB: Deepfakes use complex machine learning algorithms, whereas cheapfakes use basic editing tools like speed alteration or splicing
Deepfakes use AI; cheapfakes use crude editing. Cheapfakes are often more detectable but faster to create and still effective against untrained audiences.
IncorrectB: Deepfakes use complex machine learning algorithms, whereas cheapfakes use basic editing tools like speed alteration or splicing
Deepfakes use AI; cheapfakes use crude editing. Cheapfakes are often more detectable but faster to create and still effective against untrained audiences.
Social Engineering via Deepfakes — Concepts
1Which machine learning concept involves two neural networks contesting against each other to produce realistic synthetic media?
CorrectB: Generative Adversarial Networks (GANs)
GANs pit a generator (creates fakes) against a discriminator (detects fakes) in an adversarial game, producing increasingly realistic synthetic media.
IncorrectB: Generative Adversarial Networks (GANs)
GANs pit a generator (creates fakes) against a discriminator (detects fakes) in an adversarial game, producing increasingly realistic synthetic media.
2In the context of a live deepfake injection attack on Microsoft Teams or Zoom, what software tool is typically abused to route the synthetic feed?
CorrectC: A virtual camera driver acting as a middleware bridge
Virtual camera tools (OBS, vCam) intercept the legitimate camera feed and replace it with deepfake output, fooling video conferencing software.
IncorrectC: A virtual camera driver acting as a middleware bridge
Virtual camera tools (OBS, vCam) intercept the legitimate camera feed and replace it with deepfake output, fooling video conferencing software.
3How do deepfake audio models primarily construct a victim's voice profile?
CorrectA: By analyzing phonetic pronunciations, pitch, and cadence to build an acoustic model
Voice cloning models extract prosodic features (pitch, rhythm, timing) and phonetic patterns to synthesize speech matching the victim's unique vocal characteristics.
IncorrectA: By analyzing phonetic pronunciations, pitch, and cadence to build an acoustic model
Voice cloning models extract prosodic features (pitch, rhythm, timing) and phonetic patterns to synthesize speech matching the victim's unique vocal characteristics.
4Which vulnerability in standard biometric voice authentication systems do deepfakes directly exploit?
CorrectD: The inability of legacy systems to distinguish between natural vocal tract vibrations and synthetic waveforms
Many voice authentication systems analyze only speech content patterns, not the underlying acoustic physics of human vocal production—deepfakes bypass this.
IncorrectD: The inability of legacy systems to distinguish between natural vocal tract vibrations and synthetic waveforms
Many voice authentication systems analyze only speech content patterns, not the underlying acoustic physics of human vocal production—deepfakes bypass this.
5What is the "Out-of-Band" authentication strategy when dealing with a suspected deepfake financial request?
CorrectA: Verifying the request using a completely different communication channel than the one the request arrived on
Out-of-band verification uses an independent channel (callback to known number) to confirm the identity of the alleged sender, bypassing deepfake impersonation.
IncorrectA: Verifying the request using a completely different communication channel than the one the request arrived on
Out-of-band verification uses an independent channel (callback to known number) to confirm the identity of the alleged sender, bypassing deepfake impersonation.
6Why do live deepfake video models frequently struggle with rendering a person's profile (side view) during a call?
CorrectC: Most training data consists of forward-facing photos, leading to sparse training for severe angles
Training datasets overwhelmingly consist of frontal camera angles. The model fails when asked to render unfamiliar perspectives like profiles.
IncorrectC: Most training data consists of forward-facing photos, leading to sparse training for severe angles
Training datasets overwhelmingly consist of frontal camera angles. The model fails when asked to render unfamiliar perspectives like profiles.
7In a sophisticated corporate espionage attack, how might an attacker combine deepfakes with a watering hole attack?
CorrectD: By replacing a legitimate executive presentation video on an industry website with a deepfake containing subtly altered financial metrics
A watering hole attack compromises a trusted resource; adding deepfakes to it increases credibility and exploits the victim's trust in the source.
IncorrectD: By replacing a legitimate executive presentation video on an industry website with a deepfake containing subtly altered financial metrics
A watering hole attack compromises a trusted resource; adding deepfakes to it increases credibility and exploits the victim's trust in the source.
8What specific artifact in deepfake audio is a forensic indicator of synthesis?
CorrectB: Uniform, mathematical absence of natural breathing or ambient background room noise
Deepfake audio often lacks natural background noise, breathing patterns, or micro-utterances ("um," "uh") that characterize human speech.
IncorrectB: Uniform, mathematical absence of natural breathing or ambient background room noise
Deepfake audio often lacks natural background noise, breathing patterns, or micro-utterances ("um," "uh") that characterize human speech.
9Which corporate policy is best suited to mitigate the threat of deepfake-enabled wire fraud?
CorrectC: Mandating dual-authorization protocols and physical tokens for any transfer exceeding a specific financial threshold
Dual-authorization and hardware tokens ensure that even if one person is impersonated via deepfake, a second verification step prevents fraud.
IncorrectC: Mandating dual-authorization protocols and physical tokens for any transfer exceeding a specific financial threshold
Dual-authorization and hardware tokens ensure that even if one person is impersonated via deepfake, a second verification step prevents fraud.
10How does the "Face Swapping" technique mathematically map a target face onto an actor's body?
CorrectB: By establishing facial landmarks and aligning the target's pixel mesh over the actor's spatial coordinates
Face swapping detects facial keypoints (eyes, mouth, jawline) and warps the target face to match the actor's spatial position and rotation in each frame.
IncorrectB: By establishing facial landmarks and aligning the target's pixel mesh over the actor's spatial coordinates
Face swapping detects facial keypoints (eyes, mouth, jawline) and warps the target face to match the actor's spatial position and rotation in each frame.
11What is the primary limitation of relying entirely on software-based deepfake detection tools in an enterprise?
CorrectA: Detection algorithms are often reactive and lag behind the rapid generative advancements of attacker models
Deepfake generation advances faster than detection. By the time a detector is trained, attackers have already developed new evasion techniques.
IncorrectA: Detection algorithms are often reactive and lag behind the rapid generative advancements of attacker models
Deepfake generation advances faster than detection. By the time a detector is trained, attackers have already developed new evasion techniques.
12In what way does a deepfake attack bypass traditional email security gateways (SEGs)?
CorrectD: The attack often occurs entirely via voice or external video conferencing, evading textual and attachment scanning completely
Live deepfake calls via phone or video conferencing bypass email completely, making traditional SEG filters irrelevant.
IncorrectD: The attack often occurs entirely via voice or external video conferencing, evading textual and attachment scanning completely
Live deepfake calls via phone or video conferencing bypass email completely, making traditional SEG filters irrelevant.
13What does the term "Voice Conversion" specifically refer to in deepfake terminology?
CorrectB: Transforming the spoken audio of an attacker to sound exactly like the target victim in real-time
Voice conversion transforms one person's voice into another's timbre and characteristics while preserving the semantic content of what's spoken.
IncorrectB: Transforming the spoken audio of an attacker to sound exactly like the target victim in real-time
Voice conversion transforms one person's voice into another's timbre and characteristics while preserving the semantic content of what's spoken.
14Which authentication protocol is highly resilient against deepfake video impersonation during remote onboarding?
CorrectC: Fast IDentity Online (FIDO2) hardware security keys
FIDO2 hardware keys require physical possession and cryptographic interaction—impossible for an attacker impersonating someone via deepfake video.
IncorrectC: Fast IDentity Online (FIDO2) hardware security keys
FIDO2 hardware keys require physical possession and cryptographic interaction—impossible for an attacker impersonating someone via deepfake video.
15What makes "Deepfake Phishing" (combining synthesized video with malicious links) more successful than standard phishing?
CorrectD: It establishes a false sense of physiological familiarity, suppressing the victim's critical thinking faculties
Our brains trust familiar faces and voices deeply. Deepfake videos create a parasocial illusion that disarms rational skepticism toward accompanying links.
IncorrectD: It establishes a false sense of physiological familiarity, suppressing the victim's critical thinking faculties
Our brains trust familiar faces and voices deeply. Deepfake videos create a parasocial illusion that disarms rational skepticism toward accompanying links.
16How do attackers bypass standard "turn your head" liveness checks using modern deepfake software?
CorrectA: By utilizing 3D spatial mapping models that dynamically re-render the synthetic face across different yaw and pitch axes
Advanced deepfakes use 3D face models that can render faces at multiple angles. Simple liveness checks (turn head) no longer reliably detect them.
IncorrectA: By utilizing 3D spatial mapping models that dynamically re-render the synthetic face across different yaw and pitch axes
Advanced deepfakes use 3D face models that can render faces at multiple angles. Simple liveness checks (turn head) no longer reliably detect them.
17What is a "Safe Word" protocol in the context of deepfake vishing defense?
CorrectC: A pre-established, secret phrase shared among employees used to verify identity during unusual or high-stakes requests
Safe word protocols allow employees to verify identity through knowledge that only legitimate insiders would possess—impossible for deepfake impersonation.
IncorrectC: A pre-established, secret phrase shared among employees used to verify identity during unusual or high-stakes requests
Safe word protocols allow employees to verify identity through knowledge that only legitimate insiders would possess—impossible for deepfake impersonation.
18Why are synchronized audio-visual deepfakes particularly difficult to execute convincingly in real-time live streams?
CorrectA: The intense computational overhead required to render high-resolution frames synchronizing lip movements with audio often causes noticeable latency
Real-time rendering has strict compute budgets. Synchronizing lips with audio at high resolution introduces visible delays and artifacts.
IncorrectA: The intense computational overhead required to render high-resolution frames synchronizing lip movements with audio often causes noticeable latency
Real-time rendering has strict compute budgets. Synchronizing lips with audio at high resolution introduces visible delays and artifacts.
19What role does a "Decoder" play in a standard deepfake autoencoder architecture?
CorrectD: It reconstructs the compressed latent representation of the face back into a high-resolution output image
In autoencoders, the decoder takes the compressed facial representation and expands it back to pixel space, generating the final synthetic output.
IncorrectD: It reconstructs the compressed latent representation of the face back into a high-resolution output image
In autoencoders, the decoder takes the compressed facial representation and expands it back to pixel space, generating the final synthetic output.
20When training a defensive AI to detect deepfakes, what is the most common feature analyzed by the model?
CorrectB: Inconsistencies in lighting, shadowing, and pixel blending at the boundaries of the facial mask
Deepfake detection typically focuses on rendering artifacts—blending errors, lighting inconsistencies, and color aberrations at facial boundaries.
IncorrectB: Inconsistencies in lighting, shadowing, and pixel blending at the boundaries of the facial mask
Deepfake detection typically focuses on rendering artifacts—blending errors, lighting inconsistencies, and color aberrations at facial boundaries.
Social Engineering via Deepfakes — Advanced
1How do adversarial perturbation attacks attempt to bypass forensic deepfake detection models?
CorrectD: By introducing microscopic, mathematically calculated pixel noise that forces the detection classifier to output a false negative
Adversarial perturbations add imperceptible noise optimized to fool detection models while remaining visually indistinguishable to humans.
IncorrectD: By introducing microscopic, mathematically calculated pixel noise that forces the detection classifier to output a false negative
Adversarial perturbations add imperceptible noise optimized to fool detection models while remaining visually indistinguishable to humans.
2In frequency domain analysis of synthetic audio, what specific anomaly do forensic investigators often look for?
CorrectB: Unnatural clipping or smoothing in the high-frequency spectral bands that biological vocal cords do not produce
Deepfake audio often lacks the complex harmonic distortions of natural human vocal production, visible as unnatural smoothing in spectrograms.
IncorrectB: Unnatural clipping or smoothing in the high-frequency spectral bands that biological vocal cords do not produce
Deepfake audio often lacks the complex harmonic distortions of natural human vocal production, visible as unnatural smoothing in spectrograms.
3What is the "Uncanny Valley" effect, and how does it relate to advanced deepfake social engineering?
CorrectC: It describes the psychological unease humans feel when viewing near-perfect synthetic media, acting as a subconscious indicator of a deepfake
Uncanny valley is the psychological discomfort when something is almost human but not quite—viewers sense something is "off" with near-perfect deepfakes.
IncorrectC: It describes the psychological unease humans feel when viewing near-perfect synthetic media, acting as a subconscious indicator of a deepfake
Uncanny valley is the psychological discomfort when something is almost human but not quite—viewers sense something is "off" with near-perfect deepfakes.
4How does a "Lip-Sync" deepfake fundamentally differ from a full "Face Reenactment" model?
CorrectA: It modifies only the mouth region of an existing video to match a new audio track, leaving the rest of the original face entirely intact
Lip-sync deepfakes only modify mouth movements to match audio, reducing computational cost. Face reenactment generates the entire face dynamically.
IncorrectA: It modifies only the mouth region of an existing video to match a new audio track, leaving the rest of the original face entirely intact
Lip-sync deepfakes only modify mouth movements to match audio, reducing computational cost. Face reenactment generates the entire face dynamically.
5What is the primary function of a "Diffusion Model" in the next generation of synthetic media generation?
CorrectC: It creates highly detailed images by progressively removing mathematical noise from a static signal, bypassing traditional GAN architectures
Diffusion models iteratively denoise synthetic media, often producing higher quality and more stable results than GANs, especially for video.
IncorrectC: It creates highly detailed images by progressively removing mathematical noise from a static signal, bypassing traditional GAN architectures
Diffusion models iteratively denoise synthetic media, often producing higher quality and more stable results than GANs, especially for video.
6Which forensic technique utilizes photoplethysmography (PPG) to detect video deepfakes?
CorrectD: Analyzing the micro-color changes in the skin corresponding to human blood flow and heartbeats
PPG-based detection analyzes subtle color variations in skin caused by subcutaneous blood flow—deepfakes often lack this physiological detail.
IncorrectD: Analyzing the micro-color changes in the skin corresponding to human blood flow and heartbeats
PPG-based detection analyzes subtle color variations in skin caused by subcutaneous blood flow—deepfakes often lack this physiological detail.
7Why is the "Watermarking" of training data an ineffective standalone defense against deepfake social engineering?
CorrectB: Attackers typically utilize open-source datasets and independent models that do not adhere to commercial watermarking standards
Watermarks only protect copyrighted data. Attackers use public images from social media, corporate videos, and YouTube—already in the public domain.
IncorrectB: Attackers typically utilize open-source datasets and independent models that do not adhere to commercial watermarking standards
Watermarks only protect copyrighted data. Attackers use public images from social media, corporate videos, and YouTube—already in the public domain.
8In the context of an Advanced Persistent Threat (APT) campaign, what is "Synthetic Identity Theft"?
CorrectA: Creating a completely fabricated persona utilizing AI-generated faces and voices to infiltrate corporate networks over months
Synthetic identity APTs use deepfakes to build trust over time, gaining access to critical systems through layered deception.
IncorrectA: Creating a completely fabricated persona utilizing AI-generated faces and voices to infiltrate corporate networks over months
Synthetic identity APTs use deepfakes to build trust over time, gaining access to critical systems through layered deception.
9How do attackers exploit deepfakes to execute a "Stock Manipulation" (Pump and Dump) scheme?
CorrectB: By releasing highly realistic, forged video announcements of a CEO declaring bankruptcy or massive profits to trigger algorithmic trading
Deepfake videos of executives announcing major developments can move stock prices through algorithmic and emotional trading before the public detects the fraud.
IncorrectB: By releasing highly realistic, forged video announcements of a CEO declaring bankruptcy or massive profits to trigger algorithmic trading
Deepfake videos of executives announcing major developments can move stock prices through algorithmic and emotional trading before the public detects the fraud.
10What is the architectural vulnerability of relying on "Eye Catchlight" analysis to detect deepfakes?
CorrectA: Advanced generative models have learned to accurately ray-trace environmental reflections into the synthetic corneas
Modern deepfakes accurately render catchlights by simulating light reflections on the cornea, defeating this once-reliable detection method.
IncorrectA: Advanced generative models have learned to accurately ray-trace environmental reflections into the synthetic corneas
Modern deepfakes accurately render catchlights by simulating light reflections on the cornea, defeating this once-reliable detection method.
11How does "Zero-Shot" voice cloning elevate the threat landscape for vishing attacks?
CorrectC: It allows an attacker to clone a victim's voice perfectly using only a few seconds of audio, eliminating the need for massive, structured training datasets
Zero-shot voice cloning dramatically lowers the barrier—attackers need only brief audio samples from social media or news clips.
IncorrectC: It allows an attacker to clone a victim's voice perfectly using only a few seconds of audio, eliminating the need for massive, structured training datasets
Zero-shot voice cloning dramatically lowers the barrier—attackers need only brief audio samples from social media or news clips.
12What legal compliance framework is most directly challenged when an attacker uses a deepfake to verbally authorize a massive transfer of European customer data?
CorrectD: The General Data Protection Regulation (GDPR) rules regarding explicit, verifiable consent
GDPR requires explicit, documented consent for data processing. Deepfake-based "authorization" fundamentally violates these consent requirements.
IncorrectD: The General Data Protection Regulation (GDPR) rules regarding explicit, verifiable consent
GDPR requires explicit, documented consent for data processing. Deepfake-based "authorization" fundamentally violates these consent requirements.
13In analyzing the latent space of a deepfake autoencoder, what does a "disentangled representation" allow an attacker to do?
CorrectD: Independently control specific facial attributes like lighting, emotion, and head pose without altering the underlying identity
Disentangled representations allow fine-grained control over specific facial features, creating more convincing and flexible deepfakes.
IncorrectD: Independently control specific facial attributes like lighting, emotion, and head pose without altering the underlying identity
Disentangled representations allow fine-grained control over specific facial features, creating more convincing and flexible deepfakes.
14What is a "Presentation Attack" in the context of biometric security bypass via deepfakes?
CorrectB: Holding up a high-resolution screen displaying a live deepfake to an optical facial recognition scanner
Presentation attacks present synthetic biometric data (video deepfakes) to scanners, bypassing verification systems designed to detect only real faces.
IncorrectB: Holding up a high-resolution screen displaying a live deepfake to an optical facial recognition scanner
Presentation attacks present synthetic biometric data (video deepfakes) to scanners, bypassing verification systems designed to detect only real faces.
15How does the "Gossip Protocol" conceptually assist decentralized deepfake detection networks?
CorrectA: It allows distributed nodes to rapidly share hashes of known malicious synthetic media streams to prevent cascading organizational compromises
Gossip protocols enable rapid, peer-to-peer distribution of threat indicators across decentralized systems without a central authority.
IncorrectA: It allows distributed nodes to rapidly share hashes of known malicious synthetic media streams to prevent cascading organizational compromises
Gossip protocols enable rapid, peer-to-peer distribution of threat indicators across decentralized systems without a central authority.
16What is the primary limitation of using Blockchain and cryptographic hashing to verify the authenticity of corporate media?
CorrectC: It cannot authenticate the origin of a live, unrecorded video stream generated dynamically during a Zoom meeting
Blockchain can verify historical records but cannot verify the authenticity of real-time, dynamically generated video streams on a live call.
IncorrectC: It cannot authenticate the origin of a live, unrecorded video stream generated dynamically during a Zoom meeting
Blockchain can verify historical records but cannot verify the authenticity of real-time, dynamically generated video streams on a live call.
17How do deepfake operators mitigate the visual artifacting caused by inter-frame temporal inconsistencies?
CorrectD: By employing advanced temporal smoothing algorithms and optical flow tracking to ensure pixel stability between consecutive video frames
Optical flow and temporal smoothing ensure that pixel movements between frames are physically plausible, reducing flicker and jitter.
IncorrectD: By employing advanced temporal smoothing algorithms and optical flow tracking to ensure pixel stability between consecutive video frames
Optical flow and temporal smoothing ensure that pixel movements between frames are physically plausible, reducing flicker and jitter.
18What specific biometric feature is notoriously difficult for current open-source GANs to render accurately, often betraying the deepfake to forensic analysis?
CorrectB: The precise structural geometry and asymmetric details of human teeth and oral cavities
Teeth have complex geometry and asymmetries difficult for models to render convincingly—dental details often betray deepfakes.
IncorrectB: The precise structural geometry and asymmetric details of human teeth and oral cavities
Teeth have complex geometry and asymmetries difficult for models to render convincingly—dental details often betray deepfakes.
19In a highly targeted "Whaling" deepfake attack, what is the purpose of conducting a "Dry Run"?
CorrectA: To test the acoustic properties of the synthetic voice model over a cellular network to ensure compression algorithms do not distort the clone
Dry runs validate that the deepfake persists through network compression and cellular transmission without degradation.
IncorrectA: To test the acoustic properties of the synthetic voice model over a cellular network to ensure compression algorithms do not distort the clone
Dry runs validate that the deepfake persists through network compression and cellular transmission without degradation.
20How does the concept of "Data Poisoning" act as an offensive defense against deepfake model trainers?
CorrectC: Injecting imperceptible noise into public corporate headshots that mathematically breaks facial alignment algorithms during the attacker's training phase
Data poisoning injects noise into public media so attackers' models learn corrupted facial landmarks, degrading their deepfake output quality.
IncorrectC: Injecting imperceptible noise into public corporate headshots that mathematically breaks facial alignment algorithms during the attacker's training phase
Data poisoning injects noise into public media so attackers' models learn corrupted facial landmarks, degrading their deepfake output quality.
Conclusion: Mastering Deepfake Threats & Defenses
These 60 MCQs span the full deepfake spectrum — from recognizing voice cloning and face-swap attacks, to understanding the underlying GAN and diffusion model architectures, to defending against sophisticated synthetic identity exploitation and biometric bypass attempts. Deepfakes weaponize humans' inherent trust in audio and video, making them uniquely powerful social engineering vectors.
The key defense principle is behavioral verification and out-of-band authentication. Technology alone (liveness detection, spectral analysis) lags behind attacker capabilities. Instead, organizations must implement multi-layered defenses: security awareness, challenge-response testing, hardware authentication tokens, dual authorization, and safe word protocols.
After completing this MCQ set, deepen your knowledge with the full Social Engineering via Deepfakes theory notes and practice with AI & Cloud Security interview questions to see how deepfake defense strategies integrate into comprehensive organizational security architectures.
📌 Key Takeaways — Social Engineering via Deepfakes
- Deepfakes ≠ Deepfake Phishing: Deepfakes are synthetic media. When combined with social engineering (vishing, BEC), they become powerful attack vectors.
- Voice Cloning Cost-Benefit: Modern zero-shot voice cloning requires only seconds of audio from social media. Training time is minimal. Detection is hard.
- GANs vs. Diffusion Models: GANs train a generator/discriminator pair. Diffusion models progressively denoise synthetic media. Diffusion often produces higher quality.
- Liveness Detection Limits: Early liveness (turn your head, blink) is defeated by 3D facial reenactment. Advanced liveness exists but is resource-intensive.
- Out-of-Band Verification is Strongest: Calling back on a known phone number bypasses impersonation entirely. No technology required.
- Uncanny Valley Fades: As deepfake quality improves, the psychological warning signal weakens. Users must rely on procedural verification, not intuition.
- Biometric Bypass is Real: Deepfakes defeat basic facial and voice recognition. Deeper biometrics (PPG, heartbeat detection) help but can also be gamed.
- Forensic Detection is Reactive: Detection models catch older attack types. Adversarial perturbations and diffusion models are outpacing forensic tools.
- Social Engineering is the Real Threat: Deepfakes are a tool. The real danger is psychological manipulation—urgency, authority, fear. Technical deepfakes are just delivery vehicles.
- Legal & Compliance: GDPR, NIST, and emerging deepfake legislation are tightening. Organizations must document authentication and consent to survive audits.
Quick Review & Summary
Use this table to consolidate key Deepfake attack and defense concepts before or after attempting the questions above.
| Attack / Concept | Technology Behind It | Primary Defense |
|---|---|---|
| Voice Phishing (Vishing) | Zero-shot voice cloning / TTS | Out-of-band callback verification |
| BEC / CEO Fraud | Face-swap real-time video call | Safe word + dual authorisation |
| Synthetic Identity Fraud | GAN-generated facial images | KYC with liveness detection |
| Biometric Bypass | 3D facial reenactment | PPG / heartbeat-based liveness |
| Diffusion Model Deepfakes | Latent diffusion denoising | Adversarial perturbation watermarking |
| Deepfake Detection | Spectral / temporal forensics | Multi-layer detection pipelines |
Frequently Asked Questions
Q. Can deepfakes be created with just a smartphone?
Q. How can I tell if a video of someone is authentic?
Q. What is the difference between a deepfake and a deep synthesis?
Q. Can voice cloning fool biometric systems?
Q. Are deepfakes illegal?
Q. How do enterprises detect deepfakes in real-time video calls?
Q. What is synthetic identity fraud?
Q. Can a court accept deepfake video as evidence?
Struggling with some questions? Re-read the full Theory Guide: Social Engineering via Deepfakes