What is the core mechanism of a direct "Prompt Injection" attack?

Maliciously altering the user input to override the LLM's original system instructions

Physically altering the hardware memory sectors where the model's weights reside

Overloading the API gateway with millions of simultaneous conversational requests

Stealing the user's authentication tokens by intercepting the network packet

What is the primary difference between a "System Prompt" and a "User Prompt"?

The system prompt sets the foundational rules and persona, while the user prompt contains the human's immediate query

The system prompt is written in binary code, while the user prompt is written in natural language

The system prompt is executed on the client's machine, while the user prompt is executed in the cloud

The system prompt determines the financial cost of the API call, while the user prompt determines the latency

In the context of LLMs, what is a "Jailbreak"?

Crafting complex inputs specifically designed to bypass the model's ethical safety filters and generate prohibited content

Rooting the physical mobile device running the chatbot application

Transferring a proprietary cloud model to a local, offline hard drive

Bypassing the billing API to utilize a premium AI service for free

Which of the following best describes an "LLM Hallucination"?

The model confidently generates grammatically correct but factually inaccurate or entirely fabricated information

The model correctly predicts the next word but outputs it in a foreign language

The server running the model experiences a memory leak and crashes

The user experiences visual artifacts on their monitor due to heavy GPU processing

What makes prompt injection fundamentally difficult to solve compared to traditional SQL injection?

Natural language lacks strict syntactic boundaries between code and data, making it hard to parse malicious intent

LLMs operate exclusively offline, making centralized patching impossible

Prompt injections utilize heavily encrypted network tunnels

Traditional firewalls immediately delete any traffic containing natural language

What is the most basic, classic phrase used in early prompt injection attacks?

Ignore all previous instructions and do the following...

Execute command: DROP TABLE users;

Format the hard drive immediately.

Bypass the firewall using port 8080.

Why is deploying an LLM with access to corporate databases considered a severe security risk if unmitigated?

The LLM could be manipulated via prompt injection to exfiltrate or delete sensitive database records

The LLM will automatically encrypt the database, locking out human administrators

The LLM requires an entirely separate, massive cooling system that drains power

The database will automatically reject queries formatted in JSON

Which entity published the definitive "Top 10" list detailing the most critical security vulnerabilities in LLM applications?

OWASP (Open Worldwide Application Security Project)

The United States Department of Defense (DoD)

The Internet Engineering Task Force (IETF)

The World Wide Web Consortium (W3C)

What is "Data Leakage" when interacting with public LLM providers?

Employees accidentally sharing proprietary corporate code or trade secrets within their chat queries

The physical deterioration of the solid-state drives hosting the model

The AI model intentionally broadcasting its source code to the public

A flaw in the network interface card dropping TCP packets

What is an "Agentic" LLM workflow?

An architecture where the model is given access to external tools and APIs to autonomously execute actions

A model deployed strictly to analyze and sort internal emails

A legal framework governing the copyright of AI-generated content

A conversational bot that only responds with predefined, hardcoded answers

How does "Role-Playing" act as a vulnerability vector in LLMs?

Attackers command the AI to adopt a persona (like a villain) that is theoretically exempt from standard safety guardrails

It forces the model to generate text significantly slower to mimic human typing

It requires the user to input a valid credit card before proceeding

It permanently alters the underlying neural network weights

What is a "System Message" in the OpenAI API architecture?

A distinct, highly weighted message sent at the beginning of a conversation to dictate the assistant's behavior

A push notification sent to the user's phone when a generation is complete

An automated email sent to administrators if the API rate limit is exceeded

A hardware interrupt triggered by the motherboard during high thermal load

Which basic mitigation strategy limits the amount of text a user can send to the LLM?

Input length restriction and rate limiting

Multi-Factor Authentication (MFA)

Advanced Encryption Standard (AES-256)

Domain Name System (DNS) sinkholing

Why do conversational AIs sometimes accidentally reveal their own hidden instructions?

Because the model processes its own system prompt as part of the standard conversational context window and can be tricked into summarizing it

Because developers intentionally program them to share their source code with premium users

Because the cloud provider automatically appends the instructions to every output for debugging

Because hardware failures cause random sectors of memory to be printed to the screen

What is the risk of using LLMs to automatically generate and execute code (e.g., Python REPL)?

The model could be tricked into generating and running malicious OS-level commands that compromise the host server

The generated code is always written in obsolete programming languages

The compiler will refuse to execute code that lacks human comments

The model will consume too much RAM and format the local hard drive

What does the term "Guardrails" mean in LLM deployment?

Additional filtering mechanisms or secondary models designed to catch and block unsafe inputs or outputs before they reach the user

The physical metal racks that secure the AI servers in a data center

The legal contracts signed by developers before accessing an API

The mathematical boundaries that prevent the model from generating images

How can an attacker exploit an LLM's summarizing feature?

By uploading a document containing hidden malicious instructions that the LLM reads during the summarization process

By disconnecting the server's internet connection during the summarization process

By formatting the document with massive font sizes to crash the visual parser

By compressing the document into a ZIP file secured with a password

What is the primary purpose of a "Deny List" in basic LLM security?

To block user prompts that contain specific, known malicious keywords or phrases

To permanently ban users who access the application from a mobile device

To prevent the LLM from outputting text in foreign languages

To restrict the AI from answering questions during non-business hours

In LLM terminology, what is a "Token"?

A sub-word piece of text that acts as the fundamental unit of data processed by the neural network

A physical hardware USB key required to access the administrative dashboard

The encrypted password used to authenticate API requests

The graphical avatar displayed next to the AI's chat bubbles

Why is it dangerous to directly output raw, unsanitized LLM responses into a web application?

The LLM might generate malicious JavaScript, leading to a Cross-Site Scripting (XSS) attack against the end user

The output will be completely unreadable due to binary encoding

The web application will immediately shut down to prevent bandwidth overages

The generated text will automatically bypass the user's local antivirus software

What is the defining characteristic of an "Indirect Prompt Injection"?

The malicious instructions are embedded in a third-party source (like a website or email) that the LLM retrieves, rather than typed by the user

The attack targets the database server directly using raw SQL queries

The injection utilizes invisible characters to crash the web browser UI

The attacker physically accesses an unlocked terminal to type the prompt

In a Retrieval-Augmented Generation (RAG) architecture, what is "Data Poisoning"?

Attackers corrupt the underlying vector database so the LLM retrieves and trusts malicious context documents

The database automatically deletes older files to prevent the AI from accessing them

The AI model intentionally corrupts its own training data to free up memory space

Attackers flood the network to ensure the RAG system times out and defaults to an offline mode

What is the "Confused Deputy" problem in the context of LLM agents?

A privileged AI agent is manipulated by an unprivileged user's input into executing a harmful action on their behalf

The agent forgets its previous instructions and begins outputting randomized binary data

Two separate AI agents get stuck in an infinite loop communicating with each other

The agent's access token expires mid-transaction, leaving the database in a locked state

How do security teams use a "Semantic Firewall"?

By utilizing a secondary, smaller AI model to evaluate the safety and intent of the primary model's inputs and outputs in real-time

To block all incoming traffic from specific geographic regions based on IP addresses

To physically separate the database servers from the web application servers

To encrypt data stored on physical backup tapes

What is the primary danger of "Overreliance" on LLM outputs (OWASP LLM09)?

Developers or users blindly trusting the AI's generated code or security configurations without human verification

The hardware server continuously rebooting due to excessive thermal output

The AI model consuming excessive amounts of electrical power, driving up operational costs

The chatbot generating repetitive, looping phrases that fill up the database

Which attack vector involves exploiting the LLM's ability to fetch external URLs to scan internal network ports?

Server-Side Request Forgery (SSRF)

Man-in-the-Middle (MitM) interception

Cross-Site Request Forgery (CSRF)

Distributed Denial of Service (DDoS)

How does "Few-Shot Prompting" inadvertently increase the attack surface of an LLM application?

Providing internal API examples in the prompt gives attackers a blueprint of the backend architecture if the prompt is leaked

It forces the agent to completely ignore any instructions provided by the end user

It makes the agent's responses significantly slower and less coherent

It drastically increases the financial cost of running the LLM in the cloud

What is a "Do-Anything-Now" (DAN) exploit?

A highly specific, complex role-playing jailbreak that forces the model to simulate a state of absolute freedom from its creators' policies

A hardware device used to instantly decrypt network traffic

A virus that overwrites the operating system's boot sector

A feature in standard LLMs that allows administrators to pause execution

How can an attacker execute a "Data Extraction via Markdown" attack using an LLM?

By tricking the LLM into formatting a response with an external image link containing appended sensitive data, forcing the user's browser to execute a blind GET request

By forcing the agent to write a malicious macro in a Microsoft Word document

By manipulating the agent into outputting its entire neural network architecture as a text file

By using bold and italic formatting to bypass strict keyword filters

What is the primary goal of "Model Inversion" or "Training Data Extraction" attacks?

To repeatedly query the API to reverse-engineer and extract sensitive, memorized data that was used during the model's initial training phase

The process of migrating a cloud-based AI model to an on-premise hardware server

The accidental flipping of logical operators within the model's codebase

A technique used to optimize the model's speed by shrinking its parameter size

Why is an LLM-based "Human-in-the-Loop" (HITL) system occasionally flawed?

Human operators suffer from alert fatigue and may begin blindly approving the LLM's execution requests without reading them

Humans lack the keyboard speed required to approve fast transactions

The system automatically logs out the user after ten seconds of inactivity

The graphical user interface cannot display the complex mathematical logic of the LLM

What is an "Adversarial Suffix" attack?

Appending mathematically derived, nonsensical strings of characters to a prompt that reliably bypass the model's safety alignments

Attaching a massive, multi-gigabyte file to the prompt to crash the server

Translating the prompt into an obscure, ancient language to confuse the parser

Compressing the document into a ZIP file secured with a password

In the context of LLM vulnerability, what is "Prompt Leaking"?

A specific type of injection designed solely to force the model to regurgitate its proprietary, hidden system instructions to the user

The accidental publication of a company's database schema on a public blog

A physical attack where the server's hard drive is stolen and reverse-engineered

The slow degradation of the model's performance over time due to hardware wear

How does an "Insecure Output Handling" vulnerability manifest in an LLM application?

The backend system implicitly trusts the LLM's JSON output and parses it directly into an `eval()` function or database query without validation

The application displays the output using a font size that is too small to read

The user interface crashes when the LLM outputs characters from a foreign language

The system automatically emails the LLM's response to the entire company directory

What is "Sponge Poisoning" (or an Energy Latency Attack) against an LLM?

Crafting inputs specifically designed to force the model into maximum computational complexity, causing a Denial of Service (DoS) and burning API credits

Uploading malware that physically destroys the cooling fans on the server

Intercepting and altering the training data to make the model biased against certain demographics

Deleting the model's memory cache so it has to relearn basic tasks

How do defenders use "Delimiters" (like `"""` or `###`) to secure prompts?

They enclose the untrusted user input within specific characters and instruct the LLM to treat everything inside the delimiters strictly as data, not commands

They permanently encrypt the prompt so that network sniffers cannot read it

They instruct the web browser to render the text in bold formatting

They act as a physical stop command that cuts the server's power if an attack is detected

What is the "Dual-LLM Pattern" designed to achieve in secure Agentic architectures?

Separating the workload so one sandboxed LLM parses untrusted user input, while a completely isolated, highly privileged LLM executes the verified API commands

Preventing the server from overheating by splitting the computational load across two physical data centers

Stopping users from initiating more than two chat sessions at the exact same time

Preventing the AI from simultaneously processing text and image data

Which mitigation technique involves appending a random, secret string to the system prompt and requiring the LLM to output it before executing actions?

Honeypot Tokens or Canary Tokens

Cryptographic Hashing Algorithms

Named Entity Recognition Masking

Role-Based Access Control Filtering

Why is "Model Theft" (OWASP LLM10) a critical concern for open-weight models?

Malicious actors can download the model, completely strip out its safety alignments locally, and use it to generate massive phishing campaigns

Attackers can resell the physical servers on secondary hardware markets

The model might begin generating highly accurate financial advice for competitors

It forces the original developers to pay excessive cloud hosting fees

What is the primary limitation of using basic Regex (Regular Expressions) to detect prompt injection?

Natural language is infinitely variable, allowing attackers to easily bypass static word filters using synonyms, translations, or spacing

Regex requires a dedicated, secondary GPU to process efficiently

Regex automatically encrypts the payload, making it unreadable to the LLM

Regex can only scan files that are smaller than one kilobyte

What is a "Crescendo Attack" in the context of LLM jailbreaking?

A multi-turn conversation where the attacker slowly steers the context window through benign, escalating questions until the model is tricked into generating harmful content

Utilizing massive botnets to overwhelm the API rate limits simultaneously

Injecting raw hexadecimal machine code directly into the chat interface

Embedding malicious instructions within an audio file uploaded to the agent

How do attackers utilize "ASCII Smuggling" to bypass visual and text-based LLM guardrails?

Encoding malicious instructions using invisible Unicode tags or specific control characters that the LLM interprets but human reviewers cannot see

Sending a physical letter containing malicious code to the data center administrators

Hiding executable malware within the pixel data of an innocent-looking JPEG image

Rapidly flashing text on the screen to bypass the user's visual perception

What is the specific security flaw in the "LLM-as-a-Judge" mitigation strategy?

The evaluating LLM is a neural network itself, making it susceptible to the exact same prompt injections and hallucination biases as the model it is policing

The evaluating LLM inherently requires ten times more processing power than the target LLM

The evaluating LLM will automatically delete any data it deems suspicious without warning

The evaluating LLM cannot process inputs that originate from outside its local network

In advanced adversarial machine learning, what is a "Universal Transferable Jailbreak"?

A specific, mathematically optimized string of characters that successfully bypasses the safety filters of multiple completely different, unrelated LLM models

A hardware tool that can decrypt the traffic of any local AI server

A legal loophole allowing users to claim copyright on AI outputs

An algorithm that rapidly migrates a cloud database to an on-premise server

How does a "Sleepwalking" attack affect autonomous, multi-step LLM agents?

The agent's internal state is corrupted via a malicious tool response, causing it to execute repetitive, resource-draining API calls without ever reaching a termination state

The agent is forced to execute operations at exactly midnight to bypass active monitoring

The attack physically dims the brightness of the user's monitor to hide unauthorized transactions

The AI model begins to randomly delete its own internal training weights

What is the architectural goal of "Cryptographic Alignment" in future LLM systems?

Requiring the agent to possess cryptographically signed authorization tokens that dynamically expire based on the risk level of the generated execution plan

Aligning the physical servers geographically to maximize satellite signal reception

Ensuring the AI's internal logic tree perfectly matches a predefined flowchart

Utilizing quantum computing to instantly decrypt all incoming internet traffic

How do "Adversarial Perturbations" successfully attack Vision-Language Models (VLMs)?

By introducing microscopic, calculated pixel noise into an uploaded image that forces the model to fundamentally misclassify the visual data and execute a hidden instruction

By shining a bright physical laser directly into the camera lens

By uploading an image that is completely black, forcing a processing error

By wrapping the image file in a password-protected zip folder

In the context of LLM security, what does the "Walrus Attack" specifically target?

It targets commercial chatbots to extract their highly engineered, proprietary system prompts and internal tool descriptions

It causes the physical server housing the agent to experience a stack overflow and crash

It intercepts the billing API to allow the attacker to use the AI for free

It alters the aesthetic user interface of the chatbot to display phishing links

What makes securing a "Multi-Agent System" (like AutoGen) exponentially more difficult than a single-agent system?

They introduce complex inter-agent communication protocols where a compromised, low-privilege sub-agent can socially engineer a highly privileged master agent

They require a constant connection to a centralized quantum computer to function

They natively generate their own programming languages, making log analysis impossible

They physically consume too much electrical power, creating frequent brownouts

Which technique relies on creating fake API endpoints to detect if an LLM has been compromised and is exploring the internal network?

Deploying LLM-specific Honeypots or Canary Endpoints

Implementing rigorous Rate Limiting and Token Quotas

Utilizing the K-Means clustering algorithm for anomaly detection

Formatting the hard drive using a 7-pass military wipe

What is "Membership Inference" in the context of LLM privacy attacks?

An attack designed to mathematically determine whether a specific, exact piece of data (like a person's medical record) was included in the model's training dataset

Extracting the usernames and passwords of the administrative team

Determining the exact geographical location of the server farm

Calculating the exact financial cost of the model's training phase

MCQ Practice Test

Prompt Injection & LLM Vulnerabilities MCQ 60 Tests With Answers (2026)

PerfectNotes TeamUpdated: May 2026~15 min practice 60 Questions · 3 Levels 60 Questions · 3 Levels

Prompt Injection & LLM Vulnerabilities MCQ practice questions are essential for AI security certifications, LLM engineering roles, and enterprise AI threat modeling. This platform provides 60 carefully curated practice questions covering direct and indirect injection, hallucinations, jailbreaks, RAG poisoning, model inversion, and advanced adversarial ML techniques.

These questions are organized into three progressive difficulty levels of 20 questions each: Basics (covering core definitions, prompt structures, direct injection, jailbreaks, and guardrails), Concepts (covering indirect injection, RAG poisoning, OWASP LLM threats, semantic firewalls, and model inversion), and Advanced (covering Crescendo attacks, adversarial perturbations, cryptographic alignment, multi-agent vectors, and ZK-ML). Each question includes a verified, in-depth explanation bridging theory and practical attack/defense implementation.

Practice in Study Mode to build conceptual understanding as you go, or use Exam Mode for timed testing to simulate real AI security certification conditions. Track your progress and identify gaps in prompt injection defense, jailbreak detection, and LLM hardening strategies.

Contents

1.
Basics (20 Questions)Prompt injection · Jailbreaks · Hallucinations · Guardrails · Direct injection
2.
Concepts (20 Questions)Indirect injection · RAG poisoning · OWASP threats · Model inversion · Semantic firewalls
3.
Advanced (20 Questions)Crescendo attacks · Adversarial perturbations · Cryptographic alignment · Multi-agent · ZK-ML
4.
Conclusionsummary · next steps · study tips
5.
Key Takeawaysquick-fire bullet recap of essential facts
6.
Quick Review Summaryconcept · definition · key fact table
7.
FAQcommon questions answered

Back to Theory Notes

Level:

Conclusion: Master Prompt Injection & LLM Security MCQs

These 60 MCQs cover the full spectrum of LLM vulnerability knowledge — from recognizing basic direct injection attacks, to understanding sophisticated indirect injection through RAG poisoning, to mastering advanced adversarial ML techniques like Universal Transferable Jailbreaks and Crescendo attacks.

The key to securing LLMs is understanding that natural language inherently lacks the syntactic boundaries that protect traditional software. Attackers have infinite variability in how they phrase attacks, making defense layered, continuous, and imperfect. Guardrails, semantic firewalls, and HITL systems are necessary but insufficient on their own.

After completing this MCQ set, deepen your knowledge with the full Prompt Injection & LLM Vulnerabilities theory notes and practice with AI & Cloud Security interview questions to see how these concepts apply to real-world system design, threat modeling, and defense strategies.

📌 Key Takeaways — Prompt Injection & LLM Vulnerabilities

Direct vs. Indirect Injection: Direct: attacker types malicious text. Indirect: malicious instructions embedded in documents the LLM retrieves.
Jailbreak: Sophisticated prompts designed to bypass safety filters. Examples: role-playing, encoding obfuscation, adversarial suffixes.
Hallucination: LLM confidently generates false, fabricated, or contradictory information. Not always a security issue but can be.
Prompt Leaking: Attacker tricks LLM into revealing its hidden system prompt. Example: "What are your instructions above?"
RAG Poisoning: Attackers corrupt retrieval documents so the LLM retrieves and trusts malicious context.
Confused Deputy: Privileged agent tricked into performing unauthorized actions on behalf of attacker.
Model Inversion: Extracting sensitive training data from the model via repeated queries and analysis of probabilities.
Guardrails are not perfect: Keyword blockers, deny lists, semantic filters — all can be bypassed with synonyms, encoding, obfuscation.
Input validation matters: Delimiters, context windows, rate limiting, input length restrictions all reduce attack surface.
Defense-in-depth required: No single defense layer stops all attacks. Combine: guardrails, HITL, semantic firewalls, monitoring, canary tokens.

Quick Review & Summary

Use this table to consolidate LLM vulnerability mappings before or after attempting the questions above.

Attack / Vulnerability	Category	Primary Defense
Direct Prompt Injection	Input Manipulation	Input delimiters, context separation, output validation
Indirect Prompt Injection	Data-layer Manipulation	RBAC on retrieval, provenance tracking, source validation
Jailbreaking	Safety Bypass	Multi-layer guardrails, LLM classifier, adversarial training
Hallucination	Reliability / Trust	Grounding via RAG, RLHF, output confidence scoring
Prompt Leaking	Data Exfiltration	Canary tokens, output monitoring, system prompt hardening
RAG Poisoning	Knowledge Corruption	Signed document provenance, RBAC on vector DB, freshness checks
Model Inversion	Training Data Extraction	Differential privacy, output perturbation, rate limiting
Crescendo Attack	Gradual Safety Bypass	Conversation-level monitoring, HITL checkpoints
Adversarial Suffix	Optimized Token Injection	Semantic classifiers, input normalization, adversarial training
Unconstrained Feedback Loop	Autonomous Escalation	Human-in-the-loop gates, loop iteration limits, sandboxing

Frequently Asked Questions

Q. What is the difference between Direct and Indirect Prompt Injection?

Direct injection: attacker types malicious text directly into the chat. Indirect injection: malicious instructions are embedded in a document/webpage that the LLM retrieves—the attacker doesn't directly interact with the chat.

Q. Why are hallucinations a security vulnerability?

Hallucinations breed trust in false information. If an LLM is compromised or hallucinating, users might trust the false output for security decisions (e.g., "use this encryption algorithm" which doesn't exist).

Q. Can prompt injection be completely prevented?

Not with 100% certainty. Natural language is inherently ambiguous. Defenses (guardrails, semantic firewalls, delimiters) reduce risk significantly but aren't foolproof.

Q. How does RAG poisoning differ from training data poisoning?

RAG poisoning: attacker corrupts retrieval documents used during inference. Training poisoning: attacker corrupts data used during model training (pre-deployment).

Q. What is the "Confused Deputy" problem?

A privileged service (AI agent) is tricked by lesser-privileged input (attacker) into performing actions the lesser privilege couldn't do directly.

Q. Why is code execution via LLM particularly dangerous?

If an LLM can run code (Python REPL, etc.), prompt injection becomes arbitrary code execution. Attacker can delete files, steal data, install backdoors.

Q. What are "Guardrails" and why aren't they perfect?

Guardrails are secondary filters/classifiers to block unsafe content. However, attackers continuously find new phrasings and obfuscation techniques to bypass them.

Q. How do adversarial suffixes evade detection?

Adversarial suffixes are mathematically optimized token sequences that trigger harmful outputs. They're nonsensical to humans, so pattern-matching filters miss them.

Struggling with some questions? Re-read the full Theory Guide: Prompt Injection & LLM Vulnerabilities

Back to Theory: Prompt Injection & LLM Vulnerabilities

Identity & Access Management for AI MCQs

Social Engineering via Deepfakes MCQs