Skip to content

LLM-mediated SQL Injection

Improper output handling is a vulnerability in LLM applications that occurs when responses generated by the model are missing filtration and sanitization. This vulnerability is especially prevalent in web applications where outputs can be dynamically inserted into the DOM and/or used by other services. This issue can lead to a wide range of exploitation scenarios, such as XSS, code injection, command injection, and privilege escalation.

The Hackergram application contains endpoints that allow attackers to explore vulnerabilities specific to the integration of Large Language Models (LLMs) in web applications, including /generate_post and /leaderboard.

Exercise: /leaderboard

The /leaderboard endpoint of Hackergram demonstrates improper output handling vulnerabilities. In this attack, you will explore the lack of output sanitization in the LLM's response. The objective is to attempt to change another user's password using the LLM integrated into the leaderboard endpoint. To perform the attack, follow these steps:

  1. Experiment with the functionality by submitting a normal prompt, such as “Give me the count of leaderboard members.”
  2. Observe how the prompt generates a query to the application’s database.
  3. Next, try to find a way to change another user's password through the endpoint.
  4. If you succeed, you should see the SQL query that was executed.
  5. To confirm the success of the attack, log out of your current account and log in to the affected account.

Additional exercise

Drop a table from the Hackergram database.

Countermeasures

The root cause of LLM-mediated SQL injection is that model output flows directly into a database interpreter without any structural validation. A representative vulnerable pattern is:

query = llm.generate(f"Write SQL to find user {username}")
db.execute(query)

If the model outputs a destructive statement such as DELETE FROM users, the database executes it with full server privileges. Preventing this requires constraining the model's output to a predefined structure and ensuring that only parameterized queries or allowlisted operations are permitted. The model should express intent, not executable SQL.

The recommended fix is to require the model to produce structured data and map that to a safe, parameterized query:

import json

command = llm.generate(user_prompt)
parsed = json.loads(command)

if parsed.get("action") not in ["search", "summarize"]:
    abort(400)

# Map the parsed intent to a safe parameterized query
if parsed["action"] == "search":
    cursor.execute("SELECT * FROM users WHERE username = %s", (parsed["target"],))

This approach prevents the model from generating arbitrary SQL. The application enforces an allowlist of permissible actions, and all values from model output are passed as bound parameters rather than interpolated into query text.

Broader context

LLM-mediated SQL injection is an instance of output-to-interpreter violations: the model becomes an untrusted code generator whose output flows directly into another interpreter. The same class of failures underlies LLM-mediated stored XSS. Both are mitigated by ensuring model output cannot directly reach an interpreter without structural validation and that any values passed to interpreters are properly parameterized or escaped.