When File Uploads Break XSS Defenses

Something I’ve come across repeatedly over the years is a stored cross-site scripting (XSS) vulnerability that only triggers when content comes from a file upload but not when the exact same payload is entered manually into a web form.

At first glance, this seems strange. If the platform’s XSS defenses are working, why would the source of the input matter? The answer is a classic development mistake: different code paths for different input sources, with inconsistent sanitization and encoding.

How it Happens

In most platforms, user-controlled content can enter the system through several routes:

  • Web forms (text boxes, comments, profile fields, etc.)

  • File uploads (CSV, XML, JSON, Markdown, HTML fragments, or even images with metadata)

  • APIs or background import processes

It’s common for developers to tightly validate and sanitize form input because testers and security scanners usually focus on that. But the file-import pipeline is often forgotten about, resulting in the app trusting the data, using a looser parser, or applying different libraries.

That mismatch creates an escape hatch for attackers. Content that would be stripped or encoded if typed into a form slips right through when delivered in a file.

Common Causes

Split validation paths (most common): Form input goes through a robust sanitizer (commonly DOMPurify), while file uploads often lack any form of sanitization at all. What this might look like in practice is this:

# Web form submission where form data is sanitized using bleach
from bleach import clean 

@app.route("/submit_form", methods=["POST"])
def submit_form():
    raw = request.form["comment"]
    sanitized = clean(raw, tags=['b', 'i', 'u'], attributes={}, strip=True)
    db.save_comment(sanitized)

# File upload route (CSV)

@app.route("/upload_csv", methods=["POST"])
def upload_csv():
    file = request.files["file"]
    for row in csv.reader(file):
        db.save_comment(row[1])

Lack of normalization before validation: Uploaded data may arrive encoded differently (double-encoded HTML, padded, etc.) If the sanitizer isn’t normalizing the data first, malicious input can bypass filtering, unlike content supplied via a web form, which is probably only HTML encoded once. This might look something like this:

# Normalization via the form path in PHP
$comment = $_POST['comment'];
$comment = htmlspecialchars($comment, ENT_QUOTES, 'UTF-8');

# Lack of normalization via file upload in PHP
$xml = simplexml_load_file($_FILES['file']['tmp_name']);
$comment = (string) $xml->comment;

A payload such as the one below that's hex encoded won't see the angle bracket (&#x3C;), but the browser will decode it back to < when rendered.

<comment>&#x3C;script&#x3E;alert(1)&#x3C;/script&#x3E;</comment>

I couldn’t even put this in the code editor here on SquareSpace because it decodes it…see, browsers will do it.

For more filter evasion techniques, check out - https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Evasion_Cheat_Sheet.html

Relying on file “type” instead of content: Developers will often put a lot of emphasis on file-type validation through MIME-type, extensions, and magic bytes. However, they often forget that when the contents of those files are included later in the HTML, it matters too. When this occurs, what we see is that there is simply no sanitization whatsoever on the contents of the file, which is similar to the first example, different validation paths, but let’s take a different spin on this one: sneaking a polyglot file or malicious metadata into a file that satisfies content type validation:

# Trusting file extension
@app.route("/profile/<user_id>")
def profile(user_id):
    svg = open(f"uploads/{user_id}.svg").read()   
    return f"<div class='avatar'>{svg}</div>"

# The attacker uploads a file containing
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
  <[script]>alert('xss from svg')<[/script]>
</svg>

XSS via inlined SVG

We can also see examples of this when content with valid HTML is iframe’d by the page or otherwise included without any sanitization applied.

Blacklisting instead of whitelisting: Instead of allowing only safe tags and attributes, the import filter tries to strip dangerous ones. The filter may block obvious tags like <script> but leave <body> or strip events like onerror= but allow onkeyup=. I have large wordlists dedicated to finding combinations of tags and attributes that are authorized when blacklisting is used. It simply isn’t an effective tactic which shouldn’t be news but we still see it happening. Example:

# Regex-based "bad word" filter
function sanitize(input) {
  // Naive blacklist
  return input.replace(/<script.*?>.*?<\/script>/gi, "");
}

app.post("/submit_form", (req, res) => {
  db.saveComment(sanitize(req.body.comment));
});

# Can simply be bypassed with:
<svg onload=alert(1)>

Why this matters

Bug hunters, pentesters, and real-world adversaries love inconsistent defenses. Unfortunately, in cybersecurity, the defenders, in this case, developers, have to be right all the time, and the attackers only have to be right once. Using file upload paths to execute XSS is one of those things that I’ve consistently seen as a way to get that one right path in what is overall a well-hardened application when it comes to sanitization and encoding.

How to address the issues shown here

The rule of thumb: all untrusted input must flow through the same defenses. Whether it comes from a text box, a JSON API, or a file upload, the processing steps should be identical:

  1. Normalize: Decode to UTF-8, collapse entities, and standardize formats before analysis.

  2. Sanitize once, centrally: Use a proven library (DOMPurify, OWASP Java HTML Sanitizer, Bleach) with a strict allowlist.

  3. Output-encode per context: HTML body vs. attribute vs. URL all require different encoders.

  4. Treat uploads as untrusted: Never trust file extensions. Set Content-Type: application/octet-stream and X-Content-Type-Options: nosniff for downloads. Avoid inline embedding unless absolutely necessary.

  5. Defense in depth: Enforce a strong Content Security Policy (CSP) to limit script execution even if something slips through.

  6. Test both paths: Testing should send identical payloads through both forms and file uploads, verifying that the sanitized output matches.

Next
Next

The Dark Side of XSS: Weaponizing XSS to Manipulate and Deceive for Social Engineering Purposes