criticalCWE-502OWASP A08:2021 — Software and Data Integrity Failures

How to fix Insecure Deserialization

Deserializing untrusted data is one of the highest-impact vulnerabilities in modern apps — typically RCE with a one-line payload. Java's `ObjectInputStream`, Python's `pickle.loads`, Ruby's `YAML.load`, PHP's `unserialize`, .NET's `BinaryFormatter` — all of them execute arbitrary code if you feed them attacker-controlled bytes. The fix is rarely to 'sanitize input' — it's to use a different format.

What is Insecure Deserialization?

Insecure deserialization happens when you deserialize attacker-controlled data using a format that supports object reconstruction with arbitrary code paths. Python's pickle is the canonical example: `pickle.loads(open('/tmp/cookie.pkl').read())` will execute any `__reduce__` method an attacker put in the pickle. Java's `ObjectInputStream` triggers `readObject` chains in any class with a magic gadget chain (CommonsCollections, Spring, etc.).

The vulnerability is fundamentally about format choice: serialization formats that support 'object construction with arbitrary callbacks' (pickle, Java native, PHP serialize, Ruby YAML.load, .NET BinaryFormatter) are inherently dangerous on untrusted input. JSON, MessagePack, Protobuf and other 'data-only' formats are safe by design — they describe data structures, not code.

OWASP A08 covers this. CWE-502. The most famous example: ysoserial — a Java tool that generates ObjectInputStream payloads to exploit Java apps. Its existence means any Java app deserializing untrusted bytes is essentially RCE-on-tap. Python's pickle has the same property; the attack is `import os; os.system('id')` packaged as a pickle.

What an attacker can do

The concrete impact of leaving Insecure Deserialization unpatched.

Remote Code Execution (almost always)

Most exploits result in RCE — full server takeover from a single deserialization call.

Auth bypass via crafted session objects

Apps that store sessions as serialized objects (Java, .NET) can be coerced into accepting forged sessions.

Privilege escalation

Deserialization gadget chains often achieve direct file system / process control, escalating from web user to root.

Persistence and lateral movement

RCE via deserialization is reliable — attackers use it to install backdoors and pivot to internal networks.

How do I know if I'm vulnerable?

Manual: search for deserialization calls — `pickle.loads`, `pickle.load`, `cPickle.loads`, `yaml.load` (vs safe `yaml.safe_load`), `unserialize` (PHP), `Marshal.load` (Ruby), `ObjectInputStream` (Java), `BinaryFormatter` / `NetDataContractSerializer` (.NET). Each call needs proof the input is from a trusted source.

Automated: Semgrep's security ruleset flags all of the above with high confidence. AuditCore runs Semgrep when you provide a repo URL. For runtime detection, the AI Context Scanner reasons about deserialization patterns in your code; the SSTI scanner catches related code-execution paths.

How to fix Insecure Deserialization

5 ordered steps. Apply them in order — each builds on the previous.

Don't deserialize untrusted data — use JSON instead

If input comes from the network, a cookie, a request body, or any user-controllable source — use JSON (or another data-only format). Never pickle/yaml.load/unserialize.

JSON is restrictive by design — it describes strings, numbers, booleans, arrays, objects. No code execution paths. Same for MessagePack, Protobuf, BSON.

pythonStep 1

# ❌ Deserializing user-controlled bytes — RCE
import pickle
data = pickle.loads(request.body)  # attacker controls request.body

# ❌ yaml.load is also dangerous (loads arbitrary Python objects)
import yaml
config = yaml.load(request.body)

# ✅ JSON for arbitrary data
import json
data = json.loads(request.body)

# ✅ yaml.safe_load only loads built-in types
config = yaml.safe_load(request.body)

If you must use a binary format, sign and verify

When JSON isn't an option (binary protocols, performance), sign the serialized data with HMAC and verify before deserializing. Tampering becomes detectable.

HMAC with a server-side secret means attackers can't forge the signature — they can only replay or modify data you originally signed. Combined with TTL, this prevents most attacks.

pythonStep 2

import hmac, hashlib, pickle, base64

SECRET = os.environ["SERIALIZE_SECRET"].encode()

def sign(data: bytes) -> bytes:
    sig = hmac.new(SECRET, data, hashlib.sha256).digest()
    return base64.b64encode(sig + data)

def verify_and_load(blob: bytes):
    raw = base64.b64decode(blob)
    sig, data = raw[:32], raw[32:]
    expected = hmac.new(SECRET, data, hashlib.sha256).digest()
    if not hmac.compare_digest(sig, expected):
        raise ValueError("Tampered or unsigned data")
    return pickle.loads(data)  # OK — verified

# Even better: don't pickle. Use msgpack with strict types.

Java: replace ObjectInputStream with safer alternatives

Java's native serialization is fundamentally broken on untrusted input. Use Jackson (JSON), Protobuf, or MessagePack.

If you must use ObjectInputStream (legacy code), use a class allow-list via Java 9's `ObjectInputFilter` or NotSoSerial library. Otherwise: full RCE via gadget chains.

javaStep 3

// ❌ Vulnerable
ObjectInputStream ois = new ObjectInputStream(request.getInputStream());
Object obj = ois.readObject();

// ✅ Use Jackson for JSON
ObjectMapper mapper = new ObjectMapper();
MyDto dto = mapper.readValue(request.getInputStream(), MyDto.class);

// ✅ If ObjectInputStream is unavoidable: allow-list classes (Java 9+)
ObjectInputFilter filter = ObjectInputFilter.Config.createFilter(
    "com.example.SafeDto;com.example.OtherSafeDto;!*"
);
ObjectInputStream ois = new ObjectInputStream(input);
ois.setObjectInputFilter(filter);

PHP: replace unserialize() with json_decode()

PHP's `unserialize()` triggers `__wakeup` and `__destruct` methods of any class. Magic methods become attack vectors.

PHP 7+ supports `unserialize($data, ['allowed_classes' => false])` which limits to plain stdClass. Better: use json_decode for new code; convert legacy unserialize calls.

phpStep 4

<?php
// ❌ Vulnerable — any class with __wakeup is a gadget
$data = unserialize($_POST['payload']);

// ✅ Use JSON
$data = json_decode($_POST['payload'], true);

// ✅ If unserialize is unavoidable, allow-list classes (PHP 7+)
$data = unserialize($_POST['payload'], [
    'allowed_classes' => ['MyDto', 'OtherSafeClass']
]);

// Or false to allow only stdClass:
$data = unserialize($_POST['payload'], ['allowed_classes' => false]);

Audit cookies, query params, and stored data for serialized payloads

Deserialization vulns hide in places you don't expect: a 'state' cookie that's actually a base64'd pickle, a '__VIEWSTATE' parameter, a Java session cookie. Audit any opaque blob your app accepts.

If you see base64 strings in cookies/params and your code calls deserialize on them — that's the bug. Replace with signed JSON.

pythonStep 5

# Common pattern: encode app state in a cookie
# ❌ Vulnerable cookie scheme
@app.route("/setstate", methods=["POST"])
def set_state():
    state = pickle.dumps(request.get_json())
    resp = make_response("ok")
    resp.set_cookie("state", base64.b64encode(state))
    return resp

@app.route("/getstate")
def get_state():
    blob = base64.b64decode(request.cookies["state"])
    return pickle.loads(blob)  # RCE if attacker forges cookie

# ✅ Use itsdangerous or signed JWT instead
from itsdangerous import URLSafeTimedSerializer
s = URLSafeTimedSerializer(app.config["SECRET_KEY"])
# encode:
token = s.dumps(state_dict)
# decode (verifies signature):
state_dict = s.loads(token, max_age=3600)

How to verify the fix

Run AuditCore — Semgrep flags all common deserialization sinks. The runtime scanners detect exposed endpoints accepting binary blobs.

Manual: grep your codebase for the language-specific function names (pickle.loads, yaml.load, unserialize, ObjectInputStream, Marshal.load, BinaryFormatter). Each occurrence is a potential RCE — verify the input source. If it could ever be user-controlled, fix immediately.

FAQ

Frequently asked questions

Is JSON deserialization also vulnerable?+

JSON parsing itself is safe — it only constructs primitives, arrays, objects. The vulnerability comes when you deserialize JSON into typed objects with framework-specific 'magic' (Jackson polymorphic types, .NET TypeNameHandling). Disable polymorphic deserialization or use strict typing.

What about Python's marshal module?+

marshal is for Python bytecode — not intended for untrusted input. Same RCE risk as pickle. Don't use marshal for anything user-controlled.

Is YAML always dangerous?+

Only if you use `yaml.load()` (loads arbitrary objects). `yaml.safe_load()` is safe — only loads built-in types. The PyYAML default `yaml.load()` warns about this since version 5.1. Always use safe_load.

Can I sandbox the deserializer?+

Limited. Allow-listing classes (Java ObjectInputFilter, PHP unserialize allowed_classes) helps but bypasses are continually discovered. Better: don't deserialize untrusted input at all. Sandbox is defense-in-depth, not the fix.

How do I migrate legacy code with deep pickle usage?+

Replace at the boundary first — never deserialize input from the network. Internal pickle (caching, persistence between trusted services) can stay if signed. Then migrate stored data: write a one-shot script that loads old pickles and re-saves as JSON. Once all stored data is converted, remove pickle imports.

Related fix guides

How to fix command injection

Deserialization usually escalates to command injection via gadget chains.

Read guide

How to fix SSTI

Both are 'untrusted input becomes code' patterns. Same root cause family.

Read guide

Don't just guess — scan and verify

AuditCore Free Trial scans your homepage for Insecure Deserialization and 50+ other vulnerability classes. No credit card. Results in 60 seconds.