Semgrep
p/security-audit rules. Part of AuditCore's automated security audit pipeline — runs on every scan in the Enterprise tier and above, with findings normalized into a single severity-rated table.
What is Semgrep?
Semgrep is the SAST (static application security testing) workhorse for AuditCore's static analysis phase. It pattern-matches your source code against a library of security rules — finding things like hardcoded secrets, dangerous eval() usage, SQL queries built via string concatenation, and unsafe deserialization patterns. Unlike traditional SAST tools (Veracode, Checkmarx) that take hours and produce thousands of false positives, Semgrep runs in seconds and uses precise AST-based matching for ~5× lower noise.
AuditCore runs Semgrep with the `p/security-audit` ruleset — Semgrep's curated security pack covering the OWASP Top 10 plus language-specific anti-patterns. Languages supported: JavaScript/TypeScript, Python, Go, Ruby, Java, PHP, C/C++, Rust, Kotlin, Scala. We add the `p/secrets` pack for credential detection (overlaps with gitleaks, but Semgrep catches inline string assignments that gitleaks misses since it scans git history not file contents).
Static analysis is included in the Pro+ tiers when you provide a repository URL during scan setup. For URL-only scans, we skip Semgrep (no source code to analyze). The trade-off: SAST findings have less context than dynamic findings (we know the code is dangerous, not whether it's reachable in production), but they catch issues *before* they ship — earlier in the SDLC means cheaper to fix.
Common gotchas: Semgrep won't trace data flow across files in OSS edition (taint mode is enterprise). It also misses logic bugs (broken auth, race conditions) — those need our dynamic scanners. Use Semgrep findings as a hit list, not as the final word; some rules have caveats that require human judgment.
What it tests
- Hardcoded secrets in source code (API keys, tokens, passwords)
- SQL injection via string concatenation (separate from runtime SQLi detection)
- Command injection via `exec`, `shell_exec`, `subprocess.run(shell=True)`
- Unsafe deserialization (`pickle.loads`, Jackson polymorphic types, Ruby YAML.load)
- Path traversal via unvalidated file paths
- Insecure crypto (MD5/SHA1 for passwords, ECB mode, hardcoded IVs)
- Server-side request forgery (SSRF) patterns
- XML external entity (XXE) configurations in popular parsers
- JWT misuse (alg:none acceptance, hardcoded secrets, missing verification)
- OWASP Top 10 patterns specific to React, Vue, Angular, Express, FastAPI, Spring, Django, Rails, Laravel
Where it runs in the AuditCore pipeline
Phase 5/5 · Static / Code & Mobile
Source-code, dependency and mobile-binary analysis — Semgrep rules, gitleaks secrets, Trivy CVEs, APK / IPA manifest, permissions, strings, network and native-binary hardening.
Source: scanners/semgrep_scanner.py
Sample findings
SQL query built via f-string in Python
High. `cursor.execute(f"SELECT * FROM users WHERE id={user_id}")` is vulnerable even if `user_id` looks numeric — passing `1; DROP TABLE users` becomes a valid query. Mitigation: parameterized form: `cursor.execute("SELECT * FROM users WHERE id=%s", (user_id,))`.
Hardcoded AWS access key in committed file
Critical. `AWS_ACCESS_KEY_ID = 'AKIA…'` found in `config/dev.py`. Even if it's a 'dev' key, it's now in git history forever and probably has IAM permissions a public exposure shouldn't have. Mitigation: rotate key immediately, move to environment variable or AWS Secrets Manager, scrub from git history with `git filter-repo`.
subprocess call with `shell=True` and user input
Critical. `subprocess.run(f'convert {filename} output.jpg', shell=True)` allows command injection if `filename` is `'a.jpg; rm -rf /'`. Mitigation: pass arguments as a list (`subprocess.run(['convert', filename, 'output.jpg'])`) which never invokes a shell.
JWT verification accepts alg:none
Critical. `jwt.decode(token, options={'verify_signature': False})` skips signature checking entirely — anyone can forge a JWT claiming any user identity. Mitigation: `jwt.decode(token, secret, algorithms=['HS256'])` with explicit algorithm allow-list. Never set `verify_signature=False` in production.
Other static / code & mobile scanners
FAQ
Does Semgrep replace my existing SAST tool (Veracode, SonarQube)?
It can replace SonarQube's security rules — Semgrep is faster (~30s vs ~30min on a medium repo), produces fewer false positives, and is free open-source. Veracode and Checkmarx have deeper enterprise features (compliance reporting, ticketing integrations, taint analysis across files) — Semgrep OSS doesn't match those. If you want enterprise-grade SAST, Semgrep Pro/Enterprise exists; AuditCore uses the OSS version.
How long does Semgrep take on a typical repo?
10-60 seconds for repos up to ~100K lines. Larger monorepos can take 2-5 minutes. We run it with `--jobs auto` to use all available CPU. AuditCore caps Semgrep runtime at 5 minutes — past that, we report whatever it found and move on (rare for typical web apps).
Can I add custom Semgrep rules for AuditCore to run?
Not in the current self-serve flow. Custom rules support is on the Enterprise roadmap — useful for codifying internal coding standards (e.g. 'never call our deprecated `legacyAuth()` function'). Contact us if needed.
Why does Semgrep flag patterns that I know are safe?
Pattern-matching can't always tell whether user input actually reaches a sink. A finding like 'SQL via string concat' might be safe if the string is hardcoded or comes from a trusted enum. Mark these as `nosemgrep` in the source comment — Semgrep respects in-line suppressions. For broader exclusions, add a `.semgrepignore` file. AuditCore reports raw findings; tuning happens in your repo.
Does Semgrep work on minified JavaScript or compiled binaries?
No. Semgrep needs human-readable source code with stable structure. For minified bundles, run Semgrep against the original source before bundling. For compiled binaries, AuditCore uses Trivy (dependency CVE) and our APK Binary Analyzer (mobile binaries) instead.