Autonomous web application penetration testing — OWASP Top 10 exploitation, white-box source-aware scanning, CI/CD security gates, vulnerability report interpretation, and remediation workflows. Powered by Shannon pentest framework.
Autonomous web application penetration testing. Source-aware scanning that only reports vulnerabilities it can prove with a working exploit.
No Exploit, No Report. Every finding includes a reproducible proof-of-concept. PoC validation significantly reduces false positives, but Critical/High findings should always be manually verified (see section 4 — False Positive Identification).
| Category | What Shannon Tests | Techniques |
|---|---|---|
| SQL Injection | Union-based, blind (boolean/time), error-based, second-order | Payload fuzzing, source-guided parameter discovery |
| Command Injection | OS command injection via user input | Backtick, pipe, semicolon, $() injection patterns |
| XSS | Reflected, stored, DOM-based | Context-aware payload generation, filter bypass |
| SSRF | Internal network access, cloud metadata | http://169.254.169.254, internal service probing |
| Broken Authentication | Credential stuffing, session fixation, JWT attacks | Brute force, token manipulation, 2FA bypass |
| Broken Authorization | IDOR, privilege escalation, role bypass | Horizontal/vertical access control testing |
WSTG-INFO — Information Gathering ✓ Automated
WSTG-CONF — Configuration Management ✓ Automated
WSTG-IDNT — Identity Management ✓ Automated
WSTG-ATHN — Authentication Testing ✓ Automated
WSTG-ATHZ — Authorization Testing ✓ Automated
WSTG-SESS — Session Management ✓ Automated
WSTG-INPV — Input Validation ✓ Automated
WSTG-ERRH — Error Handling ✓ Automated
WSTG-CRYP — Cryptography ◐ Partial (TLS config, weak hashing)
WSTG-BUSN — Business Logic ✗ Pro only
WSTG-CLNT — Client-Side Testing ✓ Automated (DOM XSS, open redirects)
WSTG-APIS — API Testing ✓ Automated (REST, limited GraphQL)
# Clone Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon
# Set API key (use >> to append if .env already exists)
echo "ANTHROPIC_API_KEY=your-key-here" >> .env
# Run against a target (black-box)
./shannon start URL=https://target-app.example.com REPO=my-app
# Run with source code (white-box — recommended, finds more vulns)
./shannon start URL=https://target-app.example.com REPO=my-app
# Place source code in workspaces/my-app/repo/ before running
# Authentication config — tell Shannon how to log in
auth:
login_url: /login
credentials:
- username: testuser@example.com
password: TestPass123!
role: user
- username: admin@example.com
password: AdminPass456!
role: admin
# Scope rules
rules:
avoid:
- /api/admin/delete-all # Don't hit destructive endpoints
- /api/billing/* # Skip billing endpoints
- /logout # Don't log yourself out
focus:
- /api/* # Prioritize API endpoints
- /dashboard/* # Focus on authenticated surfaces
# 2FA support (if app uses TOTP)
totp:
secret: JBSWY3DPEHPK3PXP # PLACEHOLDER — replace with your test account's actual TOTP secret
./shannon start URL=<url> REPO=<name> # Start full pentest
./shannon start URL=<url> REPO=<name> CONFIG=shannon.yaml # With config
./shannon workspaces # List all workspaces
./shannon logs ID=<workflow-id> # Tail live logs
./shannon query ID=<workflow-id> # Check progress
./shannon stop # Stop containers (preserves data)
./shannon stop CLEAN=true # Full cleanup — DELETES all workspace data
# WARNING: Export reports before CLEAN=true — it removes reports, PoCs, and logs
Phase 1: RECONNAISSANCE
├── Pre-Recon (source code analysis with configured LLM)
│ └── Outputs: code_analysis_deliverable.md
└── Recon (attack surface mapping with Playwright + Nmap)
└── Outputs: recon_deliverable.md
Phase 2: VULNERABILITY ANALYSIS (5 parallel agents)
├── Injection Analysis → injection_analysis.md + exploitation_queue.json
├── XSS Analysis → xss_analysis.md + exploitation_queue.json
├── Auth Analysis → auth_analysis.md + exploitation_queue.json
├── SSRF Analysis → ssrf_analysis.md + exploitation_queue.json
└── AuthZ Analysis → authz_analysis.md + exploitation_queue.json
Phase 3: EXPLOITATION (5 parallel agents, conditional)
├── Injection Exploit → injection_exploitation_evidence.md
├── XSS Exploit → xss_exploitation_evidence.md
├── Auth Exploit → auth_exploitation_evidence.md
├── SSRF Exploit → ssrf_exploitation_evidence.md
└── AuthZ Exploit → authz_exploitation_evidence.md
Phase 4: REPORTING
└── comprehensive_security_assessment_report.md
Pre-Recon reads source code to understand the application architecture, identify entry points, map data flows, and find potential vulnerability patterns before any network interaction.
Recon maps the live attack surface: crawls the app with a headless browser, enumerates API endpoints, identifies technologies, scans for open ports.
Vulnerability Analysis agents work in parallel, each specializing in one category. They combine source code knowledge with recon data to hypothesize specific vulnerabilities and create exploitation queues.
Exploitation agents receive the queues and attempt real attacks using browser automation (Playwright) and HTTP requests. Only proven exploits are included in the final report.
| Severity | Definition | Action |
|---|---|---|
| Critical | Direct data breach, RCE, full authentication bypass | Fix immediately, consider taking app offline |
| High | Significant data exposure, privilege escalation, stored XSS | Fix within 24-48 hours |
| Medium | Limited data exposure, CSRF, reflected XSS, information disclosure | Fix within 1-2 weeks |
| Low | Minor information leaks, missing headers, verbose errors | Fix in next sprint |
Each finding in the report includes:
## [CRITICAL] SQL Injection in /api/users/search
**Endpoint:** GET /api/users/search?q=
**Parameter:** q
**Type:** Union-based SQL injection
### Proof of Concept
GET /api/users/search?q=' UNION SELECT username,password,NULL FROM users--
### Response Evidence
HTTP/1.1 200 OK
[{"username":"admin","password":"$2b$12$...","3":null}]
### Source Code Reference
File: src/routes/users.ts:42
const results = await db.query(`SELECT * FROM users WHERE name LIKE '%${req.query.q}%'`);
### Remediation
Use parameterized queries:
const results = await db.query('SELECT * FROM users WHERE name LIKE $1', [`%${req.query.q}%`]);
Shannon's "no exploit, no report" policy minimizes false positives, but review for:
Always verify the PoC manually for Critical/High findings before filing tickets.
# .github/workflows/security.yml
name: Security Pentest
on:
pull_request:
branches: [main]
schedule:
- cron: '0 2 * * 1' # Weekly Monday 2am
jobs:
pentest:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start test application
run: docker compose -f docker-compose.test.yml up -d
- name: Wait for app
run: |
for i in $(seq 1 30); do
curl -s http://localhost:3000/health && break
sleep 2
done
- name: Run Shannon pentest
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
git clone https://github.com/KeygraphHQ/shannon.git /tmp/shannon
cd /tmp/shannon
./shannon start URL=http://host.docker.internal:3000 REPO=pr-${{ github.event.pull_request.number }}
- name: Check for critical findings
run: |
REPORT="/tmp/shannon/workspaces/pr-${{ github.event.pull_request.number }}/comprehensive_security_assessment_report.md"
if [ ! -f "$REPORT" ]; then
echo "::error::Security report not found at $REPORT — pentest may have failed. Blocking deploy."
exit 1
fi
# Count severity headings (format: ## [CRITICAL] or ## [HIGH])
CRITICAL_COUNT=$(grep -c '^\#\#.*\[CRITICAL\]' "$REPORT" || true)
HIGH_COUNT=$(grep -c '^\#\#.*\[HIGH\]' "$REPORT" || true)
if [ "$CRITICAL_COUNT" -gt 0 ]; then
echo "::error::$CRITICAL_COUNT critical vulnerabilities found! Review the security report."
cat "$REPORT"
exit 1
fi
if [ "$HIGH_COUNT" -gt 0 ]; then
echo "::warning::$HIGH_COUNT high-severity vulnerabilities found. Manual review required."
fi
- name: Upload report
if: always()
uses: actions/upload-artifact@v4
with:
name: security-report
path: /tmp/shannon/workspaces/pr-*/comprehensive_security_assessment_report.md
| Pattern | When | Cost | Coverage |
|---|---|---|---|
| Full pentest on PR | Every pull request to main | ~$50/run | Complete |
| Weekly scheduled | Cron job on staging | ~$200/month | Complete |
| Quick single-category | Pre-merge for risky changes | ~$10/run | One vuln type |
| Pre-release gate | Before production deploy | ~$50/run | Complete |
Estimated costs per run (varies with model selection and pricing):
- Simple app (5-10 endpoints): ~$15-25
- Medium app (20-50 endpoints): ~$30-50
- Complex app (100+ endpoints): ~$50-100
Note: Costs are approximate and depend on the configured model. Check your
API provider's current pricing for accurate estimates.
Cost reduction strategies:
1. Use CONFIG to narrow scope (focus/avoid rules)
2. Run single-category scans for targeted checks
3. Use named workspaces to resume interrupted scans
4. Schedule full scans weekly, quick scans on PRs
1. TRIAGE (Day 0)
├── Read the full report
├── Verify all Critical/High PoCs manually
├── Create tickets with severity labels
├── Assign owners and deadlines
└── Notify stakeholders for Critical findings
2. FIX (Day 1-14, based on severity)
├── Critical: same day
├── High: within 48 hours
├── Medium: within 2 weeks
└── Low: next sprint
3. VERIFY (After fix)
├── Re-run Shannon against the same workspace
│ └── ./shannon start URL=<url> REPO=<same-name> WORKSPACE=verify
├── Completed agents are skipped (resumable)
├── Confirm the PoC no longer works
└── Update ticket status
4. DOCUMENT
├── Archive the report
├── Update security runbook with new patterns
├── Add regression tests for each finding
└── Schedule next pentest
For each finding, create a permanent test:
// tests/security/sql-injection.test.ts
describe('SQL Injection regression', () => {
it('should not be vulnerable to union-based injection in /api/users/search', async () => {
const res = await request(app)
.get("/api/users/search")
.query({ q: "' UNION SELECT username,password,NULL FROM users--" });
// Should NOT return other users' data
expect(res.body).not.toEqual(
expect.arrayContaining([
expect.objectContaining({ username: 'admin' })
])
);
});
it('should use parameterized queries', async () => {
const res = await request(app)
.get("/api/users/search")
.query({ q: "test" });
expect(res.status).toBe(200);
// Normal search should still work
});
});
Supplement with manual testing or other tools:
| Gap | Alternative |
|---|---|
| Business logic flaws | Manual review, threat modeling |
| Mobile app testing | OWASP MAS, Frida, Objection |
| Infrastructure/cloud | ScoutSuite, Prowler, CloudSploit |
| Container security | Trivy, Grype, Docker Bench |
| API rate limiting | Custom load testing (k6, Artillery) |
| GraphQL deep testing | InQL, graphql-cop |
| WebSocket testing | OWASP ZAP WebSocket plugin |
| Dependency vulnerabilities | npm audit, Snyk, Socket.dev |
| Secrets in source code | TruffleHog, GitLeaks, detect-secrets |
| CORS misconfiguration | CORScanner, manual review |
| HTTP request smuggling | smuggler, h2csmuggler |
| Race conditions / TOCTOU | Turbo Intruder, manual testing |
| Cache poisoning | Web Cache Deception Scanner |
| Host header injection | Manual review of password reset flows |
# Run alongside Shannon for full coverage:
# Dependency scanning
npm audit --production
npx snyk test
# Secret detection
trufflehog git file://. --only-verified
# Container scanning
trivy image myapp:latest
# Infrastructure
prowler aws --severity critical high
# API fuzzing
schemathesis run http://localhost:3000/openapi.json
DO:
✓ Only test applications you own or have written authorization to test
✓ Use staging/test environments, never production
✓ Create dedicated test accounts with known credentials
✓ Set scope rules to avoid destructive endpoints
✓ Review reports before sharing (may contain sensitive data)
✓ Keep API keys secure (Shannon uses significant API credits)
DON'T:
✗ Point Shannon at production systems
✗ Test third-party services without explicit written permission
✗ Share reports containing valid credentials or PII
✗ Run without scope rules on apps with destructive endpoints
✗ Ignore the cost — monitor API spend during runs
# docker-compose.test.yml — isolated test environment
services:
app:
build: .
environment:
- NODE_ENV=test
- DATABASE_URL=postgres://test:test@db:5432/testdb
ports:
- "3000:3000"
networks:
- pentest-net
db:
image: postgres:16
environment:
- POSTGRES_DB=testdb
- POSTGRES_USER=test
- POSTGRES_PASSWORD=test
networks:
- pentest-net
networks:
pentest-net:
driver: bridge
# Isolated network — no access to host or internet