back to all skills

security-pentester

devv1.0.0

Autonomous web application penetration testing — OWASP Top 10 exploitation, white-box source-aware scanning, CI/CD security gates, vulnerability report interpretation, and remediation workflows. Powered by Shannon pentest framework.

copied ✓
openclawclaude-codecursorcodex
0 installsVirusTotal: cleanSource code

Security Pentester

Autonomous web application penetration testing. Source-aware scanning that only reports vulnerabilities it can prove with a working exploit.

Core Principle

No Exploit, No Report. Every finding includes a reproducible proof-of-concept. PoC validation significantly reduces false positives, but Critical/High findings should always be manually verified (see section 4 — False Positive Identification).


1. Vulnerability Coverage

OWASP Top 10 Testing Matrix

CategoryWhat Shannon TestsTechniques
SQL InjectionUnion-based, blind (boolean/time), error-based, second-orderPayload fuzzing, source-guided parameter discovery
Command InjectionOS command injection via user inputBacktick, pipe, semicolon, $() injection patterns
XSSReflected, stored, DOM-basedContext-aware payload generation, filter bypass
SSRFInternal network access, cloud metadatahttp://169.254.169.254, internal service probing
Broken AuthenticationCredential stuffing, session fixation, JWT attacksBrute force, token manipulation, 2FA bypass
Broken AuthorizationIDOR, privilege escalation, role bypassHorizontal/vertical access control testing

OWASP Web Security Testing Guide (WSTG) Coverage

WSTG-INFO  — Information Gathering            ✓ Automated
WSTG-CONF  — Configuration Management         ✓ Automated
WSTG-IDNT  — Identity Management              ✓ Automated
WSTG-ATHN  — Authentication Testing           ✓ Automated
WSTG-ATHZ  — Authorization Testing            ✓ Automated
WSTG-SESS  — Session Management               ✓ Automated
WSTG-INPV  — Input Validation                 ✓ Automated
WSTG-ERRH  — Error Handling                   ✓ Automated
WSTG-CRYP  — Cryptography                     ◐ Partial (TLS config, weak hashing)
WSTG-BUSN  — Business Logic                   ✗ Pro only
WSTG-CLNT  — Client-Side Testing              ✓ Automated (DOM XSS, open redirects)
WSTG-APIS  — API Testing                      ✓ Automated (REST, limited GraphQL)

2. Running a Pentest

Quick Start

# Clone Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon

# Set API key (use >> to append if .env already exists)
echo "ANTHROPIC_API_KEY=your-key-here" >> .env

# Run against a target (black-box)
./shannon start URL=https://target-app.example.com REPO=my-app

# Run with source code (white-box — recommended, finds more vulns)
./shannon start URL=https://target-app.example.com REPO=my-app
# Place source code in workspaces/my-app/repo/ before running

Configuration (shannon.yaml)

# Authentication config — tell Shannon how to log in
auth:
  login_url: /login
  credentials:
    - username: testuser@example.com
      password: TestPass123!
      role: user
    - username: admin@example.com
      password: AdminPass456!
      role: admin

# Scope rules
rules:
  avoid:
    - /api/admin/delete-all    # Don't hit destructive endpoints
    - /api/billing/*           # Skip billing endpoints
    - /logout                  # Don't log yourself out
  focus:
    - /api/*                   # Prioritize API endpoints
    - /dashboard/*             # Focus on authenticated surfaces

# 2FA support (if app uses TOTP)
totp:
  secret: JBSWY3DPEHPK3PXP   # PLACEHOLDER — replace with your test account's actual TOTP secret

CLI Commands

./shannon start URL=<url> REPO=<name>    # Start full pentest
./shannon start URL=<url> REPO=<name> CONFIG=shannon.yaml  # With config
./shannon workspaces                      # List all workspaces
./shannon logs ID=<workflow-id>           # Tail live logs
./shannon query ID=<workflow-id>          # Check progress
./shannon stop                            # Stop containers (preserves data)
./shannon stop CLEAN=true                 # Full cleanup — DELETES all workspace data
# WARNING: Export reports before CLEAN=true — it removes reports, PoCs, and logs

3. Understanding the Pipeline

4-Phase Architecture

Phase 1: RECONNAISSANCE
  ├── Pre-Recon (source code analysis with configured LLM)
  │   └── Outputs: code_analysis_deliverable.md
  └── Recon (attack surface mapping with Playwright + Nmap)
      └── Outputs: recon_deliverable.md

Phase 2: VULNERABILITY ANALYSIS (5 parallel agents)
  ├── Injection Analysis   → injection_analysis.md + exploitation_queue.json
  ├── XSS Analysis         → xss_analysis.md + exploitation_queue.json
  ├── Auth Analysis        → auth_analysis.md + exploitation_queue.json
  ├── SSRF Analysis        → ssrf_analysis.md + exploitation_queue.json
  └── AuthZ Analysis       → authz_analysis.md + exploitation_queue.json

Phase 3: EXPLOITATION (5 parallel agents, conditional)
  ├── Injection Exploit    → injection_exploitation_evidence.md
  ├── XSS Exploit          → xss_exploitation_evidence.md
  ├── Auth Exploit         → auth_exploitation_evidence.md
  ├── SSRF Exploit         → ssrf_exploitation_evidence.md
  └── AuthZ Exploit        → authz_exploitation_evidence.md

Phase 4: REPORTING
  └── comprehensive_security_assessment_report.md

What Each Phase Does

Pre-Recon reads source code to understand the application architecture, identify entry points, map data flows, and find potential vulnerability patterns before any network interaction.

Recon maps the live attack surface: crawls the app with a headless browser, enumerates API endpoints, identifies technologies, scans for open ports.

Vulnerability Analysis agents work in parallel, each specializing in one category. They combine source code knowledge with recon data to hypothesize specific vulnerabilities and create exploitation queues.

Exploitation agents receive the queues and attempt real attacks using browser automation (Playwright) and HTTP requests. Only proven exploits are included in the final report.


4. Interpreting Reports

Severity Levels

SeverityDefinitionAction
CriticalDirect data breach, RCE, full authentication bypassFix immediately, consider taking app offline
HighSignificant data exposure, privilege escalation, stored XSSFix within 24-48 hours
MediumLimited data exposure, CSRF, reflected XSS, information disclosureFix within 1-2 weeks
LowMinor information leaks, missing headers, verbose errorsFix in next sprint

Reading a Finding

Each finding in the report includes:

## [CRITICAL] SQL Injection in /api/users/search

**Endpoint:** GET /api/users/search?q=
**Parameter:** q
**Type:** Union-based SQL injection

### Proof of Concept
GET /api/users/search?q=' UNION SELECT username,password,NULL FROM users--

### Response Evidence
HTTP/1.1 200 OK
[{"username":"admin","password":"$2b$12$...","3":null}]

### Source Code Reference
File: src/routes/users.ts:42
const results = await db.query(`SELECT * FROM users WHERE name LIKE '%${req.query.q}%'`);

### Remediation
Use parameterized queries:
const results = await db.query('SELECT * FROM users WHERE name LIKE $1', [`%${req.query.q}%`]);

False Positive Identification

Shannon's "no exploit, no report" policy minimizes false positives, but review for:

  • Environment-specific: Exploit only works in test environment (different DB, debug mode)
  • Already mitigated: WAF or middleware blocks the attack in production but not staging
  • Intended behavior: Feature that looks like a vulnerability (e.g., admin search returns all users by design)
  • LLM hallucination: Report claims a vulnerability but the PoC doesn't actually demonstrate impact

Always verify the PoC manually for Critical/High findings before filing tickets.


5. CI/CD Integration

Pre-Deploy Security Gate

# .github/workflows/security.yml
name: Security Pentest
on:
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 2 * * 1'  # Weekly Monday 2am

jobs:
  pentest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start test application
        run: docker compose -f docker-compose.test.yml up -d

      - name: Wait for app
        run: |
          for i in $(seq 1 30); do
            curl -s http://localhost:3000/health && break
            sleep 2
          done

      - name: Run Shannon pentest
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git clone https://github.com/KeygraphHQ/shannon.git /tmp/shannon
          cd /tmp/shannon
          ./shannon start URL=http://host.docker.internal:3000 REPO=pr-${{ github.event.pull_request.number }}

      - name: Check for critical findings
        run: |
          REPORT="/tmp/shannon/workspaces/pr-${{ github.event.pull_request.number }}/comprehensive_security_assessment_report.md"
          if [ ! -f "$REPORT" ]; then
            echo "::error::Security report not found at $REPORT — pentest may have failed. Blocking deploy."
            exit 1
          fi
          # Count severity headings (format: ## [CRITICAL] or ## [HIGH])
          CRITICAL_COUNT=$(grep -c '^\#\#.*\[CRITICAL\]' "$REPORT" || true)
          HIGH_COUNT=$(grep -c '^\#\#.*\[HIGH\]' "$REPORT" || true)
          if [ "$CRITICAL_COUNT" -gt 0 ]; then
            echo "::error::$CRITICAL_COUNT critical vulnerabilities found! Review the security report."
            cat "$REPORT"
            exit 1
          fi
          if [ "$HIGH_COUNT" -gt 0 ]; then
            echo "::warning::$HIGH_COUNT high-severity vulnerabilities found. Manual review required."
          fi

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: security-report
          path: /tmp/shannon/workspaces/pr-*/comprehensive_security_assessment_report.md

Integration Patterns

PatternWhenCostCoverage
Full pentest on PREvery pull request to main~$50/runComplete
Weekly scheduledCron job on staging~$200/monthComplete
Quick single-categoryPre-merge for risky changes~$10/runOne vuln type
Pre-release gateBefore production deploy~$50/runComplete

Cost Management

Estimated costs per run (varies with model selection and pricing):
- Simple app (5-10 endpoints):     ~$15-25
- Medium app (20-50 endpoints):    ~$30-50
- Complex app (100+ endpoints):    ~$50-100

Note: Costs are approximate and depend on the configured model. Check your
API provider's current pricing for accurate estimates.

Cost reduction strategies:
1. Use CONFIG to narrow scope (focus/avoid rules)
2. Run single-category scans for targeted checks
3. Use named workspaces to resume interrupted scans
4. Schedule full scans weekly, quick scans on PRs

6. Post-Pentest Workflow

Triage → Fix → Verify

1. TRIAGE (Day 0)
   ├── Read the full report
   ├── Verify all Critical/High PoCs manually
   ├── Create tickets with severity labels
   ├── Assign owners and deadlines
   └── Notify stakeholders for Critical findings

2. FIX (Day 1-14, based on severity)
   ├── Critical: same day
   ├── High: within 48 hours
   ├── Medium: within 2 weeks
   └── Low: next sprint

3. VERIFY (After fix)
   ├── Re-run Shannon against the same workspace
   │   └── ./shannon start URL=<url> REPO=<same-name> WORKSPACE=verify
   ├── Completed agents are skipped (resumable)
   ├── Confirm the PoC no longer works
   └── Update ticket status

4. DOCUMENT
   ├── Archive the report
   ├── Update security runbook with new patterns
   ├── Add regression tests for each finding
   └── Schedule next pentest

Regression Testing

For each finding, create a permanent test:

// tests/security/sql-injection.test.ts
describe('SQL Injection regression', () => {
  it('should not be vulnerable to union-based injection in /api/users/search', async () => {
    const res = await request(app)
      .get("/api/users/search")
      .query({ q: "' UNION SELECT username,password,NULL FROM users--" });

    // Should NOT return other users' data
    expect(res.body).not.toEqual(
      expect.arrayContaining([
        expect.objectContaining({ username: 'admin' })
      ])
    );
  });

  it('should use parameterized queries', async () => {
    const res = await request(app)
      .get("/api/users/search")
      .query({ q: "test" });

    expect(res.status).toBe(200);
    // Normal search should still work
  });
});

7. What Shannon Doesn't Cover

Supplement with manual testing or other tools:

GapAlternative
Business logic flawsManual review, threat modeling
Mobile app testingOWASP MAS, Frida, Objection
Infrastructure/cloudScoutSuite, Prowler, CloudSploit
Container securityTrivy, Grype, Docker Bench
API rate limitingCustom load testing (k6, Artillery)
GraphQL deep testingInQL, graphql-cop
WebSocket testingOWASP ZAP WebSocket plugin
Dependency vulnerabilitiesnpm audit, Snyk, Socket.dev
Secrets in source codeTruffleHog, GitLeaks, detect-secrets
CORS misconfigurationCORScanner, manual review
HTTP request smugglingsmuggler, h2csmuggler
Race conditions / TOCTOUTurbo Intruder, manual testing
Cache poisoningWeb Cache Deception Scanner
Host header injectionManual review of password reset flows

Complementary Tool Stack

# Run alongside Shannon for full coverage:

# Dependency scanning
npm audit --production
npx snyk test

# Secret detection
trufflehog git file://. --only-verified

# Container scanning
trivy image myapp:latest

# Infrastructure
prowler aws --severity critical high

# API fuzzing
schemathesis run http://localhost:3000/openapi.json

8. Safe Testing Practices

Rules of Engagement

DO:
  ✓ Only test applications you own or have written authorization to test
  ✓ Use staging/test environments, never production
  ✓ Create dedicated test accounts with known credentials
  ✓ Set scope rules to avoid destructive endpoints
  ✓ Review reports before sharing (may contain sensitive data)
  ✓ Keep API keys secure (Shannon uses significant API credits)

DON'T:
  ✗ Point Shannon at production systems
  ✗ Test third-party services without explicit written permission
  ✗ Share reports containing valid credentials or PII
  ✗ Run without scope rules on apps with destructive endpoints
  ✗ Ignore the cost — monitor API spend during runs

Test Environment Setup

# docker-compose.test.yml — isolated test environment
services:
  app:
    build: .
    environment:
      - NODE_ENV=test
      - DATABASE_URL=postgres://test:test@db:5432/testdb
    ports:
      - "3000:3000"
    networks:
      - pentest-net

  db:
    image: postgres:16
    environment:
      - POSTGRES_DB=testdb
      - POSTGRES_USER=test
      - POSTGRES_PASSWORD=test
    networks:
      - pentest-net

networks:
  pentest-net:
    driver: bridge
    # Isolated network — no access to host or internet