How do I install the security-pentester skill?

Run: npx skills-ws install security-pentester. This works with OpenClaw, Claude Code, Cursor, and Codex.

security-pentester

devv1.0.0

Autonomous web application penetration testing — OWASP Top 10 exploitation, white-box source-aware scanning, CI/CD security gates, vulnerability report interpretation, and remediation workflows. Powered by Shannon pentest framework.

copied ✓

openclawclaude-codecursorcodex

0 installsVirusTotal: cleanSource code

Security Pentester

Autonomous web application penetration testing. Source-aware scanning that only reports vulnerabilities it can prove with a working exploit.

Core Principle

No Exploit, No Report. Every finding includes a reproducible proof-of-concept. PoC validation significantly reduces false positives, but Critical/High findings should always be manually verified (see section 4 — False Positive Identification).

1. Vulnerability Coverage

OWASP Top 10 Testing Matrix

Category	What Shannon Tests	Techniques
SQL Injection	Union-based, blind (boolean/time), error-based, second-order	Payload fuzzing, source-guided parameter discovery
Command Injection	OS command injection via user input	Backtick, pipe, semicolon, `$()` injection patterns
XSS	Reflected, stored, DOM-based	Context-aware payload generation, filter bypass
SSRF	Internal network access, cloud metadata	`http://169.254.169.254`, internal service probing
Broken Authentication	Credential stuffing, session fixation, JWT attacks	Brute force, token manipulation, 2FA bypass
Broken Authorization	IDOR, privilege escalation, role bypass	Horizontal/vertical access control testing

OWASP Web Security Testing Guide (WSTG) Coverage

WSTG-INFO  — Information Gathering            ✓ Automated
WSTG-CONF  — Configuration Management         ✓ Automated
WSTG-IDNT  — Identity Management              ✓ Automated
WSTG-ATHN  — Authentication Testing           ✓ Automated
WSTG-ATHZ  — Authorization Testing            ✓ Automated
WSTG-SESS  — Session Management               ✓ Automated
WSTG-INPV  — Input Validation                 ✓ Automated
WSTG-ERRH  — Error Handling                   ✓ Automated
WSTG-CRYP  — Cryptography                     ◐ Partial (TLS config, weak hashing)
WSTG-BUSN  — Business Logic                   ✗ Pro only
WSTG-CLNT  — Client-Side Testing              ✓ Automated (DOM XSS, open redirects)
WSTG-APIS  — API Testing                      ✓ Automated (REST, limited GraphQL)

2. Running a Pentest

Quick Start

# Clone Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon

# Set API key (use >> to append if .env already exists)
echo "ANTHROPIC_API_KEY=your-key-here" >> .env

# Run against a target (black-box)
./shannon start URL=https://target-app.example.com REPO=my-app

# Run with source code (white-box — recommended, finds more vulns)
./shannon start URL=https://target-app.example.com REPO=my-app
# Place source code in workspaces/my-app/repo/ before running

Configuration (shannon.yaml)

# Authentication config — tell Shannon how to log in
auth:
  login_url: /login
  credentials:
    - username: testuser@example.com
      password: TestPass123!
      role: user
    - username: admin@example.com
      password: AdminPass456!
      role: admin

# Scope rules
rules:
  avoid:
    - /api/admin/delete-all    # Don't hit destructive endpoints
    - /api/billing/*           # Skip billing endpoints
    - /logout                  # Don't log yourself out
  focus:
    - /api/*                   # Prioritize API endpoints
    - /dashboard/*             # Focus on authenticated surfaces

# 2FA support (if app uses TOTP)
totp:
  secret: JBSWY3DPEHPK3PXP   # PLACEHOLDER — replace with your test account's actual TOTP secret

CLI Commands

./shannon start URL=<url> REPO=<name>    # Start full pentest
./shannon start URL=<url> REPO=<name> CONFIG=shannon.yaml  # With config
./shannon workspaces                      # List all workspaces
./shannon logs ID=<workflow-id>           # Tail live logs
./shannon query ID=<workflow-id>          # Check progress
./shannon stop                            # Stop containers (preserves data)
./shannon stop CLEAN=true                 # Full cleanup — DELETES all workspace data
# WARNING: Export reports before CLEAN=true — it removes reports, PoCs, and logs

3. Understanding the Pipeline

4-Phase Architecture

Phase 1: RECONNAISSANCE
  ├── Pre-Recon (source code analysis with configured LLM)
  │   └── Outputs: code_analysis_deliverable.md
  └── Recon (attack surface mapping with Playwright + Nmap)
      └── Outputs: recon_deliverable.md

Phase 2: VULNERABILITY ANALYSIS (5 parallel agents)
  ├── Injection Analysis   → injection_analysis.md + exploitation_queue.json
  ├── XSS Analysis         → xss_analysis.md + exploitation_queue.json
  ├── Auth Analysis        → auth_analysis.md + exploitation_queue.json
  ├── SSRF Analysis        → ssrf_analysis.md + exploitation_queue.json
  └── AuthZ Analysis       → authz_analysis.md + exploitation_queue.json

Phase 3: EXPLOITATION (5 parallel agents, conditional)
  ├── Injection Exploit    → injection_exploitation_evidence.md
  ├── XSS Exploit          → xss_exploitation_evidence.md
  ├── Auth Exploit         → auth_exploitation_evidence.md
  ├── SSRF Exploit         → ssrf_exploitation_evidence.md
  └── AuthZ Exploit        → authz_exploitation_evidence.md

Phase 4: REPORTING
  └── comprehensive_security_assessment_report.md

What Each Phase Does

Pre-Recon reads source code to understand the application architecture, identify entry points, map data flows, and find potential vulnerability patterns before any network interaction.

Recon maps the live attack surface: crawls the app with a headless browser, enumerates API endpoints, identifies technologies, scans for open ports.

Vulnerability Analysis agents work in parallel, each specializing in one category. They combine source code knowledge with recon data to hypothesize specific vulnerabilities and create exploitation queues.

Exploitation agents receive the queues and attempt real attacks using browser automation (Playwright) and HTTP requests. Only proven exploits are included in the final report.

4. Interpreting Reports

Severity Levels

Severity	Definition	Action
Critical	Direct data breach, RCE, full authentication bypass	Fix immediately, consider taking app offline
High	Significant data exposure, privilege escalation, stored XSS	Fix within 24-48 hours
Medium	Limited data exposure, CSRF, reflected XSS, information disclosure	Fix within 1-2 weeks
Low	Minor information leaks, missing headers, verbose errors	Fix in next sprint

Reading a Finding

Each finding in the report includes:

## [CRITICAL] SQL Injection in /api/users/search

**Endpoint:** GET /api/users/search?q=
**Parameter:** q
**Type:** Union-based SQL injection

### Proof of Concept
GET /api/users/search?q=' UNION SELECT username,password,NULL FROM users--

### Response Evidence
HTTP/1.1 200 OK
[{"username":"admin","password":"$2b$12$...","3":null}]

### Source Code Reference
File: src/routes/users.ts:42
const results = await db.query(`SELECT * FROM users WHERE name LIKE '%${req.query.q}%'`);

### Remediation
Use parameterized queries:
const results = await db.query('SELECT * FROM users WHERE name LIKE $1', [`%${req.query.q}%`]);

False Positive Identification

Shannon's "no exploit, no report" policy minimizes false positives, but review for:

Environment-specific: Exploit only works in test environment (different DB, debug mode)
Already mitigated: WAF or middleware blocks the attack in production but not staging
Intended behavior: Feature that looks like a vulnerability (e.g., admin search returns all users by design)
LLM hallucination: Report claims a vulnerability but the PoC doesn't actually demonstrate impact

Always verify the PoC manually for Critical/High findings before filing tickets.

5. CI/CD Integration

Pre-Deploy Security Gate

# .github/workflows/security.yml
name: Security Pentest
on:
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 2 * * 1'  # Weekly Monday 2am

jobs:
  pentest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start test application
        run: docker compose -f docker-compose.test.yml up -d

      - name: Wait for app
        run: |
          for i in $(seq 1 30); do
            curl -s http://localhost:3000/health && break
            sleep 2
          done

      - name: Run Shannon pentest
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git clone https://github.com/KeygraphHQ/shannon.git /tmp/shannon
          cd /tmp/shannon
          ./shannon start URL=http://host.docker.internal:3000 REPO=pr-${{ github.event.pull_request.number }}

      - name: Check for critical findings
        run: |
          REPORT="/tmp/shannon/workspaces/pr-${{ github.event.pull_request.number }}/comprehensive_security_assessment_report.md"
          if [ ! -f "$REPORT" ]; then
            echo "::error::Security report not found at $REPORT — pentest may have failed. Blocking deploy."
            exit 1
          fi
          # Count severity headings (format: ## [CRITICAL] or ## [HIGH])
          CRITICAL_COUNT=$(grep -c '^\#\#.*\[CRITICAL\]' "$REPORT" || true)
          HIGH_COUNT=$(grep -c '^\#\#.*\[HIGH\]' "$REPORT" || true)
          if [ "$CRITICAL_COUNT" -gt 0 ]; then
            echo "::error::$CRITICAL_COUNT critical vulnerabilities found! Review the security report."
            cat "$REPORT"
            exit 1
          fi
          if [ "$HIGH_COUNT" -gt 0 ]; then
            echo "::warning::$HIGH_COUNT high-severity vulnerabilities found. Manual review required."
          fi

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: security-report
          path: /tmp/shannon/workspaces/pr-*/comprehensive_security_assessment_report.md

Integration Patterns

Pattern	When	Cost	Coverage
Full pentest on PR	Every pull request to main	~$50/run	Complete
Weekly scheduled	Cron job on staging	~$200/month	Complete
Quick single-category	Pre-merge for risky changes	~$10/run	One vuln type
Pre-release gate	Before production deploy	~$50/run	Complete

Cost Management

Estimated costs per run (varies with model selection and pricing):
- Simple app (5-10 endpoints):     ~$15-25
- Medium app (20-50 endpoints):    ~$30-50
- Complex app (100+ endpoints):    ~$50-100

Note: Costs are approximate and depend on the configured model. Check your
API provider's current pricing for accurate estimates.

Cost reduction strategies:
1. Use CONFIG to narrow scope (focus/avoid rules)
2. Run single-category scans for targeted checks
3. Use named workspaces to resume interrupted scans
4. Schedule full scans weekly, quick scans on PRs

6. Post-Pentest Workflow

Triage → Fix → Verify

1. TRIAGE (Day 0)
   ├── Read the full report
   ├── Verify all Critical/High PoCs manually
   ├── Create tickets with severity labels
   ├── Assign owners and deadlines
   └── Notify stakeholders for Critical findings

2. FIX (Day 1-14, based on severity)
   ├── Critical: same day
   ├── High: within 48 hours
   ├── Medium: within 2 weeks
   └── Low: next sprint

3. VERIFY (After fix)
   ├── Re-run Shannon against the same workspace
   │   └── ./shannon start URL=<url> REPO=<same-name> WORKSPACE=verify
   ├── Completed agents are skipped (resumable)
   ├── Confirm the PoC no longer works
   └── Update ticket status

4. DOCUMENT
   ├── Archive the report
   ├── Update security runbook with new patterns
   ├── Add regression tests for each finding
   └── Schedule next pentest

Regression Testing

For each finding, create a permanent test:

// tests/security/sql-injection.test.ts
describe('SQL Injection regression', () => {
  it('should not be vulnerable to union-based injection in /api/users/search', async () => {
    const res = await request(app)
      .get("/api/users/search")
      .query({ q: "' UNION SELECT username,password,NULL FROM users--" });

    // Should NOT return other users' data
    expect(res.body).not.toEqual(
      expect.arrayContaining([
        expect.objectContaining({ username: 'admin' })
      ])
    );
  });

  it('should use parameterized queries', async () => {
    const res = await request(app)
      .get("/api/users/search")
      .query({ q: "test" });

    expect(res.status).toBe(200);
    // Normal search should still work
  });
});

7. What Shannon Doesn't Cover

Supplement with manual testing or other tools:

Gap	Alternative
Business logic flaws	Manual review, threat modeling
Mobile app testing	OWASP MAS, Frida, Objection
Infrastructure/cloud	ScoutSuite, Prowler, CloudSploit
Container security	Trivy, Grype, Docker Bench
API rate limiting	Custom load testing (k6, Artillery)
GraphQL deep testing	InQL, graphql-cop
WebSocket testing	OWASP ZAP WebSocket plugin
Dependency vulnerabilities	npm audit, Snyk, Socket.dev
Secrets in source code	TruffleHog, GitLeaks, detect-secrets
CORS misconfiguration	CORScanner, manual review
HTTP request smuggling	smuggler, h2csmuggler
Race conditions / TOCTOU	Turbo Intruder, manual testing
Cache poisoning	Web Cache Deception Scanner
Host header injection	Manual review of password reset flows

Complementary Tool Stack

# Run alongside Shannon for full coverage:

# Dependency scanning
npm audit --production
npx snyk test

# Secret detection
trufflehog git file://. --only-verified

# Container scanning
trivy image myapp:latest

# Infrastructure
prowler aws --severity critical high

# API fuzzing
schemathesis run http://localhost:3000/openapi.json

8. Safe Testing Practices

Rules of Engagement

DO:
  ✓ Only test applications you own or have written authorization to test
  ✓ Use staging/test environments, never production
  ✓ Create dedicated test accounts with known credentials
  ✓ Set scope rules to avoid destructive endpoints
  ✓ Review reports before sharing (may contain sensitive data)
  ✓ Keep API keys secure (Shannon uses significant API credits)

DON'T:
  ✗ Point Shannon at production systems
  ✗ Test third-party services without explicit written permission
  ✗ Share reports containing valid credentials or PII
  ✗ Run without scope rules on apps with destructive endpoints
  ✗ Ignore the cost — monitor API spend during runs

Test Environment Setup

# docker-compose.test.yml — isolated test environment
services:
  app:
    build: .
    environment:
      - NODE_ENV=test
      - DATABASE_URL=postgres://test:test@db:5432/testdb
    ports:
      - "3000:3000"
    networks:
      - pentest-net

  db:
    image: postgres:16
    environment:
      - POSTGRES_DB=testdb
      - POSTGRES_USER=test
      - POSTGRES_PASSWORD=test
    networks:
      - pentest-net

networks:
  pentest-net:
    driver: bridge
    # Isolated network — no access to host or internet