Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Vibe Testing (Not finished yet)

Before writing tests, understand what exists, what to test, and what’s needed to test correctly.


Phase 1: Codebase Understanding

Understand structure before mapping details.

Prompt Template

Scan the codebase and identify:

1. External Dependencies
   - All external API calls (LLM providers, third-party services)
   - Database connections and operations
   - File system operations
   - Network/HTTP operations

2. Major Internal Modules
   - List each module/package
   - Identify its primary responsibility
   - Note its dependencies on other modules

3. Architectural Pattern Analysis
   - What pattern is in use? (hexagonal, layered, clean, etc.)
   - Where are the boundaries? (ports, adapters, interfaces)
   - What is core logic vs infrastructure?

Put this analysis in devdocs/testing/plan/phase1_codebase_understanding.md

Focus on structure only. No operation details yet.

Expected Output

# Phase 1: Codebase Understanding

## External Dependencies
| Dependency | Type | Location |
|------------|------|----------|
| OpenAI API | LLM Provider | adapters/llm/ |
| PostgreSQL | Database | adapters/storage/ |
| Redis | Cache | adapters/cache/ |

## Internal Modules
| Module | Responsibility | Depends On |
|--------|---------------|------------|
| core/ | Business logic | ports/ |
| adapters/ | External integrations | (external) |
| ports/ | Interfaces | (none) |

## Architecture
- Pattern: Hexagonal
- Core: core/
- Ports: ports/
- Adapters: adapters/
- Boundaries: All external access through adapters

Phase 2: Operation Mapping

Comprehensive identification of all operations. This needs depth.

Prompt Template

Based on the codebase understanding from phase1, create a comprehensive operation map.

For EACH operation in the codebase:

1. Operation Identification
   - Name and location
   - What it does (one sentence)
   - Input → Output

2. Sub-operations
   - What operations does this depend on?
   - What operations depend on this?

3. Architectural Layer
   - pure logic / adapter / port / core integration
   - Can this be tested in isolation?

4. Coupling Analysis
   - Is this truly isolated?
   - Or does it only make sense composed with other operations?

Put this in devdocs/testing/plan/phase2_operation_mapping.md

Be comprehensive. Every operation matters.

Expected Output

# Phase 2: Operation Mapping

## Operations Inventory

### LLM Operations
| Operation | Location | Does | Layer | Isolated? |
|-----------|----------|------|-------|-----------|
| send_prompt | adapters/llm/openai.py | Sends prompt to OpenAI | Adapter | Yes |
| parse_response | adapters/llm/parser.py | Extracts structured data | Pure logic | Yes |
| agent_decide | core/agent.py | LLM + tools + state | Core | No - coupled |

### Storage Operations
| Operation | Location | Does | Layer | Isolated? |
|-----------|----------|------|-------|-----------|
| save_session | adapters/storage/pg.py | Persists session | Adapter | Yes |
| ... | ... | ... | ... | ... |

## Dependency Graph

agent_decide ├── send_prompt ├── parse_response ├── tool_executor │ └── (various tools) └── state_manager └── save_session


## Coupling Groups
Operations that must be tested together:
- Group 1: agent_decide + send_prompt + parse_response (agent loop)
- Group 2: ...

Phase 3: Concept Mapping

Test abstract qualities, not just operations.

Prompt Template

Identify concepts that need testing beyond individual operations.

1. Performance Concepts
   - Response time requirements
   - Throughput expectations
   - Resource usage limits

2. Architecture Concepts
   - Are boundaries respected?
   - Is core independent of adapters?
   - Are ports properly abstracted?

3. Reliability Concepts
   - Error recovery
   - Retry behavior
   - Graceful degradation

4. Data Concepts
   - Data integrity across operations
   - Transformation correctness
   - State consistency

5. Security Concepts
   - Authentication boundaries
   - Authorization checks
   - Data isolation

Put this in devdocs/testing/plan/phase3_concept_mapping.md

For each concept, note:
- What does "correct" look like?
- How would we know if it's broken?
- What operations are involved?

Expected Output

# Phase 3: Concept Mapping

## Performance Concepts
| Concept | Correct Looks Like | Broken Looks Like | Operations Involved |
|---------|-------------------|-------------------|---------------------|
| LLM Response Time | < 5s for simple queries | > 10s or timeout | send_prompt, parse_response |
| DB Write Speed | < 100ms per write | > 500ms | save_session, save_message |

## Architecture Concepts
| Concept | Correct | Broken | How to Verify |
|---------|---------|--------|---------------|
| Core Independence | core/ imports only from ports/ | core/ imports from adapters/ | Import analysis |
| Adapter Isolation | Adapters don't know about each other | Cross-adapter imports | Dependency check |

## Reliability Concepts
| Concept | Correct | Broken | Test Approach |
|---------|---------|--------|---------------|
| LLM Retry | 3 retries on timeout, then fail gracefully | Crash on first failure | Kill connection mid-request |
| DB Reconnect | Auto-reconnect after connection drop | Permanent failure | Restart DB during test |

## Data Concepts
| Concept | Correct | Broken | Test Approach |
|---------|---------|--------|---------------|
| Session Integrity | All messages in order | Missing or duplicated messages | Concurrent write test |
| State Consistency | State reflects all operations | Stale or partial state | Crash recovery test |

## Security Concepts
| Concept | Correct | Broken | Test Approach |
|---------|---------|--------|---------------|
| User Isolation | User A cannot see User B data | Data leakage | Cross-user query test |

Phase 4: Test Requirements Document

What’s needed for tests to be meaningful.

Prompt Template

Based on the operations and concepts identified, document what's required for correct testing.

1. Real Data Requirements
   - What real files are needed? (videos, images, documents)
   - What format and specifications?
   - Where should they be located?

2. External Service Requirements
   - What services must be running?
   - What credentials are needed?
   - Can we use sandboxed/test versions?

3. Environment Requirements
   - Minimum hardware specs
   - Required environment variables
   - Network access needs

4. Test Data Specifications
   - What edge cases must be covered?
   - What data variations are needed?
   - Sample data schemas

5. State Requirements
   - What must exist before tests run?
   - Database seeds needed?
   - Files that must be present?

Put this in devdocs/testing/plan/requirements_for_correct_testing.md

Without these requirements met, tests pass but prove nothing.

Expected Output

# Requirements for Correct Testing

## Real Data Requirements

### Video Files
| Requirement | Specification | Location | Purpose |
|-------------|--------------|----------|---------|
| Test video | MP4, 10MB, 30s, 720p | tests/fixtures/video/ | Video processing tests |
| Corrupt video | MP4 with broken headers | tests/fixtures/video/ | Error handling tests |
| Large video | MP4, 500MB | tests/fixtures/video/ | Performance tests |

### Documents
| Requirement | Specification | Location | Purpose |
|-------------|--------------|----------|---------|
| PDF samples | Various sizes, 1-100 pages | tests/fixtures/docs/ | Document parsing |
| Unicode text | UTF-8 with emojis, RTL text | tests/fixtures/docs/ | Encoding tests |

## External Service Requirements

| Service | Required For | Test Version Available? | Credentials |
|---------|-------------|------------------------|-------------|
| OpenAI API | LLM tests | No - use real with low quota | OPENAI_API_KEY |
| PostgreSQL | Storage tests | Yes - Docker container | tests/docker-compose.yml |
| Redis | Cache tests | Yes - Docker container | tests/docker-compose.yml |

## Environment Requirements

| Requirement | Minimum | Recommended | Notes |
|-------------|---------|-------------|-------|
| RAM | 4GB | 8GB | For load tests |
| Disk | 2GB free | 10GB free | For video fixtures |
| Network | Internet access | Low latency | For LLM API calls |

### Environment Variables
```bash
# Required for all tests
DATABASE_URL=postgresql://test:test@localhost:5432/test
REDIS_URL=redis://localhost:6379

# Required for LLM tests
OPENAI_API_KEY=sk-...

# Optional
LOG_LEVEL=DEBUG
TEST_TIMEOUT=30

Test Data Specifications

User Data Edge Cases

Must test with:

  • Empty username
  • Unicode username (日本語, العربية)
  • Maximum length username (255 chars)
  • Special characters (!@#$%^&*)
  • SQL injection attempts
  • XSS attempts

Session Data

Must include:

  • Empty session
  • Session with 1 message
  • Session with 1000 messages
  • Session with large attachments
  • Corrupted session data

Pre-Test State Requirements

Database

-- Must exist before tests
CREATE DATABASE test_db;

-- Seed data required
INSERT INTO users (id, name) VALUES ('test-user', 'Test User');

File System

tests/
├── fixtures/
│   ├── video/
│   │   └── (required video files)
│   ├── docs/
│   │   └── (required document files)
│   └── data/
│       └── (required JSON/CSV files)
└── tmp/
    └── (writable directory for test outputs)

Checklist Before Running Tests

  • Docker containers running (PostgreSQL, Redis)
  • Environment variables set
  • Test fixtures present
  • Network access available
  • Sufficient disk space
  • API keys valid and have quota

---

## Test Types Summary

After completing phases 1-4, you understand:

| What | Phase | Produces |
|------|-------|----------|
| Structure | Phase 1 | Codebase map |
| Operations | Phase 2 | What to test mechanically |
| Concepts | Phase 3 | What to test abstractly |
| Requirements | Phase 4 | What's needed to test correctly |

Now you can create two types of tests:

### Probe Tests (Discovery)
- Used during development
- Human observed, verbose output
- Establishes "what works"
- Teaches you how the system behaves

### CI/CD Tests (Reliability)
- Automated, runs on every commit
- Pass/fail gates
- Maintains "what must keep working"
- Codified knowledge from probe tests

Phase 1-4 → Understand what to test ↓ Probe Tests → Discover what works, learn behavior ↓ Codify → Turn discoveries into CI/CD tests ↓ CI/CD Tests → Maintain reliability over time