Test Approval Workflow

The Hive framework implements a "Human-in-the-Loop" (HITL) testing philosophy. Because Hive agents are self-improving and can generate their own test cases to validate code evolution, the framework requires explicit approval before these tests are committed to the permanent test suite.

This workflow ensures that LLM-generated tests are accurate, secure, and align with the intended goal requirements.

Overview

The approval workflow supports two primary interfaces:

Interactive CLI: A step-by-step terminal interface for manual review.
Batch/MCP Interface: A programmatic interface used by the Model Context Protocol (MCP) or external dashboards to process multiple approvals at once.

Approval Actions

Every generated test must be processed with one of the following ApprovalAction states:

Interactive CLI Workflow

The interactive workflow is the default method for developers to review tests locally. It displays the test metadata, description, expected inputs/outputs, and the generated source code.

from framework.testing.approval_cli import interactive_approval
from framework.testing.test_storage import TestStorage

# Initialize storage and fetch pending tests
storage = TestStorage(path="./tests/hive")
pending_tests = storage.get_pending_tests(goal_id="research_agent_v1")

# Launch the interactive terminal UI
results = interactive_approval(
    tests=pending_tests,
    storage=storage
)

During the interactive session, the CLI prompts the user for input ([a]pprove, [r]eject, [e]dit, [s]kip). If edit is chosen, the framework opens the system's default editor (via $EDITOR) with the test code.

Programmatic & Batch Approval

For integrations with external tools or the Hive dashboard, the batch_approval function allows for processing multiple requests using Pydantic-validated models.

Data Models

ApprovalRequest

The core model for submitting a decision on a test.

class ApprovalRequest(BaseModel):
    test_id: str
    action: ApprovalAction
    modified_code: str | None = None  # Required if action is MODIFY
    reason: str | None = None         # Required if action is REJECT
    approved_by: str = "user"

BatchApprovalResult

Returned after processing a batch, providing a summary of the operations.

class BatchApprovalResult(BaseModel):
    goal_id: str
    total: int
    approved: int
    modified: int
    rejected: int
    skipped: int
    errors: int
    results: list[ApprovalResult]

Usage Example

from framework.testing.approval_cli import batch_approval
from framework.testing.approval_types import ApprovalRequest, ApprovalAction

requests = [
    ApprovalRequest(
        test_id="test_001", 
        action=ApprovalAction.APPROVE
    ),
    ApprovalRequest(
        test_id="test_002", 
        action=ApprovalAction.REJECT, 
        reason="Redundant with existing integration tests"
    )
]

batch_result = batch_approval(
    goal_id="research_agent_v1",
    requests=requests,
    storage=storage
)

print(batch_result.summary())
# Output: Processed 2 tests: 1 approved, 0 modified, 1 rejected, 0 skipped, 0 errors

Validation Logic

The framework enforces strict validation before an approval is processed:

Modify Validation: If the action is MODIFY, modified_code must be present and non-empty.
Reject Validation: If the action is REJECT, a reason must be provided to help the agent understand why the test was unsuitable, which informs future self-improvement cycles.
Persistence: Tests are only moved from the pending state to the active (or rejected) state in the TestStorage after a successful ApprovalResult is generated.