Test Approval Workflow
The Hive framework implements a "Human-in-the-Loop" (HITL) testing philosophy. Because Hive agents are self-improving and can generate their own test cases to validate code evolution, the framework requires explicit approval before these tests are committed to the permanent test suite.
This workflow ensures that LLM-generated tests are accurate, secure, and align with the intended goal requirements.
Overview
The approval workflow supports two primary interfaces:
- Interactive CLI: A step-by-step terminal interface for manual review.
- Batch/MCP Interface: A programmatic interface used by the Model Context Protocol (MCP) or external dashboards to process multiple approvals at once.
Approval Actions
Every generated test must be processed with one of the following ApprovalAction states:
| Action | Description |
| :--- | :--- |
| APPROVE | Accepts the test exactly as generated. |
| MODIFY | Accepts the test after the user provides updated test code. |
| REJECT | Declines the test. Requires a reason string. |
| SKIP | Leaves the test in a pending state for later review. |
Interactive CLI Workflow
The interactive workflow is the default method for developers to review tests locally. It displays the test metadata, description, expected inputs/outputs, and the generated source code.
from framework.testing.approval_cli import interactive_approval
from framework.testing.test_storage import TestStorage
# Initialize storage and fetch pending tests
storage = TestStorage(path="./tests/hive")
pending_tests = storage.get_pending_tests(goal_id="research_agent_v1")
# Launch the interactive terminal UI
results = interactive_approval(
tests=pending_tests,
storage=storage
)
During the interactive session, the CLI prompts the user for input ([a]pprove, [r]eject, [e]dit, [s]kip). If edit is chosen, the framework opens the system's default editor (via $EDITOR) with the test code.
Programmatic & Batch Approval
For integrations with external tools or the Hive dashboard, the batch_approval function allows for processing multiple requests using Pydantic-validated models.
Data Models
ApprovalRequest
The core model for submitting a decision on a test.
class ApprovalRequest(BaseModel):
test_id: str
action: ApprovalAction
modified_code: str | None = None # Required if action is MODIFY
reason: str | None = None # Required if action is REJECT
approved_by: str = "user"
BatchApprovalResult
Returned after processing a batch, providing a summary of the operations.
class BatchApprovalResult(BaseModel):
goal_id: str
total: int
approved: int
modified: int
rejected: int
skipped: int
errors: int
results: list[ApprovalResult]
Usage Example
from framework.testing.approval_cli import batch_approval
from framework.testing.approval_types import ApprovalRequest, ApprovalAction
requests = [
ApprovalRequest(
test_id="test_001",
action=ApprovalAction.APPROVE
),
ApprovalRequest(
test_id="test_002",
action=ApprovalAction.REJECT,
reason="Redundant with existing integration tests"
)
]
batch_result = batch_approval(
goal_id="research_agent_v1",
requests=requests,
storage=storage
)
print(batch_result.summary())
# Output: Processed 2 tests: 1 approved, 0 modified, 1 rejected, 0 skipped, 0 errors
Validation Logic
The framework enforces strict validation before an approval is processed:
- Modify Validation: If the action is
MODIFY,modified_codemust be present and non-empty. - Reject Validation: If the action is
REJECT, areasonmust be provided to help the agent understand why the test was unsuitable, which informs future self-improvement cycles. - Persistence: Tests are only moved from the
pendingstate to theactive(orrejected) state in theTestStorageafter a successfulApprovalResultis generated.