Adaptive review sampling-rate tuning¶
Autonomously retune bounded QA, audit, or spot-check sampling rates using review yield and miss signals so oversight coverage stays efficient without changing case disposition, queue order, or live operational execution.
Metadata¶
- Pattern id:
adaptive-review-sampling-rate-tuning - Pattern family: Optimize / Adapt
- Problem structure: Feedback-driven optimization (
feedback-driven-optimization) - Domains: Support (
support), Research (research), Operations (operations)
Workflow goal¶
Continuously adjust internal quality-review, audit, or spot-check sampling rates within preapproved bounds so oversight effort tracks observed defect yield, escaped-issue risk, and reviewer capacity while keeping protected cohorts, rollback, privacy, and auditability explicit.
Inputs¶
Current review sampling policy¶
- Description: The active sampling configuration covering baseline rates, stratified review quotas, protected-cohort floors, cooldown windows, and any currently frozen review classes.
- Kind: sampling-policy
- Required: Yes
- Examples:
- Support QA policy with distinct sample rates for severity-one escalations, routine closures, and outage follow-ups
- Research artifact spot-check policy with separate rates for embargoed studies, rerun-failure cohorts, and routine replication packets
Review outcome and escape history¶
- Description: Historical findings, miss signals, reviewer overrides, reopened quality concerns, and backfilled audit discoveries showing where the current sampling policy is over- or under-covering risk.
- Kind: outcome-history
- Required: Yes
- Examples:
- Quality-review findings showing that identity-recovery tickets reopened after low-sample weeks
- Spot-check results showing that low-volume benchmark artifacts with disclosure-sensitive annexes produced defects at a higher rate than the default sample assumed
Sampling guardrails and delegated bounds¶
- Description: The approved minimum and maximum sampling rates, protected strata, maximum step size, reviewer-load ceilings, and freeze conditions that constrain autonomous tuning.
- Kind: policy
- Required: Yes
- Examples:
- Policy requiring a nonzero sample floor for executive escalations and security-adjacent support cases
- Governance rule limiting any one tuning cycle to a five-point sampling-rate shift unless a human owner approves more
Review operating context¶
- Description: Current reviewer capacity, workflow mix, recent process changes, and policy updates that may justify a temporary sampling adjustment or invalidate older outcome windows.
- Kind: operating-context
- Required: No
- Examples:
- Temporary loss of senior QA reviewers after a major support migration
- New artifact-packaging rules that make prior benchmark-review yield no longer comparable
Outputs¶
Updated review sampling policy¶
- Description: The applied configuration artifact with revised rates, protected-cohort floors, and any temporary hold or cooldown metadata, kept inside approved bounds.
- Kind: sampling-policy
- Required: Yes
- Examples:
- Increased sample rate for outage-related identity recoveries while holding routine low-risk closure reviews at baseline
- Raised spot-check coverage for embargoed benchmark studies with rerun instability while leaving standard internal baselines unchanged
Sampling change record¶
- Description: Structured audit record of the signals examined, bounds checked, prior and new rates, reviewer-load impact estimate, and rollback trigger status for each tuning cycle.
- Kind: audit-log
- Required: Yes
- Examples:
- Log showing that a support QA sample increase was driven by reopen clusters and remained below the delegated reviewer-capacity ceiling
- Record explaining why one proposed decrease in benchmark spot checks was blocked because a protected annex cohort floor would have been crossed
Sampling freeze or escalation packet¶
- Description: Optional packet emitted when evidence is sparse, the proposed change would cross a protected bound, or recent misses suggest the workflow should stop autonomous tuning until human review occurs.
- Kind: review-packet
- Required: No
- Examples:
- Freeze notice generated after maintenance-documentation defects rise despite several downward sample-rate moves
- Escalation packet requesting governance review because new privacy rules changed which transcript features may be used for sampling strata
Environment¶
Operates in governed internal oversight workflows where the main artifact being changed is a review-sampling policy rather than the underlying operational queue, case disposition, or remedial action.
Systems¶
- QA, audit, or spot-check configuration store
- Case, artifact, or work-record system that supplies sample candidates
- Review findings and escaped-issue history store
- Governance dashboard for freeze, rollback, and sampled audit review
- Audit log or policy versioning system
Actors¶
- Quality or audit owner
- Reviewer or assurance analyst
- Operations or program lead responsible for oversight coverage
- Auditor or governance steward
Constraints¶
- Autonomous tuning may change only bounded sampling rates, stratified quotas, or protected-cohort floors that were preapproved for delegated adjustment.
- The workflow must stop at updating the sampling policy and its audit trace; it must not adjudicate reviewed items, reprioritize live work, assign specific reviewers, or trigger remediation.
- Sampling changes must remain quickly reversible, and any evidence of hidden miss growth or protected-cohort undercoverage must force rollback or freeze.
- Audit packets and configuration history must remain explainable enough for ex post oversight without exposing unnecessary sensitive case detail.
Assumptions¶
- Review findings and escaped-issue signals are available with enough fidelity to distinguish persistent drift from short-term noise.
- Protected strata, reviewer-load ceilings, and maximum autonomous step sizes are versioned in a policy source the tuning loop can read and enforce.
- The surrounding review program can backfill or temporarily increase sample coverage if a later audit shows that a recent autonomous decrease was too aggressive.
Capability requirements¶
- Monitoring (
monitoring): The workflow depends on continuous observation of review yield, escape signals, override rates, and reviewer capacity rather than one-off calibration. - Optimization (
optimization): The core task is retuning future oversight coverage so review effort follows the highest-yield or highest-risk strata within explicit limits. - Policy and constraint checking (
policy-and-constraint-checking): Every proposed rate change must be checked against protected-cohort floors, step-size limits, reviewer-load caps, and freeze conditions before it is applied. - Verification (
verification): The loop must validate that apparent changes in review yield are supported by trusted findings and not explained only by stale policy, tiny samples, or data quality defects. - Memory and state tracking (
memory-and-state-tracking): Durable state is needed to compare current and prior sampling policies, track cumulative drift, and preserve rollback lineage across tuning cycles. - Tool use (
tool-use): The workflow must update configuration stores and append audit records rather than merely suggest a rate change in prose. - Exception handling (
exception-handling): Sparse evidence, protected-floor conflicts, unexpected reviewer saturation, or repeated escaped defects must trigger freeze or escalation instead of forced tuning.
Execution architecture¶
- Event-driven monitoring (
event-driven-monitoring): New review findings, escaped defects, override clusters, or reviewer-capacity shifts naturally trigger bounded reevaluation of the active sampling policy. - Tool-using single agent (
tool-using-single-agent): One governed agent can usually inspect review telemetry, compare candidate sampling moves against bounds, update the approved policy artifact, and write the audit record in one loop.
Autonomy profile¶
- Level: Autonomous with audit (
autonomous-with-audit) - Reversibility: Sampling-rate changes are usually easy to roll back by restoring the previous configuration version. Harm from a temporary undersampling period may be only partially reversible because some quality issues can escape review until the next audit or backfill sweep detects them.
- Escalation: Escalate when a proposed move would cross a protected floor or reviewer-load ceiling, when evidence quality is too sparse or contradictory to justify a change, when cumulative drift approaches a delegated bound, or when escaped-issue patterns rise after a recent decrease in coverage.
Human checkpoints¶
- Humans define the protected strata, minimum and maximum rates, reviewer-load ceilings, and freeze conditions that bound autonomous tuning.
- Governance owners review sampled tuning runs and periodic drift summaries to confirm that escaped defects, fairness posture, privacy handling, and reviewer burden remain acceptable.
- Humans take over when policy changes, repeated misses, or protected-cohort concerns make the delegated bounds no longer trustworthy.
Risk and governance¶
- Risk level: Low (
low) - Failure impact: Mis-tuned sampling changes oversight efficiency and may delay detection of quality issues, but it does not alter underlying case outcomes or execute external actions, and it can usually be corrected by restoring prior rates and backfilling spot checks.
- Auditability: Preserve the current and updated sampling-policy versions, findings windows analyzed, reviewer-capacity assumptions, blocked proposals, freeze events, rollback actions, and sampled human-audit results for each tuning cycle.
Approval requirements¶
- Autonomous tuning may apply only within preapproved rate bands, protected-stratum floors, and maximum step sizes defined by the oversight owner.
- Human approval is required for any change that would alter protected cohort definitions, materially increase reviewer-load ceilings, or reduce coverage below the delegated minimum for a monitored class.
Privacy¶
- Use aggregated findings and minimally necessary review metadata when computing sampling changes so audit packets do not expose unnecessary customer, participant, or worker detail.
- Keep any sensitive case excerpts or artifact references in restricted annexes instead of general tuning logs when authorized reviewers need supporting detail.
Security¶
- Restrict who can edit protected-stratum rules, rate limits, and service accounts that apply sampling updates.
- Log every autonomous configuration change, rollback, and manual freeze so unauthorized manipulation of oversight coverage is detectable.
Notes: Low-risk posture is appropriate because the workflow changes only bounded internal oversight coverage and remains reversible, audit-ready, and separate from the primary operational or adjudicative workflow being sampled.
Why agentic¶
- Useful tuning requires interpreting delayed review yield and escape signals across multiple strata rather than relying on one static sample rate.
- The workflow must decide whether observed defect changes reflect true drift, temporary regime change, or noisy sample windows and then adapt or freeze accordingly.
- Durable memory, constraint checking, and rollback awareness matter because repeated small sampling moves can create hidden undercoverage even when each individual change seems harmless.
Failure modes¶
The workflow lowers coverage after a short quiet period and misses a latent quality drift¶
- Impact: Escaped defects or audit failures rise because the loop treated temporary calm as evidence that a risky cohort needed less review.
- Severity: medium
- Detectability: medium
- Mitigations:
- Require minimum evidence windows and backstop protected-floor rates for cohorts with historically severe misses.
- Trigger rollback or freeze when escaped-issue indicators rise after a downward tuning move.
Reviewer-load optimization crowds out protected cohorts¶
- Impact: The system preserves reviewer capacity by reducing sampling in lower-volume or less visible strata that still need consistent oversight.
- Severity: medium
- Detectability: high
- Mitigations:
- Encode protected-cohort floors as hard constraints rather than soft preferences.
- Report separate fairness or coverage checks for protected cohorts before each autonomous update is applied.
Sampling rates oscillate because the workflow chases noisy short-term findings¶
- Impact: Reviewers experience unstable workload and the audit trail becomes hard to interpret because rates move up and down without converging.
- Severity: low
- Detectability: high
- Mitigations:
- Limit step size and require cooldown periods before revisiting the same stratum.
- Compare candidate moves against longer baseline windows and cumulative drift, not only the latest batch.
Policy or privacy changes invalidate the current strata definition¶
- Impact: The workflow continues tuning on outdated cohort definitions or features that should no longer influence sampling.
- Severity: medium
- Detectability: high
- Mitigations:
- Version strata definitions and freeze autonomous tuning when the governing policy source changes materially.
- Route policy-adjacent changes to human review instead of silently reusing stale grouping logic.
Evaluation¶
Success metrics¶
- Increased defect yield or meaningful finding rate per review hour without a corresponding rise in escaped issues for protected cohorts.
- Percentage of autonomous tuning cycles applied within delegated bounds without later rollback for preventable reasons.
- Reduction in reviewer overload or idle review effort caused by static sampling that no longer matches current risk.
Quality criteria¶
- Sampling-policy changes remain explainable in terms of findings, protected floors, workload limits, and recent drift rather than opaque score shifts.
- The workflow never expands into case adjudication, reviewer assignment, queue management, or remedial execution.
- Prior policy versions, freeze events, and rollback lineage remain reconstructable for every autonomous change.
Robustness checks¶
- Test a quiet-period scenario followed by a hidden quality regression and confirm protected-floor coverage prevents dangerous undersampling.
- Simulate reviewer-capacity loss and verify the workflow reduces discretionary sampling before touching protected cohorts.
- Introduce a policy change that invalidates the existing strata map and ensure autonomous tuning freezes pending human review.
Benchmark notes: Evaluate review-sampling quality on oversight effectiveness, boundary discipline, and rollback readiness together; lower review volume is not success if meaningful misses or protected-cohort blind spots rise.
Implementation notes¶
Orchestration notes¶
- Separate telemetry intake, candidate-rate generation, guardrail checking, configuration update, and audit-log writing so each stage is inspectable and replayable.
- Preserve cumulative drift and post-change outcome markers in the same state history used for later sampled human audits.
Integration notes¶
- Common implementations integrate QA or audit tools, case or artifact repositories, findings stores, policy registries, and versioned configuration services.
- Keep the pattern neutral about whether the sampled items are tickets, benchmark artifacts, maintenance records, or another governed review substrate.
Deployment notes¶
- Start with narrow strata and conservative delegated ranges until oversight owners trust the loop's freeze and rollback behavior.
- Review early autonomous decreases especially closely because hidden undercoverage often appears later than reviewer-load savings.
References¶
- docs/patterns/optimize-adapt.md
- data/vocabularies/problem-structures.yaml
- data/vocabularies/autonomy-levels.yaml
Example domains¶
- Support (
support): Raise the spot-check rate for outage-related enterprise support closures after reopened identity incidents spike, while preserving fixed minimum coverage for security-sensitive escalations. - Research (
research): Increase artifact spot-check coverage for embargoed benchmark studies with rerun instability and disclosure-sensitive annexes while leaving routine internal baselines at the default rate. - Operations (
operations): Retune maintenance-documentation review sampling toward assets, sites, or vendors with rising documentation escapes without changing dispatch schedules or maintenance execution itself.
Related patterns¶
- Adaptive threshold calibration (adjacent)
- Both patterns make bounded feedback-driven adjustments autonomously, but this one tunes oversight sampling coverage rather than detection sensitivity thresholds.
- Queue prioritization optimization (contrasts-with)
- Queue optimization changes the order of live work, while this pattern changes only how much review coverage future work receives.
- Anomaly detection review (can-monitor)
- Drift or miss signals surfaced by anomaly review can become feedback inputs for future review-sampling adjustments.
Grounded instances¶
- Payroll change audit spot-check sampling-rate tuning
- Cold-chain deviation closure spot-check sampling-rate tuning
- Maintenance documentation spot-check sampling-rate tuning
- Embargoed benchmark artifact spot-check sampling-rate tuning
- Enterprise support quality-review sampling-rate tuning
Canonical source¶
data/patterns/optimize-adapt/adaptive-review-sampling-rate-tuning.yaml