AI Decision Review Guidelines#

This document outlines the procedures for human analysts to review AI-driven alert triage and investigation decisions. It also defines a standardized feedback format to support the “Identification” and “Learning” phases of the PICERL AI Performance Framework, specifically targeting “Feedback Loop Metrics” and “Escalation-to-Accuracy Ratio.”

Purpose#

  • To establish a consistent process for human oversight of AI agent decisions.

  • To capture structured feedback on AI performance, enabling continuous improvement.

  • To build trust in AI capabilities by understanding and refining its decision-making.

  • To provide data for measuring AI Decision Accuracy, Escalation Rate & Accuracy, and Auto-Closed Alert Reversal Rate.

Scope of Review#

Human analysts should periodically review a sample of AI agent decisions, including:

  1. AI Auto-Closed Alerts: Alerts that the AI determined to be benign or false positive and closed without human intervention.

  2. AI Auto-Escalated Alerts: Alerts that the AI determined required human attention and escalated.

  3. AI-Driven Investigation Summaries/Findings: Key findings or summaries generated by AI during an investigation.

  4. AI-Initiated Automated Actions (Post-Facto Review): Review of automated containment or remediation actions triggered by AI (cross-reference with automated_response_playbook_criteria.md).

Review Frequency & Sampling#

  • Daily Spot Checks: A small, random sample of AI auto-closed and auto-escalated alerts from the past 24 hours.

  • Weekly Thematic Review: Focus on specific alert types, AI modules, or high-volume AI decisions to identify systemic issues or areas for improvement.

  • Post-Incident Review (for AI-involved incidents): Detailed review of all AI actions and decisions related to significant incidents.

Standardized Feedback Format#

When a human analyst reviews an AI decision and finds a discrepancy or area for improvement, the following structured feedback should be logged (e.g., in a dedicated SOAR case field, a review checklist, or a feedback database):

  1. Reviewed Item ID: (e.g., Alert ID, Case ID, AI Action ID)

  2. AI Decision/Action Taken: (e.g., “Auto-closed alert as False Positive”, “Escalated to Tier 2”, “Recommended blocking IP X.X.X.X”)

  3. AI Confidence (if available): The confidence score the AI assigned to its decision.

  4. Analyst Assessment of AI Decision:

    • AI_Decision_Correct: Yes / No

  5. If AI_Decision_Correct = No (Override/Correction Details):

    • Human_Corrected_Outcome: (e.g., “Reopened as True Positive - Low Severity”, “Downgraded Escalation to Info Only”, “Did not block IP - found legitimate business use”)

    • Reason_for_Override: (Free text, but encourage use of predefined categories if possible, e.g., “Incorrect FP assessment by AI”, “AI missed critical context”, “AI misinterpreted log data”, “Policy violation by AI suggestion”)

    • Confidence_of_Human_Correction: High / Medium / Low

    • Missing_Context_Identified_by_Human: (Specific data points or contextual information the AI lacked or misinterpreted, e.g., “AI did not consider that IP X.X.X.X is a known business partner from whitelists.md,” “AI misclassified PowerShell script as benign, but it matches TTP Y from internal_threat_profile.md”)

  6. If AI_Decision_Correct = Yes (Affirmation Details - Optional but useful):

    • Reason_for_Agreement: (e.g., “AI correctly identified benign scanner activity as per common_benign_alerts.md,” “AI accurately summarized key IOCs”)

  7. Analyst Name/ID:

  8. Review Timestamp:

  9. Suggested Improvement for AI (if any):

    • (e.g., “Update common_benign_alerts.md with this new scanner IP,” “Refine indicator_handling_protocols.md for this specific malware variant,” “Consider adding X field to data_normalization_map.md for better EDR correlation”)

Feedback Integration & Learning Loop#

  1. Collection: Structured feedback is collected in a centralized system (SOAR, dedicated database, or even a structured log file).

  2. Analysis:

    • Periodically (e.g., weekly/monthly), designated personnel (e.g., SOC Lead, AI Operations Lead, Detection Engineering) analyze the aggregated feedback.

    • Identify trends: Common reasons for AI errors, frequently missed context, types of alerts AI struggles with.

  3. Action & Improvement:

    • Update rules-bank: Based on feedback, update relevant documents like common_benign_alerts.md, whitelists.md, indicator_handling_protocols.md, automated_response_playbook_criteria.md, etc. This is the primary mechanism for “teaching” the AI through its knowledge base.

    • Refine AI Logic/Models (if applicable): For more advanced AI systems, feedback may be used to retrain models or adjust internal decision logic (this is outside the direct scope of rules-bank updates but is the ultimate goal).

    • Update Detection Rules: If AI feedback highlights issues with upstream detection quality, this feeds into the detection_improvement_process.md.

    • SOAR Playbook Adjustments: Modify SOAR playbooks that interact with or are triggered by AI.

  4. Metric Tracking: Use the collected feedback data to calculate:

    • AI Decision Accuracy (TP/FP for triage decisions).

    • Auto-Closed Alert Reversal Rate.

    • Escalation Rate & Escalation-to-Accuracy Ratio.

    • Track trends in these metrics over time to measure improvement.

Responsibilities#

  • SOC Analysts (Tier 1-3): Responsible for conducting reviews and providing detailed, structured feedback.

  • SOC Lead / AI Operations Lead: Responsible for overseeing the review process, analyzing feedback trends, and coordinating improvement actions.

  • Detection Engineering: Responsible for acting on feedback related to detection rule quality or gaps.

  • AI Development Team (if applicable): Responsible for incorporating feedback into AI model/logic refinements.

By implementing these guidelines, the organization can foster a collaborative environment where human expertise continually refines and improves the performance and trustworthiness of AI security agents.


References and Inspiration#