AI Explainability Standards

AI Explainability Standards#

This document defines standards for AI agent explanations to improve clarity, trustworthiness, and the speed at which human analysts can understand AI-driven decisions. This supports the “Explainability Time/Score” metric within the PICERL AI Performance Framework (“Identification” and “Learning” phases).

Purpose#

To ensure AI agent decisions are transparent and understandable by human analysts.
To build trust in AI by providing clear rationale for its actions and conclusions.
To reduce the “Explainability Time” – the time it takes an analyst to comprehend why an AI agent made a particular decision.
To provide a basis for scoring the quality of AI explanations (“Explainability Score”).

Core Principles of AI Explanations#

Clarity: Explanations should be concise, use clear language, and avoid jargon where possible.
Accuracy: The explanation must accurately reflect the AI’s decision-making process and the data it used.
Actionability: Where appropriate, the explanation should guide the analyst on next steps or why a certain action was taken/recommended.
Contextualization: Explanations should reference relevant information from the rules-bank (e.g., specific protocols, asset criticality, known benign activity) that influenced the decision.
Confidence Indication: The AI should clearly state its confidence level in the decision or finding.

Standards for “Good” AI Explanations#

A good explanation from an AI agent should generally include:

Summary of Decision/Finding: A brief statement of what the AI concluded or did.
- Example: “Alert XYZ closed as False Positive due to matching known benign scanner activity.”
- Example: “Escalated Alert ABC due to correlation of high-risk IOC with critical asset communication.”
- Example: “Recommended isolating host DEF based on EDR detection of malware and subsequent C2 communication.”
Key Evidence/Triggers: The primary data points or rules that led to the decision.
- Example: “Matched IP 1.2.3.4 against common_benign_alerts.md entry for ‘Qualys Scanner Range’.”
- Example: “File Hash [HASH] (from Alert ABC) found in GTI with malicious_score: 9/10 and category: Ransomware. Host [HOSTNAME] (Critical Asset per asset_inventory_guidelines.md) communicated with C2 IP [IP_ADDRESS] (associated with hash via GTI).”
Referenced rules-bank Documents/Protocols: Explicitly name any rules-bank documents that guided the logic.
- Example: “Following indicator_handling_protocols.md for IP Address.”
- Example: “Criteria for automated block met as per automated_response_playbook_criteria.md for ‘Block Malicious IP’.”
Confidence Score: The AI’s assessed confidence in its decision.
- Example: “Confidence: High (90%)”
Key Unprocessed or Ambiguous Factors (If Any): If the AI encountered conflicting data or had to make assumptions, these should be noted.
- Example: “Low confidence data point: Geolocation of source IP was inconclusive but did not contradict other findings.”

Templates for AI-Generated Summaries / Explanations#

AI agents should strive to use a structured template for their explanations, which can be logged in SOAR case comments or presented in AI interaction UIs.

Template 1: Alert Triage Explanation (e.g., Auto-Close or Escalation)#

**AI Triage Decision for Alert [Alert ID/Name]:**

*   **Decision:** [e.g., Auto-Closed as Benign, Escalated to Tier 2, Marked for Further AI Investigation]
*   **Confidence:** [e.g., High, Medium, Low] - [Percentage if available, e.g., 85%]
*   **Primary Reason / Key Evidence:**
    *   [Bullet point 1: e.g., Matched IOC `X` against `whitelists.md` entry `Y`.]
    *   [Bullet point 2: e.g., Alert pattern consistent with `common_benign_alerts.md` profile for 'Nightly Backup Activity'.]
    *   [Bullet point 3: e.g., Severity of correlated GTI report for domain `Z` is 'Critical'.]
*   **`rules-bank` Protocol(s) Applied:** [e.g., `indicator_handling_protocols.md` for IP, `incident_severity_matrix.md`]
*   **Next Steps (if any suggested by AI):** [e.g., "Recommend human review due to conflicting low-confidence indicators."]
*   **AI Agent ID:** [Agent Identifier]
*   **Timestamp:** [Decision Timestamp]

Template 2: Automated Action Explanation (e.g., Host Isolation)#

**AI Automated Action Log for Case [Case ID]:**

*   **Action Taken:** [e.g., Initiated host isolation for `[HOSTNAME]` (IP: `[IP_ADDRESS]`)]
*   **Triggering Condition(s):**
    *   [Bullet point 1: e.g., EDR Alert `[EDR_ALERT_ID]` (Severity: Critical, Type: Ransomware_Detected)]
    *   [Bullet point 2: e.g., GTI report for file hash `[HASH]` indicates `malicious_score: 10/10`.]
*   **`rules-bank` Criteria Met:** [e.g., `automated_response_playbook_criteria.md` - Playbook: Isolate Host (EDR) - All trigger conditions met.]
    *   [Detail specific criteria met, e.g., "Host not in `critical_servers_do_not_isolate.lst`."]
*   **Confidence in Action:** [e.g., Very High (95%)]
*   **Outcome (if immediately available):** [e.g., EDR API confirmed host isolation successful.]
*   **Rollback Procedure Reference:** [e.g., See `automated_response_playbook_criteria.md` for manual rollback.]
*   **AI Agent ID:** [Agent Identifier]
*   **Timestamp:** [Action Timestamp]

Measuring Explainability#

Explainability Time: Measured by analysts during the review process (see ai_decision_review_guidelines.md). How long did it take to understand the AI’s reasoning?
Explainability Score (Qualitative):
- Analysts can provide a score (e.g., 1-5) during reviews based on the clarity, accuracy, and completeness of the AI’s explanation against the standards above.
- Rubric Example (5-point scale):
  - 5: Excellent - Clear, accurate, all key elements present, immediately understandable.
  - 4: Good - Mostly clear and accurate, minor omissions but understandable.
  - 3: Fair - Understandable with some effort, key elements may be missing or unclear.
  - 2: Poor - Difficult to understand, significant omissions or inaccuracies.
  - 1: Unacceptable - Incomprehensible or misleading.
Feedback Loop: Trends in Explainability Time and Scores should be tracked. Consistently low scores for certain decision types indicate areas where AI explanation generation needs improvement.

By adhering to these standards, AI agents can become more effective collaborators with human analysts, fostering trust and improving overall SOC efficiency.

References and Inspiration#

The concepts of Explainability Time and Score are drawn from the PICERL framework:
- Stojkovski, Filip. “Measuring ROI of AI agents in security operations.” Cyber Security Automation and Orchestration, May 29, 2025. https://www.cybersec-automation.com/p/measuring-roi-of-ai-agents-in-security-operations-9a67fdab64192ed0
The need for clear AI explanations to build trust and synergy with human analysts is a core theme in:
- Stojkovski, Filip & Williams, Dylan. “Blueprint for AI Agents in Cybersecurity.” Cyber Security Automation and Orchestration, November 26, 2024. https://www.cybersec-automation.com/p/blueprint-for-ai-agents-in-cybersecurity