Runbook: Data Lake Queries#
Objective#
To query security data lakes (e.g., BigQuery) for specific historical data, large-scale analysis, or information not readily available through standard SIEM searches. This runbook facilitates constructing and executing SQL queries against defined datasets and optionally saving or documenting the results.
Scope#
This runbook covers:
Defining a BigQuery SQL query based on a specified objective, target datasets, time range, and filter conditions.
Optionally using
describe-tableorlist-tablesto understand schema or table names before query construction.Executing the constructed SQL query using the
bigquery.execute-querytool.Analyzing the results of the query.
Optionally formatting and saving the query results to a file.
Optionally documenting the query and a summary of its results in a SOAR case.
This runbook explicitly excludes:
Real-time alerting based on data lake queries.
Complex data engineering or ETL processes for data lakes.
Automated actions based on query results (though results might inform manual actions).
Inputs#
${QUERY_OBJECTIVE}: A description of the data needed or the question the query aims to answer. This is mandatory.${TARGET_DATASETS}: A comma-separated list of BigQuery tables or datasets to query (e.g.,my_project.my_dataset.my_table1,my_project.my_dataset.my_table2). This is mandatory.${TIME_RANGE_START}: The start timestamp for the query’s time range (e.g., ISO 8601 format like “YYYY-MM-DDTHH:MM:SSZ”). This is mandatory.${TIME_RANGE_END}: The end timestamp for the query’s time range (e.g., ISO 8601 format). This is mandatory.(Optional)
${SPECIFIC_FIELDS}: A comma-separated list of specific fields to retrieve (e.g., “field1,field2,field3”). If not provided,SELECT *might be assumed or all relevant fields based on the objective.(Optional)
${FILTER_CONDITIONS}: Specific SQL WHERE clause conditions (e.g.,field1 = 'value' AND field2 > 100).(Optional)
${SOAR_CASE_ID}: If results need to be documented in a specific SOAR case.(Derived)
${SQL_QUERY}: The fully constructed BigQuery SQL query.(Derived)
${QUERY_RESULTS}: The raw results returned bybigquery.execute-query.(Derived)
${FORMATTED_RESULTS}: (Optional) Query results formatted for saving to a file (e.g., CSV, JSON, Markdown).
Outputs#
${QUERY_EXECUTION_STATUS}: Status of the BigQuery query execution (e.g., Success, Failure, Error message).${ANALYSIS_SUMMARY}: A brief summary of the analysis performed on the query results.(Optional)
${SAVED_FILE_PATH}: If results are saved, the path to the file.(Optional)
${SOAR_COMMENT_STATUS}: If documented in SOAR, the status of the comment posting.
Tools#
bigquery:execute-query,describe-table,list-tableswrite_to_file(Optional, for saving results to a file, replaceswrite_report)secops-soar:post_case_comment(Optional, for documenting query/results in a SOAR case)
Workflow Steps & Diagram#
Define Query: Based on
${QUERY_OBJECTIVE},${TARGET_DATASETS}, time range (${TIME_RANGE_START},${TIME_RANGE_END}),${SPECIFIC_FIELDS}, and${FILTER_CONDITIONS}, construct the BigQuery SQL query (${SQL_QUERY}). Usebigquery.describe-tableorbigquery.list-tablesif needed to confirm schema/table names.Execute Query: Run the
${SQL_QUERY}usingbigquery.execute-query. Store results in${QUERY_RESULTS}and status in${QUERY_EXECUTION_STATUS}.Analyze Results: Review the
${QUERY_RESULTS}. Summarize findings in${ANALYSIS_SUMMARY}.Format/Save Results (Optional): If needed, format the
${QUERY_RESULTS}(e.g., as Markdown, CSV, or JSON, let this be${FORMATTED_RESULTS}) and save them usingwrite_to_filewithpath="./reports/query_results_${QUERY_OBJECTIVE_Sanitized}_${timestamp}.md"(or other appropriate extension) andcontent=${FORMATTED_RESULTS}. Store path in${SAVED_FILE_PATH}.Document (Optional): If
${SOAR_CASE_ID}is provided, document the${SQL_QUERY}executed and the${ANALYSIS_SUMMARY}in the relevant SOAR case usingsoar-mcp_post_case_comment. Store status in${SOAR_COMMENT_STATUS}.
sequenceDiagram
participant Analyst/User
participant AutomatedAgent as Automated Agent (MCP Client)
participant BigQuery as bigquery
participant SOAR as secops-soar (Optional)
Analyst/User->>AutomatedAgent: Start Data Lake Query\nInput: QUERY_OBJECTIVE, TARGET_DATASETS, TIME_RANGE...
%% Step 1: Define Query
opt Need Schema/Table Info
AutomatedAgent->>BigQuery: list-tables() / describe-table(table_name=...)
BigQuery-->>AutomatedAgent: Table/Schema Info
end
Note over AutomatedAgent: Construct BigQuery SQL Query (SQL_QUERY)
%% Step 2: Execute Query
AutomatedAgent->>BigQuery: execute-query(query=SQL_QUERY)
BigQuery-->>AutomatedAgent: Query Results (QUERY_RESULTS), Status (QUERY_EXECUTION_STATUS)
%% Step 3: Analyze Results
Note over AutomatedAgent: Analyze query results. Store summary in ANALYSIS_SUMMARY.
%% Step 4: Format/Save Results (Optional)
opt Save Results
Note over AutomatedAgent: Format results (e.g., CSV, JSON) (FORMATTED_RESULTS)
AutomatedAgent->>AutomatedAgent: write_to_file(path="./reports/query_results...", content=FORMATTED_RESULTS)
Note over AutomatedAgent: Results saved to file (SAVED_FILE_PATH)
end
%% Step 5: Document (Optional)
opt Document in SOAR and SOAR_CASE_ID provided
AutomatedAgent->>SOAR: post_case_comment(case_id=SOAR_CASE_ID, comment="Data Lake Query Executed: SQL_QUERY, Summary: ANALYSIS_SUMMARY")
SOAR-->>AutomatedAgent: Comment Confirmation (SOAR_COMMENT_STATUS)
end
AutomatedAgent->>Analyst/User: attempt_completion(result="Data lake query executed. Status: QUERY_EXECUTION_STATUS. Analysis: ANALYSIS_SUMMARY. File: SAVED_FILE_PATH. SOAR Update: SOAR_COMMENT_STATUS.")
Completion Criteria#
The BigQuery SQL query (
${SQL_QUERY}) has been successfully constructed based on the provided inputs.The query has been executed using
bigquery.execute-query, and the${QUERY_EXECUTION_STATUS}is available.The
${QUERY_RESULTS}have been returned and an${ANALYSIS_SUMMARY}has been formulated.(Optional) If requested, the results have been formatted and saved to a file, and
${SAVED_FILE_PATH}is available.(Optional) If requested and
${SOAR_CASE_ID}was provided, the query and summary have been documented in the SOAR case, and${SOAR_COMMENT_STATUS}is available.
Rubrics#
The following rubric is used to evaluate the execution of this Threat Hunt/Analysis runbook by an LLM agent.
Grading Scale (0-100 Points)#
Criteria |
Points |
Description |
|---|---|---|
Scope & Query |
25 |
Defined a clear scope and executed effective queries (UDM, search). |
Data Analysis |
30 |
Analyzed results to identify patterns, anomalies, or malicious behavior. |
Findings |
15 |
Accurately identified and filtered findings (True Positives vs. False Positives). |
Documentation |
15 |
Documented the hunt methodology and results clearly. |
Operational Artifacts |
15 |
Produced required artifacts: Sequence diagram, execution metadata (date/cost), and summary. |
Evaluation Criteria Details#
1. Scope & Query (25 Points)#
10 pts: Correctly defined the time range and entities/indicators for the hunt.
15 pts: Constructed and executed valid, efficient queries to retrieve relevant data.
2. Data Analysis (30 Points)#
15 pts: Effectively analyzed the returned data for the hypothesized threat.
15 pts: Correlated events or indicators to strengthen the analysis.
3. Findings (15 Points)#
15 pts: Correctly classified the findings and provided evidence for the conclusion.
4. Documentation (15 Points)#
15 pts: Recorded the hunt process, queries used, and findings in the system of record.
5. Operational Artifacts (15 Points)#
5 pts: Sequence Diagram: Produced a Mermaid sequence diagram visualizing the steps taken.
5 pts: Execution Metadata: Recorded the date, duration, and estimated token cost.
5 pts: Summary Report: Generated a concise summary of the actions and outcomes.