Runbook: Data Lake Queries#
Objective#
To query security data lakes (e.g., BigQuery) for specific historical data, large-scale analysis, or information not readily available through standard SIEM searches. This runbook facilitates constructing and executing SQL queries against defined datasets and optionally saving or documenting the results.
Scope#
This runbook covers:
Defining a BigQuery SQL query based on a specified objective, target datasets, time range, and filter conditions.
Optionally using
describe-table
orlist-tables
to understand schema or table names before query construction.Executing the constructed SQL query using the
bigquery.execute-query
tool.Analyzing the results of the query.
Optionally formatting and saving the query results to a file.
Optionally documenting the query and a summary of its results in a SOAR case.
This runbook explicitly excludes:
Real-time alerting based on data lake queries.
Complex data engineering or ETL processes for data lakes.
Automated actions based on query results (though results might inform manual actions).
Inputs#
${QUERY_OBJECTIVE}
: A description of the data needed or the question the query aims to answer. This is mandatory.${TARGET_DATASETS}
: A comma-separated list of BigQuery tables or datasets to query (e.g.,my_project.my_dataset.my_table1,my_project.my_dataset.my_table2
). This is mandatory.${TIME_RANGE_START}
: The start timestamp for the query’s time range (e.g., ISO 8601 format like “YYYY-MM-DDTHH:MM:SSZ”). This is mandatory.${TIME_RANGE_END}
: The end timestamp for the query’s time range (e.g., ISO 8601 format). This is mandatory.(Optional)
${SPECIFIC_FIELDS}
: A comma-separated list of specific fields to retrieve (e.g., “field1,field2,field3”). If not provided,SELECT *
might be assumed or all relevant fields based on the objective.(Optional)
${FILTER_CONDITIONS}
: Specific SQL WHERE clause conditions (e.g.,field1 = 'value' AND field2 > 100
).(Optional)
${SOAR_CASE_ID}
: If results need to be documented in a specific SOAR case.(Derived)
${SQL_QUERY}
: The fully constructed BigQuery SQL query.(Derived)
${QUERY_RESULTS}
: The raw results returned bybigquery.execute-query
.(Derived)
${FORMATTED_RESULTS}
: (Optional) Query results formatted for saving to a file (e.g., CSV, JSON, Markdown).
Outputs#
${QUERY_EXECUTION_STATUS}
: Status of the BigQuery query execution (e.g., Success, Failure, Error message).${ANALYSIS_SUMMARY}
: A brief summary of the analysis performed on the query results.(Optional)
${SAVED_FILE_PATH}
: If results are saved, the path to the file.(Optional)
${SOAR_COMMENT_STATUS}
: If documented in SOAR, the status of the comment posting.
Tools#
bigquery
:execute-query
,describe-table
,list-tables
write_to_file
(Optional, for saving results to a file, replaceswrite_report
)secops-soar
:post_case_comment
(Optional, for documenting query/results in a SOAR case)
Workflow Steps & Diagram#
Define Query: Based on
${QUERY_OBJECTIVE}
,${TARGET_DATASETS}
, time range (${TIME_RANGE_START}
,${TIME_RANGE_END}
),${SPECIFIC_FIELDS}
, and${FILTER_CONDITIONS}
, construct the BigQuery SQL query (${SQL_QUERY}
). Usebigquery.describe-table
orbigquery.list-tables
if needed to confirm schema/table names.Execute Query: Run the
${SQL_QUERY}
usingbigquery.execute-query
. Store results in${QUERY_RESULTS}
and status in${QUERY_EXECUTION_STATUS}
.Analyze Results: Review the
${QUERY_RESULTS}
. Summarize findings in${ANALYSIS_SUMMARY}
.Format/Save Results (Optional): If needed, format the
${QUERY_RESULTS}
(e.g., as Markdown, CSV, or JSON, let this be${FORMATTED_RESULTS}
) and save them usingwrite_to_file
withpath="./reports/query_results_${QUERY_OBJECTIVE_Sanitized}_${timestamp}.md"
(or other appropriate extension) andcontent=${FORMATTED_RESULTS}
. Store path in${SAVED_FILE_PATH}
.Document (Optional): If
${SOAR_CASE_ID}
is provided, document the${SQL_QUERY}
executed and the${ANALYSIS_SUMMARY}
in the relevant SOAR case usingsecops-soar.post_case_comment
. Store status in${SOAR_COMMENT_STATUS}
.
sequenceDiagram participant Analyst/User participant AutomatedAgent as Automated Agent (MCP Client) participant BigQuery as bigquery participant SOAR as secops-soar (Optional) Analyst/User->>AutomatedAgent: Start Data Lake Query\nInput: QUERY_OBJECTIVE, TARGET_DATASETS, TIME_RANGE... %% Step 1: Define Query opt Need Schema/Table Info AutomatedAgent->>BigQuery: list-tables() / describe-table(table_name=...) BigQuery-->>AutomatedAgent: Table/Schema Info end Note over AutomatedAgent: Construct BigQuery SQL Query (SQL_QUERY) %% Step 2: Execute Query AutomatedAgent->>BigQuery: execute-query(query=SQL_QUERY) BigQuery-->>AutomatedAgent: Query Results (QUERY_RESULTS), Status (QUERY_EXECUTION_STATUS) %% Step 3: Analyze Results Note over AutomatedAgent: Analyze query results. Store summary in ANALYSIS_SUMMARY. %% Step 4: Format/Save Results (Optional) opt Save Results Note over AutomatedAgent: Format results (e.g., CSV, JSON) (FORMATTED_RESULTS) AutomatedAgent->>AutomatedAgent: write_to_file(path="./reports/query_results...", content=FORMATTED_RESULTS) Note over AutomatedAgent: Results saved to file (SAVED_FILE_PATH) end %% Step 5: Document (Optional) opt Document in SOAR and SOAR_CASE_ID provided AutomatedAgent->>SOAR: post_case_comment(case_id=SOAR_CASE_ID, comment="Data Lake Query Executed: SQL_QUERY, Summary: ANALYSIS_SUMMARY") SOAR-->>AutomatedAgent: Comment Confirmation (SOAR_COMMENT_STATUS) end AutomatedAgent->>Analyst/User: attempt_completion(result="Data lake query executed. Status: QUERY_EXECUTION_STATUS. Analysis: ANALYSIS_SUMMARY. File: SAVED_FILE_PATH. SOAR Update: SOAR_COMMENT_STATUS.")
Completion Criteria#
The BigQuery SQL query (
${SQL_QUERY}
) has been successfully constructed based on the provided inputs.The query has been executed using
bigquery.execute-query
, and the${QUERY_EXECUTION_STATUS}
is available.The
${QUERY_RESULTS}
have been returned and an${ANALYSIS_SUMMARY}
has been formulated.(Optional) If requested, the results have been formatted and saved to a file, and
${SAVED_FILE_PATH}
is available.(Optional) If requested and
${SOAR_CASE_ID}
was provided, the query and summary have been documented in the SOAR case, and${SOAR_COMMENT_STATUS}
is available.