Incident Documentation

NOTE:
  • need screenshot for jira and postgres audit logs
  • Need changes according to evidence
  • Date, Names etc change, Need Review from taha

Dev Cloud SQL Noisy‑Query Table‑Top – Incident Report (DEV‑337)

1  Overview

This report documents a table‑top security incident executed on 2 July 2025 to validate Optimsync’s incident‑response program and provide formal evidence for HIPAA (§ 164.306 / 308 / 316) and SOC 2 (CC 7.3‑7.5) controls. The exercise intentionally generated abnormal load against a development Cloud SQL instance, allowing the team to demonstrate detection, triage, containment, root‑cause analysis (RCA), and lessons‑learned processes. The artefacts referenced here will be uploaded to Drata and linked to controls DCF‑28 (Security Events Tracked & Evaluated) and DCF‑30 (Incident‑Response Lessons‑Learned Documented).

2  Incident Summary

Title
Title
Field
Details
Incident Title
Dev Cloud SQL Noisy‑Query Table‑Top
Incident ID
DEV‑337
Incident Date
2025‑07‑02
Reported / Declared By
Saqib (Incident Commander)
Environment
Development only – dev‑db Cloud SQL (PostgreSQL)
Data Involved
Sample PHI (non‑production)
Severity
High (table‑top test)
Status
Resolved (post‑mortem completed, CAPA in progress)
Tracking Link

Systems Involved

  • Google Cloud Platform – Cloud SQL for PostgreSQL
  • Optimsync Node.js/Express backend
  • Optimsync React web application
  • GCP IAM & Cloud Monitoring/Logging

3  Internal Tracking & Communication

  • Jira Ticket: DEV‑337 – Incident workflow from Open → Investigating → RCA → Resolved.
  • Slack Channel: #alert (private) – declaration & updates.
  • PagerDuty Alert: Fired via Cloud Monitoring webhook (High CPU / Execution‑Time policy).
Attached evidence proves that every stage was logged and time‑stamped in these systems.

4  Timeline of Events (UTC)

Title
Title
Time
Event
11:57
Cloud Monitoring alert fires (CPU > 80 %, exec_time spike).
11:57
Saqib declares an incident in Slack; Jira DEV‑337 created.
12:00
Developer Abdul Manan confirms abnormal query load; loop still running.
12:02
CPU = 85 %, 23 active connections – screenshots captured.
12:03
Triage comment added – scope dev only, sample PHI, prod safe.
12:05
Test user incident_test deleted via gcloud sql users delete; containment confirmed.
12:08
Metrics return to baseline; post‑containment chart captured.
12:15
Cloud SQL audit logs exported (downloaded-logs-20250702-200100.json).
12:30
Terraform PR opened to tighten DB IAM roles.
13:00
Incident moved to RCA phase; post‑mortem scheduled for 2025‑07‑03.

5  Root‑Cause Analysis (RCA) & Post‑Mortem

5.1 Scenario Description

A deliberate workload of ~300 000 SELECT COUNT(*) statements were executed against the development Cloud SQL instance using user incident_test. The objective was to trigger monitoring thresholds and exercise the full incident workflow.

5.2 Findings

  • Alert policies correctly detected CPU and query‑execution spikes within two minutes.
  • The offending service account had unrestricted read access in dev, revealing a gap in our least‑privilege controls.
  • Query Insights was disabled by default, adding overhead to log analysis.

5.3 Root Cause

Over‑privileged dev database role combined with intentionally generated high‑volume queries.

5.4 Containment & Remediation

    Deleted incident_test user (CLI output attached).
    Enabled Query Insights on all dev Cloud SQL instances.
    Merged Terraform changes enforcing least‑privilege DB roles.
    Added CI lint rule to block unrestricted DB roles in future PRs.

6  Lessons Learned & Corrective / Preventive Actions (CAPA)

6.1 Lessons Learned

  • Monitoring thresholds and on‑call escalation functioned as designed.
  • Role‑scoping gaps in dev can still present compliance risk.
  • Post‑incident automation (CI lint, Query Insights) shortens investigation time.

6.2 CAPA Tracker

Title
Title
Title
Title
Title
CAPA‑#
Action
Owner
Due
Status
101
Add Terraform least‑privilege module for Cloud SQL roles
DevOps
2025‑07‑10
Open
102
Enable CI lint rule for DB IAM scopes
SecOps
2025‑07‑12
Open
103
Enforce Query Insights on all dev Cloud SQL instances
DBA
2025‑07‑08
Open
104
Add automatic rollback plan to incident runbook
Compliance
2025‑07‑15
Open

7  Compliance Mapping

Title
Title
Title
Title
Evidence Section
Drata Control
HIPAA
SOC 2 CC
Timeline & alert screenshots
DCF‑28
164.306(a‑d); 164.308(a)(1)(i)
7.3
Containment actions & audit logs
DCF‑28
164.306(c‑d)
7.3, 7.4
RCA & Lessons Learned (this section)
DCF‑30
164.316(a); 164.306(d‑e)
7.4, 7.5
CAPA tracker & follow‑up tasks
DCF‑30
164.306(e); 164.316(a)
7.5

8  Evidences

  • 
  • 
  • 
  • 
  • 
  • 
  • 
  • 
  • 
  • Jira export PDF (DEV‑337_incident.pdf)

9 Post‑Mortem & Lessons Learned

Root‑Cause Analysis (5 Whys)

    Why did the alert fire? High‑volume SELECT workload generated by test user incident_test.
    Why was that workload possible? The test role had unrestricted read access to the dev database.
    Why was the role unrestricted? The Terraform module for Cloud SQL roles lacked least‑privilege guardrails.
    Why did Terraform lack guardrails? CI pipeline had no lint rule to check DB role scopes.
    Why was the lint rule missing? The requirement wasn’t captured in the SDLC security checklist.
Direct Cause : over‑privileged dev DB role persisted in IaC.
Contributing Factors : no automated expiry for test credentials; Query Insights disabled in dev; missing CI lint rule for IAM changes.

What Went Well

  • Monitoring alert triggered within 2 minutes.
  • Slack/Jira workflow executed smoothly; responders and responsibilities were clear.
  • Containment completed within 10 minutes; no production impact.

What Didn’t Go Well

  • Over‑privileged IAM role slipped through code review.
  • No automated rollback for mistaken IAM changes.
  • Initial log review was slower because Query Insights was disabled.

Corrective & Preventive Actions (CAPA)

Title
Title
Title
Title
Title
ID
Action
Owner
Due
Status
CAPA‑101
Add least‑privilege guardrails to Terraform Cloud SQL module
DevOps
2025‑07‑10
Open
CAPA‑102
Implement CI lint rule for DB role scopes
SecOps
2025‑07‑12
Open
CAPA‑103
Enable Query Insights on all dev Cloud SQL instances
DBA
2025‑07‑08
Open
CAPA‑104
Introduce auto‑expiry mechanism for test credentials
SecOps
2025‑07‑15
Open



Sign‑Off

Post‑mortem held 2025‑07‑03; attendees: Saqib (Incident Commander), Abdul Manan (Developer), Taha (SecOps Lead).Approved by the Compliance Officer on 2025‑07‑03.


9  Approvals

Title
Title
Title
Role
Name
Date
Incident Commander
Saqib
2025‑07‑03
DevOps Lead
Abdul Manan
2025‑07‑03
Compliance Officer
Taha
2025‑07‑03
Prepared by Optimsync Security & Compliance Team – 2025‑07‑03