1. Detection: Quality Over Quantity
Detection starts with alerts from tools like SIEM, EDR, firewalls, or cloud logs.
What to check immediately
- Alert type (malware, login, network, policy)
- Asset affected (user laptop, server, cloud VM)
- Severity and confidence score
- Is this alert repeating?
Important metric
- Alert Volume per Day
- Example: 40–60 alerts/day is manageable
- 300+ alerts/day usually means poor tuning
Good SOC teams focus on reducing noise, not reacting to everything.
2. Triage: Decide Fast, Decide Right
Triage is the most important step.
Your job is to answer three questions quickly:
- Is this real or false?
- Is this isolated or spreading?
- How urgent is this?
Practical triage checks
- Compare activity with user’s normal behavior
- Check login source IP and location
- Review command-line or process tree
- Look for similar alerts on other hosts
Decision outcomes
- False Positive → close with reason
- Suspicious → escalate
- Confirmed Incident → contain
Key metrics
- MTTD (Mean Time to Detect)
Target: minutes, not hours - False Positive Rate
High rate = analyst burnout
3. Containment: Stop the Damage First
Containment is not about fixing, it’s about stopping spread.
Common containment actions
- Isolate endpoint from network
- Disable or reset user account
- Block IP, domain, or hash
- Remove active sessions or tokens
Rule to remember
Contain first, investigate later.
Delaying containment to “collect more data” often makes things worse.
4. Eradication: Remove the Root Cause
Once contained, remove what caused the incident.
Examples
- Delete malware files
- Remove malicious scheduled tasks
- Patch vulnerable services
- Revoke compromised credentials
- Remove unauthorized admin accounts
Analyst checklist
- Was persistence created?
- Any new users or services added?
- Any lateral movement signs?
Missing eradication steps leads to incident recurrence.
5. Recovery & Lessons Learned
Recovery means returning systems to normal safely.
Recovery steps
- Reconnect isolated systems
- Restore files from clean backups
- Monitor closely for 24–72 hours
- Validate system and user activity
Post-incident review (very important)
Ask:
- Why did this alert trigger?
- Could it be detected earlier?
- Was escalation smooth?
- What control failed?
Useful metrics
- MTTR (Mean Time to Respond)
- Number of repeated incidents
- Time taken per incident type
Good teams improve after incidents, not just close tickets.
Documentation: Evidence Matters
Every incident must be documented clearly.
What good documentation includes
- Timeline (who, what, when)
- Logs and screenshots
- Actions taken
- Final outcome
- Recommendations
This helps with:
- Audits
- Compliance
- Training new analysts
- Improving detection rules
Poor documentation = poor SOC maturity.
Common Mistakes in Incident Response
Avoid these:
- Treating every alert as critical
- Skipping containment
- Over-escalating without context
- Closing incidents without evidence
- Ignoring metrics
Incident response is decision-making, not panic handling.
Final Thoughts
Incident response is not about knowing tools only.
It is about:
- Logical thinking
- Prioritization
- Clear communication
- Learning from mistakes
A good analyst:
- Reduces noise
- Acts fast but carefully
- Documents clearly
- Improves the system after every incident
That is what real incident response looks like.
