Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
NOC Critical Incidents
Date
December 2025
About the project
Service: NOC monitoring and critical incident response (SEV1) + ticket management + SIEM alert analysis
Incident type: customer portal unavailability in multiple regions (suspected firewall blocking)
Environment: remote operation with coordination between network, security, and operations teams
Scope: triage of alerts and emails, SEV1 ticket creation and updates, bridge call activation, evidence collection, and monitoring until stabilization
Main actions: immediate escalation, review of firewall policies and recent changes, traffic path testing, evidence collection (traffic analysis), and rollback for service restoration
Deliverables: ticket with complete work notes + SEV1 timeline + bridge notes prepared for internal record
Operational requirements: SEV1 prioritization, noise control (parallel alerts), objective communication, and auditable documentation
Result: service restored via safe rollback and opening of technical follow-up for root cause investigation with vendor
Date: December 2025
Location: remote execution (Brazil - international support)
Location
Florianópolis / Brazil exported service
Project type
Remote - Long-term client
Case Study: Critical Incident Response (SEV1) for Firewall Blocking Affecting Customer Portals
Overview
During a NOC shift, we received alerts indicating that customer portals were unavailable in multiple regions. The critical point was that "global health" dashboards appeared normal, while actual impact was confirmed from the user side. This type of scenario requires rapid triage, coordination between teams, and rigorous documentation of what was done, in what order, and why.
Date: December 2025
Location: remote execution (Brazil - international support)
Service: Remote NOC / Incident Response / SIEM & Ticket Management
The Challenge
• Portal unavailability in more than one region simultaneously.
• General indicators with no clear evidence of the problem, increasing the risk of misdiagnosis.
• High volume of alerts and emails during the incident, requiring noise control.
• Occurrence of parallel incidents (memory/network alerts) that needed to be managed without losing focus on SEV1.
What We Did
1. Rapid triage and definition of "main event"
We separated the real incident (portal unavailability) from alerts that didn't require immediate action, keeping everything registered in the ticketing system.
2. Critical incident opening and immediate escalation
We created the SEV1 ticket and triggered the response flow, involving teams responsible for network, firewall, and operations.
3. Diagnosis guided by technical hypotheses, not assumptions
Since "global health" didn't reflect the impact, we treated it as a probable case of firewall policies/blocking and/or traffic path. We reviewed recent changes, validated related rules, and tested rollback to previous paths to isolate the behavior.
4. Evidence collection during the bridge
We ran traffic analysis to identify where the flow was being interrupted and support decisions with data, not gut feeling.
5. Fast restoration with safe rollback
At the end of the bridge, a rollback was applied to a path known to be stable, prioritizing service restoration. Root cause investigation continued in parallel.
6. Management of parallel incidents without compromising SEV1
Additional alerts (e.g., memory and switch interface) were monitored and escalated when necessary, without interrupting the main critical incident response line.
7. Complete documentation ready for audit
The ticket was updated with clear work notes, and bridge notes were organized for internal record, with timeline, actions executed, and final result.
Result
• Service restored with rollback of the path/change that was generating the block.
• Priority control: full focus on SEV1 without losing visibility of parallel alerts.
• Better operational governance: well-documented ticket, consistent timeline, and record ready for internal handoff.
• Opening of follow-up with vendor for root cause investigation related to policies/integration (API), avoiding "just putting out fires" and closing without cause.
Why This Matters for Technology Companies
Because it shows real operations:
• When the dashboard says "ok" and the user says "down," the response needs to be method-based: isolate, prove, restore, and document.
And all of this with ticket discipline, coordination, and focus on business impact.



