Question 1 of 7 Production Incident
Production Incident

3am. Payments down. 6 engineers on a call.

It is 3:17am. PagerDuty fires for the payments service. You are on call. Monitoring shows 40% of POST /payments/process requests returning 500 errors. The other 60% succeed. Slack is active with a client-facing team asking for updates. Three engineers join the bridge call in 2 minutes.

ERROR 03:14:52 PaymentProcessor: NullReferenceException at OrderService.cs:247
WARN 03:14:52 DB connection pool: 89/100 connections active
INFO 03:14:53 Retry succeeded for txn_8a2f1c
ERROR 03:14:54 PaymentProcessor: NullReferenceException at OrderService.cs:247
WARN 03:14:55 Circuit breaker: HALF_OPEN state

The bridge call starts. One engineer says roll back the last deployment. Another says check the database first. You are leading. What is your first action?