Preparation

Set your team up for success before anything breaks. This chapter covers how to define stakeholders, notification rules, messaging templates, communication channels, automations and more!

The best time to prepare for incident communication is before anything breaks. When an outage hits, you don’t want to be scrambling to find contacts, writing messages from scratch, or arguing about what channel to use. This section gives you the building blocks to have everything ready.

Define Who to Notify

During an incident, speed matters — but so does relevance. Not everyone needs to hear about every service outage, and over-notifying can cause confusion or unnecessary panic. To avoid this, prepare a recipients list that specifies exactly who should be notified, for which severities, and for which services.

1. Identify Stakeholders

List everyone who could receive incident communications. Typical categories include:

  • Internal: Engineering, Support, Customer Success, Executives, Finance, Legal/PR.

  • External: Customers, key accounts, partners, regulators.

2. Document Notification Rules

For each stakeholder, record:

  • Severities to notify (SEV1, SEV2, SEV3).

  • Relevant services (which outages actually affect them).

  • Contact methods (email, phone, Slack, PagerDuty, etc.).

  • Notes (special conditions — e.g., SLA thresholds, VIP outreach).

Example recipients list:

Stakeholder
Severities to Notify
Relevant Services
Contact Method(s)
Notes

Engineering On-Call

SEV1, SEV2, SEV3

All services

PagerDuty, Slack

Primary: Alice, Backup: Bob

Finance

SEV1, SEV2

Billing, Payments

Email: finance@...

Notify only if >1h downtime

PR / Marketing

SEV1

Public-facing services

Phone + Email

Prepares external messaging

VIP Account A

SEV1, SEV2

Core API, Integrations

Direct CSM call

SLA requires <15m notification

3. Keep It Current

  • Review the list regularly or after major changes (new services, new stakeholders).

  • Store it in a place that’s always accessible during incidents (status page tool, internal wiki, or incident management platform).

  • Assign ownership for keeping it updated.


Prepare Messaging Templates

You should never start an incident update with a blank page. Prepare:

  • Pre-approved templates for:

    • SEV1, SEV2, SEV3 incidents.

    • Write different versions for internal vs. external audiences.

    • One template for each update type: identified, update, monitoring, and resolved.

  • Tone guidelines:

    • ✅ Be clear, empathetic, plain English.

    • ❌ Don’t downplay (“only a few users”), don’t use heavy technical jargon.

Example starter template for external acknowledgment:

“We are investigating an issue affecting [service(s)]. Some customers may experience [impact]. We’ll share an update within the next [20 minutes].”


Define Communication Channels

Decide in advance where and how updates will be published. Typical options:

  • Status page: the primary source of truth.

  • Email: good for customer alerts, especially for SEV1/2.

  • In-app notifications: useful for SaaS platforms with logged-in users.

  • Chat tools (Slack/MS Teams): for internal coordination.

  • Social media (Twitter/X, LinkedIn): optional, but effective for high-impact public outages.

Document which channel is used for which severity. Example:

  • SEV1: Status page + email to all customers + internal Slack + exec briefing.

  • SEV2: Status page + targeted email (affected accounts).

  • SEV3: Status page only.

Define Update Cadence

Set clear expectations for how often you’ll send incident updates, based on the severity and the audience. This helps reduce uncertainty for everyone affected.

Example Update Cadence Matrix

Severity
Internal Stakeholders
External Stakeholders

SEV1

Every 15 minutes

Every 20 minutes

SEV2

Every 30 minutes

Every 45 minutes

SEV3

Every 60 minutes

Every 90 minutes

  • Internal stakeholders: Teams who rely on the affected service to do their work (e.g., Support, Sales, Product).

  • External stakeholders: Customers, partners, or regulators.

Document the agreed cadence in your incident runbook and review it regularly.


Automate Distribution

It’s not enough to know the channel — decide how the message actually gets there:

  • Manual posting (who has access, who’s trained).

  • Automated distribution via status page tools (status page → email/SMS).

  • Pre-configured integrations (PagerDuty/Pingdom → status page → email/SMS).

Confirm access rights: make sure multiple people (not just the founder or CTO) can publish updates.


Assign Roles & Responsibilities

Clearly assign who owns what:

  • Incident Commander (IC): Focuses on the technical response, confirms facts for comms.

  • Comms Lead: Writes and publishes updates. In small teams, the IC may also take on this responsibility.

  • Support/CSM: Relays info directly to customers, handles VIP outreach.

Document backups for each role. Incidents often happen at night or on weekends.


Dry Run & Review

Preparation isn’t “set it and forget it.” Test your process:

  • Run a tabletop exercise at least once a quarter.

  • Simulate a SEV1 outage: does everyone know their role? Can updates go out in 10 minutes?

  • After the exercise, update templates, contact lists, and access rights based on what broke down.

Last updated