# Preparation

The best time to prepare for incident communication is **before anything breaks**. When an outage hits, you don’t want to be scrambling to find contacts, writing messages from scratch, or arguing about what channel to use. This section gives you the building blocks to have everything ready.

## Define Who to Notify

During an incident, speed matters — but so does **relevance**. Not everyone needs to hear about every service outage, and over-notifying can cause confusion or unnecessary panic. To avoid this, prepare a **recipients list** that specifies exactly **who should be notified, for which severities, and for which services**.

### 1. Identify Stakeholders

List everyone who could receive incident communications. Typical categories include:

* **Internal:** Engineering, Support, Customer Success, Executives, Finance, Legal/PR.
* **External:** Customers, key accounts, partners, regulators.

### 2. Document Notification Rules

For each stakeholder, record:

* **Severities to notify** (`SEV1`, `SEV2`, `SEV3`).
* **Relevant services** (which outages actually affect them).
* **Contact methods** (email, phone, Slack, PagerDuty, etc.).
* **Notes** (special conditions — e.g., SLA thresholds, VIP outreach).

Example recipients list:

<table><thead><tr><th>Stakeholder</th><th>Severities to Notify</th><th width="148.5">Relevant Services</th><th>Contact Method(s)</th><th>Notes</th></tr></thead><tbody><tr><td>Engineering On-Call</td><td><code>SEV1</code>, <code>SEV2</code>, <code>SEV3</code></td><td>All services</td><td>PagerDuty, Slack</td><td>Primary: Alice, Backup: Bob</td></tr><tr><td>Finance</td><td><code>SEV1</code>, <code>SEV2</code></td><td>Billing, Payments</td><td>Email: finance@...</td><td>Notify only if >1h downtime</td></tr><tr><td>PR / Marketing</td><td><code>SEV1</code></td><td>Public-facing services</td><td>Phone + Email</td><td>Prepares external messaging</td></tr><tr><td>VIP Account A</td><td><code>SEV1</code>, <code>SEV2</code></td><td>Core API, Integrations</td><td>Direct CSM call</td><td>SLA requires &#x3C;15m notification</td></tr></tbody></table>

### 3. Keep It Current

* Review the list **regularly** or after major changes (new services, new stakeholders).
* Store it in a place that’s always accessible during incidents (status page tool, internal wiki, or incident management platform).
* Assign ownership for keeping it updated.

***

## Prepare Messaging Templates

You should never start an incident update with a blank page. Prepare:

* **Pre-approved templates** for:
  * SEV1, SEV2, SEV3 incidents.
  * Write different versions for internal vs. external audiences.
  * One template for each update type: identified, update, monitoring, and resolved.
* **Tone guidelines:**
  * ✅ Be clear, empathetic, plain English.
  * ❌ Don’t downplay (“only a few users”), don’t use heavy technical jargon.

{% hint style="info" %}
Example starter template for external acknowledgment:
{% endhint %}

> “We are investigating an issue affecting \[service(s)]. Some customers may experience \[impact]. We’ll share an update within the next \[20 minutes].”

***

## Define Communication Channels

Decide in advance where and how updates will be published. Typical options:

* **Status page:** the primary source of truth.
* **Email:** good for customer alerts, especially for SEV1/2.
* **In-app notifications:** useful for SaaS platforms with logged-in users.
* **Chat tools (Slack/MS Teams):** for internal coordination.
* **Social media (Twitter/X, LinkedIn):** optional, but effective for high-impact public outages.

{% hint style="info" %}
Document **which channel is used for which severity**. Example:
{% endhint %}

* SEV1: Status page + email to all customers + internal Slack + exec briefing.
* SEV2: Status page + targeted email (affected accounts).
* SEV3: Status page only.

## Define Update Cadence

Set clear expectations for how often you’ll send incident updates, based on the severity and the audience. This helps reduce uncertainty for everyone affected.

**Example Update Cadence Matrix**

<table><thead><tr><th valign="top">Severity</th><th valign="top">Internal Stakeholders</th><th valign="top">External Stakeholders</th></tr></thead><tbody><tr><td valign="top"><code>SEV1</code></td><td valign="top">Every 15 minutes</td><td valign="top">Every 20 minutes</td></tr><tr><td valign="top"><code>SEV2</code></td><td valign="top">Every 30 minutes</td><td valign="top">Every 45 minutes</td></tr><tr><td valign="top"><code>SEV3</code></td><td valign="top">Every 60 minutes</td><td valign="top">Every 90 minutes</td></tr></tbody></table>

* **Internal stakeholders**: Teams who rely on the affected service to do their work (e.g., Support, Sales, Product).
* **External stakeholders**: Customers, partners, or regulators.

{% hint style="info" %}
Document the agreed cadence in your incident runbook and review it regularly.
{% endhint %}

***

## Automate Distribution

It’s not enough to know the channel — decide **how the message actually gets there**:

* Manual posting (who has access, who’s trained).
* Automated distribution via status page tools (status page → email/SMS).
* Pre-configured integrations (PagerDuty/Pingdom → status page → email/SMS).

{% hint style="info" %}
Confirm **access rights**: make sure multiple people (not just the founder or CTO) can publish updates.
{% endhint %}

***

## Assign Roles & Responsibilities

Clearly assign who owns what:

* **Incident Commander (IC):** Focuses on the technical response, confirms facts for comms.
* **Comms Lead:** Writes and publishes updates. In small teams, the **IC** may also take on this responsibility.
* **Support/CSM:** Relays info directly to customers, handles VIP outreach.

{% hint style="info" %}
Document backups for each role. Incidents often happen at night or on weekends.
{% endhint %}

***

## Dry Run & Review

Preparation isn’t “set it and forget it.” Test your process:

* Run a **tabletop exercise** at least once a quarter.
* Simulate a SEV1 outage: does everyone know their role? Can updates go out in 10 minutes?
* After the exercise, update templates, contact lists, and access rights based on what broke down.
