Add support for PagerDuty
This commit is contained in:
295
README.md
295
README.md
@ -17,14 +17,16 @@ core applications: https://status.twinnation.org/
|
||||
- [Usage](#usage)
|
||||
- [Configuration](#configuration)
|
||||
- [Conditions](#conditions)
|
||||
- [Alerting](#alerting)
|
||||
- [Configuring Slack alerts](#configuring-slack-alerts)
|
||||
- [Configuring PagerDuty alerts](#configuring-pagerduty-alerts)
|
||||
- [Configuring Twilio alerts](#configuring-twilio-alerts)
|
||||
- [Configuring custom alerts](#configuring-custom-alerts)
|
||||
- [Docker](#docker)
|
||||
- [Running the tests](#running-the-tests)
|
||||
- [Using in Production](#using-in-production)
|
||||
- [FAQ](#faq)
|
||||
- [Sending a GraphQL request](#sending-a-graphql-request)
|
||||
- [Configuring Slack alerts](#configuring-slack-alerts)
|
||||
- [Configuring Twilio alerts](#configuring-twilio-alerts)
|
||||
- [Configuring custom alerts](#configuring-custom-alerts)
|
||||
|
||||
|
||||
## Features
|
||||
@ -72,35 +74,36 @@ Note that you can also add environment variables in the configuration file (i.e.
|
||||
|
||||
### Configuration
|
||||
|
||||
| Parameter | Description | Default |
|
||||
| -------------------------------------- | --------------------------------------------------------------- | -------------- |
|
||||
| `debug` | Whether to enable debug logs | `false` |
|
||||
| `metrics` | Whether to expose metrics at /metrics | `false` |
|
||||
| `services` | List of services to monitor | Required `[]` |
|
||||
| `services[].name` | Name of the service. Can be anything. | Required `""` |
|
||||
| `services[].url` | URL to send the request to | Required `""` |
|
||||
| `services[].conditions` | Conditions used to determine the health of the service | `[]` |
|
||||
| `services[].interval` | Duration to wait between every status check | `60s` |
|
||||
| `services[].method` | Request method | `GET` |
|
||||
| `services[].graphql` | Whether to wrap the body in a query param (`{"query":"$body"}`) | `false` |
|
||||
| `services[].body` | Request body | `""` |
|
||||
| `services[].headers` | Request headers | `{}` |
|
||||
| `services[].alerts[].type` | Type of alert. Valid types: `slack`, `twilio`, `custom` | Required `""` |
|
||||
| `services[].alerts[].enabled` | Whether to enable the alert | `false` |
|
||||
| `services[].alerts[].threshold` | Number of failures in a row needed before triggering the alert | `3` |
|
||||
| `services[].alerts[].description` | Description of the alert. Will be included in the alert sent | `""` |
|
||||
| `services[].alerts[].send-on-resolved` | Whether to send a notification once a triggered alert subsides | `false` |
|
||||
| `alerting` | Configuration for alerting | `{}` |
|
||||
| `alerting.slack` | Webhook to use for alerts of type `slack` | `""` |
|
||||
| `alerting.twilio` | Settings for alerts of type `twilio` | `""` |
|
||||
| `alerting.twilio.sid` | Twilio account SID | Required `""` |
|
||||
| `alerting.twilio.token` | Twilio auth token | Required `""` |
|
||||
| `alerting.twilio.from` | Number to send Twilio alerts from | Required `""` |
|
||||
| `alerting.twilio.to` | Number to send twilio alerts to | Required `""` |
|
||||
| `alerting.custom` | Configuration for custom actions on failure or alerts | `""` |
|
||||
| `alerting.custom.url` | Custom alerting request url | `""` |
|
||||
| `alerting.custom.body` | Custom alerting request body. | `""` |
|
||||
| `alerting.custom.headers` | Custom alerting request headers | `{}` |
|
||||
| Parameter | Description | Default |
|
||||
| --------------------------------------------- | -------------------------------------------------------------------------- | -------------- |
|
||||
| `debug` | Whether to enable debug logs | `false` |
|
||||
| `metrics` | Whether to expose metrics at /metrics | `false` |
|
||||
| `services` | List of services to monitor | Required `[]` |
|
||||
| `services[].name` | Name of the service. Can be anything. | Required `""` |
|
||||
| `services[].url` | URL to send the request to | Required `""` |
|
||||
| `services[].conditions` | Conditions used to determine the health of the service | `[]` |
|
||||
| `services[].interval` | Duration to wait between every status check | `60s` |
|
||||
| `services[].method` | Request method | `GET` |
|
||||
| `services[].graphql` | Whether to wrap the body in a query param (`{"query":"$body"}`) | `false` |
|
||||
| `services[].body` | Request body | `""` |
|
||||
| `services[].headers` | Request headers | `{}` |
|
||||
| `services[].alerts[].type` | Type of alert. Valid types: `slack`, `twilio`, `custom` | Required `""` |
|
||||
| `services[].alerts[].enabled` | Whether to enable the alert | `false` |
|
||||
| `services[].alerts[].threshold` | Number of failures in a row needed before triggering the alert | `3` |
|
||||
| `services[].alerts[].description` | Description of the alert. Will be included in the alert sent | `""` |
|
||||
| `services[].alerts[].send-on-resolved` | Whether to send a notification once a triggered alert subsides | `false` |
|
||||
| `services[].alerts[].success-before-resolved` | Number of successes in a row needed before sending a resolved notification | `2` |
|
||||
| `alerting` | Configuration for alerting | `{}` |
|
||||
| `alerting.slack` | Webhook to use for alerts of type `slack` | `""` |
|
||||
| `alerting.twilio` | Settings for alerts of type `twilio` | `""` |
|
||||
| `alerting.twilio.sid` | Twilio account SID | Required `""` |
|
||||
| `alerting.twilio.token` | Twilio auth token | Required `""` |
|
||||
| `alerting.twilio.from` | Number to send Twilio alerts from | Required `""` |
|
||||
| `alerting.twilio.to` | Number to send twilio alerts to | Required `""` |
|
||||
| `alerting.custom` | Configuration for custom actions on failure or alerts | `""` |
|
||||
| `alerting.custom.url` | Custom alerting request url | `""` |
|
||||
| `alerting.custom.body` | Custom alerting request body. | `""` |
|
||||
| `alerting.custom.headers` | Custom alerting request headers | `{}` |
|
||||
|
||||
|
||||
### Conditions
|
||||
@ -121,6 +124,136 @@ Here are some examples of conditions you can use:
|
||||
| `len([BODY].name) == 8` | String at jsonpath `$.name` has a length of 8 | `{"name":"john.doe"}` | `{"name":"bob"}` |
|
||||
|
||||
|
||||
### Alerting
|
||||
|
||||
|
||||
|
||||
#### Configuring Slack alerts
|
||||
|
||||
```yaml
|
||||
alerting:
|
||||
slack: "https://hooks.slack.com/services/**********/**********/**********"
|
||||
services:
|
||||
- name: twinnation
|
||||
interval: 30s
|
||||
url: "https://twinnation.org/health"
|
||||
alerts:
|
||||
- type: slack
|
||||
enabled: true
|
||||
description: "healthcheck failed 3 times in a row"
|
||||
send-on-resolved: true
|
||||
- type: slack
|
||||
enabled: true
|
||||
threshold: 5
|
||||
description: "healthcheck failed 5 times in a row"
|
||||
send-on-resolved: true
|
||||
conditions:
|
||||
- "[STATUS] == 200"
|
||||
- "[BODY].status == UP"
|
||||
- "[RESPONSE_TIME] < 300"
|
||||
```
|
||||
|
||||
Here's an example of what the notifications look like:
|
||||
|
||||

|
||||
|
||||
|
||||
#### Configuring PagerDuty alerts
|
||||
|
||||
It is highly recommended to set `services[].alerts[].send-on-resolved` to `true` for alerts
|
||||
of type `pagerduty`, because unlike other alerts, the operation resulting from setting said
|
||||
parameter to `true` will not create another incident, but mark the incident as resolved on
|
||||
PagerDuty instead.
|
||||
|
||||
```yaml
|
||||
alerting:
|
||||
pagerduty: "********************************"
|
||||
services:
|
||||
- name: twinnation
|
||||
interval: 30s
|
||||
url: "https://twinnation.org/health"
|
||||
alerts:
|
||||
- type: pagerduty
|
||||
enabled: true
|
||||
threshold: 3
|
||||
description: "healthcheck failed 3 times in a row"
|
||||
send-on-resolved: true
|
||||
success-before-resolved: 5
|
||||
conditions:
|
||||
- "[STATUS] == 200"
|
||||
- "[BODY].status == UP"
|
||||
- "[RESPONSE_TIME] < 300"
|
||||
```
|
||||
|
||||
|
||||
#### Configuring Twilio alerts
|
||||
|
||||
```yaml
|
||||
alerting:
|
||||
twilio:
|
||||
sid: "..."
|
||||
token: "..."
|
||||
from: "+1-234-567-8901"
|
||||
to: "+1-234-567-8901"
|
||||
services:
|
||||
- name: twinnation
|
||||
interval: 30s
|
||||
url: "https://twinnation.org/health"
|
||||
alerts:
|
||||
- type: twilio
|
||||
enabled: true
|
||||
threshold: 5
|
||||
description: "healthcheck failed 5 times in a row"
|
||||
conditions:
|
||||
- "[STATUS] == 200"
|
||||
- "[BODY].status == UP"
|
||||
- "[RESPONSE_TIME] < 300"
|
||||
```
|
||||
|
||||
|
||||
#### Configuring custom alerts
|
||||
|
||||
While they're called alerts, you can use this feature to call anything.
|
||||
|
||||
For instance, you could automate rollbacks by having an application that keeps tracks of new deployments, and by
|
||||
leveraging Gatus, you could have Gatus call that application endpoint when a service starts failing. Your application
|
||||
would then check if the service that started failing was recently deployed, and if it was, then automatically
|
||||
roll it back.
|
||||
|
||||
The values `[ALERT_DESCRIPTION]` and `[SERVICE_NAME]` are automatically substituted for the alert description and the
|
||||
service name respectively in the body (`alerting.custom.body`) as well as the url (`alerting.custom.url`).
|
||||
|
||||
If you have `send-on-resolved` set to `true`, you may want to use `[ALERT_TRIGGERED_OR_RESOLVED]` to differentiate
|
||||
the notifications. It will be replaced for either `TRIGGERED` or `RESOLVED`, based on the situation.
|
||||
|
||||
For all intents and purpose, we'll configure the custom alert with a Slack webhook, but you can call anything you want.
|
||||
|
||||
```yaml
|
||||
alerting:
|
||||
custom:
|
||||
url: "https://hooks.slack.com/services/**********/**********/**********"
|
||||
method: "POST"
|
||||
body: |
|
||||
{
|
||||
"text": "[ALERT_TRIGGERED_OR_RESOLVED]: [SERVICE_NAME] - [ALERT_DESCRIPTION]"
|
||||
}
|
||||
services:
|
||||
- name: twinnation
|
||||
interval: 30s
|
||||
url: "https://twinnation.org/health"
|
||||
alerts:
|
||||
- type: custom
|
||||
enabled: true
|
||||
threshold: 10
|
||||
send-on-resolved: true
|
||||
description: "healthcheck failed 10 times in a row"
|
||||
conditions:
|
||||
- "[STATUS] == 200"
|
||||
- "[BODY].status == UP"
|
||||
- "[RESPONSE_TIME] < 300"
|
||||
```
|
||||
|
||||
|
||||
## Docker
|
||||
|
||||
```
|
||||
@ -173,101 +306,3 @@ will send a `POST` request to `http://localhost:8080/playground` with the follow
|
||||
```json
|
||||
{"query":" {\n user(gender: \"female\") {\n id\n name\n gender\n avatar\n }\n }"}
|
||||
```
|
||||
|
||||
|
||||
### Configuring Slack alerts
|
||||
|
||||
```yaml
|
||||
alerting:
|
||||
slack: "https://hooks.slack.com/services/**********/**********/**********"
|
||||
services:
|
||||
- name: twinnation
|
||||
interval: 30s
|
||||
url: "https://twinnation.org/health"
|
||||
alerts:
|
||||
- type: slack
|
||||
enabled: true
|
||||
description: "healthcheck failed 3 times in a row"
|
||||
send-on-resolved: true
|
||||
- type: slack
|
||||
enabled: true
|
||||
threshold: 5
|
||||
description: "healthcheck failed 5 times in a row"
|
||||
send-on-resolved: true
|
||||
conditions:
|
||||
- "[STATUS] == 200"
|
||||
- "[BODY].status == UP"
|
||||
- "[RESPONSE_TIME] < 300"
|
||||
```
|
||||
|
||||
Here's an example of what the notifications look like:
|
||||
|
||||

|
||||
|
||||
|
||||
### Configuring Twilio alerts
|
||||
|
||||
```yaml
|
||||
alerting:
|
||||
twilio:
|
||||
sid: "..."
|
||||
token: "..."
|
||||
from: "+1-234-567-8901"
|
||||
to: "+1-234-567-8901"
|
||||
services:
|
||||
- name: twinnation
|
||||
interval: 30s
|
||||
url: "https://twinnation.org/health"
|
||||
alerts:
|
||||
- type: twilio
|
||||
enabled: true
|
||||
threshold: 5
|
||||
description: "healthcheck failed 5 times in a row"
|
||||
conditions:
|
||||
- "[STATUS] == 200"
|
||||
- "[BODY].status == UP"
|
||||
- "[RESPONSE_TIME] < 300"
|
||||
```
|
||||
|
||||
|
||||
### Configuring custom alerts
|
||||
|
||||
While they're called alerts, you can use this feature to call anything.
|
||||
|
||||
For instance, you could automate rollbacks by having an application that keeps tracks of new deployments, and by
|
||||
leveraging Gatus, you could have Gatus call that application endpoint when a service starts failing. Your application
|
||||
would then check if the service that started failing was recently deployed, and if it was, then automatically
|
||||
roll it back.
|
||||
|
||||
The values `[ALERT_DESCRIPTION]` and `[SERVICE_NAME]` are automatically substituted for the alert description and the
|
||||
service name respectively in the body (`alerting.custom.body`) as well as the url (`alerting.custom.url`).
|
||||
|
||||
If you have `send-on-resolved` set to `true`, you may want to use `[ALERT_TRIGGERED_OR_RESOLVED]` to differentiate
|
||||
the notifications. It will be replaced for either `TRIGGERED` or `RESOLVED`, based on the situation.
|
||||
|
||||
For all intents and purpose, we'll configure the custom alert with a Slack webhook, but you can call anything you want.
|
||||
|
||||
```yaml
|
||||
alerting:
|
||||
custom:
|
||||
url: "https://hooks.slack.com/services/**********/**********/**********"
|
||||
method: "POST"
|
||||
body: |
|
||||
{
|
||||
"text": "[ALERT_TRIGGERED_OR_RESOLVED]: [SERVICE_NAME] - [ALERT_DESCRIPTION]"
|
||||
}
|
||||
services:
|
||||
- name: twinnation
|
||||
interval: 30s
|
||||
url: "https://twinnation.org/health"
|
||||
alerts:
|
||||
- type: custom
|
||||
enabled: true
|
||||
threshold: 10
|
||||
send-on-resolved: true
|
||||
description: "healthcheck failed 10 times in a row"
|
||||
conditions:
|
||||
- "[STATUS] == 200"
|
||||
- "[BODY].status == UP"
|
||||
- "[RESPONSE_TIME] < 300"
|
||||
```
|
||||
|
Reference in New Issue
Block a user