docs(alerting): Add Matrix alerts to README
This commit is contained in:
		
							
								
								
									
										107
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										107
									
								
								README.md
									
									
									
									
									
								
							| @ -50,6 +50,7 @@ Have any feedback or questions? [Create a discussion](https://github.com/TwiN/ga | ||||
|     - [Configuring Teams alerts](#configuring-teams-alerts) | ||||
|     - [Configuring Telegram alerts](#configuring-telegram-alerts) | ||||
|     - [Configuring Twilio alerts](#configuring-twilio-alerts) | ||||
|     - [Configuring Matrix alerts](#configuring-matrix-alerts) | ||||
|     - [Configuring custom alerts](#configuring-custom-alerts) | ||||
|     - [Setting a default alert](#setting-a-default-alert) | ||||
|   - [Maintenance](#maintenance) | ||||
| @ -276,7 +277,7 @@ See [examples/docker-compose-postgres-storage](.examples/docker-compose-postgres | ||||
|  | ||||
|  | ||||
| ### Client configuration | ||||
| In order to support a wide range of environments, each monitored endpoint has a unique configuration for  | ||||
| In order to support a wide range of environments, each monitored endpoint has a unique configuration for | ||||
| the client used to send the request. | ||||
|  | ||||
| | Parameter                     | Description                                                                | Default         | | ||||
| @ -377,7 +378,7 @@ ignored. | ||||
|  | ||||
| ```yaml | ||||
| alerting: | ||||
|   discord:  | ||||
|   discord: | ||||
|     webhook-url: "https://discord.com/api/webhooks/**********/**********" | ||||
|  | ||||
| endpoints: | ||||
| @ -420,7 +421,7 @@ alerting: | ||||
|     host: "mail.example.com" | ||||
|     port: 587 | ||||
|     to: "recipient1@example.com,recipient2@example.com" | ||||
|     # You can also add group-specific to keys, which will  | ||||
|     # You can also add group-specific to keys, which will | ||||
|     # override the to key above for the specified groups | ||||
|     overrides: | ||||
|       - group: "core" | ||||
| @ -470,7 +471,7 @@ endpoints: | ||||
|  | ||||
| ```yaml | ||||
| alerting: | ||||
|   googlechat:  | ||||
|   googlechat: | ||||
|     webhook-url: "https://chat.googleapis.com/v1/spaces/*******/messages?key=**********&token=********" | ||||
|  | ||||
| endpoints: | ||||
| @ -501,7 +502,7 @@ endpoints: | ||||
|  | ||||
| ```yaml | ||||
| alerting: | ||||
|   mattermost:  | ||||
|   mattermost: | ||||
|     webhook-url: "http://**********/hooks/**********" | ||||
|     client: | ||||
|       insecure: true | ||||
| @ -601,9 +602,9 @@ Behavior: | ||||
|  | ||||
| ```yaml | ||||
| alerting: | ||||
|   pagerduty:  | ||||
|   pagerduty: | ||||
|     integration-key: "********************************" | ||||
|     # You can also add group-specific integration keys, which will  | ||||
|     # You can also add group-specific integration keys, which will | ||||
|     # override the integration key above for the specified groups | ||||
|     overrides: | ||||
|       - group: "core" | ||||
| @ -653,7 +654,7 @@ endpoints: | ||||
| | `alerting.slack.overrides[].webhook-url`  | Slack Webhook URL                                                                          | `""`          | | ||||
| ```yaml | ||||
| alerting: | ||||
|   slack:  | ||||
|   slack: | ||||
|     webhook-url: "https://hooks.slack.com/services/**********/**********/**********" | ||||
|  | ||||
| endpoints: | ||||
| @ -696,7 +697,7 @@ Here's an example of what the notifications look like: | ||||
| alerting: | ||||
|   teams: | ||||
|     webhook-url: "https://********.webhook.office.com/webhookb2/************" | ||||
|     # You can also add group-specific to keys, which will  | ||||
|     # You can also add group-specific to keys, which will | ||||
|     # override the to key above for the specified groups | ||||
|     overrides: | ||||
|       - group: "core" | ||||
| @ -745,7 +746,7 @@ Here's an example of what the notifications look like: | ||||
|  | ||||
| ```yaml | ||||
| alerting: | ||||
|   telegram:  | ||||
|   telegram: | ||||
|     token: "123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11" | ||||
|     id: "0123456789" | ||||
|  | ||||
| @ -801,6 +802,36 @@ endpoints: | ||||
|         description: "healthcheck failed" | ||||
| ``` | ||||
|  | ||||
| #### Configuring Matrix alerts | ||||
| | Parameter                          | Description                                                                                | Default                            | | ||||
| |:-----------------------------------|:-------------------------------------------------------------------------------------------|:-----------------------------------| | ||||
| | `alerting.matrix`                  | Settings for alerts of type `matrix`                                                       | `{}`                               | | ||||
| | `alerting.matrix.homeserver-url`   | Custom homeserver URL                                                                      | `https://matrix-client.matrix.org` | | ||||
| | `alerting.matrix.access-token`     | Bot user access token                                                                      | Required `""`                      | | ||||
| | `alerting.matrix.internal-room-id` | Internal room ID of room that bot user can send messages  to                               | Required `""`                      | | ||||
| | `alerting.matrix.default-alert`    | Default alert configuration. <br />See [Setting a default alert](#setting-a-default-alert) | N/A                                | | ||||
|  | ||||
| ```yaml | ||||
| alerting: | ||||
|   matrix: | ||||
|     homeserver-url: "..." | ||||
|     access-token: "..." | ||||
|     internal-room-id: "..." | ||||
|  | ||||
| endpoints: | ||||
|   - name: website | ||||
|     interval: 30s | ||||
|     url: "https://twin.sh/health" | ||||
|     conditions: | ||||
|       - "[STATUS] == 200" | ||||
|       - "[BODY].status == UP" | ||||
|       - "[RESPONSE_TIME] < 300" | ||||
|     alerts: | ||||
|       - type: matrix | ||||
|         enabled: true | ||||
|         send-on-resolved: true | ||||
|         description: "healthcheck failed" | ||||
| ``` | ||||
|  | ||||
| #### Configuring custom alerts | ||||
| | Parameter                       | Description                                                                                | Default       | | ||||
| @ -813,9 +844,9 @@ endpoints: | ||||
| | `alerting.custom.client`        | Client configuration. <br />See [Client configuration](#client-configuration).             | `{}`          | | ||||
| | `alerting.custom.default-alert` | Default alert configuration. <br />See [Setting a default alert](#setting-a-default-alert) | N/A           | | ||||
|  | ||||
| While they're called alerts, you can use this feature to call anything.  | ||||
| While they're called alerts, you can use this feature to call anything. | ||||
|  | ||||
| For instance, you could automate rollbacks by having an application that keeps tracks of new deployments, and by  | ||||
| For instance, you could automate rollbacks by having an application that keeps tracks of new deployments, and by | ||||
| leveraging Gatus, you could have Gatus call that application endpoint when an endpoint starts failing. Your application | ||||
| would then check if the endpoint that started failing was part of the recently deployed application, and if it was, | ||||
| then automatically roll it back. | ||||
| @ -827,7 +858,7 @@ Furthermore, you may use the following placeholders in the body (`alerting.custo | ||||
| - `[ENDPOINT_URL]` (resolved from `endpoints[].url`) | ||||
|  | ||||
| If you have an alert using the `custom` provider with `send-on-resolved` set to `true`, you can use the | ||||
| `[ALERT_TRIGGERED_OR_RESOLVED]` placeholder to differentiate the notifications.  | ||||
| `[ALERT_TRIGGERED_OR_RESOLVED]` placeholder to differentiate the notifications. | ||||
| The aforementioned placeholder will be replaced by `TRIGGERED` or `RESOLVED` accordingly, though it can be modified | ||||
| (details at the end of this section). | ||||
|  | ||||
| @ -867,7 +898,7 @@ alerting: | ||||
|         TRIGGERED: "partial_outage" | ||||
|         RESOLVED: "operational" | ||||
| ``` | ||||
| As a result, the `[ALERT_TRIGGERED_OR_RESOLVED]` in the body of first example of this section would be replaced by  | ||||
| As a result, the `[ALERT_TRIGGERED_OR_RESOLVED]` in the body of first example of this section would be replaced by | ||||
| `partial_outage` when an alert is triggered and `operational` when an alert is resolved. | ||||
|  | ||||
|  | ||||
| @ -886,7 +917,7 @@ long configuration file. | ||||
| To avoid such problem, you can use the `default-alert` parameter present in each provider configuration: | ||||
| ```yaml | ||||
| alerting: | ||||
|   slack:  | ||||
|   slack: | ||||
|     webhook-url: "https://hooks.slack.com/services/**********/**********/**********" | ||||
|     default-alert: | ||||
|       enabled: true | ||||
| @ -963,7 +994,7 @@ endpoints: | ||||
| ``` | ||||
|  | ||||
| ### Maintenance | ||||
| If you have maintenance windows, you may not want to be annoyed by alerts.  | ||||
| If you have maintenance windows, you may not want to be annoyed by alerts. | ||||
| To do that, you'll have to use the maintenance configuration: | ||||
|  | ||||
| | Parameter              | Description                                                                                                                            | Default       | | ||||
| @ -1069,8 +1100,8 @@ To run Gatus locally with Docker: | ||||
| docker run -p 8080:8080 --name gatus twinproduction/gatus | ||||
| ``` | ||||
|  | ||||
| Other than using one of the examples provided in the [.examples](.examples) folder, you can also try it out locally by  | ||||
| creating a configuration file, we'll call it `config.yaml` for this example, and running the following  | ||||
| Other than using one of the examples provided in the [.examples](.examples) folder, you can also try it out locally by | ||||
| creating a configuration file, we'll call it `config.yaml` for this example, and running the following | ||||
| command: | ||||
| ```console | ||||
| docker run -p 8080:8080 --mount type=bind,source="$(pwd)"/config.yaml,target=/config/config.yaml --name gatus twinproduction/gatus | ||||
| @ -1154,26 +1185,26 @@ will send a `POST` request to `http://localhost:8080/playground` with the follow | ||||
| To ensure that Gatus provides reliable and accurate results (i.e. response time), Gatus only evaluates one endpoint at a time | ||||
| In other words, even if you have multiple endpoints with the same interval, they will not execute at the same time. | ||||
|  | ||||
| You can test this yourself by running Gatus with several endpoints configured with a very short, unrealistic interval,  | ||||
| You can test this yourself by running Gatus with several endpoints configured with a very short, unrealistic interval, | ||||
| such as 1ms. You'll notice that the response time does not fluctuate - that is because while endpoints are evaluated on | ||||
| different goroutines, there's a global lock that prevents multiple endpoints from running at the same time. | ||||
|  | ||||
| Unfortunately, there is a drawback. If you have a lot of endpoints, including some that are very slow or prone to timing out  | ||||
| Unfortunately, there is a drawback. If you have a lot of endpoints, including some that are very slow or prone to timing out | ||||
| (the default timeout is 10s), then it means that for the entire duration of the request, no other endpoint can be evaluated. | ||||
|  | ||||
| The interval does not include the duration of the request itself, which means that if an endpoint has an interval of 30s  | ||||
| and the request takes 2s to complete, the timestamp between two evaluations will be 32s, not 30s.  | ||||
| The interval does not include the duration of the request itself, which means that if an endpoint has an interval of 30s | ||||
| and the request takes 2s to complete, the timestamp between two evaluations will be 32s, not 30s. | ||||
|  | ||||
| While this does not prevent Gatus' from performing health checks on all other endpoints, it may cause Gatus to be unable  | ||||
| While this does not prevent Gatus' from performing health checks on all other endpoints, it may cause Gatus to be unable | ||||
| to respect the configured interval, for instance: | ||||
| - Endpoint A has an interval of 5s, and times out after 10s to complete  | ||||
| - Endpoint A has an interval of 5s, and times out after 10s to complete | ||||
| - Endpoint B has an interval of 5s, and takes 1ms to complete | ||||
| - Endpoint B will be unable to run every 5s, because endpoint A's health evaluation takes longer than its interval | ||||
|  | ||||
| To sum it up, while Gatus can handle any interval you throw at it, you're better off having slow requests with  | ||||
| To sum it up, while Gatus can handle any interval you throw at it, you're better off having slow requests with | ||||
| higher interval. | ||||
|  | ||||
| As a rule of thumb, I personally set the interval for more complex health checks to `5m` (5 minutes) and  | ||||
| As a rule of thumb, I personally set the interval for more complex health checks to `5m` (5 minutes) and | ||||
| simple health checks used for alerting (PagerDuty/Twilio) to `30s`. | ||||
|  | ||||
|  | ||||
| @ -1199,18 +1230,18 @@ endpoints: | ||||
|       - "[CONNECTED] == true" | ||||
| ``` | ||||
|  | ||||
| Placeholders `[STATUS]` and `[BODY]` as well as the fields `endpoints[].body`, `endpoints[].headers`,  | ||||
| Placeholders `[STATUS]` and `[BODY]` as well as the fields `endpoints[].body`, `endpoints[].headers`, | ||||
| `endpoints[].method` and `endpoints[].graphql` are not supported for TCP endpoints. | ||||
|  | ||||
| This works for applications such as databases (Postgres, MySQL, etc.) and caches (Redis, Memcached, etc.). | ||||
|  | ||||
| **NOTE**: `[CONNECTED] == true` does not guarantee that the endpoint itself is healthy - it only guarantees that there's  | ||||
| something at the given address listening to the given port, and that a connection to that address was successfully  | ||||
| **NOTE**: `[CONNECTED] == true` does not guarantee that the endpoint itself is healthy - it only guarantees that there's | ||||
| something at the given address listening to the given port, and that a connection to that address was successfully | ||||
| established. | ||||
|  | ||||
|  | ||||
| ### Monitoring an endpoint using ICMP | ||||
| By prefixing `endpoints[].url` with `icmp:\\`, you can monitor endpoints at a very basic level using ICMP, or more  | ||||
| By prefixing `endpoints[].url` with `icmp:\\`, you can monitor endpoints at a very basic level using ICMP, or more | ||||
| commonly known as "ping" or "echo": | ||||
|  | ||||
| ```yaml | ||||
| @ -1242,12 +1273,12 @@ endpoints: | ||||
|  | ||||
| There are two placeholders that can be used in the conditions for endpoints of type DNS: | ||||
| - The placeholder `[BODY]` resolves to the output of the query. For instance, a query of type `A` would return an IPv4. | ||||
| - The placeholder `[DNS_RCODE]` resolves to the name associated to the response code returned by the query, such as  | ||||
| - The placeholder `[DNS_RCODE]` resolves to the name associated to the response code returned by the query, such as | ||||
| `NOERROR`, `FORMERR`, `SERVFAIL`, `NXDOMAIN`, etc. | ||||
|  | ||||
|  | ||||
| ### Monitoring an endpoint using STARTTLS | ||||
| If you have an email server that you want to ensure there are no problems with, monitoring it through STARTTLS  | ||||
| If you have an email server that you want to ensure there are no problems with, monitoring it through STARTTLS | ||||
| will serve as a good initial indicator: | ||||
| ```yaml | ||||
| endpoints: | ||||
| @ -1280,11 +1311,11 @@ endpoints: | ||||
| ### disable-monitoring-lock | ||||
| Setting `disable-monitoring-lock` to `true` means that multiple endpoints could be monitored at the same time. | ||||
|  | ||||
| While this behavior wouldn't generally be harmful, conditions using the `[RESPONSE_TIME]` placeholder could be impacted  | ||||
| While this behavior wouldn't generally be harmful, conditions using the `[RESPONSE_TIME]` placeholder could be impacted | ||||
| by the evaluation of multiple endpoints at the same time, therefore, the default value for this parameter is `false`. | ||||
|  | ||||
| There are three main reasons why you might want to disable the monitoring lock: | ||||
| - You're using Gatus for load testing (each endpoint are periodically evaluated on a different goroutine, so  | ||||
| - You're using Gatus for load testing (each endpoint are periodically evaluated on a different goroutine, so | ||||
| technically, if you create 100 endpoints with a 1 seconds interval, Gatus will send 100 requests per second) | ||||
| - You have a _lot_ of endpoints to monitor | ||||
| - You want to test multiple endpoints at very short intervals (< 5s) | ||||
| @ -1381,7 +1412,7 @@ web: | ||||
|  | ||||
|  | ||||
| Gatus can automatically generate an SVG badge for one of your monitored endpoints. | ||||
| This allows you to put badges in your individual applications' README or even create your own status page if you  | ||||
| This allows you to put badges in your individual applications' README or even create your own status page if you | ||||
| desire. | ||||
|  | ||||
| The path to generate a badge is the following: | ||||
| @ -1392,7 +1423,7 @@ Where: | ||||
| - `{duration}` is `7d`, `24h` or `1h` | ||||
| - `{key}` has the pattern `<GROUP_NAME>_<ENDPOINT_NAME>` in which both variables have ` `, `/`, `_`, `,` and `.` replaced by `-`. | ||||
|  | ||||
| For instance, if you want the uptime during the last 24 hours from the endpoint `frontend` in the group `core`,  | ||||
| For instance, if you want the uptime during the last 24 hours from the endpoint `frontend` in the group `core`, | ||||
| the URL would look like this: | ||||
| ``` | ||||
| https://example.com/api/v1/endpoints/core_frontend/uptimes/7d/badge.svg | ||||
| @ -1418,7 +1449,7 @@ The path to generate a badge is the following: | ||||
| Where: | ||||
| - `{key}` has the pattern `<GROUP_NAME>_<ENDPOINT_NAME>` in which both variables have ` `, `/`, `_`, `,` and `.` replaced by `-`. | ||||
|  | ||||
| For instance, if you want the current status of the endpoint `frontend` in the group `core`,  | ||||
| For instance, if you want the current status of the endpoint `frontend` in the group `core`, | ||||
| the URL would look like this: | ||||
| ``` | ||||
| https://example.com/api/v1/endpoints/core_frontend/health/badge.svg | ||||
| @ -1456,7 +1487,7 @@ Example: https://status.twin.sh/api/v1/endpoints/core_blog-home/statuses | ||||
|  | ||||
| Gzip compression will be used if the `Accept-Encoding` HTTP header contains `gzip`. | ||||
|  | ||||
| The API will return a JSON payload with the `Content-Type` response header set to `application/json`.  | ||||
| The API will return a JSON payload with the `Content-Type` response header set to `application/json`. | ||||
| No such header is required to query the API. | ||||
|  | ||||
|  | ||||
|  | ||||
		Reference in New Issue
	
	Block a user