From ef9ca05fae51ec2d1bb86966120d5128bc155c2c Mon Sep 17 00:00:00 2001 From: Nikolay Stanchev <ns17@it-innovation.soton.ac.uk> Date: Tue, 16 Oct 2018 12:23:45 +0100 Subject: [PATCH] Updates alerts documentation --- docs/AlertsSpecification.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/docs/AlertsSpecification.md b/docs/AlertsSpecification.md index e97729a..069bf04 100644 --- a/docs/AlertsSpecification.md +++ b/docs/AlertsSpecification.md @@ -114,6 +114,7 @@ topology_template: condition: threshold: 100 # requests have increased by at least 100 granularity: 120 + aggregation_method: mean resource_type: flame_sfp: storage flame_sf: storage-users @@ -132,6 +133,7 @@ topology_template: condition: threshold: -100 # requests have decreased by at least 100 granularity: 120 + aggregation_method: mean resource_type: flame_sfp: storage flame_sf: storage-users @@ -224,7 +226,7 @@ the format is still the same for consistency. Therefore, using `<measurement>.*` * **threshold** - * for **threshold** event type, this is the critical value the queried metric is compared to. - * for **relative** event type, this is the critical value the difference (between the current metric value and the past metric value) is compared to. + * for **relative** event type, this is the critical value the difference (between the current aggregated metric value and the past aggregated metric value) is compared to. * for **deadman** event type, this is the critical value the number of measurement points (received in InfluxDB) is compared to. * **granularity** - period in seconds @@ -233,7 +235,7 @@ the format is still the same for consistency. Therefore, using `<measurement>.*` * for **deadman** event type, this value specifies how long the span in time (in which the number of measurement points are checked) is * **aggregation_method** - the function to use when querying InfluxDB, e.g. median, mean, etc. This value is only used when -the event_type is set to **threshold**. +the event_type is set to **threshold** or **relative**. * **resource_type** - provides context for the given event - key-value pairs for the global tags of the CLMC Information Model. This includes any of the following: `"flame_sfp", "flame_sf", "flame_sfe", "flame_server", "flame_location"`. @@ -294,7 +296,7 @@ result of the comparison operation is true, an alert is triggered. For example: "neq" : "not equal" ``` -* **relative** - A relative event type is an alert in which Kapacitor computes the difference between the current value of a metric and the value +* **relative** - A relative event type is an alert in which Kapacitor computes the difference between the current aggregated value of a metric and the aggregated value reported a given period of time ago. The difference between the current and the past value is then compared against a given threshold. If the result of the comparison operation is true, an alert is triggered. For example: @@ -308,6 +310,7 @@ threshold. If the result of the comparison operation is true, an alert is trigge condition: threshold: -100 granularity: 120 + aggregation_method: mean resource_type: flame_sfp: storage flame_sf: storage-users @@ -318,8 +321,8 @@ threshold. If the result of the comparison operation is true, an alert is trigge - flame_sfemc ``` - This trigger specification will create an alert task in Kapacitor, which compares every **requests** value reported in - measurement **storage** with the value received **120** seconds ago. If the difference between the current and the past + This trigger specification will create an alert task in Kapacitor, which compares the mean **requests** value reported in measurement **storage** + with the mean value received **120** seconds ago. If the difference between the current and the past value is less than or equal to (comparison operator is **lte**) **-100**, an alert is triggered. Simply explained, an alert is triggered if the **requests** current value has decreased by at least 100 relative to the value reported 120 seconds ago. The queried value is contextualised for service function **storage-users** (using service function package **storage**) @@ -329,6 +332,7 @@ threshold. If the result of the comparison operation is true, an alert is trigge * **aggregation_method** is not required here - the alert task compares the actual value that's being reported (stream mode) * if **aggregation_method** is provided, it will be ignored + * if X is the current timestamp, the current aggregated value refers to the period {X - granularity; X} while the past aggregated value refers to the period {X - 2*granularity; X - granularity} * **deadman** - A deadman event type is an alert in which Kapacitor computes the number of reported points in a measurement for a given period of time. This number is then compared to a given threshold value. If less number of points have been @@ -352,7 +356,7 @@ For example: This trigger specification will create an alert task in Kapacitor, which monitors the number of points reported in measurement **storage** and having tag **sfp** set as **storage**. This value is computed every 60 seconds. - If the number of reported points is less than **0** (no points have been reported for the last 60 seconds), an alert + If the number of reported points is less than or equal to **0** (no points have been reported for the last 60 seconds), an alert will be triggered. Triggered alerts will be sent through an HTTP POST message to the URLs listed in the **implementation** section. *Notes*: -- GitLab