Skip to content
Snippets Groups Projects
Commit ef9ca05f authored by Nikolay Stanchev's avatar Nikolay Stanchev
Browse files

Updates alerts documentation

parent 17ba3cb4
No related branches found
No related tags found
No related merge requests found
...@@ -114,6 +114,7 @@ topology_template: ...@@ -114,6 +114,7 @@ topology_template:
condition: condition:
threshold: 100 # requests have increased by at least 100 threshold: 100 # requests have increased by at least 100
granularity: 120 granularity: 120
aggregation_method: mean
resource_type: resource_type:
flame_sfp: storage flame_sfp: storage
flame_sf: storage-users flame_sf: storage-users
...@@ -132,6 +133,7 @@ topology_template: ...@@ -132,6 +133,7 @@ topology_template:
condition: condition:
threshold: -100 # requests have decreased by at least 100 threshold: -100 # requests have decreased by at least 100
granularity: 120 granularity: 120
aggregation_method: mean
resource_type: resource_type:
flame_sfp: storage flame_sfp: storage
flame_sf: storage-users flame_sf: storage-users
...@@ -224,7 +226,7 @@ the format is still the same for consistency. Therefore, using `<measurement>.*` ...@@ -224,7 +226,7 @@ the format is still the same for consistency. Therefore, using `<measurement>.*`
* **threshold** - * **threshold** -
* for **threshold** event type, this is the critical value the queried metric is compared to. * for **threshold** event type, this is the critical value the queried metric is compared to.
* for **relative** event type, this is the critical value the difference (between the current metric value and the past metric value) is compared to. * for **relative** event type, this is the critical value the difference (between the current aggregated metric value and the past aggregated metric value) is compared to.
* for **deadman** event type, this is the critical value the number of measurement points (received in InfluxDB) is compared to. * for **deadman** event type, this is the critical value the number of measurement points (received in InfluxDB) is compared to.
* **granularity** - period in seconds * **granularity** - period in seconds
...@@ -233,7 +235,7 @@ the format is still the same for consistency. Therefore, using `<measurement>.*` ...@@ -233,7 +235,7 @@ the format is still the same for consistency. Therefore, using `<measurement>.*`
* for **deadman** event type, this value specifies how long the span in time (in which the number of measurement points are checked) is * for **deadman** event type, this value specifies how long the span in time (in which the number of measurement points are checked) is
* **aggregation_method** - the function to use when querying InfluxDB, e.g. median, mean, etc. This value is only used when * **aggregation_method** - the function to use when querying InfluxDB, e.g. median, mean, etc. This value is only used when
the event_type is set to **threshold**. the event_type is set to **threshold** or **relative**.
* **resource_type** - provides context for the given event - key-value pairs for the global tags of the CLMC Information Model. * **resource_type** - provides context for the given event - key-value pairs for the global tags of the CLMC Information Model.
This includes any of the following: `"flame_sfp", "flame_sf", "flame_sfe", "flame_server", "flame_location"`. This includes any of the following: `"flame_sfp", "flame_sf", "flame_sfe", "flame_server", "flame_location"`.
...@@ -294,7 +296,7 @@ result of the comparison operation is true, an alert is triggered. For example: ...@@ -294,7 +296,7 @@ result of the comparison operation is true, an alert is triggered. For example:
"neq" : "not equal" "neq" : "not equal"
``` ```
* **relative** - A relative event type is an alert in which Kapacitor computes the difference between the current value of a metric and the value * **relative** - A relative event type is an alert in which Kapacitor computes the difference between the current aggregated value of a metric and the aggregated value
reported a given period of time ago. The difference between the current and the past value is then compared against a given reported a given period of time ago. The difference between the current and the past value is then compared against a given
threshold. If the result of the comparison operation is true, an alert is triggered. For example: threshold. If the result of the comparison operation is true, an alert is triggered. For example:
...@@ -308,6 +310,7 @@ threshold. If the result of the comparison operation is true, an alert is trigge ...@@ -308,6 +310,7 @@ threshold. If the result of the comparison operation is true, an alert is trigge
condition: condition:
threshold: -100 threshold: -100
granularity: 120 granularity: 120
aggregation_method: mean
resource_type: resource_type:
flame_sfp: storage flame_sfp: storage
flame_sf: storage-users flame_sf: storage-users
...@@ -318,8 +321,8 @@ threshold. If the result of the comparison operation is true, an alert is trigge ...@@ -318,8 +321,8 @@ threshold. If the result of the comparison operation is true, an alert is trigge
- flame_sfemc - flame_sfemc
``` ```
This trigger specification will create an alert task in Kapacitor, which compares every **requests** value reported in This trigger specification will create an alert task in Kapacitor, which compares the mean **requests** value reported in measurement **storage**
measurement **storage** with the value received **120** seconds ago. If the difference between the current and the past with the mean value received **120** seconds ago. If the difference between the current and the past
value is less than or equal to (comparison operator is **lte**) **-100**, an alert is triggered. Simply explained, an alert value is less than or equal to (comparison operator is **lte**) **-100**, an alert is triggered. Simply explained, an alert
is triggered if the **requests** current value has decreased by at least 100 relative to the value reported 120 seconds ago. is triggered if the **requests** current value has decreased by at least 100 relative to the value reported 120 seconds ago.
The queried value is contextualised for service function **storage-users** (using service function package **storage**) The queried value is contextualised for service function **storage-users** (using service function package **storage**)
...@@ -329,6 +332,7 @@ threshold. If the result of the comparison operation is true, an alert is trigge ...@@ -329,6 +332,7 @@ threshold. If the result of the comparison operation is true, an alert is trigge
* **aggregation_method** is not required here - the alert task compares the actual value that's being reported (stream mode) * **aggregation_method** is not required here - the alert task compares the actual value that's being reported (stream mode)
* if **aggregation_method** is provided, it will be ignored * if **aggregation_method** is provided, it will be ignored
* if X is the current timestamp, the current aggregated value refers to the period {X - granularity; X} while the past aggregated value refers to the period {X - 2*granularity; X - granularity}
* **deadman** - A deadman event type is an alert in which Kapacitor computes the number of reported points in a measurement * **deadman** - A deadman event type is an alert in which Kapacitor computes the number of reported points in a measurement
for a given period of time. This number is then compared to a given threshold value. If less number of points have been for a given period of time. This number is then compared to a given threshold value. If less number of points have been
...@@ -352,7 +356,7 @@ For example: ...@@ -352,7 +356,7 @@ For example:
This trigger specification will create an alert task in Kapacitor, which monitors the number of points reported in This trigger specification will create an alert task in Kapacitor, which monitors the number of points reported in
measurement **storage** and having tag **sfp** set as **storage**. This value is computed every 60 seconds. measurement **storage** and having tag **sfp** set as **storage**. This value is computed every 60 seconds.
If the number of reported points is less than **0** (no points have been reported for the last 60 seconds), an alert If the number of reported points is less than or equal to **0** (no points have been reported for the last 60 seconds), an alert
will be triggered. Triggered alerts will be sent through an HTTP POST message to the URLs listed in the **implementation** section. will be triggered. Triggered alerts will be sent through an HTTP POST message to the URLs listed in the **implementation** section.
*Notes*: *Notes*:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment