Skip to content
Snippets Groups Projects
Commit f7f03c64 authored by Nikolay Stanchev's avatar Nikolay Stanchev
Browse files

Updates AlertsConfiguration documentation

parent 96901f83
No related branches found
No related tags found
No related merge requests found
......@@ -33,16 +33,17 @@
#### Description
This document outlines the TOSCA alert specification used to configure alerts within CLMC. Alerts are configured through a YAML-based
TOSCA-compliant document according to the TOSCA simple profile. This document is passed to the CLMC service, which parses and validates the document. Subsequently, the CLMC service
creates and activates the alerts within Kapacitor, then registers the HTTP alert handlers specified in the document.
TOSCA-compliant document according to the TOSCA simple profile. This document is passed to the CLMC service, which parses and validates the document.
Subsequently, the CLMC service creates and activates the alerts within Kapacitor, then registers the HTTP alert handlers specified in the document.
The specification is compliant with the TOSCA policy template as implemented by the Openstack tosca parser. See an example below:
https://github.com/openstack/tosca-parser/blob/master/toscaparser/tests/data/policies/tosca_policy_template.yaml
#### TOSCA Alerts Specification Document
The TOSCA Alerts Specification Document consists of two main sections - **metadata** and **triggers**. Full definitions and
clarification of the structure of the document is given in the following sections. An example of a valid alert specification
The TOSCA Alerts Specification Document consists of two main sections - **metadata** and **policies**. Each **policy** contains a number
of triggers. A **trigger** is a fully qualified specification for an alert. Full definitions and clarification of the structure of the document
is given in the following sections. An example of a valid alert specification
document will look like:
```yaml
......@@ -188,22 +189,23 @@ topology_template:
event_type: <threshold | relative | deadman>
metric: <measurement>.<field>
condition:
threshold: <critical value>
granularity: <period in seconds - how often to check whether the event condition is true>
threshold: <critical value - semantics depend on the event type>
granularity: <period in seconds - semantic depends on the event type>
aggregation_method: <aggregation function supported by InfluxDB - e.g. 'mean'>
resource_type:
<CLMC Information Model Tag Name>: <CLMC Information Model Tag Value>
<CLMC Information Model Tag Name>: <CLMC Information Model Tag Value>
...
comparison_operator: <logical operator to use for comparison, e.g. 'gt', 'lt'
comparison_operator: <logical operator to use for comparison, e.g. 'gt', 'lt', 'gte', etc.
action:
implementation:
- <HTTP Alert Handler URL>
- <HTTP Alert Handler URL>
- <HTTP Alert Handler URL - receives POST messages from Kapacitor when alerts trigger>
- <HTTP Alert Handler URL - receives POST messages from Kapacitor when alerts trigger>
...
...
```
##### Definitions
* **policy_identifier** - policy label which should match with a StateChange policy in the TOSCA resource specification document
......@@ -213,21 +215,142 @@ submitted to the FLAME Orchestrator for consistency.
specification document submitted to the FLAME Orchestrator.
* **event_type** - the type of TICK Script template to use to create the alert - more information will be provided about
the different options here, but we assume the most common one will be **threshold**.
the different options here, but we assume the most common one will be **threshold**. Other supported types are **relative**
and **deadman**.
* **metric** - the metric to query in InfluxDB, must include measurement name and field name in
format `<measurement>`.`<field>`
format `<measurement>`.`<field>`. The only exception is when a **deadman** event type is used, where the `<field>`
* **threshold** - when using the **threshold** event type, this is the critical value the actual metric is compared to.
* **threshold** -
* for **threshold** event type, this is the critical value the queried metric is compared to.
* for **relative** event type, this is the critical value the difference (between the current metric value and the past metric value) is compared to.
* for **deadman** event type, this is the critical value the number of measurement points (received in InfluxDB) is compared to.
* **granularity** - the period in seconds, which instructs Kapacitor how often to query InfluxDB and check whether the
event condition is true.
* **granularity** - period in seconds
* for **threshold** event type, this value specifies how often should Kapacitor query InfluxDB to check whether the alert condition is true.
* for **relative** event type, this value specifies how long back in time to compare the current metric value with
* for **deadman** event type, this value specifies how long the span in time (in which the number of measurement points are checked) is
* **aggregation_method** - the function to use when querying InfluxDB
* **aggregation_method** - the function to use when querying InfluxDB, e.g. median, mean, etc. This value is only used when
the event_type is set to **threshold**.
* **resource_type** - provides context for the given event - key-value pairs for the global tags of the
CLMC Information Model
* **resource_type** - provides context for the given event - key-value pairs for the global tags of the CLMC Information Model.
* **comparison_operator** - the logical operator to use for comparison - less than, greater than, less than or erual to, etc.
* **comparison_operator** - the logical operator to use for comparison - lt (less than), gt ()greater than), lte (less than or equal to), etc.
* **implementation** - a list of the URLs of alert handlers to which alert data is sent when the event condition is true.
##### Event types
* **threshold** - A threshold event type is an alert in which Kapacitor queries InfluxDB on specific metric in a given period of time
by using a query function such as *mean*, *median*, *mode*, etc. This value is then compared against a given threshold. If the
result of the comparison operation is true, an alert is triggered. For example:
```yaml
high_latency:
description: This event triggers when the mean network latency in a given location exceeds a given threshold (in ms).
event_type: threshold
metric: network.latency
condition:
threshold: 45
granularity: 120
aggregation_method: mean
resource_type:
location: watershed
comparison_operator: gt
action:
implementation:
- http://sfemc.flame.eu/notify
- http://companyA.alert-handler.flame.eu/high-latency
```
This trigger specification will create an alert task in Kapacitor, which queries the **latency** field in the **network**
measurement on location **watershed** every **120** seconds and compares the mean value for the last 120 seconds with the threshold value **45**.
If the mean latency exceeds 45 (**gt** operator is used, which stands for **greater than**), an alert is triggered. This alert will
be sent through an HTTP POST message to the URLs listed in the **implementation** section.
The currently included InfluxQL functions are:
`"count", "mean", "median", "mode", "sum", "first", "last", "max", "min"`
The comparison operator mappings are as follows:
```
"lt" : "less than",
"gt" : "greater than",
"lte" : "less than or equal to",
"gte" : "greater than or equal to",
"eq" : "equal",
"neq" : "not equal"
```
* **relative** - A relative event type is an alert in which Kapacitor computes the difference between the current value of a metric and the value
reported a given period of time ago. The difference between the current and the past value is then compared against a given
threshold. If the result of the comparison operation is true, an alert is triggered. For example:
```yaml
decrease_in_requests:
description: |
This event triggers when the number of requests has decreased relative to the number of requests received
120 seconds ago.
event_type: relative
metric: storage.requests
condition:
threshold: -100
granularity: 120
resource_type:
sf_package: storage
sf: storage-users
location: watershed
comparison_operator: lte
action:
implementation:
- http://sfemc.flame.eu/notify
```
This trigger specification will create an alert task in Kapacitor, which compares every **requests** value reported in
measurement **storage** with the value received **120** seconds ago. If the difference between the current and the past
value is less than or equal to (comparison operator is **lte**) **-100**, an alert is triggered. Simply explained, an alert
is triggered if the **requests** current value has decreased by at least 100 relative to the value reported 120 seconds ago.
The queried value is contextualised for service function **storage-users** (using service function package **storage**)
at location **watershed**. Triggered alerts will be sent through an HTTP POST message to the URLs listed in the **implementation** section.
*Notes*:
* **aggregation_method** is not required here - the alert task compares the actual value that's being reported (stream mode)
* if **aggregation_method** is provided, it will be ignored
* **deadman** - A deadman event type is an alert in which Kapacitor computes the number of reported points in a measurement
for a given period of time. This number is then compared to a given threshold value. If less number of points have been
reported (in comparison with the threshold value), an alert is triggered.
For example:
```yaml
missing_storage_measurements:
description: This event triggers when the number of storage measurements reported falls below the threshold value.
event_type: deadman
metric: storage.*
condition:
threshold: 0
granularity: 60
resource_type:
sf_package: storage
action:
implementation:
- http://sfemc.flame.eu/notify
```
This trigger specification will create an alert task in Kapacitor, which monitors the number of points reported in
measurement **storage** and having tag **sf_package** set as **storage**. This value is computed every 60 seconds.
If the number of reported points is less than **0** (no points have been reported for the last 60 seconds), an alert
will be triggered. Triggered alerts will be sent through an HTTP POST message to the URLs listed in the **implementation** section.
*Notes*:
* **metric** only requires the measurement name in this event type and doesn't require a field name
* the trigger specification still needs to be consisten with the parsing rule for **metric**: `<measurement>`.`<field>`
* simply putting a `*` for field is sufficient, e.g. `storage.*`
* even if you put something else for field value, it will be ignored - only the **measurement** name is used
* **aggregation_method** is not required in this event type, any values provided will be ignored
* **comparison operator** is not required in this event type, any values provided will be ignored
[run]
source = clmcservice
omit =
*test*
*__init__*
clmcservice\aggregation\influx_data_interface.py
clmcservice\configapi\views.py
clmcservice\whoamiapi\views.py
# configapi\views and whoami\views are currently omitted since there is no implementation there, yet
\ No newline at end of file
source = clmcservice
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment