Skip to content
Snippets Groups Projects
Commit 21d62515 authored by Nikolay Stanchev's avatar Nikolay Stanchev
Browse files

Alerts configuration initial documentation

parent 817d7cd6
No related branches found
No related tags found
No related merge requests found
<!--
// © University of Southampton IT Innovation Centre, 2018
//
// Copyright in this software belongs to University of Southampton
// IT Innovation Centre of Gamma House, Enterprise Road,
// Chilworth Science Park, Southampton, SO16 7NS, UK.
//
// This software may not be used, sold, licensed, transferred, copied
// or reproduced in whole or in part in any manner or form or in or
// on any media by any person other than in accordance with the terms
// of the Licence Agreement supplied with the software, or otherwise
// without the prior written consent of the copyright owners.
//
// This software is distributed WITHOUT ANY WARRANTY, without even the
// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
// PURPOSE, except where stated in the Licence Agreement supplied with
// the software.
//
// Created By : Nikolay Stanchev
// Created Date : 15-08-2018
// Created for Project : FLAME
-->
# **FLAME - Integration of alerts, topics and handlers**
#### **Authors**
|Authors|Organisation|
|:---:|:---:|
|[Nikolay Stanchev](mailto:ns17@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
#### Description
This document outlines the configuration of alerts within CLMC. Alerts are configured through a YAML-based
TOSCA-compliant document. This document is passed to the CLMC service, which parses and validates the document. Subsequently, the CLMC service
creates and activates the alerts within Kapacitor, then registers the HTTP alert handlers specified in the document.
#### TOSCA Alerts Configuration Document
The TOSCA Alerts Configuration Document consists of two main sections - **metadata** and **triggers**. Full definitions and
clarification of the structure of the document is given in the following sections. An example of an alert configuration
document will look like:
```yaml
metadata:
sfc: companyA-VR
sfci: companyA-VR-premium
triggers:
high_latency:
description: This event triggers when the mean network latency in a given location exceeds a given threshold (in ms).
event_type: threshold
metric: network.latency
condition:
threshold: 45
granularity: 120
aggregation_method: mean
resource_type:
location: watershed
comparison_operator: gt
action:
implementation:
- http://sfemc.flame.eu/notify
- http://companyA.alert-handler.flame.eu/high-latency
low_requests:
description: |
This event triggers when the last reported number of requests for a given service function
falls behind a given threshold.
event_type: threshold
metric: storage.requests
condition:
threshold: 5
granularity: 60
aggregation_method: last
resource_type:
sf_package: storage
sf: storage-users
location: watershed
comparison_operator: lt
action:
implementation:
- http://sfemc.flame.eu/notify
- http://companyA.alert-handler.flame.eu/low-requests
```
##### Metadata
The ***metadata*** section specifies the service function chain ID and the service function chain instance ID, for which this
alerts configuration relates to. The format is the following:
```yaml
metadata:
sfc: <sfc_id>
sfci: <sfc_i_id>
```
##### Triggers
The ***triggers*** section defines a sequence of trigger-type nodes, each representing a fully qualified configuration for an
alert within CLMC. The format is the following:
```yaml
triggers:
<event identifier>:
description: <optional description for the given event trigger>
event_type: <threshold | relative | deadman>
metric: <measurement>.<field>
condition:
threshold: <critical value>
granularity: <period in seconds - how often to check whether the event condition is true>
aggregation_method: <aggregation function supported by InfluxDB - e.g. 'mean'>
resource_type:
<CLMC Information Model Tag Name>: <CLMC Information Model Tag Value>
<CLMC Information Model Tag Name>: <CLMC Information Model Tag Value>
...
comparison_operator: <logical operator to use for comparison, e.g. 'gt', 'lt'
action:
implementation:
- <HTTP Alert Handler URL>
- <HTTP Alert Handler URL>
...
...
```
##### Definitions
* **event_identifier** - the name of the event that **MUST** match with the *constraint* event name referenced in the TOSCA resource
specification document submitted to the FLAME Orchestrator.
* **event_type** - the type of TICK Script template to use to create the alert - more information will be provided about
the different options here, but we assume the most common one will be **threshold**.
* **metric** - the metric to query in InfluxDB, must include measurement name and field name in
format `<measurement>`.`<field>`
* **threshold** - when using the **threshold** event type, this is the critical value the actual metric is compared to.
* **granularity** - the period in seconds, which instructs Kapacitor how often to query InfluxDB and check whether the
event condition is true.
* **aggregation_method** - the function to use when querying InfluxDB
* **resource_type** - provides context for the given event - key-value pairs for the global tags of the
CLMC Information Model
* **comparison_operator** - the logical operator to use for comparison - less than, greater than, less than or erual to, etc.
* **implementation** - a list of the URLs of alert handlers to which alert data is sent when the event condition is true.
<!--
// © University of Southampton IT Innovation Centre, 2018
//
// Copyright in this software belongs to University of Southampton
// IT Innovation Centre of Gamma House, Enterprise Road,
// Chilworth Science Park, Southampton, SO16 7NS, UK.
//
// This software may not be used, sold, licensed, transferred, copied
// or reproduced in whole or in part in any manner or form or in or
// on any media by any person other than in accordance with the terms
// of the Licence Agreement supplied with the software, or otherwise
// without the prior written consent of the copyright owners.
//
// This software is distributed WITHOUT ANY WARRANTY, without even the
// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
// PURPOSE, except where stated in the Licence Agreement supplied with
// the software.
//
// Created By : Nikolay Stanchev
// Created Date : 09-08-2018
// Created for Project : FLAME
-->
# **FLAME - Integration of alerts, topics and handlers**
#### **Authors**
|Authors|Organisation|
|:---:|:---:|
|[Nikolay Stanchev](mailto:ns17@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
#### Description
This document outlines an internal proposal for the implementation of a CLMC Notification API - in relation to Kapacitor's alerts,
topics and handlers.
#### Terminology
1) Alert (a.k.a Task) - some work for Kapacitor to do periodically over time. Essentially, this is a query for Kapacitor to
execute and check the result of. If the result matches a given condition, an alert has to be fired.
2) Handler (a.k.a Event Handler or Alert Handler) - a software, which is responsible for handling triggered alerts. Currently,
we are only considering the HTTP Post handler - a HTTP server (or simply a socket) that listens on a given url for POST
messages.
3) Topic (a.k.a. Event Name) - a namespace to which an alert publishes data and from which a handler subscribes for alert data. Topics are used
to decouple Alerts from Handlers and are created on demand - an alert that publishes to non-existing topic will cause Kapacitor
to automatically create the topic and a handler which subscribes to a non-existing topic will cause Kapacitor to automatically
create the topic.
#### Proposal
After doing some extensive analysis on Kapacitor, I suggest that for managing **alerts**, we use task templates with
placeholders for MSP-specific values. This is feasible because as we found out from the CLMC infomation model analysis
everything apart from the ***ipendpoint*** identifier is already described in TOSCA by a MSP. Here is an example of what a simple
task template might look like:
```tickscript
// Alert template ID - threshold_exceeded
var db string
var rp = 'autogen' // default value for the retention policy
var measurement string
var field string
var whereCondition = 'TRUE' // default value is TRUE, hence no filtering of the query result
var messageValue = 'TRUE' // default value is TRUE, as this is what SFEMC expects as a notification for an event rule
var criticalValue float
var alertPeriod = 60s // this value is read from TOSCA and is measured in seconds, default value is 60 seconds
var topicID string
batch
|query('SELECT mean(' + field + ') AS mean_value FROM "' + db + '"."' + rp + '"."' + measurement + '" WHERE ' + whereCondition)
.period(alertPeriod)
.every(alertPeriod)
|alert()
.crit(lambda: "mean_value" >= criticalValue)
.message(messageValue)
.topic(topicID)
```
And here is an example of what a configuration for the template above might look like:
```json
{
"db": {"type": "string", "value": "CLMCMetrics"},
"rp": {"type": "string", "value": "autogen"},
"measurement": {"type": "string", "value": "storage_sf_measurement"},
"field": {"type": "string", "value": "service_delay"},
"criticalValue": {"type": "float", "value": 10.0},
"topicID": {"type": "string", "value": "storage_sf_delay_exceeded"},
"whereCondition": {"type": "string", "value": "sf_package='storage' and sf='storage-users'"}
}
```
Alerts configurations are received and managed by the CLMC service with Kapacitor on the background. Therefore, the CLMC
service must provide an API endpoint for receiving this data, e.g. /alerts/configuration. As a starting point we might want to
provide templates for the three types that are given when building alerts in Chronograf - *threshold*, *relative* and *deadman*.
These would then be adjusted/extended based on common use cases from the experiments. An example request to configure alerts might
look like this:
**HTTP POST Request**
**Request URL** - ***http://clmc.flame.eu/alerts/configuration***
**Request Body** - must contain a list of configuration objects (similar to the one defined above) along with the type of
template to use -
```json
[
{
"type": "threshold_exceeded",
"configuration": {
"db": {"type": "string", "value": "CLMCMetrics"},
"rp": {"type": "string", "value": "autogen"},
"measurement": {"type": "string", "value": "storage_sf_measurement"},
"field": {"type": "string", "value": "service_delay"},
"criticalValue": {"type": "float", "value": 10.0},
"topicID": {"type": "string", "value": "storage_sf_delay_exceeded"},
"whereCondition": {"type": "string", "value": "sf_package='storage' and sf='storage-users'"}
}
}
]
```
The alerts configuration must be sent along with the TOSCA template so that a *topic* validation is made of whether there
is an alert publishing data to the TOSCA-specific events and to also read the period value for each event (worth arguing on this).
For handling alerts defined with the aforementioned methodology, following from discussion with IDE and what they prefer,
we would automatically subscribe a well-known SFEMC endpoint handler, which will receive HTTP POST messages when alerts trigger,
to every TOSCA-specific notification event.
In case we decide to also allow MSP to provide MSP-specific alert handlers, the CLMC service will expose an
additional API endpoint - /alerts/configuration/\<topicID\>/handlers which will allow a MSP to subscribe with a
webhook handler to the given topic in a situation where the MSP wishes to receive alert notifications, too.
#### Sequence diagram of the interactions between SFEMC, CLMC and MSP
![sequence_diagram](image/CLMC-Notifications-SequenceD-v4.png)
\ No newline at end of file
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment