Alerts configuration initial documentation

21d62515 · Nikolay Stanchev · 817d7cd6 · 21d62515 · 817d7cd6 · 817d7cd6
Commit 21d62515 authored 6 years ago by Nikolay Stanchev
--- a/docs/AlertsConfiguration.md
+++ b/docs/AlertsConfiguration.md
+<!--
+// © University of Southampton IT Innovation Centre, 2018
+//
+// Copyright in this software belongs to University of Southampton
+// IT Innovation Centre of Gamma House, Enterprise Road, 
+// Chilworth Science Park, Southampton, SO16 7NS, UK.
+//
+// This software may not be used, sold, licensed, transferred, copied
+// or reproduced in whole or in part in any manner or form or in or
+// on any media by any person other than in accordance with the terms
+// of the Licence Agreement supplied with the software, or otherwise
+// without the prior written consent of the copyright owners.
+//
+// This software is distributed WITHOUT ANY WARRANTY, without even the
+// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+// PURPOSE, except where stated in the Licence Agreement supplied with
+// the software.
+//
+//      Created By :            Nikolay Stanchev
+//      Created Date :          15-08-2018
+//      Created for Project :   FLAME
+-->
+
+# **FLAME - Integration of alerts, topics and handlers**
+
+#### **Authors**
+
+|Authors|Organisation|                    
+|:---:|:---:|  
+|[Nikolay Stanchev](mailto:ns17@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
+
+
+#### Description
+
+This document outlines the configuration of alerts within CLMC. Alerts are configured through a YAML-based
+TOSCA-compliant document. This document is passed to the CLMC service, which parses and validates the document. Subsequently, the CLMC service
+creates and activates the alerts within Kapacitor, then registers the HTTP alert handlers specified in the document.
+
+
+#### TOSCA Alerts Configuration Document
+
+The TOSCA Alerts Configuration Document consists of two main sections - **metadata** and **triggers**. Full definitions and
+clarification of the structure of the document is given in the following sections. An example of an alert configuration
+document will look like:
+
+```yaml
+metadata:
+    sfc: companyA-VR
+    sfci: companyA-VR-premium
+triggers:
+    high_latency:
+      description: This event triggers when the mean network latency in a given location exceeds a given threshold (in ms).
+      event_type: threshold
+      metric: network.latency
+      condition:
+        threshold: 45
+        granularity: 120
+        aggregation_method: mean
+        resource_type:
+          location: watershed
+        comparison_operator: gt
+      action:
+        implementation:
+          - http://sfemc.flame.eu/notify
+          - http://companyA.alert-handler.flame.eu/high-latency
+    low_requests:
+      description: |
+        This event triggers when the last reported number of requests for a given service function 
+        falls behind a given threshold.
+      event_type: threshold
+      metric: storage.requests
+      condition:
+        threshold: 5
+        granularity: 60
+        aggregation_method: last
+        resource_type:
+          sf_package: storage
+          sf: storage-users
+          location: watershed  
+        comparison_operator: lt
+      action:
+        implementation:
+          - http://sfemc.flame.eu/notify
+          - http://companyA.alert-handler.flame.eu/low-requests
+```
+
+
+##### Metadata
+
+The ***metadata*** section specifies the service function chain ID and the service function chain instance ID, for which this 
+alerts configuration relates to. The format is the following:
+
+```yaml
+metadata:
+    sfc: <sfc_id>
+    sfci: <sfc_i_id>
+```
+
+##### Triggers
+
+The ***triggers*** section defines a sequence of trigger-type nodes, each representing a fully qualified configuration for an
+alert within CLMC. The format is the following:
+
+```yaml
+triggers:
+    <event identifier>:
+      description: <optional description for the given event trigger>
+      event_type: <threshold | relative | deadman>
+      metric: <measurement>.<field>
+      condition:
+        threshold: <critical value>
+        granularity: <period in seconds - how often to check whether the event condition is true>
+        aggregation_method: <aggregation function supported by InfluxDB - e.g. 'mean'>
+        resource_type:
+          <CLMC Information Model Tag Name>: <CLMC Information Model Tag Value>
+          <CLMC Information Model Tag Name>: <CLMC Information Model Tag Value>
+          ...
+        comparison_operator: <logical operator to use for comparison, e.g. 'gt', 'lt'
+      action:
+        implementation:
+          - <HTTP Alert Handler URL>
+          - <HTTP Alert Handler URL>
+          ...
+    ...
+```
+
+##### Definitions
+
+* **event_identifier** - the name of the event that **MUST** match with the *constraint* event name referenced in the TOSCA resource
+specification document submitted to the FLAME Orchestrator.
+
+* **event_type** - the type of TICK Script template to use to create the alert - more information will be provided about 
+the different options here, but we assume the most common one will be **threshold**.
+
+* **metric** - the metric to query in InfluxDB, must include measurement name and field name in 
+format `<measurement>`.`<field>`
+
+* **threshold** - when using the **threshold** event type, this is the critical value the actual metric is compared to.
+
+* **granularity** - the period in seconds, which instructs Kapacitor how often to query InfluxDB and check whether the
+event condition is true.
+
+* **aggregation_method** - the function to use when querying InfluxDB
+
+* **resource_type** - provides context for the given event - key-value pairs for the global tags of the 
+CLMC Information Model
+
+* **comparison_operator** - the logical operator to use for comparison - less than, greater than, less than or erual to, etc.
+
+* **implementation** - a list of the URLs of alert handlers to which alert data is sent when the event condition is true.
--- a/docs/NotificationAPI-proposal.md
+++ b/docs/NotificationAPI-proposal.md
-<!--
-// © University of Southampton IT Innovation Centre, 2018
-//
-// Copyright in this software belongs to University of Southampton
-// IT Innovation Centre of Gamma House, Enterprise Road, 
-// Chilworth Science Park, Southampton, SO16 7NS, UK.
-//
-// This software may not be used, sold, licensed, transferred, copied
-// or reproduced in whole or in part in any manner or form or in or
-// on any media by any person other than in accordance with the terms
-// of the Licence Agreement supplied with the software, or otherwise
-// without the prior written consent of the copyright owners.
-//
-// This software is distributed WITHOUT ANY WARRANTY, without even the
-// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
-// PURPOSE, except where stated in the Licence Agreement supplied with
-// the software.
-//
-//      Created By :            Nikolay Stanchev
-//      Created Date :          09-08-2018
-//      Created for Project :   FLAME
-->
-
-# **FLAME - Integration of alerts, topics and handlers**
-
-#### **Authors**
-
-|Authors|Organisation|                    
-|:---:|:---:|  
-|[Nikolay Stanchev](mailto:ns17@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
-
-
-#### Description
-
-This document outlines an internal proposal for the implementation of a CLMC Notification API - in relation to Kapacitor's alerts,
-topics and handlers.
-
-#### Terminology
-
-1) Alert (a.k.a Task) - some work for Kapacitor to do periodically over time. Essentially, this is a query for Kapacitor to
-execute and check the result of. If the result matches a given condition, an alert has to be fired.
-
-2) Handler (a.k.a Event Handler or Alert Handler) - a software, which is responsible for handling triggered alerts. Currently,
-we are only considering the HTTP Post handler - a HTTP server (or simply a socket) that listens on a given url for POST
-messages.
-
-3) Topic (a.k.a. Event Name) - a namespace to which an alert publishes data and from which a handler subscribes for alert data. Topics are used
-to decouple Alerts from Handlers and are created on demand - an alert that publishes to non-existing topic will cause Kapacitor
-to automatically create the topic and a handler which subscribes to a non-existing topic will cause Kapacitor to automatically
-create the topic.
-
-#### Proposal
-
-After doing some extensive analysis on Kapacitor, I suggest that for managing **alerts**, we use task templates with
-placeholders for MSP-specific values. This is feasible because as we found out from the CLMC infomation model analysis
-everything apart from the ***ipendpoint*** identifier is already described in TOSCA by a MSP. Here is an example of what a simple
-task template might look like:
-
-```tickscript
-// Alert template ID - threshold_exceeded
-
-var db string
-
-var rp = 'autogen'  // default value for the retention policy
-
-var measurement string
-
-var field string
-
-var whereCondition = 'TRUE'  // default value is TRUE, hence no filtering of the query result
-
-var messageValue = 'TRUE'  // default value is TRUE, as this is what SFEMC expects as a notification for an event rule
-
-var criticalValue float
-
-var alertPeriod = 60s  // this value is read from TOSCA and is measured in seconds, default value is 60 seconds
-
-var topicID string
-
-batch
-    |query('SELECT mean(' + field + ') AS mean_value FROM "' + db + '"."' + rp + '"."' + measurement + '" WHERE ' + whereCondition)
-        .period(alertPeriod)
-        .every(alertPeriod)
-    |alert()
-        .crit(lambda: "mean_value" >= criticalValue)
-        .message(messageValue)
-        .topic(topicID)
-```
-
-And here is an example of what a configuration for the template above might look like:
-
-```json
-{
-  "db": {"type": "string", "value": "CLMCMetrics"},
-  "rp": {"type": "string", "value": "autogen"},
-  "measurement": {"type": "string", "value": "storage_sf_measurement"},
-  "field": {"type": "string", "value": "service_delay"},
-  "criticalValue": {"type": "float", "value": 10.0},
-  "topicID": {"type": "string", "value": "storage_sf_delay_exceeded"},
-  "whereCondition": {"type": "string", "value": "sf_package='storage' and sf='storage-users'"}
-}
-```
-
-Alerts configurations are received and managed by the CLMC service with Kapacitor on the background. Therefore, the CLMC
-service must provide an API endpoint for receiving this data, e.g. /alerts/configuration. As a starting point we might want to
-provide templates for the three types that are given when building alerts in Chronograf - *threshold*, *relative* and *deadman*. 
-These would then be adjusted/extended based on common use cases from the experiments. An example request to configure alerts might
-look like this:
-
-**HTTP POST Request**  
-**Request URL** - ***http://clmc.flame.eu/alerts/configuration***  
-**Request Body** - must contain a list of configuration objects (similar to the one defined above) along with the type of
-template to use - 
-```json
-[
-  {
-    "type": "threshold_exceeded",
-    "configuration": {
-      "db": {"type": "string", "value": "CLMCMetrics"},
-      "rp": {"type": "string", "value": "autogen"},
-      "measurement": {"type": "string", "value": "storage_sf_measurement"},
-      "field": {"type": "string", "value": "service_delay"},
-      "criticalValue": {"type": "float", "value": 10.0},
-      "topicID": {"type": "string", "value": "storage_sf_delay_exceeded"},
-      "whereCondition": {"type": "string", "value": "sf_package='storage' and sf='storage-users'"}
-    }
-  }
-]
-```
-
-The alerts configuration must be sent along with the TOSCA template so that a *topic* validation is made of whether there
-is an alert publishing data to the TOSCA-specific events and to also read the period value for each event (worth arguing on this).
-
-For handling alerts defined with the aforementioned methodology, following from discussion with IDE and what they prefer,
-we would automatically subscribe a well-known SFEMC endpoint handler, which will receive HTTP POST messages when alerts trigger,
-to every TOSCA-specific notification event.
-
-In case we decide to also allow MSP to provide MSP-specific alert handlers, the CLMC service will expose an 
-additional API endpoint - /alerts/configuration/\<topicID\>/handlers which will allow a MSP to subscribe with a
-webhook handler to the given topic in a situation where the MSP wishes to receive alert notifications, too.
-
-#### Sequence diagram of the interactions between SFEMC, CLMC and MSP
-
-![sequence_diagram](image/CLMC-Notifications-SequenceD-v4.png)
\ No newline at end of file
--- a/docs/image/CLMC-Notifications-SequenceD-v4.png
+++ b/docs/image/CLMC-Notifications-SequenceD-v4.png