diff --git a/docs/AlertsConfiguration.md b/docs/AlertsConfiguration.md new file mode 100644 index 0000000000000000000000000000000000000000..e6cb341af80a601619481163b9f6f616845ba866 --- /dev/null +++ b/docs/AlertsConfiguration.md @@ -0,0 +1,150 @@ +<!-- +// © University of Southampton IT Innovation Centre, 2018 +// +// Copyright in this software belongs to University of Southampton +// IT Innovation Centre of Gamma House, Enterprise Road, +// Chilworth Science Park, Southampton, SO16 7NS, UK. +// +// This software may not be used, sold, licensed, transferred, copied +// or reproduced in whole or in part in any manner or form or in or +// on any media by any person other than in accordance with the terms +// of the Licence Agreement supplied with the software, or otherwise +// without the prior written consent of the copyright owners. +// +// This software is distributed WITHOUT ANY WARRANTY, without even the +// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR +// PURPOSE, except where stated in the Licence Agreement supplied with +// the software. +// +// Created By : Nikolay Stanchev +// Created Date : 15-08-2018 +// Created for Project : FLAME +--> + +# **FLAME - Integration of alerts, topics and handlers** + +#### **Authors** + +|Authors|Organisation| +|:---:|:---:| +|[Nikolay Stanchev](mailto:ns17@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)| + + +#### Description + +This document outlines the configuration of alerts within CLMC. Alerts are configured through a YAML-based +TOSCA-compliant document. This document is passed to the CLMC service, which parses and validates the document. Subsequently, the CLMC service +creates and activates the alerts within Kapacitor, then registers the HTTP alert handlers specified in the document. + + +#### TOSCA Alerts Configuration Document + +The TOSCA Alerts Configuration Document consists of two main sections - **metadata** and **triggers**. Full definitions and +clarification of the structure of the document is given in the following sections. An example of an alert configuration +document will look like: + +```yaml +metadata: + sfc: companyA-VR + sfci: companyA-VR-premium +triggers: + high_latency: + description: This event triggers when the mean network latency in a given location exceeds a given threshold (in ms). + event_type: threshold + metric: network.latency + condition: + threshold: 45 + granularity: 120 + aggregation_method: mean + resource_type: + location: watershed + comparison_operator: gt + action: + implementation: + - http://sfemc.flame.eu/notify + - http://companyA.alert-handler.flame.eu/high-latency + low_requests: + description: | + This event triggers when the last reported number of requests for a given service function + falls behind a given threshold. + event_type: threshold + metric: storage.requests + condition: + threshold: 5 + granularity: 60 + aggregation_method: last + resource_type: + sf_package: storage + sf: storage-users + location: watershed + comparison_operator: lt + action: + implementation: + - http://sfemc.flame.eu/notify + - http://companyA.alert-handler.flame.eu/low-requests +``` + + +##### Metadata + +The ***metadata*** section specifies the service function chain ID and the service function chain instance ID, for which this +alerts configuration relates to. The format is the following: + +```yaml +metadata: + sfc: <sfc_id> + sfci: <sfc_i_id> +``` + +##### Triggers + +The ***triggers*** section defines a sequence of trigger-type nodes, each representing a fully qualified configuration for an +alert within CLMC. The format is the following: + +```yaml +triggers: + <event identifier>: + description: <optional description for the given event trigger> + event_type: <threshold | relative | deadman> + metric: <measurement>.<field> + condition: + threshold: <critical value> + granularity: <period in seconds - how often to check whether the event condition is true> + aggregation_method: <aggregation function supported by InfluxDB - e.g. 'mean'> + resource_type: + <CLMC Information Model Tag Name>: <CLMC Information Model Tag Value> + <CLMC Information Model Tag Name>: <CLMC Information Model Tag Value> + ... + comparison_operator: <logical operator to use for comparison, e.g. 'gt', 'lt' + action: + implementation: + - <HTTP Alert Handler URL> + - <HTTP Alert Handler URL> + ... + ... +``` + +##### Definitions + +* **event_identifier** - the name of the event that **MUST** match with the *constraint* event name referenced in the TOSCA resource +specification document submitted to the FLAME Orchestrator. + +* **event_type** - the type of TICK Script template to use to create the alert - more information will be provided about +the different options here, but we assume the most common one will be **threshold**. + +* **metric** - the metric to query in InfluxDB, must include measurement name and field name in +format `<measurement>`.`<field>` + +* **threshold** - when using the **threshold** event type, this is the critical value the actual metric is compared to. + +* **granularity** - the period in seconds, which instructs Kapacitor how often to query InfluxDB and check whether the +event condition is true. + +* **aggregation_method** - the function to use when querying InfluxDB + +* **resource_type** - provides context for the given event - key-value pairs for the global tags of the +CLMC Information Model + +* **comparison_operator** - the logical operator to use for comparison - less than, greater than, less than or erual to, etc. + +* **implementation** - a list of the URLs of alert handlers to which alert data is sent when the event condition is true. diff --git a/docs/NotificationAPI-proposal.md b/docs/NotificationAPI-proposal.md deleted file mode 100644 index c3c908dc31f66bf0a076059103f93cdb18adeab9..0000000000000000000000000000000000000000 --- a/docs/NotificationAPI-proposal.md +++ /dev/null @@ -1,144 +0,0 @@ -<!-- -// © University of Southampton IT Innovation Centre, 2018 -// -// Copyright in this software belongs to University of Southampton -// IT Innovation Centre of Gamma House, Enterprise Road, -// Chilworth Science Park, Southampton, SO16 7NS, UK. -// -// This software may not be used, sold, licensed, transferred, copied -// or reproduced in whole or in part in any manner or form or in or -// on any media by any person other than in accordance with the terms -// of the Licence Agreement supplied with the software, or otherwise -// without the prior written consent of the copyright owners. -// -// This software is distributed WITHOUT ANY WARRANTY, without even the -// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR -// PURPOSE, except where stated in the Licence Agreement supplied with -// the software. -// -// Created By : Nikolay Stanchev -// Created Date : 09-08-2018 -// Created for Project : FLAME ---> - -# **FLAME - Integration of alerts, topics and handlers** - -#### **Authors** - -|Authors|Organisation| -|:---:|:---:| -|[Nikolay Stanchev](mailto:ns17@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)| - - -#### Description - -This document outlines an internal proposal for the implementation of a CLMC Notification API - in relation to Kapacitor's alerts, -topics and handlers. - -#### Terminology - -1) Alert (a.k.a Task) - some work for Kapacitor to do periodically over time. Essentially, this is a query for Kapacitor to -execute and check the result of. If the result matches a given condition, an alert has to be fired. - -2) Handler (a.k.a Event Handler or Alert Handler) - a software, which is responsible for handling triggered alerts. Currently, -we are only considering the HTTP Post handler - a HTTP server (or simply a socket) that listens on a given url for POST -messages. - -3) Topic (a.k.a. Event Name) - a namespace to which an alert publishes data and from which a handler subscribes for alert data. Topics are used -to decouple Alerts from Handlers and are created on demand - an alert that publishes to non-existing topic will cause Kapacitor -to automatically create the topic and a handler which subscribes to a non-existing topic will cause Kapacitor to automatically -create the topic. - -#### Proposal - -After doing some extensive analysis on Kapacitor, I suggest that for managing **alerts**, we use task templates with -placeholders for MSP-specific values. This is feasible because as we found out from the CLMC infomation model analysis -everything apart from the ***ipendpoint*** identifier is already described in TOSCA by a MSP. Here is an example of what a simple -task template might look like: - -```tickscript -// Alert template ID - threshold_exceeded - -var db string - -var rp = 'autogen' // default value for the retention policy - -var measurement string - -var field string - -var whereCondition = 'TRUE' // default value is TRUE, hence no filtering of the query result - -var messageValue = 'TRUE' // default value is TRUE, as this is what SFEMC expects as a notification for an event rule - -var criticalValue float - -var alertPeriod = 60s // this value is read from TOSCA and is measured in seconds, default value is 60 seconds - -var topicID string - -batch - |query('SELECT mean(' + field + ') AS mean_value FROM "' + db + '"."' + rp + '"."' + measurement + '" WHERE ' + whereCondition) - .period(alertPeriod) - .every(alertPeriod) - |alert() - .crit(lambda: "mean_value" >= criticalValue) - .message(messageValue) - .topic(topicID) -``` - -And here is an example of what a configuration for the template above might look like: - -```json -{ - "db": {"type": "string", "value": "CLMCMetrics"}, - "rp": {"type": "string", "value": "autogen"}, - "measurement": {"type": "string", "value": "storage_sf_measurement"}, - "field": {"type": "string", "value": "service_delay"}, - "criticalValue": {"type": "float", "value": 10.0}, - "topicID": {"type": "string", "value": "storage_sf_delay_exceeded"}, - "whereCondition": {"type": "string", "value": "sf_package='storage' and sf='storage-users'"} -} -``` - -Alerts configurations are received and managed by the CLMC service with Kapacitor on the background. Therefore, the CLMC -service must provide an API endpoint for receiving this data, e.g. /alerts/configuration. As a starting point we might want to -provide templates for the three types that are given when building alerts in Chronograf - *threshold*, *relative* and *deadman*. -These would then be adjusted/extended based on common use cases from the experiments. An example request to configure alerts might -look like this: - -**HTTP POST Request** -**Request URL** - ***http://clmc.flame.eu/alerts/configuration*** -**Request Body** - must contain a list of configuration objects (similar to the one defined above) along with the type of -template to use - -```json -[ - { - "type": "threshold_exceeded", - "configuration": { - "db": {"type": "string", "value": "CLMCMetrics"}, - "rp": {"type": "string", "value": "autogen"}, - "measurement": {"type": "string", "value": "storage_sf_measurement"}, - "field": {"type": "string", "value": "service_delay"}, - "criticalValue": {"type": "float", "value": 10.0}, - "topicID": {"type": "string", "value": "storage_sf_delay_exceeded"}, - "whereCondition": {"type": "string", "value": "sf_package='storage' and sf='storage-users'"} - } - } -] -``` - -The alerts configuration must be sent along with the TOSCA template so that a *topic* validation is made of whether there -is an alert publishing data to the TOSCA-specific events and to also read the period value for each event (worth arguing on this). - -For handling alerts defined with the aforementioned methodology, following from discussion with IDE and what they prefer, -we would automatically subscribe a well-known SFEMC endpoint handler, which will receive HTTP POST messages when alerts trigger, -to every TOSCA-specific notification event. - -In case we decide to also allow MSP to provide MSP-specific alert handlers, the CLMC service will expose an -additional API endpoint - /alerts/configuration/\<topicID\>/handlers which will allow a MSP to subscribe with a -webhook handler to the given topic in a situation where the MSP wishes to receive alert notifications, too. - -#### Sequence diagram of the interactions between SFEMC, CLMC and MSP - - \ No newline at end of file diff --git a/docs/image/CLMC-Notifications-SequenceD-v4.png b/docs/image/CLMC-Notifications-SequenceD-v4.png deleted file mode 100644 index 686e2e9cc4fc76432543acb0aa1f53faae2a71a6..0000000000000000000000000000000000000000 Binary files a/docs/image/CLMC-Notifications-SequenceD-v4.png and /dev/null differ