Merge branch '103-add-user-documentation-for-the-end-to-end-delay' into 'integration'

Resolve "Add user documentation for the end to end delay" Closes #103 See merge request FLAME/consortium/3rdparties/flame-clmc!71

Merge branch '103-add-user-documentation-for-the-end-to-end-delay' into 'integration'
f7413af8 · Stephen Phillips · 22e27140 · 09f4aaf2 · f7413af8
Commit f7413af8 authored 6 years ago by Stephen Phillips
--- a/docs/graph-monitoring-user-guide.md
+++ b/docs/graph-monitoring-user-guide.md
+<!--
+// © University of Southampton IT Innovation Centre, 2018
+//
+// Copyright in this software belongs to University of Southampton
+// IT Innovation Centre of Gamma House, Enterprise Road, 
+// Chilworth Science Park, Southampton, SO16 7NS, UK.
+//
+// This software may not be used, sold, licensed, transferred, copied
+// or reproduced in whole or in part in any manner or form or in or
+// on any media by any person other than in accordance with the terms
+// of the Licence Agreement supplied with the software, or otherwise
+// without the prior written consent of the copyright owners.
+//
+// This software is distributed WITHOUT ANY WARRANTY, without even the
+// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+// PURPOSE, except where stated in the Licence Agreement supplied with
+// the software.
+//
+//      Created By :            Nikolay Stanchev
+//      Created Date :          17-05-2019
+//      Created for Project :   FLAME
+-->
+
+
+## CLMC - Graph-based measurements of service end-to-end delay
+
+
+### Input requirements
+
+CLMC offers API endpoints to build and query a layer-based graph data structure starting from the infrastructure network layer
+up to the logical abstraction layer of a media service. This graph can then be further used to measure an aggregation of the end-to-end delay
+from a particular user equipment to a given service function endpoint without putting additional load on the deployed services. For detailed analysis
+on the calculations performed by CLMC to derive this metric see the [documentation](https://gitlab.it-innovation.soton.ac.uk/FLAME/consortium/3rdparties/flame-clmc/blob/master/docs/total-service-request-delay.md)
+particularly the [conclusions](https://gitlab.it-innovation.soton.ac.uk/FLAME/consortium/3rdparties/flame-clmc/blob/master/docs/total-service-request-delay.md#conclusion) section.
+In order to use the API, three metrics must first be measured for each service function:
+
+* **response_time** – how much time it takes for a service to process a request (seconds)
+
+* **request_size** – the size of incoming requests for this service (bytes)
+
+* **response_size** – the size of outgoing responses from this service (bytes)
+
+An example is a Tomcat-based service, which uses the Tomcat telegraf input plugin for monitoring – the plugin measures the following fields
+**bytes_sent**, **bytes_received** and **processing_time**. The measurement name is **tomcat_connector**.
+
+* **processing_time** is the total time spent processing incoming requests measured since the server has started, therefore,
+this is a constantly increasing value.
+
+* **bytes_sent** and **bytes_received** measured using the same approach
+
+The graph monitoring process runs every X seconds, where X is configurable (e.g. 30 seconds). The media service provider
+must define how to get the aggregated value of the three fields defined above for this X-seconds window. For example, 
+if the media service provider decides to use **mean** values, the following definitions can be used for a Tomcat-based service:
+
+* **response_time** - `(max(processing_time) - min(processing_time)) / ((count(processing_time) -1)*1000)`
+
+* **request_size** - `(max(bytes_received) - min(bytes_received)) / (count(bytes_received) - 1)`
+
+* **response_size** - `(max(bytes_sent) - min(bytes_sent)) / (count(bytes_sent) - 1)`
+
+Simply explained, since the Tomcat plugin measures these values as a continuously increasing measurement, we take difference between
+the maximum and the minimum value received in the time window and divide by the number of measurements received for the time window, which
+basically gives us the average (response time also divided by 1000 to convert milliseconds to seconds).
+
+To demonstrate this, let's say that the measurements received in the time window for **processing_time** are 21439394, 21439399 and 21439406 milliseconds.
+Therefore, the average processing time would be (21439406 - 21439394) / ((3 - 1) * 1000) = 0.006 seconds. The same procedure is followed
+for the request size and response size fields.
+
+
+### Running a graph monitoring process
+
+There is a dedicated endpoint which starts an automated graph monitoring script, running in the background on CLMC,
+constantly executing a full processing pipeline - build temporal graph, query for end-to-end delay, write results bach in InfluxDB, delete
+temporal graph. The pipeline uses the defined configuration to periodically build the temporal graph and query for the end-to-end delay
+from all possible UEs to every deployed service function endpoint and writes the result back into a dedicated measurement in the time-series database (InfluxDB).
+For more information on the graph monitoring pipeline, see the [graph RTT slides](https://owncloud.it-innovation.soton.ac.uk/remote.php/webdav/Shared/FLAME/Project%20Reviews/2nd%20EC%20Review%20(technical)/drafts/WP4_FLAME_Graph_RTT.pptx).
+
+* `POST http://<clmc-host>/clmc/clmc-service/graph/monitor`
+
+* Expected JSON body serving as the configuration of the graph monitoring script:
+
+```json
+{
+  "query_period": "<how often is the graph pipeline executed - defines the length of the time window mentioned above>",
+  "results_measurement_name": "<where to write the end-to-end delay measurements>",
+  "service_function_chain": "<SFC identifier>",
+  "service_function_chain_instance": "<SFC identifier>_1",
+  "service_functions": {
+    "<service function package>": {
+      "response_time_field": "<field measuring the service delay of a service function - as described above>",
+      "request_size_field": "<field measuring the request size of a service function - as described above>",
+      "response_size_field": "<field measuring the response size of a service function - as descirbed above>",
+      "measurement_name": "<the name of the measurement which contains the fields above>"
+    },
+    ...
+  }
+}
+```
+
+* Example request with curl:
+
+`curl -X POST -d <JSON body> http://<clmc-host>/clmc/clmc-service/graph/monitor`
+
+* Example JSON body for the tomcat-based service described above:
+
+```json
+{
+  "query_period": 30,
+  "results_measurement_name": "graph_measurements",
+  "service_function_chain": "fms-sfc",
+  "service_function_chain_instance": "fms-sfc_1",
+  "service_functions": {
+    "fms-storage": {
+      "response_time_field": "(max(processing_time) - min(processing_time)) / ((count(processing_time) -1)*1000)",
+      "request_size_field": "(max(bytes_received) - min(bytes_received)) / (count(bytes_received) - 1)",
+      "response_size_field": "(max(bytes_sent) - min(bytes_sent)) / (count(bytes_sent) - 1)",
+      "measurement_name": "tomcat_connector"
+    }
+  }
+}
+```
+
+An example response will look like this:
+
+```json
+{
+  "uuid": "75df6f8d-3829-4fd8-a3e6-b3e917010141",
+  "database": "fms-sfc"
+}
+```
+
+The configuration described above will start a graph monitoring process executing every 30 seconds and writing the end-to-end delay results
+in the measurement named **graph_measurements**, database **fms-sfc**. To stop the graph monitoring process, use the request ID received in 
+the response of the previous request:
+
+`curl -X DELETE http://<clmc-host>/clmc/clmc-service/graph/monitor/75df6f8d-3829-4fd8-a3e6-b3e917010141` 
+
+To view the status of the graph monitoring process, send the same request, but using a GET method rather than DELETE.
+
+`curl -X GET http://<clmc-host>/clmc/clmc-service/graph/monitor/75df6f8d-3829-4fd8-a3e6-b3e917010141` 
+
+Keep in mind that since this process is executing once in a given period, it is expected to see status **sleeping** in the response.
+Example response:
+
+```json
+{
+  "status": "sleeping",
+  "msg": "Successfully fetched status of graph pipeline process."
+}
+```
\ No newline at end of file