Skip to content
Snippets Groups Projects
Commit f7413af8 authored by Stephen Phillips's avatar Stephen Phillips
Browse files

Merge branch '103-add-user-documentation-for-the-end-to-end-delay' into 'integration'

Resolve "Add user documentation for the end to end delay"

Closes #103

See merge request FLAME/consortium/3rdparties/flame-clmc!71
parents 22e27140 09f4aaf2
No related branches found
No related tags found
No related merge requests found
<!--
// © University of Southampton IT Innovation Centre, 2018
//
// Copyright in this software belongs to University of Southampton
// IT Innovation Centre of Gamma House, Enterprise Road,
// Chilworth Science Park, Southampton, SO16 7NS, UK.
//
// This software may not be used, sold, licensed, transferred, copied
// or reproduced in whole or in part in any manner or form or in or
// on any media by any person other than in accordance with the terms
// of the Licence Agreement supplied with the software, or otherwise
// without the prior written consent of the copyright owners.
//
// This software is distributed WITHOUT ANY WARRANTY, without even the
// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
// PURPOSE, except where stated in the Licence Agreement supplied with
// the software.
//
// Created By : Nikolay Stanchev
// Created Date : 17-05-2019
// Created for Project : FLAME
-->
## CLMC - Graph-based measurements of service end-to-end delay
### Input requirements
CLMC offers API endpoints to build and query a layer-based graph data structure starting from the infrastructure network layer
up to the logical abstraction layer of a media service. This graph can then be further used to measure an aggregation of the end-to-end delay
from a particular user equipment to a given service function endpoint without putting additional load on the deployed services. For detailed analysis
on the calculations performed by CLMC to derive this metric see the [documentation](https://gitlab.it-innovation.soton.ac.uk/FLAME/consortium/3rdparties/flame-clmc/blob/master/docs/total-service-request-delay.md)
particularly the [conclusions](https://gitlab.it-innovation.soton.ac.uk/FLAME/consortium/3rdparties/flame-clmc/blob/master/docs/total-service-request-delay.md#conclusion) section.
In order to use the API, three metrics must first be measured for each service function:
* **response_time** – how much time it takes for a service to process a request (seconds)
* **request_size** – the size of incoming requests for this service (bytes)
* **response_size** – the size of outgoing responses from this service (bytes)
An example is a Tomcat-based service, which uses the Tomcat telegraf input plugin for monitoring – the plugin measures the following fields
**bytes_sent**, **bytes_received** and **processing_time**. The measurement name is **tomcat_connector**.
* **processing_time** is the total time spent processing incoming requests measured since the server has started, therefore,
this is a constantly increasing value.
* **bytes_sent** and **bytes_received** measured using the same approach
The graph monitoring process runs every X seconds, where X is configurable (e.g. 30 seconds). The media service provider
must define how to get the aggregated value of the three fields defined above for this X-seconds window. For example,
if the media service provider decides to use **mean** values, the following definitions can be used for a Tomcat-based service:
* **response_time** - `(max(processing_time) - min(processing_time)) / ((count(processing_time) -1)*1000)`
* **request_size** - `(max(bytes_received) - min(bytes_received)) / (count(bytes_received) - 1)`
* **response_size** - `(max(bytes_sent) - min(bytes_sent)) / (count(bytes_sent) - 1)`
Simply explained, since the Tomcat plugin measures these values as a continuously increasing measurement, we take difference between
the maximum and the minimum value received in the time window and divide by the number of measurements received for the time window, which
basically gives us the average (response time also divided by 1000 to convert milliseconds to seconds).
To demonstrate this, let's say that the measurements received in the time window for **processing_time** are 21439394, 21439399 and 21439406 milliseconds.
Therefore, the average processing time would be (21439406 - 21439394) / ((3 - 1) * 1000) = 0.006 seconds. The same procedure is followed
for the request size and response size fields.
### Running a graph monitoring process
There is a dedicated endpoint which starts an automated graph monitoring script, running in the background on CLMC,
constantly executing a full processing pipeline - build temporal graph, query for end-to-end delay, write results bach in InfluxDB, delete
temporal graph. The pipeline uses the defined configuration to periodically build the temporal graph and query for the end-to-end delay
from all possible UEs to every deployed service function endpoint and writes the result back into a dedicated measurement in the time-series database (InfluxDB).
For more information on the graph monitoring pipeline, see the [graph RTT slides](https://owncloud.it-innovation.soton.ac.uk/remote.php/webdav/Shared/FLAME/Project%20Reviews/2nd%20EC%20Review%20(technical)/drafts/WP4_FLAME_Graph_RTT.pptx).
* `POST http://<clmc-host>/clmc/clmc-service/graph/monitor`
* Expected JSON body serving as the configuration of the graph monitoring script:
```json
{
"query_period": "<how often is the graph pipeline executed - defines the length of the time window mentioned above>",
"results_measurement_name": "<where to write the end-to-end delay measurements>",
"service_function_chain": "<SFC identifier>",
"service_function_chain_instance": "<SFC identifier>_1",
"service_functions": {
"<service function package>": {
"response_time_field": "<field measuring the service delay of a service function - as described above>",
"request_size_field": "<field measuring the request size of a service function - as described above>",
"response_size_field": "<field measuring the response size of a service function - as descirbed above>",
"measurement_name": "<the name of the measurement which contains the fields above>"
},
...
}
}
```
* Example request with curl:
`curl -X POST -d <JSON body> http://<clmc-host>/clmc/clmc-service/graph/monitor`
* Example JSON body for the tomcat-based service described above:
```json
{
"query_period": 30,
"results_measurement_name": "graph_measurements",
"service_function_chain": "fms-sfc",
"service_function_chain_instance": "fms-sfc_1",
"service_functions": {
"fms-storage": {
"response_time_field": "(max(processing_time) - min(processing_time)) / ((count(processing_time) -1)*1000)",
"request_size_field": "(max(bytes_received) - min(bytes_received)) / (count(bytes_received) - 1)",
"response_size_field": "(max(bytes_sent) - min(bytes_sent)) / (count(bytes_sent) - 1)",
"measurement_name": "tomcat_connector"
}
}
}
```
An example response will look like this:
```json
{
"uuid": "75df6f8d-3829-4fd8-a3e6-b3e917010141",
"database": "fms-sfc"
}
```
The configuration described above will start a graph monitoring process executing every 30 seconds and writing the end-to-end delay results
in the measurement named **graph_measurements**, database **fms-sfc**. To stop the graph monitoring process, use the request ID received in
the response of the previous request:
`curl -X DELETE http://<clmc-host>/clmc/clmc-service/graph/monitor/75df6f8d-3829-4fd8-a3e6-b3e917010141`
To view the status of the graph monitoring process, send the same request, but using a GET method rather than DELETE.
`curl -X GET http://<clmc-host>/clmc/clmc-service/graph/monitor/75df6f8d-3829-4fd8-a3e6-b3e917010141`
Keep in mind that since this process is executing once in a given period, it is expected to see status **sleeping** in the response.
Example response:
```json
{
"status": "sleeping",
"msg": "Successfully fetched status of graph pipeline process."
}
```
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment