Extends user documentation for the end to end delay

5300765f · Nikolay Stanchev · Stephen Phillips · f7413af8 · 5300765f
Commit 5300765f authored 6 years ago by Nikolay Stanchev Committed by Stephen Phillips 6 years ago
--- a/docs/graph-monitoring-user-guide.md
+++ b/docs/graph-monitoring-user-guide.md
@@ -73,9 +73,9 @@ There is a dedicated endpoint which starts an automated graph monitoring script,
 constantly executing a full processing pipeline - build temporal graph, query for end-to-end delay, write results bach in InfluxDB, delete
 temporal graph. The pipeline uses the defined configuration to periodically build the temporal graph and query for the end-to-end delay
 from all possible UEs to every deployed service function endpoint and writes the result back into a dedicated measurement in the time-series database (InfluxDB).
-For more information on the graph monitoring pipeline, see the [graph RTT slides](https://owncloud.it-innovation.soton.ac.uk/remote.php/webdav/Shared/FLAME/Project%20Reviews/2nd%20EC%20Review%20(technical)/drafts/WP4_FLAME_Graph_RTT.pptx).
+For more information on the graph monitoring pipeline, see the relevant section below.

-* `POST http://<clmc-host>/clmc/clmc-service/graph/monitor`
+* `POST http://platform/clmc/clmc-service/graph/monitor`

 * Expected JSON body serving as the configuration of the graph monitoring script:

@@ -89,7 +89,7 @@ For more information on the graph monitoring pipeline, see the [graph RTT slides
    "<service function package>": {
      "response_time_field": "<field measuring the service delay of a service function - as described above>",
      "request_size_field": "<field measuring the request size of a service function - as described above>",
-      "response_size_field": "<field measuring the response size of a service function - as descirbed above>",
+      "response_size_field": "<field measuring the response size of a service function - as described above>",
      "measurement_name": "<the name of the measurement which contains the fields above>"
    },
    ...
@@ -99,7 +99,7 @@ For more information on the graph monitoring pipeline, see the [graph RTT slides

 * Example request with curl:

-`curl -X POST -d <JSON body> http://<clmc-host>/clmc/clmc-service/graph/monitor`
+`curl -X POST -d <JSON body> http://platform/clmc/clmc-service/graph/monitor`

 * Example JSON body for the tomcat-based service described above:

@@ -133,11 +133,11 @@ The configuration described above will start a graph monitoring process executin
 in the measurement named **graph_measurements**, database **fms-sfc**. To stop the graph monitoring process, use the request ID received in 
 the response of the previous request:

-`curl -X DELETE http://<clmc-host>/clmc/clmc-service/graph/monitor/75df6f8d-3829-4fd8-a3e6-b3e917010141` 
+`curl -X DELETE http://platform/clmc/clmc-service/graph/monitor/75df6f8d-3829-4fd8-a3e6-b3e917010141` 

 To view the status of the graph monitoring process, send the same request, but using a GET method rather than DELETE.

-`curl -X GET http://<clmc-host>/clmc/clmc-service/graph/monitor/75df6f8d-3829-4fd8-a3e6-b3e917010141` 
+`curl -X GET http://platform/clmc/clmc-service/graph/monitor/75df6f8d-3829-4fd8-a3e6-b3e917010141` 

 Keep in mind that since this process is executing once in a given period, it is expected to see status **sleeping** in the response.
 Example response:
@@ -148,3 +148,94 @@ Example response:
  "msg": "Successfully fetched status of graph pipeline process."
 }
 ```
+
+### Graph monitoring pipeline - technical details
+
+In order for service graph-based monitoring to be possible, the network topology graph must be built with the relevant network link latencies.
+This network graph can be created/updated/deleted by sending a POST/PUT/DELETE request to the **/clmc/clmc-service/graph/network** API endpoint:
+
+```
+curl –X POST http://platform/clmc/clmc-service/graph/network
+curl –X PUT http://platform/clmc/clmc-service/graph/network
+curl –X DELETE http://platform/clmc/clmc-service/graph/network
+```
+
+After the network graph is built, a graph monitoring process can execute the following steps:
+
+1) Build a temporal graph for a particular service function chain
+2) Query the temporal graph for round-trip-time
+3) Write results in the time-series database (InfluxDB)
+4) Clean up and delete the temporal graph
+
+
+#### Building a temporal graph
+
+The temporal graph could be built by sending a POST request to the **/clmc/clmc-service/graph/temporal** API endpoint. The request body
+follows the same format as the one used to start an automated graph monitoring script described above with the only difference being that the
+**from** and **to** timestamps must be specified thus defining the time window for which this temporal graph relates to - for example:
+
+```json
+{
+  "from": "<start of the time window, UNIX timestamp, e.g. 1549881060>",
+  "to": "<end of the time window, UNIX timestamp, e.g. 1550151600>",
+  "service_function_chain": "<SFC identifier>",
+  "service_function_chain_instance": "<SFC identifier>_1",
+  "service_functions": {
+    "<service function package>": {
+      "response_time_field": "<field measuring the service delay of a service function - as described above>",
+      "request_size_field": "<field measuring the request size of a service function - as described above>",
+      "response_size_field": "<field measuring the response size of a service function - as described above>",
+      "measurement_name": "<the name of the measurement which contains the fields above>"
+    },
+    ...
+  }
+}
+```
+
+`curl -X POST -d <JSON body> http://platform/clmc/clmc-service/graph/temporal`
+
+The CLMC would then build the temporal graph in its graph database (Neo4j) and populate it with the time-series data valid for the defined time window.
+
+
+#### Querying the temporal graph
+
+The temporal graph built in the previous step can be used to retrieve the end-to-end delay by sending a GET request to the 
+**/clmc/clmc-service/graph/temporal/{uuid}/round-trip-time?starpoint={ue, cluster or switch}&endpoint={service function endpoint}**.
+This endpoint requires the UUID of the temporal graph received in the response from the previous step, as well as a UE and service function endpoint identifiers.
+The query is, thus, configured to return the end-to-end delay from a particular UE (User Equipment) to a particular service endpoint deployed on the FLAME platform.
+For example:
+
+`curl -X GET http://platform/clmc/clmc-service/graph/temporal/ac2cd21c-9c36-44ea-a923-51ca3f72bf7a/round-trip-time?startpoint=ue20&endpoint=fms-storage-endpoint`
+
+The automated graph monitoring process (described in the previous sections) executes this query for every possible pair of a UE and a service function endpoint to
+ensure that all metrics are collected.
+
+
+#### Writing results in InfluxDB
+
+The response of the previous requests will contain metrics such as round-trip-time, network delay and service delay. These are returned in JSON format
+which must then be converted to the InfluxDB line protocol format. An example would look like:
+
+```
+graph_measurement,flame_server=DC3,flame_sfci=fms-sfc-1,flame_location=DC3,flame_sfe=fms-storage-second-endpoint,flame_sfp=fms-storage,flame_sfc=fms-sfc,flame_sf=fms-storage-ns,traffic_source=ue24 round_trip_time=0.029501264137931037,service_delay=0.0195,network_delay=0.005 1550499460000000000
+```
+
+This measurement line could then be reported to InfluxDB with a POST request to **/clmc/influxdb/write?db={SFC identifier}**:
+
+`curl -X POST http://platform/clmc/influxdb/write?db=fms-sfc --data-binary <measurement line>`
+
+
+#### Clean up
+
+Once the temporal graph is no longer used, or the time window it relates to is no longer viable, it can be deleted with a DELETE
+request to **/clmc/clmc-service/graph/temporal/{uuid}**. The UUID parameter is the same as in the round-trip time query request,
+i.e. the UUID received when building the temporal graph. For example:
+
+`curl -X DELETE http://platform/clmc/clmc-service/graph/temporal/ac2cd21c-9c36-44ea-a923-51ca3f72bf7a`
+
+
+#### Summary
+
+The graph monitoring process described in the beginning of this document automates the steps described above. When defining a query period, e.g. 30 seconds,
+the process will start executing the pipeline every 30 seconds, by defining a non-overlapping, contiguous time windows. For each time window, a temporal graph is built,
+then queried for end-to-end delay and finally deleted.
\ No newline at end of file