Skip to content
Snippets Groups Projects
Commit ae027d2a authored by Nikolay Stanchev's avatar Nikolay Stanchev
Browse files

Issue #68 - updated documentation's E2E measurement related sections

parent 51e7973c
No related branches found
No related tags found
No related merge requests found
......@@ -46,29 +46,45 @@ Readers of this document are assumed to have at least read the [CLMC information
### **Assumptions**
* Network measurement - assumption is that we have a measurement for the network link delays, called **network_delays**, providing the following information:
Here, we list the assumptions we make for measuring and understanding E2E performance of media components:
| path (tag) | delay | time |
| --- | --- | --- |
| path identifier | e2e delay for the given path | time of measurement |
* Network measurement - the assumption is that we have a measurement for the network path delays between service function routers, called **network_delays**, providing the following information:
Here, the **path** tag value is the identifier of the path between two nodes in the network topology obtained from FLIPS. The assumption is that those identifiers
will be structured in such a way that we can obtain the source and target endpoint IDs from the path identifier itself. For example:
**endpoint1.ms-A.ict-flame.eu---endpoint2.ms-A.ict-flame.eu**
We can easily split the string on **'---'** and, thus, find the source endpoint is **endpoint1.ms-A.ict-flame.eu**, while the target endpoint is
**endpoint2.ms-A.ict-flame.eu**.
The delay field value is the network end-to-end delay in milliseconds for the path identified in the tag value.
| path (tag) | source (tag) | target (tag) | delay | time |
| --- | --- | --- | --- | --- |
| path identifier | source SFR | target SFR | e2e delay for the given path (ms) | timestamp of measurement |
* A response will traverse the same network path as the request, but in reverse direction.
Here, the **path** tag value is the identifier of the path between two nodes (service routers) in the network topology obtained from FLIPS.
The **source** tag value is the source service router for the identified path, while the **target** tag value is the target service router.
The delay field value is the network end-to-end delay in milliseconds that a packet would experience when traversing the path between the two SFRs identified in the tag values.
* Media service measurement - assumption is that we have a measurement for media services' response time, called **service_delays**, providing the following information:
An example row would be:
| FQDN (tag) | sf_instance (tag) | endpoint (tag) | response_time | time |
| path (tag) | source (tag) | target (tag) | delay | time |
| --- | --- | --- | --- | --- |
| SFR-A---S1---S2---S3---SFR-B | SFR-A | SFR-B | 10 | 1525334761282000 |
The semantics of the row is that a packet traversing the path from SFR-A (service router) through S1, S2, S3 (switches) to SFR-B (service router) will experience an averaged delay of 10ms.
* Request/Response path - the assumption is that a response will traverse the same network path as the request, but in reverse direction.
* Media service measurement - assumption is that we have a measurement for media service components' response time, called **service_delays**, providing the following information:
| FQDN (tag) | sf_instance (tag) | sfr (tag) | response_time | time |
| --- | --- | --- | --- | --- |
| media service FQDN | ID of the service function instance | SFR that connects the MC endpoint to the Flame network | response time for the media service (ms) | timestamp of measurement |
Here, the **FQDN**, **sf_instance** and **sfr** tag values identify a unique response time measurement.
The response time field value is the response time (measured in milliseconds) for the media service component only, and it does not take into account any of the network measurements.
An example row would be:
| FQDN (tag) | sf_instance (tag) | sfr (tag) | response_time | time |
| --- | --- | --- | --- | --- |
| media service FQDN | ID of the service function instance | endpoint identifier | response time for the media service (s) | time of measurement |
| ms-A.ict-flame.eu | ms-A-sf_INSTANCE | SFR-B | 27 | 1525334761282000 |
Here, the **FQDN**, **sf_instance** and **endpoint** tag values identify a unique response time measurement. The response time field value is the
response time (measured in seconds) for the media service only, and it does not take into account any of the network measurements.
The semantics of the row is that the response time for a service function instance with ID *ms-A-sf_INSTANCE* serving media service
*ms-A.ict-flame.eu* and connected to the FLAME network through service router *SFR-B* will have an averaged response time of 27 ms.
## E2E Model
......@@ -155,26 +171,25 @@ __TO DO__
### **Idea**
The idea is to aggregate platform measurement points with media service measurement points and obtain a third measurement from which we can easily
The idea is to aggregate network measurement points with media service measurement points and obtain a third measurement from which we can easily
understand both end-to-end and round-trip performance of a media service. This is achieved by having a python script running on the background and aggregating
the data from both measurements on a given sample period, e.g. every 10 seconds. The script then posts the aggregated data back to Influx in a new measurement.
the data from the two measurements on a given sample period, e.g. every 10 seconds. The script then posts the aggregated data back to Influx in a new measurement.
### **Goal**
The ultimate goal is to populate a new measurement, called **e2e_delays**, which will be provided with the following information:
| pathID_F (tag) | pathID_R (tag) | FQDN (tag) | sf_instance (tag) | D_path_F | D_path_R | D_service | time |
| --- | --- | --- | --- | --- | --- | --- | --- |
| path_ID (tag) | source_SFR (tag) | target_SFR (tag) | FQDN (tag) | sf_instance (tag) | delay_forward | delay_reverse | delay_service | time |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
* *pathID_F* - tag used to identify the path in forward direction, e.g. **endpoint1.ms-A.ict-flame.eu---endpoint2.ms-A.ict-flame.eu**
* *pathID_R* - tag used to identify the path in reverse direction, e.g. **endpoint2.ms-A.ict-flame.eu---endpoint1.ms-A.ict-flame.eu**
* *FQDN* - tag used to identify the media service
* *sf_instance* - tag used to identify the media service
* *D_path_F* - network delay for path in forward direction
* *D_path_R* - network delay for path in reverse direction
* *D_service* - media service response time
* *pathID* - tag ID used to identify the network path (bidirectional path identifier)
* *source_SFR* - tag used to identify the source service function router (the start of the network path)
* *target_SFR* - tag used to identify the target service function router (the end of the network path)
* *FQDN*- tag used to identify the media service
* *sf_instance* - tag used to identify the media component instance ID
* *delay_forward* - network delay for the path in forward direction
* *delay_reverse* - network delay for path in reverse direction
* *delay_service* - media service component response time
Then we can easily query on this measurement to obtain different performance indicators, such as end-to-end overall delays,
round-trip response time or any of the contributing parts in those performance indicators.
......@@ -182,37 +197,40 @@ round-trip response time or any of the contributing parts in those performance i
### **Aggregation script**
What the aggregation script does is very similat to the functionality of a continuous query. Given a sample report period, e.g. 10s,
What the aggregation script does is very similar to the functionality of a continuous query. Given a sample report period, e.g. 10s,
the script executes at every 10-second-period querying the averaged data for the last 10 seconds. The executed queries are:
* Network delays query - to obtain the network delay values and group them by their **path** identifier:
* Network delays query - to obtain the network delay values and group them by their **path**, **source** and **target** identifiers:
```
SELECT mean(delay) as "Dnet" FROM "E2EMetrics"."autogen".network_delays WHERE time >= now() - 10s and time < now() GROUP BY path
SELECT mean(delay) as "net_delay" FROM "E2EMetrics"."autogen"."network_delays" WHERE time >= now() - 10s and time < now() GROUP BY path, source, target
```
* Media service response time query - to obtain the response time values of the media service instances and group them by **FQDN**, **sf_instance** and **endpoint** identifiers:
* Media service response time query - to obtain the response time values of the media service instances and group them by **FQDN**, **sf_instance** and **sfr** identifiers:
```
SELECT mean(response_time) as "Dresponse" FROM "E2EMetrics"."autogen".service_delays WHERE time >= now() - 10s and time < now() GROUP BY FQDN, sf_instance, endpoint
SELECT mean(response_time) as "response_time" FROM "E2EMetrics"."autogen"."service_delays" WHERE time >= now() - 10s and time < now() GROUP BY FQDN, sf_instance, sfr
```
The results of the queries are then matched against each other on endpoint ID: on every match of the **endpoint** tag of the **service_delays** measurement with
the target endpoint ID of the **network_delays** measurement, the rows are combined to obtain an **e2e_delay** measurement row, which is posted back to influx.
The results of the queries are then matched against each other on the **target** and **sfr** tag values (for *network_delays* and *service_delays* respectively):
on every match of the **sfr** tag of the **service_delays** measurement with the **target** service router of the **network_delays** measurement, the rows are combined
to obtain an **e2e_delay** measurement row, which is posted back to influx.
Example:
* Result from first query:
Let's assume we have these results from the two queries:
* Result from first query
```
name: network_delays
tags: path=endpoint1.ms-A.ict-flame.eu---endpoint2.ms-A.ict-flame.eu
time Dnet
---- ----
tags: path=SFR-A---SFR-B, source=SFR-A, target=SFR-B
time net_delay
---- ---------
1524833145975682287 9.2
name: network_delays
tags: path=endpoint2.ms-A.ict-flame.eu---endpoint1.ms-A.ict-flame.eu
time Dnet
---- ----
tags: path=SFR-A---SFR-B, source=SFR-B, target=SFR-A
time net_delay
---- ---------
1524833145975682287 10.3
```
......@@ -220,22 +238,17 @@ time Dnet
```
name: service_delays
tags: FQDN=ms-A.ict-flame.eu, endpoint=endpoint2.ms-A.ict-flame.eu, sf_instance=test-sf-clmc-agent-build_INSTANCE
time Dresponse
---- ---------
tags: FQDN=ms-A.ict-flame.eu, sfr=SFR-B, sf_instance=test-sf-clmc-agent-build_INSTANCE
time response_time
---- -------------
1524833145975682287 11
```
The script will parse the path identifier **endpoint1.ms-A.ict-flame.eu---endpoint2.ms-A.ict-flame.eu** and find the target endpoint being
**endpoint2.ms-A.ict-flame.eu**. Then the script checks if there is service delay measurement row matching this endpoint. Since there is one,
those values will be merged, so the result will be a row like this:
The script will merge those rows, beucase there is a match on network delay target SFR and service delay SFR - namely **SFR-B**.
| pathID_F (tag) | pathID_R (tag) | FQDN (tag) | sf_instance (tag) | D_path_F | D_path_R | D_service | time |
| --- | --- | --- | --- | --- | --- | --- | --- |
| endpoint1.ms-A.ict-flame.eu---endpoint2.ms-A.ict-flame.eu | endpoint2.ms-A.ict-flame.eu---endpoint1.ms-A.ict-flame.eu | ms-A.ict-flame.eu | test-sf-clmc-agent-build_INSTANCE | 9.2 | 10.3 | 11 | 1524833145975682287 |
Here, another assumption is made that we can reverse the path identifier of a network delay row and that the reverse path delay would also
be reported in the **network_delays** measurement.
| path_ID (tag) | source_SFR (tag) | target_SFR (tag) | FQDN (tag) | sf_instance (tag) | delay_forward | delay_reverse | delay_service | time |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| SFR-A---SFR-B | SFR-A | SFR-B | ms-A.ict-flame.eu | test-sf-clmc-agent-build_INSTANCE | 9.2 | 10.3 | 11 | 1524833145975682287 |
The resulting row would then be posted back to influx in the **e2e_delays** measurement.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment