@@ -46,29 +46,45 @@ Readers of this document are assumed to have at least read the [CLMC information
### **Assumptions**
* Network measurement - assumption is that we have a measurement for the network link delays, called **network_delays**, providing the following information:
Here, we list the assumptions we make for measuring and understanding E2E performance of media components:
| path (tag) | delay | time |
| --- | --- | --- |
| path identifier | e2e delay for the given path | time of measurement |
* Network measurement - the assumption is that we have a measurement for the network path delays between service function routers, called **network_delays**, providing the following information:
Here, the **path** tag value is the identifier of the path between two nodes in the network topology obtained from FLIPS. The assumption is that those identifiers
will be structured in such a way that we can obtain the source and target endpoint IDs from the path identifier itself. For example:
| path identifier | source SFR | target SFR | e2e delay for the given path (ms) | timestamp of measurement |
* A response will traverse the same network path as the request, but in reverse direction.
Here, the **path** tag value is the identifier of the path between two nodes (service routers) in the network topology obtained from FLIPS.
The **source** tag value is the source service router for the identified path, while the **target** tag value is the target service router.
The delay field value is the network end-to-end delay in milliseconds that a packet would experience when traversing the path between the two SFRs identified in the tag values.
* Media service measurement - assumption is that we have a measurement for media services' response time, called **service_delays**, providing the following information:
The semantics of the row is that a packet traversing the path from SFR-A (service router) through S1, S2, S3 (switches) to SFR-B (service router) will experience an averaged delay of 10ms.
* Request/Response path - the assumption is that a response will traverse the same network path as the request, but in reverse direction.
* Media service measurement - assumption is that we have a measurement for media service components' response time, called **service_delays**, providing the following information:
| media service FQDN | ID of the service function instance | SFR that connects the MC endpoint to the Flame network | response time for the media service (ms) | timestamp of measurement |
Here, the **FQDN**, **sf_instance** and **sfr** tag values identify a unique response time measurement.
The response time field value is the response time (measured in milliseconds) for the media service component only, and it does not take into account any of the network measurements.
Here, the **FQDN**, **sf_instance** and **endpoint** tag values identify a unique response time measurement. The response time field value is the
response time (measured in seconds) for the media service only, and it does not take into account any of the network measurements.
The semantics of the row is that the response time for a service function instance with ID *ms-A-sf_INSTANCE* serving media service
*ms-A.ict-flame.eu* and connected to the FLAME network through service router *SFR-B* will have an averaged response time of 27 ms.
## E2E Model
...
...
@@ -155,26 +171,25 @@ __TO DO__
### **Idea**
The idea is to aggregate platform measurement points with media service measurement points and obtain a third measurement from which we can easily
The idea is to aggregate network measurement points with media service measurement points and obtain a third measurement from which we can easily
understand both end-to-end and round-trip performance of a media service. This is achieved by having a python script running on the background and aggregating
the data from both measurements on a given sample period, e.g. every 10 seconds. The script then posts the aggregated data back to Influx in a new measurement.
the data from the two measurements on a given sample period, e.g. every 10 seconds. The script then posts the aggregated data back to Influx in a new measurement.
### **Goal**
The ultimate goal is to populate a new measurement, called **e2e_delays**, which will be provided with the following information:
**pathID_F* - tag used to identify the path in forward direction, e.g. **endpoint1.ms-A.ict-flame.eu---endpoint2.ms-A.ict-flame.eu**
**pathID_R* - tag used to identify the path in reverse direction, e.g. **endpoint2.ms-A.ict-flame.eu---endpoint1.ms-A.ict-flame.eu**
**FQDN* - tag used to identify the media service
**sf_instance* - tag used to identify the media service
**D_path_F* - network delay for path in forward direction
**D_path_R* - network delay for path in reverse direction
**D_service* - media service response time
**pathID* - tag ID used to identify the network path (bidirectional path identifier)
**source_SFR* - tag used to identify the source service function router (the start of the network path)
**target_SFR* - tag used to identify the target service function router (the end of the network path)
**FQDN*- tag used to identify the media service
**sf_instance* - tag used to identify the media component instance ID
**delay_forward* - network delay for the path in forward direction
**delay_reverse* - network delay for path in reverse direction
**delay_service* - media service component response time
Then we can easily query on this measurement to obtain different performance indicators, such as end-to-end overall delays,
round-trip response time or any of the contributing parts in those performance indicators.
...
...
@@ -182,37 +197,40 @@ round-trip response time or any of the contributing parts in those performance i
### **Aggregation script**
What the aggregation script does is very similat to the functionality of a continuous query. Given a sample report period, e.g. 10s,
What the aggregation script does is very similar to the functionality of a continuous query. Given a sample report period, e.g. 10s,
the script executes at every 10-second-period querying the averaged data for the last 10 seconds. The executed queries are:
* Network delays query - to obtain the network delay values and group them by their **path** identifier:
* Network delays query - to obtain the network delay values and group them by their **path**, **source** and **target** identifiers:
```
SELECT mean(delay) as "Dnet" FROM "E2EMetrics"."autogen".network_delays WHERE time >= now() - 10s and time < now() GROUP BY path
SELECT mean(delay) as "net_delay" FROM "E2EMetrics"."autogen"."network_delays" WHERE time >= now() - 10s and time < now() GROUP BY path, source, target
```
* Media service response time query - to obtain the response time values of the media service instances and group them by **FQDN**, **sf_instance** and **endpoint** identifiers:
* Media service response time query - to obtain the response time values of the media service instances and group them by **FQDN**, **sf_instance** and **sfr** identifiers:
```
SELECT mean(response_time) as "Dresponse" FROM "E2EMetrics"."autogen".service_delays WHERE time >= now() - 10s and time < now() GROUP BY FQDN, sf_instance, endpoint
SELECT mean(response_time) as "response_time" FROM "E2EMetrics"."autogen"."service_delays" WHERE time >= now() - 10s and time < now() GROUP BY FQDN, sf_instance, sfr
```
The results of the queries are then matched against each other on endpoint ID: on every match of the **endpoint** tag of the **service_delays** measurement with
the target endpoint ID of the **network_delays** measurement, the rows are combined to obtain an **e2e_delay** measurement row, which is posted back to influx.
The results of the queries are then matched against each other on the **target** and **sfr** tag values (for *network_delays* and *service_delays* respectively):
on every match of the **sfr** tag of the **service_delays** measurement with the **target** service router of the **network_delays** measurement, the rows are combined
to obtain an **e2e_delay** measurement row, which is posted back to influx.
Example:
* Result from first query:
Let's assume we have these results from the two queries: