Skip to content
Snippets Groups Projects
Commit badd9336 authored by Simon Crowle's avatar Simon Crowle
Browse files

Clarifies definitions

parent 0a18e2d1
No related branches found
No related tags found
No related merge requests found
......@@ -37,16 +37,16 @@ Readers of this document are assumed to have at least read the [CLMC information
| term | definition |
| --- | --- |
| *client* | an end-user of a FLAME media service - typically somebody accessing the service via an mobile computing device connected to an _EP router_ |
| *endpoint* | an endpoint (EP) is a virtual machine (VM) connected to the FLAME network by a _service function router_ |
| *service function router* | a SFR is a VM that allows EPs to communicate with one another using fully qualified domain names (FQDN), rather than IP addresses |
| *network node* | a _service function router_ or other hardware that receives and sends network traffic along network connections attached to it |
| *network node* | is a _service function router_ or other hardware that receives and sends network traffic along network connections attached to it |
| *service function router* | a _service function router_ (SFR) is a VM that allows _clients_ or _endpoints_ to communicate with one another using fully qualified domain names (FQDN), rather than IP addresses |
| *service function instance* | a _service function instance_ (SFI) is a process that in part or wholly realizes the functionality of a media service |
| *endpoint* | an endpoint (EP) is a virtual machine (VM) that implements an SFI and is connected to the FLAME network by a _service function router_ |
| *E2E path* | the directed, acyclic traversal of FLAME network nodes, beginning with a source _EP_ and moving to a target _EP_ via network nodes in the FLAME network |
| *round trip time* | the total time taken for a service request to i) traverse an _E2E path_, ii) be processed by the media service, iii) be returned as a response via an _E2E path_
### **Assumptions**
Here, we list the assumptions we make for measuring and understanding E2E performance of SFIs:
Here, we list the assumptions we make for measuring and understanding E2E performance of an EP that implements a SFIs:
* Network measurement - the assumption is that we have a measurement for the network path delays between service function routers, called **network_delays**, providing the following information:
......@@ -54,7 +54,7 @@ Here, we list the assumptions we make for measuring and understanding E2E perfor
| --- | --- | --- | --- | --- |
| path identifier | source SFR | target SFR | e2e delay for the given path (ms) | timestamp of measurement |
Here, the **path_ID** tag value is the identifier of the path between two nodes (service function routers) in the network topology obtained from FLIPS. The **source_SFR** tag value is the source service router for the identified path, while the **target_SFR** tag value is the target service router. The delay field value is the network end-to-end delay in milliseconds that a packet would experience when traversing the path between the two SFRs identified in the tag values.
Here, the **path_ID** tag value is the identifier of the path between two service function routers in the network topology obtained from FLIPS. The **source_SFR** tag value is the source service router for the identified path, while the **target_SFR** tag value is the target service router. The delay field value is the network end-to-end delay in milliseconds that a packet would experience when traversing the path between the two SFRs identified in the tag values.
An example row would be:
......@@ -62,23 +62,23 @@ An example row would be:
| --- | --- | --- | --- | --- |
| SFR-A---S1---S2---S3---SFR-B | SFR-A | SFR-B | 10 | 1525334761282000 |
The semantics of the row is that a packet traversing the path from SFR-A (service function router) through S1, S2, S3 (switches) to SFR-B (service function router) will experience an averaged delay of 10ms.
The semantics of the row is that a packet traversing the path from SFR-A through S1, S2, S3 (switches) to SFR-B will experience an averaged delay of 10ms.
* Request/Response path - the assumption is that a response will traverse the same network path as the request, but in reverse direction.
* Media service measurement - assumption is that we have a measurement for media service response time, called **service_delays**, providing the following information:
* Media service measurement - assumption is that we have a measurement for media service response time, containing at least the following information:
| FQDN (tag) | sf_instance (tag) | sfr (tag) | response_time | time |
| --- | --- | --- | --- | --- |
| media service FQDN | ID of the service function instance | SFR that connects the SFI endpoint to the FLAME network | response time for the media service (ms) | timestamp of measurement |
| sf_instance (tag) | sfr (tag) | endpoint (tag) | response_time | time |
| --- | --- | --- | --- |
| media SF instance ID (FQDN) | SFR that connects the SFI endpoint to the FLAME network | SFI EP identifier | response time for the media service (ms) | timestamp of measurement |
Here, the **FQDN**, **sf_instance** and **sfr** tag values identify a unique response time measurement. The response time field value is the time elapsed (measured in milliseconds) for a media service instance only, and it does not take into account any of the network measurements. An example row would be:
Note that all FLAME service function EPs are expected to contain this and other decision context related data in their global tags, see the [CLMC monitoring documentation](monitoring.md) for further information. Above, the **sf_instance**, **sfr** and **endpoint** tag values identify a unique response time measurement. The response time field value is the time elapsed (measured in milliseconds) for a specific SFI/EP implementation only, and it does not take into account any of the network measurements. An example row would be:
| FQDN (tag) | sf_instance (tag) | sfr (tag) | response_time | time |
| --- | --- | --- | --- | --- |
| ms-A.ict-flame.eu | ms-A-sf_INSTANCE | SFR-B | 27 | 1525334761282000 |
| sf_instance (tag) | sfr (tag) | endpoint (tag) | response_time | time |
| --- | --- | --- | --- | --- | --- |
| media-service.ict-flame.eu | SFR-B | server1 | 27 | 1525334761282000 |
The semantics of the row is that the response time for a service function instance with ID *ms-A-sf_INSTANCE* serving media service *ms-A.ict-flame.eu* and connected to the FLAME network through service function router *SFR-B* will have an averaged response time of 27 ms.
The semantics of the row is that the response time for a SFI with an identity of _media-service.ict-flame.eu_ that is implemented by endpoint _server1_ and connected to the FLAME network through service function router *SFR-B* will have an averaged response time of 27 ms.
## E2E Model
......@@ -90,21 +90,21 @@ Let us begin by identifying some simple, generic interactions within a media ser
```
// simple chain
Client --> data storage SFI
Client --> data storage SFI/EP1
// sequential chain
Client --> data processor SFI --> data storage SFI
Client --> data processor SFI/EP1 --> data storage SFI/EP1
// complex chain
Client --> data processor SFI_A --> data processor SFI_B
|-> data storage SFI <-|
Client --> data processor SFI_A/EP1 --> data processor SFI_B/EP1
|-> data storage SFI/EP1 <-|
```
The first example above imagines a client simply requesting some data be stored in (or retrieved from) a database managed by the SFI responsible for persistence. In the second case, the client requests some processing of some data held in the data store, the results of which are also stored. Finally, the third case outlines a more complex scenario in which the client requests some processing of data which in turn generates further requests for additional data processing in other SFIs which also may depend on storage I/O functionality. Here additional data processing by related SFIs could include job scheduling or task decomposition and distribution to worker nodes. An advanced media service, such as a modern computer game, is a useful example of such a service in which graphics rendering; game state modelling; artificial intelligence and network communications are handled in parallel using varying problem decomposition methods.
### E2E simple chain
Next we will define a very simple network into which we will place a data processing EP and a data storage EP - we assert the clients could connect to any of _service function routers_ that link these SFIs together.
Next we will define a very simple network into which we will place a data processing EP and a data storage EP - we assert the clients could connect to any of _service function routers_ that link these SFI implementations together.
![Simple chain E2E network](image/e2e-simple-chain-network.png)
......@@ -149,11 +149,11 @@ Up until this point we have considered an elementary SFC in which there is only
![Extended chain E2E network](image/e2e-extended-chain-network.png)
Imagine a media service that both stores and processes high volumes of complex media streams. Consider as well a distributed population of clients making demands on this service. Successfully handling high demand for this service could mean deploying several instances of its SFIs (storage and processing) across multiple VMs which interoperate and share the demand load. Since clients and SFIs are distributed, service function requests (made by both) will likely give rise to propagating waves of activity, load (and delay) from multiple nodes across the FLAME platform. For simplicity, let us assume our multimedia service implements a request by processing some media data from the client and then storing it (returning some result to client). Here is client 1's request as it passes through the FLAME network and its SFIs:
Imagine a media service that both stores and processes high volumes of complex media streams. Consider as well a distributed population of clients making demands on this service. Successfully handling high demand for this service could mean deploying several EPs that implement its SFIs (storage and processing) across multiple VMs that interoperate and share the demand load. Since clients and EPs are distributed, service function requests (made by both) will likely give rise to propagating waves of activity, load (and delay) from multiple nodes across the FLAME platform. For simplicity, let us assume our multimedia service implements a request by processing some media data from the client and then storing it (returning some result to client). Here is client 1's request as it passes through the FLAME network and its SFIs:
![Extended client 1 path](image/e2e-extended-client1-path.png)
In the figure above the green arcs indicate service request travel whilst the blue denotes the response path. The shortest route directs the request to SFR 'B' and the consequent storage request travels on to SFR 'C'. __Responses return along the path used by the request__. Indicative service response times are provided by numeric values in the active SFI boxes. Let's see the same request from client 2, who has just joined the network:
In the figure above the green arcs indicate service request travel whilst the blue denotes the response path. The shortest route directs the request to SFR 'B' and the consequent storage request travels on to SFR 'C'. __Responses return along the path used by the request__. Indicative service response times are provided by numeric values in the active SFI/EP boxes. Let's see the same request from client 2, who has just joined the network:
![Extended client 2 path](image/e2e-extended-client2-path.png)
......@@ -165,7 +165,7 @@ Client 3 joins the network:
![Extended client 3 path](image/e2e-extended-client3-path.png)
In calculating a service function route that optimizes for the complete _round trip_ delay, we need to take into account the likely delays that are incurred from both network related latencies and also _all_ SFI response times. The orange route illustrated above shows how the gains made by selecting a fast route through the network are offset by penalities in using a processor SFI that is overloaded; conversely a slower route that selects a SFI with computational resources to spare resolves to an over-all faster round-trip response time.
In calculating a service function route that optimizes for the complete _round trip_ delay, we need to take into account the likely delays that are incurred from both network related latencies and also _all_ SFI response times. The orange route illustrated above shows how the gains made by selecting a fast route through the network are offset by penalities in using an EP processor for the SFI that is overloaded; conversely a slower route that selects a SFI with computational resources to spare resolves to an over-all faster round-trip response time.
## E2E Measurement
......@@ -175,14 +175,14 @@ Our aim is to aggregate network measurement points with media service measuremen
The ultimate goal is to populate a new measurement, called **e2e_delays**, which will be provided with the following information:
| path_ID (tag) | source_SFR (tag) | target_SFR (tag) | FQDN (tag) | sf_instance (tag) | delay_forward | delay_reverse | delay_service | time |
| path_ID (tag) | source_SFR (tag) | target_SFR (tag) | sf_instance (tag) | endpoint (tag) | delay_forward | delay_reverse | delay_service | time |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
* *path_ID* - tag ID used to identify the network path (bidirectional path identifier)
* *source_SFR* - tag used to identify the source service function router (the start of the network path)
* *target_SFR* - tag used to identify the target service function router (the end of the network path)
* *FQDN*- tag used to identify the media service
* *sf_instance* - tag used to identify the SFI
* *endpoint* - tag used to identify the EP implementing the SFI
* *delay_forward* - network delay for the path in forward direction
* *delay_reverse* - network delay for path in reverse direction
* *delay_service* - media service component response time
......@@ -224,11 +224,11 @@ Overall, from this analysis, the following data will be reported to Influx in th
| SFR3-SFR2 | SFR3 | SFR2 | 21 | 1525334761282000 |
| SFR3-SFR2 | SFR2 | SFR3 | 21 | 1525334761282000 |
### Monitoring SFI response times
### Monitoring SFI/EP response times
Readers of the [CLMC information model](clmc-information-model.md) will already be aware of the approach to identifying and reporting SFI performance metrics in the FLAME project. The global measurement tags that help in a decision context are used in this case to provide the mapping between network measurements and a specific service response time. Specifically, we use the SFR tag encapsulated in the media service global tags to cross-reference against target SFR tags (described above).
In its simplest case, a media service function's response time could be defined as a single value that derives from the (average) time spent processing requests in local memory and/or on disk. Indeed, a number of the FLAME foundation media service metrics sent to the CLMC could be described as such. In more advanced cases (such as for clients 1 and 3 in our example above) the full service function chain is implemented across more than one endpoint. Here we have at least two options:
In its simplest case, a media SFI's response time could be defined as a single value that derives from the (average) time spent processing requests in local memory and/or on disk. Indeed, a number of the FLAME foundation media service metrics sent to the CLMC could be described as such. In more advanced cases (such as for clients 1 and 3 in our example above) the full service function chain is implemented across more than one endpoint. Here we have at least two options:
1. Let the first SFI in a SFC be representative of the entire service function delay (making opaque the sub-calls to other SFIs required to fullfil the client's request)
......
This diff is collapsed.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment