Skip to content
Snippets Groups Projects
Commit 59a0fa32 authored by Nikolay Stanchev's avatar Nikolay Stanchev
Browse files

Merged integration into clmcservice

parents 8ec0b5b2 a7cb9107
No related branches found
No related tags found
No related merge requests found
Showing
with 1399 additions and 14 deletions
......@@ -7,4 +7,3 @@ scripts/* text eol=lf
*.png binary
*.jpg binary
*.build-config/
\ No newline at end of file
*.pytest_cache/
\ No newline at end of file
......@@ -5,10 +5,11 @@
*egg-info*
*git-commit-ref*
*_version.py*
*reporc
ubuntu-xenial-16.04-cloudimg-console.log
.idea/
*.egg
*.pyc
.pytest_cache
.tox
*$py.class
**/.pytest_cache/
\ No newline at end of file
......@@ -97,7 +97,8 @@ Vagrant.configure("2") do |config|
instance_config.vm.provision :shell, :path => "clmctest/services/#{host["service_name"]}/install.sh", env: {"REPO_ROOT" => "/vagrant"}
# CLMC agent install
instance_config.vm.provision :shell, :path => "scripts/clmc-agent/install.sh"
instance_config.vm.provision "file", source: "reporc", destination: "/vagrant/reporc"
instance_config.vm.provision :shell, :path => "scripts/clmc-agent/install.sh", env: {"REPO_ROOT" => "/vagrant"}
# CLMC agent service specific input configuration
instance_config.vm.provision :shell, inline: <<-SHELL
......@@ -113,7 +114,7 @@ Vagrant.configure("2") do |config|
# CLMC agent general and output configuration
#instance_config.vm.provision :shell, :path => "scripts/clmc-agent/configure_template.sh"
instance_config.vm.provision :shell, :path => "scripts/clmc-agent/configure.sh", :args => "#{host["location"]} #{host["sfc_id"]} #{host["sfc_id_instance"]} #{host["sf_id"]} #{host["sf_id_instance"]} #{host["ipendpoint_id"]} #{host["influxdb_url"]} #{host["database_name"]}"
instance_config.vm.provision :shell, :path => "scripts/clmc-agent/configure.sh", :args => "#{host["location"]} #{host["sfc_id"]} #{host["sfc_id_instance"]} #{host["sf_id"]} #{host["sf_id_instance"]} #{host["ipendpoint_id"]} #{host["sr_id"]} #{host["influxdb_url"]} #{host["database_name"]}"
# CLMC start agent
instance_config.vm.provision :shell, inline: "service telegraf restart"
......
## (c) University of Southampton IT Innovation Centre, 2018
##
## Copyright in this software belongs to University of Southampton
## IT Innovation Centre of Gamma House, Enterprise Road,
## Chilworth Science Park, Southampton, SO16 7NS, UK.
##
## This software may not be used, sold, licensed, transferred, copied
## or reproduced in whole or in part in any manner or form or in or
## on any media by any person other than in accordance with the terms
## of the Licence Agreement supplied with the software, or otherwise
## without the prior written consent of the copyright owners.
##
## This software is distributed WITHOUT ANY WARRANTY, without even the
## implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
## PURPOSE, except where stated in the Licence Agreement supplied with
## the software.
##
## Created By : Michael Boniface
## Created Date : 02-02-2018
## Created for Project : FLAME
hosts:
- name: clmc-service
cpus: 1
memory: 2048
disk: "10GB"
forward_ports:
- guest: 8086
host: 8086
- guest: 8888
host: 8888
- guest: 9092
host: 9092
ip_address: "172.40.231.51"
- name: minio
service_name: "minio"
cpus: 1
memory: 2048
disk: "10GB"
forward_ports:
- guest: 9000
host: 9000
ip_address: "172.40.231.155"
location: "DC1"
sfc_id: "MS_Template_1"
sfc_id_instance: "MS_I1"
sf_id: "adaptive_streaming"
sf_id_instance: "adaptive_streaming_I1"
ipendpoint_id: "adaptive_streaming_I1_minio"
influxdb_url: "http://172.40.231.51:8086"
database_name: "CLMCMetrics"
- name: test-runner
cpus: 1
memory: 2048
disk: "10GB"
ip_address: "172.40.231.200"
......@@ -47,6 +47,7 @@ hosts:
sf_id: "adaptive_streaming"
sf_id_instance: "adaptive_streaming_I1"
ipendpoint_id: "adaptive_streaming_I1_apache1"
sr_id: "service_router"
influxdb_url: "http://172.40.231.51:8086"
database_name: "CLMCMetrics"
- name: nginx
......@@ -64,6 +65,7 @@ hosts:
sf_id: "adaptive_streaming"
sf_id_instance: "adaptive_streaming_nginx_I1"
ipendpoint_id: "adaptive_streaming_nginx_I1_apache1"
sr_id: "service_router"
influxdb_url: "http://172.40.231.51:8086"
database_name: "CLMCMetrics"
- name: mongo
......@@ -81,6 +83,7 @@ hosts:
sf_id: "metadata_database"
sf_id_instance: "metadata_database_I1"
ipendpoint_id: "metadata_database_I1_apache1"
sr_id: "service_router"
influxdb_url: "http://172.40.231.51:8086"
database_name: "CLMCMetrics"
- name: ffmpeg
......@@ -98,6 +101,7 @@ hosts:
sf_id: "metadata_database"
sf_id_instance: "metadata_database_I1"
ipendpoint_id: "metadata_database_I1_apache1"
sr_id: "service_router"
influxdb_url: "http://172.40.231.51:8086"
database_name: "CLMCMetrics"
- name: host
......@@ -115,11 +119,12 @@ hosts:
sf_id: "adaptive_streaming"
sf_id_instance: "adaptive_streaming_I1"
ipendpoint_id: "adaptive_streaming_I1_apache1"
sr_id: "service_router"
influxdb_url: "http://172.40.231.51:8086"
database_name: "CLMCMetrics"
- name: test-runner
cpus: 1
memory: 2048
cpus: 2
memory: 4096
disk: "10GB"
ip_address: "172.40.231.200"
- name: minio
......@@ -137,5 +142,6 @@ hosts:
sf_id: "adaptive_streaming"
sf_id_instance: "adaptive_streaming_I1"
ipendpoint_id: "adaptive_streaming_I1_minio"
sr_id: "service_router"
influxdb_url: "http://172.40.231.51:8086"
database_name: "CLMCMetrics"
\ No newline at end of file
#!/usr/bin/python3
"""
## © University of Southampton IT Innovation Centre, 2018
##
## Copyright in this software belongs to University of Southampton
## IT Innovation Centre of Gamma House, Enterprise Road,
## Chilworth Science Park, Southampton, SO16 7NS, UK.
##
## This software may not be used, sold, licensed, transferred, copied
## or reproduced in whole or in part in any manner or form or in or
## on any media by any person other than in accordance with the terms
## of the Licence Agreement supplied with the software, or otherwise
## without the prior written consent of the copyright owners.
##
## This software is distributed WITHOUT ANY WARRANTY, without even the
## implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
## PURPOSE, except where stated in the Licence Agreement supplied with
## the software.
##
## Created By : Michael Boniface
## Created Date : 29-04-2018
## Created for Project : FLAME
"""
import pytest
import time
import random
import logging
import sys
from config_collector import ConfigCollector
STATE_INDEX = 0
TIME_INDEX = 1
samples = [[['active', 0], ['active', 2]],
[['active', 0], ['active', 2], ['active', 4]],
[['active', 0], ['failed', 2]],
[['active', 0], ['active', 2], ['inactive', 4], ['active', 6], ['failed', 8], ['inactive', 10]],
[['active', 0], ['inactive', 2], ['failed', 4], ['active', 6], ['inactive', 8], ['failed', 10]]]
def get_sample_test():
global sample_set
global current_index
sample = (samples[sample_set][current_index][STATE_INDEX], time.time())
sample_count = len(samples[sample_set])
if current_index < sample_count-1:
current_index +=1
else:
current_index = 0
return sample
def write_output(measurement):
print("Writing measurement output {0}".format(measurement))
sample_set = 0
current_index = 0
def test_agg():
t = ConfigCollector(get_sample_test, write_output, "resource")
measurement = t.create_measurement(samples[0], 10, 12)
assert measurement[0]['fields']['current_state'] == 'active'
assert measurement[0]['fields']['current_state_time'] == 12
assert measurement[0]['fields']['active_sum'] == 12
assert measurement[0]['fields']['active_count'] == 1
assert measurement[0]['time'] == 12000000000
t = ConfigCollector(get_sample_test, write_output, "resource")
measurement = t.create_measurement(samples[1], 10, 14)
assert measurement[0]['fields']['current_state'] == 'active'
assert measurement[0]['fields']['current_state_time'] == 14
assert measurement[0]['fields']['active_sum'] == 14
assert measurement[0]['fields']['active_count'] == 1
assert measurement[0]['time'] == 14000000000
t = ConfigCollector(get_sample_test, write_output, "resource")
measurement = t.create_measurement(samples[2], 8, 10)
assert measurement[0]['fields']['current_state'] == 'failed'
assert measurement[0]['fields']['current_state_time'] == 0
assert measurement[0]['fields']['active_sum'] == 2
assert measurement[0]['fields']['active_count'] == 1
assert measurement[0]['fields']['failed_sum'] == 0
assert measurement[0]['fields']['failed_count'] == 1
assert measurement[0]['time'] == 10000000000
t = ConfigCollector(get_sample_test, write_output, "resource")
measurement = t.create_measurement(samples[3], 2, 12)
assert measurement[0]['fields']['current_state'] == 'inactive'
assert measurement[0]['fields']['current_state_time'] == 0
assert measurement[0]['fields']['active_sum'] == 6
assert measurement[0]['fields']['active_count'] == 2
assert measurement[0]['fields']['inactive_sum'] == 2
assert measurement[0]['fields']['inactive_count'] == 2
assert measurement[0]['fields']['failed_sum'] == 2
assert measurement[0]['fields']['failed_count'] == 1
assert measurement[0]['time'] == 12000000000
t = ConfigCollector(get_sample_test, write_output, "resource")
measurement = t.create_measurement(samples[4], 4, 14)
assert measurement[0]['fields']['current_state'] == 'failed'
assert measurement[0]['fields']['current_state_time'] == 0
assert measurement[0]['fields']['active_sum'] == 4
assert measurement[0]['fields']['active_count'] == 2
assert measurement[0]['fields']['inactive_sum'] == 4
assert measurement[0]['fields']['inactive_count'] == 2
assert measurement[0]['fields']['failed_sum'] == 2
assert measurement[0]['fields']['failed_count'] == 2
assert measurement[0]['time'] == 14000000000
def test_one_period_collection():
global sample_set
global current_index
# one measurementing period
sample_set = 1
current_index = 0
t = ConfigCollector(get_sample_test, write_output, "resource", 2, 6)
t.start()
time.sleep(8)
t.stop()
print("Current measurement: {0}".format(str(t.current_measurement)))
assert t.current_measurement[0]['fields']['current_state'] == 'active'
assert int(round(t.current_measurement[0]['fields']['current_state_time'])) == 6
assert int(round(t.current_measurement[0]['fields']['active_sum'])) == 6
assert int(round(t.current_measurement[0]['fields']['active_count'])) == 1
def test_multi_period_single_state_collection():
global sample_set
global current_index
# two measurementing periods
sample_set = 1
current_index = 0
t = ConfigCollector(get_sample_test, write_output, "resource", 1, 3)
t.start()
time.sleep(7)
t.stop()
print("Current measurement: {0}".format(str(t.current_measurement)))
assert t.current_measurement[0]['fields']['current_state'] == 'active'
assert int(round(t.current_measurement[0]['fields']['current_state_time'])) == 6
assert int(round(t.current_measurement[0]['fields']['active_sum'])) == 6
assert int(round(t.current_measurement[0]['fields']['active_count'])) == 1
# [['active', 0], ['inactive', 2], ['failed', 4], ['active', 6], ['inactive', 8], ['failed', 10]]
def test_multi_period_multi_state_collection():
global sample_set
global current_index
# 6 samples and 2 measurementing periods
sample_set = 4
current_index = 0
t = ConfigCollector(get_sample_test, write_output, "resource", 2, 10)
t.start()
time.sleep(13)
t.stop()
print("Current measurement: {0}".format(str(t.current_measurement)))
assert t.current_measurement[0]['fields']['current_state'] == 'failed'
assert int(round(t.current_measurement[0]['fields']['current_state_time'])) == 0
assert int(round(t.current_measurement[0]['fields']['active_sum'])) == 4
assert int(round(t.current_measurement[0]['fields']['active_count'])) == 2
assert int(round(t.current_measurement[0]['fields']['inactive_sum'])) == 4
assert int(round(t.current_measurement[0]['fields']['inactive_count'])) == 2
assert int(round(t.current_measurement[0]['fields']['failed_sum'])) == 2
assert int(round(t.current_measurement[0]['fields']['failed_count'])) == 2
\ No newline at end of file
#!/usr/bin/python3
"""
## © University of Southampton IT Innovation Centre, 2018
##
## Copyright in this software belongs to University of Southampton
## IT Innovation Centre of Gamma House, Enterprise Road,
## Chilworth Science Park, Southampton, SO16 7NS, UK.
##
## This software may not be used, sold, licensed, transferred, copied
## or reproduced in whole or in part in any manner or form or in or
## on any media by any person other than in accordance with the terms
## of the Licence Agreement supplied with the software, or otherwise
## without the prior written consent of the copyright owners.
##
## This software is distributed WITHOUT ANY WARRANTY, without even the
## implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
## PURPOSE, except where stated in the Licence Agreement supplied with
## the software.
##
## Created By : Michael Boniface
## Created Date : 29-04-2018
## Created for Project : FLAME
"""
import pytest
import time
import random
import logging
import sys
from systemctl_monitor import SystemctlMonitor
URL = "localhost"
PORT = "8186"
DATABASE = "CLMCMetrics"
@pytest.mark.parametrize("service_name", [('nginx')])
def test_create_measurement(telegraf_agent_config, service_name):
service = 'unknown'
for s in telegraf_agent_config['hosts']:
if s['name'] == service_name:
service = s
continue
assert service != 'unknown', "{0} not in list of hosts".format(service_name)
mon = SystemctlMonitor(service_name, 2, 10, service['ip_address'], 8186, service['database_name'])
report = {'time': 1526042434.1773288, 'fields': {'loaded.active.running_sum': 231.85903143882751, 'current_state_time': 231.85903143882751, 'current_state': 'loaded.active.running', 'loaded.active.running_count': 1}}
measurement = mon.create_measurement(report)
assert measurement[0]['tags']['resource_name'] == service_name
assert measurement[0]['fields']['current_state'] == report['fields']['current_state']
def test_get_systemctl_status(telegraf_agent_config):
mon = SystemctlMonitor('nginx', 2, 10, URL, PORT, DATABASE)
state = mon.get_systemctl_status('nginx')
assert state == 'loaded.active.running'
def test_monitor(telegraf_agent_config):
mon = SystemctlMonitor('nginx', 2, 10, URL, PORT, DATABASE)
mon.start()
time.sleep(21)
mon.stop()
measurement = mon.get_current_measurement()
print("Current measurement: {0}".format(str(measurement)))
\ No newline at end of file
......@@ -392,19 +392,41 @@ class Sim(object):
Calculates the network delay. Declared as static method since it doesn't need access to any instance variables.
:param distance: distance metres
:param bandwidth: bandwidth Mbps
:param bandwidth: bandwidth Mb/s
:param packet_size: packet size bytes
:param tx_video_bit_rate: bp/sec
:param tx_video_bit_rate: Mb/s
:return: the calculated network delay
"""
# propogation delay = distance/speed () (e.g 2000 metres * 2*10^8 for optical fibre)
# This is how long it takes to get a signal from one end to the other
# We are using 2E8 here rather than the speed of light (3E8) presumably because in optical fibre there are other delays with the repeaters etc?
# Isn't propogation_delay the same as "latency"?
# http://www.m2optics.com/blog/bid/70587/Calculating-Optical-Fiber-Latency suggests optical fibre is 5 microseconds / km
# 5 microseconds / km is 1/5 km / microsecond = 1000/5 m / microsecond = 1000000000/5 m/s = 2 * 100000000
# In reality we would measure the latency of a link.
propogation_delay = distance / (2 * 100000000)
# packetisation delay = ip packet size (bits)/tx rate (e.g. 100Mbp with 0% packet loss)
# This is how long it takes a whole data packet to pass a point in the wire
packetisation_delay = (packet_size * 8) / (bandwidth * 1000000)
# total number of packets to be sent
# This claims to be number of packets, but is actually is some sort of rate: the units are per second.
# If the tx_video_bit_rate was actually the size of the data to be sent then the formula is correct.
packets = (tx_video_bit_rate * 1000000) / (packet_size * 8)
# This should be time for a data point to get from one end to the other (propogation_delay) +
# time delay from the start of the data passing a point until the end of the data passes the point
# = propogation_delay + (total data size / bandwidth)
#
# A packet on the network comprises a data payload and a header. The header size is constant for a particular protocol
# and the total packet size can be configured (e.g. 1500 bytes).
# packet_size = packet_header_size + packet_payload_size
# The total data size ideally would be (data size / (packet size - header size)) * packet size
# response_delay = propogation_delay + {[(data size / (packet size - header size)) * packet size] / bandwidth}
# response_delay = latency + {[(data_size / (packet_size - packet_header_size)) * packet_size] / bandwidth}
response_delay = packets * (propogation_delay + packetisation_delay)
return response_delay
......
......@@ -47,6 +47,7 @@ hosts:
sf_id: "test-sf-clmc-agent-build"
sf_id_instance: "ms-A.ict-flame.eu"
ipendpoint_id: "endpoint1.ms-A.ict-flame.eu"
sr_id: "service_router"
influxdb_url: "http://172.40.231.51:8086"
database_name: "CLMCMetrics"
- name: ipendpoint2
......@@ -64,6 +65,7 @@ hosts:
sf_id: "test-sf-clmc-agent-build"
sf_id_instance: "ms-A.ict-flame.eu"
ipendpoint_id: "endpoint2.ms-A.ict-flame.eu"
sr_id: "service_router"
influxdb_url: "http://172.40.231.51:8086"
database_name: "CLMCMetrics"
- name: test-runner
......
{
"test_config_telegraf.py::test_write_telegraf_conf": true
}
\ No newline at end of file
......@@ -50,3 +50,32 @@ fi
nginx -s reload
systemctl start nginx
## install a configuration monitoring service, this needs to be in a venv with the rest of the CLMC
sudo apt-get install python3 python3-pip -y
sudo pip3 install pyaml influxdb
svc="nginxmon"
echo "install systemctl monitoring service"
svc_file="${svc}.service"
echo "[Unit]" > $svc_file
echo "Description=nginxmon" >> $svc_file
echo "After=network-online.target" >> $svc_file
echo "" >> $svc_file
echo "[Service]" >> $svc_file
echo "WorkingDirectory=${inst}/${dir}" >> $svc_file
echo "ExecStart=/usr/bin/python3 ${REPO_ROOT}/src/monitoring/systemctl_monitor.py -service nginx -rate 2 -agg 10 -host localhost -port 8186 -db CLMCMetrics" >> $svc_file
echo "ExecStop=/usr/bin/bash ${REPO_ROOT}/src/monitoring/stop_systemctl_monitor.sh" >> $svc_file
echo "" >> $svc_file
echo "[Install]" >> $svc_file
echo "WantedBy=network-online.target" >> $svc_file
sudo cp $svc_file /lib/systemd/system
rm $svc_file
echo "enable"
sudo systemctl daemon-reload
sudo systemctl enable ${svc}
echo "start"
sudo systemctl start ${svc}
\ No newline at end of file
......@@ -26,3 +26,8 @@
## HTTP response timeout (default: 5s)
# response_timeout = "5s"
# # Influx HTTP write listener
[[inputs.http_listener]]
## Address and port to host HTTP listener on
service_address = ":8186"
\ No newline at end of file
......@@ -47,6 +47,7 @@ hosts:
sf_id: "adaptive_streaming"
sf_id_instance: "adaptive_streaming_I1"
ipendpoint_id: "adaptive_streaming_I1_nginx1"
sr_id: "service_router"
influxdb_url: "http://192.168.50.10:8086"
database_name: "CLMCMetrics"
- name: nginx2
......@@ -64,6 +65,7 @@ hosts:
sf_id: "adaptive_streaming"
sf_id_instance: "adaptive_streaming_I1"
ipendpoint_id: "adaptive_streaming_I1_nginx2"
sr_id: "service_router"
influxdb_url: "http://192.168.50.10:8086"
database_name: "CLMCMetrics"
- name: loadtest-streaming
......@@ -81,5 +83,6 @@ hosts:
sf_id: "adaptive_streaming_client"
sf_id_instance: "adaptive_streaming_I1"
ipendpoint_id: "adaptive_streaming_I1_client1"
sr_id: "service_router"
influxdb_url: "http://192.168.50.10:8086"
database_name: "CLMCMetrics"
This diff is collapsed.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
# Round Trip Time of a Service Request
The Round Trip Time (RTT) of a network is the time taken from sending a packet to receiving the acknowlegement. We are also interested in factoring in the size of the data being sent over the network and the delay caused by the service processing the request.
```
total_delay = forward_network_delay + service_delay + reverse_network_delay
```
## Network delay
Time to send complete payload over network
network_delay = time delay from first byte leaving source to final byte arriving at destination
If we ignore the OSI L6 protocol (e.g. HTTP, FTP, Tsunami) then we are modelling a chunk of data moving along a wire. The network delay is then:
```
network_delay = latency + (time difference from start of the data to the end of the data)
```
### Latency
The latency (or propagation delay) of the network path is the time taken for a particular bit of data to get from one end to the other. If we are just modelling one wire (with no switches) then this can be modelled using:
latency = distance / speed
For optical fibre (or even an eletric wire), the speed naively would be the speed of light. In fact, the speed is slower than this (in optical fibre this is because of the internal refraction that occurs, which is different for different wavelengths). According to http://www.m2optics.com/blog/bid/70587/Calculating-Optical-Fiber-Latency the delay (1/speed) is approximately 5 microseconds / km
```
if
distance is in m
delay is in s/m
latency is in s
then
latency = distance * 5 / 1E9
```
(this matches MJB's "propogation_delay" formula)
Normally we would just measure the latency of a link. Most real-life connections comprise many network links and many switches, each of which introduces some latency.
### Data delay
The time difference from start of the data to the end of the data (or "data delay" for want of a better term) is dependent on the bandwidth of the network and the amount of data.
```
if
data_size is in Bytes
bandwidth is in Mb/s
data_delay is in s
then
data_delay = data_size * 8 / bandwidth * 1E6
```
The data_size naively is the size of the data you want to send over the network (call this the "file_size"). However, the data is split into packets and each packet has a header on it so the amount of data going over the network is actually more than the amount sent.
```
let
packet_size = packet_header_size + packet_payload_size
then
data_size = (packet_size / packet_payload_size) * file_size
or
data_size = (packet_size / packet_size - packet_header_size) * file_size
```
### Total delay
```
delay = latency + data_delay
= (distance * 5 / 1E9) + {[(packet_size / packet_size - packet_header_size) * file_size] * 8 / bandwidth * 1E6}
```
### Effect of Protocol
The choice of protocol has a large effect in networks with a high bandwidth-delay product.
In data communications, bandwidth-delay product is the product of a data link's capacity (in bits per second) and its round-trip delay time (in seconds). The result, an amount of data measured in bits (or bytes), is equivalent to the maximum amount of data on the network circuit at any given time, i.e., data that has been transmitted but not yet acknowledged.
TCP for instance expects acknowledgement of every packet sent and if the sender has not received an acknowledgement within a specified time period then the packet will be retransmitted. Furthermore, TCP uses a flow-control method whereby the receiver specifies how much data it is willing to buffer and the sending host must pause sending and wait for acknowledgement once that amount of data is sent.
## Service Delay
# Understanding end-to-end media service performance in the FLAME platform
© University of Southampton IT Innovation Centre, 2018
This document describe the FLAME model of end-to-end (E2E) media service performance as it is observed and measured using the CLMC on the FLAME platform.
#### **Authors**
|Authors|Organisation|
|-|-|
|[Simon Crowle](mailto:sgc@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
## Introduction
Readers of this document are assumed to have at least read the [CLMC information model](clmc-information-model.md). Here we explore the requirements which inform the definition of metrics that determine *'end-to-end'* media service performance. Before continuing, some terms are defined:
| term | definition |
| --- | --- |
| *client* | an end-user of a FLAME media service - typically somebody accessing the service via an mobile computing device connected to an _service router_ |
| *endpoint* | an endpoint (EP) is a virtual machine (VM) connected to the FLAME network |
| *service router* | an EP that allows other EPs to communicate with one another using fully qualified domain names (FQDN), rather than IP addresses |
| *network node* | an _EP_, _service router_ or other hardware that receives and sends network traffic along network connections attached to it |
| *media component* | a media component (MC) is a process that in part or wholly realizes the functionality of a media service |
| *E2E path* | the directed, acyclic traversal of FLAME network nodes, beginning with a source _EP_ and moving to a target _EP_ via network nodes in the FLAME network |
| *E2E response time* | the total time taken for a service request to i) traverse an _E2E path_, ii) be processed at the _MC_, iii) be returned as a response via an _E2E path_
In the sections that follow we set out some basic properties of a potential media service and then explore these in more detail with a concrete example. Following on from this analysis we provide a test-based approach to the specification of E2E media service performance measures.
## E2E SFC chains
Let us begin by identifying some simple, generic interactions within a media service function chain (SFC):
```
// simple chain
Client --> data storage MC
// sequential chain
Client --> data processor MC --> data storage MC
// complex chain
Client --> data processor MC_A --> data processor MC_B
|-> data storage MC <-|
```
The first example above imagines a client requesting data be stored in (or retrieved from) a database managed by the MC responsible for persistence. In the second case, the client requests some processing of some data held in the data store, the results of which are also stored. Finally, the third case outlines a more complex scenario in which the client requests some processing of data which in turn generates further requests for additional data processing in other MCs which also may depend on storage I/O functionality. Here additional data processing by related MCs could include job scheduling or task decomposition and distribution to worker nodes. An advanced media service, such as a game server, is a useful example of such a service in which graphics rendering; game state modelling; artificial intelligence and network communications are handled in parallel using varying problem decomposition methods.
## E2E simple chain
Next we will define a simple network into which we will place a data processing EP and a data storage EP - we assert the clients could connect to any of _service routers_ that link these MC together.
![E2E network](image/e2e-simple-chain-network.png)
Our simple network consists of three _service routers_ that connect clients with MC data and storage functionality; each demand from client 1 for the storage function could be routed in one network hop from router 'A' to router 'C' or in two from routers 'A' -> 'B' -> 'C'. A demand for storage function from _client 2_ would include zero network hops.
### E2E simple chain metrics
A principal metric we use to understand _E2E response time_: the average time taken between a request or response being transmitted and received _within the FLAME network_. Scoping the E2E response time to within the FLAME network is an important qualification since it is only within this network that all necessary measurements can reliably be taken.
An out-going simple E2E request chain looks like this:
![E2E request steps](image/e2e-simple-chain-request-steps.png)
the delay associated with the processing of the service request is isolated to within the storage MC:
![E2E MC processing](image/e2e-simple-chain-mc-processing.png)
whilst for the response E2E delay, we see this:
![E2E response steps](image/e2e-simple-chain-response-steps.png)
Above we denote the time required for an service router to handle (or pass on) an in-coming message as _handle request_ or _handle response_. When a message is first encountered by a service router, an optimized path through the FLAME network must also be determined; this is labelled above as _route specification_. The _e2e response time_ is the sum of the request, service processing and response delays.
> __Side note:__
> To understand _delay_ more robustly, we may also consider the rate at which requests or responses arrive (_arrival rate_) at each node in the network since message management (queuing, for example) will have an effect at scale. Similarly, the _payload size_ of the messages being handled could also be observed since the quantity of data traversing the SFC will also impact delay in similar, large scale scenarios.
>
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment