Skip to content
Snippets Groups Projects
Commit 0634e702 authored by Michael Boniface's avatar Michael Boniface
Browse files

updated scenarios

parent 2c0a506f
No related branches found
No related tags found
No related merge requests found
......@@ -36,26 +36,46 @@
##### Information Model
The informational model describes the structure and format of configuration and monitoring information collected by the CLMC and how the information is used to support service management decision amking
https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/blob/integration/docs/monitoring.md
##### Usecase Scenario
##### Adaptive Streaming Use Case Scenario
The use case scenario provides an example usage of the information model for an mpeg-dash adaptive streaming service
https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/blob/integration/docs/adaptive-streaming-usecase-scenario.md
https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/blob/integration/docs/CLMC%20monitoring%20specification%20for%20a%20basic%20scenario.md
#### Development Environment
tbd
#### Configuration and Monitoring Specification Test Framework
#### Installation
To set up a test version of Tick Stack run the following command from the root diec
To set up the adaptive streaming use case scenario
`vagrant up`
The will create a VM with InfluxDB, Kapacitor, Telegraf and Capacitor installed with the following ports forwarded on the host machine
This will provision the following VMs clmc, ipendpoint1, ipendpoint2, nap1, nap2
The **clmc** vm includes influx, Kapacitor and Chronograf. The following ports forwarded to the clmc VM from the host machine are as follows:
* Influx: 8086
* Chronograf: 8888
* Kapacitor: 9092
#### Running the simulation
Influx: 8086
Chronograf: 8888
Kapacitor: 9092
SSH into the CLMC server
#### Configuration and Monitoring Specification Test Framework
`vagrant ssh clmc`
Run a python script to generate the test data sets
`python3 vagrant/src/mediaServiceSim/simulator_v2.py`
#### Java/Unit Test Framework (Not currently used)
A Java/JUnit test framework has been developed to provide concrete examples of the CLMC monitoring specification. To build and run this test framework you will need:
1. The CLMC TICK stack installed and running (provided as a Vagrant solution in this project)
......@@ -64,7 +84,6 @@ A Java/JUnit test framework has been developed to provide concrete examples of t
3. Maven 3+ installed
- Optionally a Java IDE installed, such as NetBeans
##### Building the test framework
1. Clone this project (obviously)
......@@ -76,7 +95,6 @@ A Java/JUnit test framework has been developed to provide concrete examples of t
4. Build the project (this should automatically build and run the tests)
> From the command line: mvn test
##### Extending the test framework
This test framework is easily extendible. There are two simple tests already ready for you to explore:
......
......@@ -60,7 +60,7 @@ Vagrant.configure("2") do |config|
v.customize ["modifyvm", :id, "--cpus", 1]
end
# Install CLMC agent
# Install CLMC agent 1
config.vm.provision :shell, :path => 'scripts/influx/install-clmc-agent.sh', :args => "/vagrant/scripts/influx/telegraf_ipendpoint1.conf"
end
......@@ -73,7 +73,7 @@ Vagrant.configure("2") do |config|
v.customize ["modifyvm", :id, "--cpus", 1]
end
# Install CLMC agent
# Install CLMC agent 2
config.vm.provision :shell, :path => 'scripts/influx/install-clmc-agent.sh', :args => "/vagrant/scripts/influx/telegraf_ipendpoint2.conf"
end
......
# CLMC monitoring specification for a basic scenario
# Adaptive Streaming Use Case Scenario
## CONFIGURATION: SLICE
### Compute node configuration
## Infrastructure Slice
#### Common context
| measurement | tag |
| --- | --- | --- |
| compute_node_config, | slice_id="SLICE1", |
### *compute_node_config*
#### Specific context
| tag | tag |
| --- | --- |
| location='DC1', | comp_node_id='c1' |
| compute_node_config | slice | location | comp_node | cpu | memory | storage | timestamp |
| --- | --- | --- | --- | --- | --- |--- | --- |
| compute_node_config | SLICE1 | locA | dc1 | 4 | 8 | 16 | 1515583926868000000 |
| compute_node_config | SLICE1 | locB | dc2 | 8 | 16 | 64 | 1515583926868000000 |
| compute_node_config | SLICE1 | locC | dc3 | 48 | 128 | 4000 | 1515583926868000000 |
#### Configurations
| field | field | field | timestamp |
| --- | --- | --- | --- |
| cpus=4, | memory=8, | storage=16 | 1515583926868000000 |
### *network_config*
| network_config | slice | network | bandwidth | timestamp |
| --- | --- | --- | --- | --- | --- |--- |
| network_config | SLICE1 | data1 | 100 | 1515583926868000000 |
### Network configuration
__How do we describe network configuration ?__
__What is a format of an infrastructure slices ?__
__What is the relevant information ?__
#### Common context: network
| measurement | tag |
| --- | --- | --- |
| network_config,| slice_id='SLICE1', |
#### Specific context: network
| tag |
| --- |
| network_id="NET1" |
#### Configurations: network
| field | timestamp |
| --- | --- | --- | --- | --- |
| bandwidth=400 | 1515583926868000000 |
#### Common context: Network interfaces
| measurement | tag |
| --- | --- | --- |
| network_interface_config,| slice_id='SLICE1', |
#### Specific context: Network interfaces
| tag | tag |
| --- | --- |
| comp_node_id='c1', | port_id='enps03' |
#### Configurations: Network interfaces
| field | field | timestamp |
| --- | --- | --- | --- |
| rx_constraint=1000, | tx_constraint=1000 | 1515583926868000000 |
## CONFIGURATION: SFC template (TOSCA)
### Media Service SFC states
__What are the SFC states ?__
### *network_interface_config*
| network_interface_config | slice | comp_node | port | network | rx_constraint | tx_constraint | timestamp |
| --- | --- | --- | --- | --- | --- |--- |--- |
| network_config | SLICE1 | dc1 | enps03 | data1 | 1000 | 1000 | 1515583926868000000 |
| network_config | SLICE1 | dc2 | enps03 | data1 | 1000 | 1000 | 1515583926868000000 |
| network_config | SLICE1 | dc3 | enps03 | data1 | 1000 | 1000 | 1515583926868000000 |
### CONFIGURATION: Media Service SF states
__What are the SF states ?__
## NAP
### ipendpoint_route
### CONFIGURATION: Media Service SF Instance states
#### Common context
| measurement | tag | tag | tag | tag | tag |
| --- | --- | --- | --- | --- | --- |
| sf_instance_surrogate_config, | location='DC1', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', |
| ipendpoint_route | location | ipendpoint_id | cont_nav | avg_http_requests_fqdn_rate | avg_network_fqdn_latency | time |
| --- | --- | --- | --- | --- | --- | --- |
| ipendpoint_route | \<common tags> | DC1 | ipendpoint1 | http://netflix.com/scream | 386, | 50 | 1515583926868000000 |
#### Specific context
| tag |
| --- |
| surrogate_id='MS_STREAM_1_SURROGATE_1' |
## Media Service
#### Configurations
| field | field | field | field | timestamp |
| --- | --- | --- | --- | --- |
| state='placed', | cpus=2, | memory=4, | storage=8 | 1515583926868000000 |
There are various aggregated metrics we can calculate but in the use case scenario we postpone that till later.
### CONFIGURATION: Media Service Function Instance Surrogates
#### Common context
| measurement | tag | tag | tag | tag | tag |
| --- | --- | --- | --- | --- | --- | --- |
| \<measurement label> | location='DC1', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', |
### sfc_instance_config
#### Specific context
| tag |
| --- |
| surrogate_id='MS_STREAM_1_SURROGATE_1' |
`sfc_i_config,<common_tags>,state <fields> timestamp`
#### Configurations
__QUESTION__: Do we only allow a 1-to-1 mapping between Media Service SF Instances and Surrogates w.r.t. configurations (i.e: I asked for 2 CPUs, I got 2 CPUs). If yes, we could cut some of the fields below.
### sf_i_config
| field | field | field | field | timestamp |
| --- | --- | --- | --- | --- |
| state='booted', | cpus=2, | memory=4, | storage=8 | 1515583926868000000 |
`sf_i_config,<common_tags>,state <fields> timestamp`
## IPEndpoint
## MONITORING
### Common context: Usage and Performance
All of the specific context measurements below carry the following common context (this has not be replicated for brevity) for both usage and performance measurements. In this example, we illustrate using two surrogate VMs.
All IPEndpoint measurements have the following global tags injected by a configured Telegraf agent
| measurement | tag | tag | tag | tag | tag | tag | tag |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| \<measurement>, | location='DC1', | comp_node_id='c1', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', | surrogate_id ='MS_STREAM_1_SURROGATE_1' |
| \<measurement>, | location='DC2', | comp_node_id='c2', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', | surrogate_id ='MS_STREAM_1_SURROGATE_2' |
* location
* compute_node
* sfc
* sfc_i
* sf
* sfc_i
* ipendpoint
Also NOTE: the metrics provided in the measurements below are effectively a 'snapshot' of usage over a relatively small period of time. The length of this snapshot may vary, depending on the underlying implementation of the instrumentation, so we might have to assume this snapshot is essentially an average of a period of 1 second. Measuring 'usage' is dependent on the units, for example as a proportion of a resource or as a proportion of time.
### Monitoring values: Monitor Service Function Instance Surrogate (VMs)
#### Monitoring values: CPU (Telegraf system metrics)
| measurement | \<common tags> | field | field | timestamp |
| --- | --- | --- | --- | --- |
| cpu | \<common tags> | avg_cpu_time_user=40, | avg_cpu_time_idle=5 | 1515583926868000000 |
#### Monitoring values: RAM (Telegraf system metrics)
| measurement | \<common tags> | field | field | timestamp |
| --- | --- | --- | --- | --- | --- |
| mem | \<common tags> | avg_free=880, | total=2048 | 1515583926868000000 |
### ipendpoint_config
#### Monitoring values: Storage (Telegraf system metrics)
| measurement | \<common tags> | field | field | timestamp |
| --- | --- | --- | --- | --- | --- | --- |
| disk | \<common tags> | avg_free=8144, | total=1576 | 1515583926868000000 |
| ipendpoint_config | location | sfc | sfc_i | sf | sf_i | ipendpoint | state | cpu| memory | storage |timestamp |
| --- | --- | --- | --- | --- | --- |--- | --- | --- | --- | --- | --- |
| ipendpoint_config | dc1 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint1 | placed | 2 | 4 | 16 | 1515583926868000000 |
| ipendpoint_config | dc2 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint2 | placed | 8 | 16 | 64 | 1515583926868000000 |
| ipendpoint_config | dc3 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint3 | placed | 48 | 128 | 4000 | 1515583926868000000 |
| ipendpoint_config | dc1 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint1 | booted | 2 | 4 | 16 | 1515583926868000000 |
| ipendpoint_config | dc2 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint2 | booted | 8 | 16 | 64 | 1515583926868000000 |
| ipendpoint_config | dc3 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint3 | booted | 48 | 128 | 4000 | 1515583926868000000 |
#### Monitoring values: Network (FLIPS network metrics)
__Can we measure network usage for a specific VM from FLIPS monitoring?__
__Some metrics from FLIPS contain 'port' label, others not, is this intended?__
### cpu_usage
| measurement | \<common tags> | field | field | field | field | field | field | timestamp |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| net_port_io | \<common tags> | avg_packet_drop_rate=0.3, | avg_packet_error_rate=0.1, | rx_bytes_port_m=13567, | rx_packets_m=768, | tx_bytes_port_m=8102, | tx_packets_port_m=356, | 1515583926868000000 |
| cpu_usage | \<common tags> | cpu | avg_cpu_time_user | avg_cpu_time_idle | timestamp |
| --- | --- | --- | --- | --- |--- |
| cpu | \<common tags> | 1 | 40 | 5 | 1515583926868000000 |
### net_port_io
### Monitoring values: Surrogate Service
| net_port_io | \<common tags> | avg_packet_drop_rate | avg_packet_error_rate | rx_bytes_port_m | rx_packets_m | tx_bytes_port_m | tx_packets_port_m | timestamp |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| net_port_io | \<common tags> | 0.3 | 0.1 | 13567 | 768 | 8102 | 356 | 1515583926868000000 |
QUESTIONS
1. Is the content navigation tag and fully qualified domain name (SDN based)? [Most likely: yes]
### mpegdash_service
#### Monitoring values: service demand and response
| measurement | \<common tags> | tag | tag | field | field | field | field | field | field | field | field | field | timestamp |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| mpegdash_service_mon | \<common tags> | cont_nav='http://netflix.com/scream' | cont_rep='h264' | avg_req_rate=10, | avg_resp_time=40, | peak_resp_time=230, | avg_error_rate=0.2, | avg_throughput=200, | \<userProfileField>=\<value> | avg_quality_delivered=5, | avg_startup_delay=1200, | avg_dropped_segments=2 | 1515583926868000000 |
| mpegdash_service_mon | \<common tags> | cont_nav | cont_rep | user_profile |avg_req_rate | avg_resp_time | peak_resp_time | avg_error_rate | avg_throughput | avg_quality_delivered | avg_startup_delay | avg_dropped_segments | timestamp |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |--- |
| mpegdash_service_mon | \<common tags> | http://netflix.com/scream | h264 | profileA | 10 | 40 | 230 | 0.2 | 200 | | 5 | 1200 | 2 | 1515583926868000000 |
| measurement | \<common tags> | tag | field | field | field | time |
| --- | --- | --- | --- | --- | --- | --- |
| surrogate_route_mon | \<common tags> | src_location='DC3', | cont_nav='http://netflix.com/scream', | avg_http_requests_fqdn_rate=386, | avg_network_fqdn_latency=50 | 1515583926868000000 |
......@@ -466,7 +466,7 @@ network_config measures the overall capacity of the network available to the pla
network_interface_config measures the connection bewteen a compute node and a network along with any constaints on that connection.
`network_interface_config,comp_node_id,port_id rx_constraint,tx_constraint timestamp`
`network_interface_config,comp_node_id,port_id,network_id rx_constraint,tx_constraint timestamp`
## Platform Measurements
......@@ -491,7 +491,7 @@ NAP data usage measurement
NAP service request and response metrics
`surrogate_route_mon,node_id,cont_nav=FQDN HTTP_REQUESTS_FQDN_M, NETWORK_FQDN_LATENCY timestamp`
`ipendpoint_route,ipendpoint_id,cont_nav=FQDN HTTP_REQUESTS_FQDN_M, NETWORK_FQDN_LATENCY timestamp`
**clmc**
......@@ -517,40 +517,40 @@ Aggregate measurement derived from VM/container measurements, most likely calcul
**sf_i_monitoring**
Aggregate measurement derived from surrogate measurements, most likely calculated using a continuous query over a specific time interval
Aggregate measurement derived from ipendpoint measurements, most likely calculated using a continuous query over a specific time interval
**surrogates**
**ipendpoints**
Aggregate measurement derived from surrogate measurements, most likely calculated using a continuous query over a specific time interval
Aggregate measurement derived from ipendpoint measurements, most likely calculated using a continuous query over a specific time interval
`surrogates,<common_tags>, placed, unplaced, booted, connected`
`ipendpoints,<common_tags>, placed, unplaced, booted, connected`
### Surrogate Measurements
### IPEndpoint Measurements
Surrogate measurements measure the configuration, usage and performance of VM/Container instances deployed by the platform within the context of a media service.
ipendpoint measurements measure the configuration, usage and performance of VM/Container instances deployed by the platform within the context of a media service.
Common tags
* location – a physical or virtual server for hosting nodes instances
* server – the location of the server
* sfc – an orchestration template
* sfc_i – an instance of the orchestration template
* sf – a SF package identifier indicating the type and version of SF
* sf_i – an instance of the SF type
* surrogate – an authoritive copy of the SF instance either a container or VM
* server – a physical or virtual server for hosting nodes instances
* location – the location of the server
* ipendpoint – an authoritive copy of the SF instance either a container or VM
#### Surrogate Measurements
#### ipendpoint Measurements
SF Host Resource Measurements measures the host resources allocated to a service function deployed by the platform. All measurements have the following global tags to allow the data to be sliced and diced according to dimensions.
**sf_instance_surrogate_config**
**ipendpoint_config**
The resources allocated to a VM/Container
`sf_instance_surrogate_config,<common_tags>,vm_state cpu,memory,storage timestamp`
`ipendpoint_config,<common_tags>,state cpu,memory,storage timestamp`
Specific tags
* vm_state
* state
**cpu_usage**
......@@ -608,7 +608,7 @@ https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/SYSTEM_
#### Network Measurements
**sf_instance_surrogate_net_port_config**
**net_port_config**
network config is concerned with any network io allocation/constraints for network rx/tx. Possible fields (but these are not available from the FLIPS monitoring specification)
......@@ -618,7 +618,7 @@ Specific tags
* port_state
* port_id
**sf_instance_surrogate_net_port_io**
**net_port_io**
All net_port_io measurements are monitoring by FLIPS. Note that RX_PACKETS_M seems to have inconsistent naming convention unless we are mistaken
......@@ -733,6 +733,14 @@ Some questions
* Can a single value of jitter (e.g. avg jitter) be calculated from the set of measurements in PACKET_JITTER_CID_M message? What is the time period for the list of jitter measurements?
* What does CID mean? consecutive identical digits
__Can we measure network usage for a specific VM from FLIPS monitoring?__
__Some metrics from FLIPS contain 'port' label, others not, is this intended?__
QUESTIONS
1. Is the content navigation tag and fully qualified domain name (SDN based)? [Most likely: yes] although this may only be part of the URL?
#### Link Measurements
Links are established between VM/container instances, need to discuss what measurements make sense. Also the context for links could be between media services, therefore a link measurement should be within the platform context and NOT the media service context. Need a couple of scenarios to work this one out.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment