updated scenarios

0634e702 · Michael Boniface · 2c0a506f · 0634e702 · 0634e702 · 0634e702
Commit 0634e702 authored 7 years ago by Michael Boniface
--- a/README.md
+++ b/README.md
@@ -36,26 +36,46 @@

 ##### Information Model

+The informational model describes the structure and format of configuration and monitoring information collected by the CLMC and how the information is used to support service management decision amking 
+
 https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/blob/integration/docs/monitoring.md

-##### Usecase Scenario
+##### Adaptive Streaming Use Case Scenario
+
+The use case scenario provides an example usage of the information model for an mpeg-dash adaptive streaming service 
+
+https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/blob/integration/docs/adaptive-streaming-usecase-scenario.md

-https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/blob/integration/docs/CLMC%20monitoring%20specification%20for%20a%20basic%20scenario.md
+#### Development Environment

+tbd

-#### Configuration and Monitoring Specification Test Framework
+#### Installation

-To set up a test version of Tick Stack run the following command from the root diec
+To set up the adaptive streaming use case scenario 

 `vagrant up`

-The will create a VM with InfluxDB, Kapacitor, Telegraf and Capacitor installed with the following ports forwarded on the host machine
+This will provision the following VMs clmc, ipendpoint1, ipendpoint2, nap1, nap2
+
+The **clmc** vm includes influx, Kapacitor and Chronograf. The following ports forwarded to the clmc VM from the host machine are as follows:
+
+* Influx: 8086 
+* Chronograf: 8888
+* Kapacitor: 9092
+
+#### Running the simulation

-Influx: 8086 
-Chronograf: 8888
-Kapacitor: 9092
+SSH into the CLMC server

-#### Configuration and Monitoring Specification Test Framework
+`vagrant ssh clmc`
+
+Run a python script to generate the test data sets
+
+`python3 vagrant/src/mediaServiceSim/simulator_v2.py`
+
+
+#### Java/Unit Test Framework (Not currently used)
 A Java/JUnit test framework has been developed to provide concrete examples of the CLMC monitoring specification. To build and run this test framework you will need:

 1. The CLMC TICK stack installed and running (provided as a Vagrant solution in this project)
@@ -64,7 +84,6 @@ A Java/JUnit test framework has been developed to provide concrete examples of t
 3. Maven 3+ installed
  - Optionally a Java IDE installed, such as NetBeans

-
 ##### Building the test framework

 1. Clone this project (obviously)
@@ -76,7 +95,6 @@ A Java/JUnit test framework has been developed to provide concrete examples of t
 4.  Build the project (this should automatically build and run the tests)
    > From the command line: mvn test

-
 ##### Extending the test framework
 This test framework is easily extendible. There are two simple tests already ready for you to explore:


--- a/Vagrantfile
+++ b/Vagrantfile
@@ -60,7 +60,7 @@ Vagrant.configure("2") do |config|
        v.customize ["modifyvm", :id, "--cpus", 1]
      end

-      # Install CLMC agent
+      # Install CLMC agent 1
      config.vm.provision :shell, :path => 'scripts/influx/install-clmc-agent.sh', :args => "/vagrant/scripts/influx/telegraf_ipendpoint1.conf"
  end    

@@ -73,7 +73,7 @@ Vagrant.configure("2") do |config|
        v.customize ["modifyvm", :id, "--cpus", 1]
      end

-      # Install CLMC agent
+      # Install CLMC agent 2
      config.vm.provision :shell, :path => 'scripts/influx/install-clmc-agent.sh', :args => "/vagrant/scripts/influx/telegraf_ipendpoint2.conf"
  end
  

--- a/docs/CLMC monitoring specification for a basic scenario.md
+++ b/docs/CLMC monitoring specification for a basic scenario.md
-# CLMC monitoring specification for a basic scenario
+# Adaptive Streaming Use Case Scenario

-## CONFIGURATION: SLICE
-### Compute node configuration
+## Infrastructure Slice

-#### Common context
-| measurement | tag |
-| --- | --- | --- |
-| compute_node_config, | slice_id="SLICE1", | 
+### *compute_node_config*

-#### Specific context
-| tag | tag |
-| --- | --- |
-| location='DC1', | comp_node_id='c1' |
+| compute_node_config | slice | location | comp_node | cpu | memory | storage | timestamp |
+| --- | --- | --- | --- | --- | --- |--- | --- |
+| compute_node_config | SLICE1 | locA | dc1 | 4 | 8 | 16 | 1515583926868000000 |
+| compute_node_config | SLICE1 | locB | dc2 | 8 | 16 | 64 | 1515583926868000000 |
+| compute_node_config | SLICE1 | locC | dc3 | 48 | 128 | 4000 | 1515583926868000000 |

-#### Configurations
-| field | field | field | timestamp |
-| --- | --- | --- | --- |
-| cpus=4, | memory=8, | storage=16 | 1515583926868000000 |
+### *network_config*
+
+| network_config | slice | network | bandwidth | timestamp |
+| --- | --- | --- | --- | --- | --- |--- | 
+| network_config | SLICE1 | data1 | 100 | 1515583926868000000 |

-### Network configuration
 __How do we describe network configuration ?__
 __What is a format of an infrastructure slices ?__
 __What is the relevant information ?__

-#### Common context: network
-| measurement | tag |
-| --- | --- | --- |
-| network_config,| slice_id='SLICE1', |
-
-#### Specific context: network
-| tag |
-| --- | 
-| network_id="NET1" |
-
-#### Configurations: network
-| field | timestamp |
-| --- | --- | --- | --- | --- |
-| bandwidth=400 | 1515583926868000000 |
-
+### *network_interface_config*

-#### Common context: Network interfaces
-| measurement | tag |
-| --- | --- | --- |
-| network_interface_config,| slice_id='SLICE1', |
+| network_interface_config | slice | comp_node | port | network | rx_constraint | tx_constraint | timestamp |
+| --- | --- | --- | --- | --- | --- |--- |--- | 
+| network_config | SLICE1 | dc1 | enps03 | data1 | 1000 | 1000 | 1515583926868000000 |
+| network_config | SLICE1 | dc2 | enps03 | data1 | 1000 | 1000 |  1515583926868000000 |
+| network_config | SLICE1 | dc3 | enps03 | data1 | 1000 | 1000 |  1515583926868000000 |

-#### Specific context: Network interfaces
-| tag | tag |
-| --- | --- |
-| comp_node_id='c1', | port_id='enps03' |
+## NAP

-#### Configurations: Network interfaces
-| field | field | timestamp |
-| --- | --- | --- | --- |
-| rx_constraint=1000, | tx_constraint=1000 | 1515583926868000000 |
+### ipendpoint_route

+| ipendpoint_route | location | ipendpoint_id | cont_nav | avg_http_requests_fqdn_rate | avg_network_fqdn_latency | time |
+| --- | --- | --- | --- | --- | --- | --- |
+| ipendpoint_route | \<common tags> | DC1 | ipendpoint1 | http://netflix.com/scream | 386, | 50 | 1515583926868000000 |

-## CONFIGURATION: SFC template (TOSCA)
-### Media Service SFC states
-__What are the SFC states ?__
-
+## Media Service 

-### CONFIGURATION: Media Service SF states
-__What are the SF states ?__
+There are various aggregated metrics we can calculate but in the use case scenario we postpone that till later.

+### sfc_instance_config

-### CONFIGURATION: Media Service SF Instance states
-#### Common context
-| measurement | tag | tag | tag | tag | tag |
-| --- | --- | --- | --- | --- | --- |
-| sf_instance_surrogate_config, | location='DC1', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', | 
+`sfc_i_config,<common_tags>,state <fields> timestamp`

-#### Specific context
-| tag | 
-| --- |
-| surrogate_id='MS_STREAM_1_SURROGATE_1' | 
+### sf_i_config

-#### Configurations
-| field | field | field | field | timestamp |
-| --- | --- | --- | --- | --- |
-| state='placed', | cpus=2, | memory=4, | storage=8 | 1515583926868000000 |
-
-### CONFIGURATION: Media Service Function Instance Surrogates
-#### Common context
-| measurement | tag | tag | tag | tag | tag |
-| --- | --- | --- | --- | --- | --- | --- |
-| \<measurement label> | location='DC1', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', | 
+`sf_i_config,<common_tags>,state <fields> timestamp`

-#### Specific context
-| tag |
-| --- |
-| surrogate_id='MS_STREAM_1_SURROGATE_1' | 
+## IPEndpoint

-#### Configurations
-__QUESTION__: Do we only allow a 1-to-1 mapping between Media Service SF Instances and Surrogates w.r.t. configurations (i.e: I asked for 2 CPUs, I got 2 CPUs). If yes, we could cut some of the fields below.
+All IPEndpoint measurements have the following global tags injected by a configured Telegraf agent

-| field | field | field | field | timestamp |
-| --- | --- | --- | --- | --- |
-| state='booted', | cpus=2, | memory=4, | storage=8 | 1515583926868000000 |
+* location
+* compute_node
+* sfc
+* sfc_i
+* sf
+* sfc_i
+* ipendpoint

+Also NOTE: the metrics provided in the measurements below are effectively a 'snapshot' of usage over a relatively small period of time. The length of this snapshot may vary, depending on the underlying implementation of the instrumentation, so we might have to assume this snapshot is essentially an average of a period of 1 second. Measuring 'usage' is dependent on the units, for example as a proportion of a resource or as a proportion of time.

-## MONITORING
-### Common context: Usage and Performance
-All of the specific context measurements below carry the following common context (this has not be replicated for brevity) for both usage and performance measurements. In this example, we illustrate using two surrogate VMs.
+### ipendpoint_config

-| measurement | tag | tag | tag | tag | tag | tag | tag |
-| --- | --- | --- | --- | --- | --- | --- | --- | --- |
-| \<measurement>, | location='DC1', | comp_node_id='c1', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', |  surrogate_id ='MS_STREAM_1_SURROGATE_1' |
-| \<measurement>, | location='DC2', | comp_node_id='c2', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', |  surrogate_id ='MS_STREAM_1_SURROGATE_2' |
+| ipendpoint_config | location | sfc | sfc_i | sf | sf_i | ipendpoint | state | cpu| memory | storage |timestamp |
+| --- | --- | --- | --- | --- | --- |--- | --- | --- |  --- |  --- |  --- | 
+| ipendpoint_config | dc1 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint1 | placed | 2 | 4 | 16 | 1515583926868000000 |
+| ipendpoint_config | dc2 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint2 | placed | 8 | 16 | 64 | 1515583926868000000 |
+| ipendpoint_config | dc3 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint3 | placed | 48 | 128 | 4000 | 1515583926868000000 |
+| ipendpoint_config | dc1 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint1 | booted | 2 | 4 | 16 | 1515583926868000000 |
+| ipendpoint_config | dc2 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint2 | booted | 8 | 16 | 64 | 1515583926868000000 |
+| ipendpoint_config | dc3 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint3 | booted | 48 | 128 | 4000 | 1515583926868000000 |

-Also NOTE: the metrics provided in the measurements below are effectively a 'snapshot' of usage over a relatively small period of time. The length of this snapshot may vary, depending on the underlying implementation of the instrumentation, so we might have to assume this snapshot is essentially an average of a period of 1 second. Measuring 'usage' is dependent on the units, for example as a proportion of a resource or as a proportion of time.
+### cpu_usage

-### Monitoring values: Monitor Service Function Instance Surrogate (VMs)
-#### Monitoring values: CPU (Telegraf system metrics)
-| measurement | \<common tags> | field | field | timestamp |
-| --- | --- | --- | --- | --- |
-| cpu | \<common tags> | avg_cpu_time_user=40, | avg_cpu_time_idle=5 | 1515583926868000000 |
-
-#### Monitoring values: RAM (Telegraf system metrics)
-| measurement | \<common tags> | field | field | timestamp |
+| cpu_usage | \<common tags> | cpu | avg_cpu_time_user | avg_cpu_time_idle | timestamp |
 | --- | --- | --- | --- | --- |--- |
-| mem | \<common tags> | avg_free=880, | total=2048 | 1515583926868000000 |
+| cpu | \<common tags> | 1 | 40 | 5 | 1515583926868000000 |

-#### Monitoring values: Storage (Telegraf system metrics)
-| measurement | \<common tags> | field | field | timestamp |
-| --- | --- | --- | --- | --- | --- | --- |
-| disk | \<common tags> | avg_free=8144, | total=1576 | 1515583926868000000 |
+### net_port_io

-#### Monitoring values: Network (FLIPS network metrics)
-__Can we measure network usage for a specific VM from FLIPS monitoring?__
-__Some metrics from FLIPS contain 'port' label, others not, is this intended?__
-
-| measurement | \<common tags> | field | field | field | field | field | field | timestamp |
+| net_port_io | \<common tags> | avg_packet_drop_rate | avg_packet_error_rate | rx_bytes_port_m | rx_packets_m | tx_bytes_port_m | tx_packets_port_m | timestamp |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- |
-| net_port_io | \<common tags> | avg_packet_drop_rate=0.3, | avg_packet_error_rate=0.1, | rx_bytes_port_m=13567, | rx_packets_m=768, | tx_bytes_port_m=8102, | tx_packets_port_m=356, | 1515583926868000000 |
-
+| net_port_io | \<common tags> | 0.3 | 0.1 | 13567 | 768 | 8102 | 356 | 1515583926868000000 |

-### Monitoring values: Surrogate Service
+### mpegdash_service

-QUESTIONS
-1. Is the content navigation tag and fully qualified domain name (SDN based)? [Most likely: yes]
-
-#### Monitoring values: service demand and response
-| measurement | \<common tags> | tag | tag | field | field | field | field | field | field | field | field | field | timestamp |
+| mpegdash_service_mon | \<common tags> | cont_nav | cont_rep | user_profile |avg_req_rate | avg_resp_time | peak_resp_time | avg_error_rate | avg_throughput | avg_quality_delivered | avg_startup_delay | avg_dropped_segments |  timestamp |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |--- |
-| mpegdash_service_mon | \<common tags> | cont_nav='http://netflix.com/scream' | cont_rep='h264' | avg_req_rate=10, | avg_resp_time=40, | peak_resp_time=230, | avg_error_rate=0.2, | avg_throughput=200, | \<userProfileField>=\<value> | avg_quality_delivered=5, | avg_startup_delay=1200, | avg_dropped_segments=2 | 1515583926868000000 |
+| mpegdash_service_mon | \<common tags> | http://netflix.com/scream | h264 | profileA | 10 | 40 | 230 | 0.2 | 200 | | 5 | 1200 | 2 | 1515583926868000000 |

-| measurement | \<common tags> | tag | field | field | field | time |
-| --- | --- | --- | --- | --- | --- | --- |
-| surrogate_route_mon | \<common tags> | src_location='DC3', | cont_nav='http://netflix.com/scream', | avg_http_requests_fqdn_rate=386, | avg_network_fqdn_latency=50 | 1515583926868000000 |

--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -466,7 +466,7 @@ network_config measures the overall capacity of the network available to the pla

 network_interface_config measures the connection bewteen a compute node and a network along with any constaints on that connection.  

-`network_interface_config,comp_node_id,port_id rx_constraint,tx_constraint timestamp`
+`network_interface_config,comp_node_id,port_id,network_id rx_constraint,tx_constraint timestamp`

 ## Platform Measurements 

@@ -491,7 +491,7 @@ NAP data usage measurement

 NAP service request and response metrics

-`surrogate_route_mon,node_id,cont_nav=FQDN HTTP_REQUESTS_FQDN_M, NETWORK_FQDN_LATENCY timestamp`
+`ipendpoint_route,ipendpoint_id,cont_nav=FQDN HTTP_REQUESTS_FQDN_M, NETWORK_FQDN_LATENCY timestamp`

 **clmc**

@@ -517,40 +517,40 @@ Aggregate measurement derived from VM/container measurements, most likely calcul

 **sf_i_monitoring**

-Aggregate measurement derived from surrogate measurements, most likely calculated using a continuous query over a specific time interval
+Aggregate measurement derived from ipendpoint measurements, most likely calculated using a continuous query over a specific time interval

-**surrogates**
+**ipendpoints**

-Aggregate measurement derived from surrogate measurements, most likely calculated using a continuous query over a specific time interval
+Aggregate measurement derived from ipendpoint measurements, most likely calculated using a continuous query over a specific time interval

-`surrogates,<common_tags>, placed, unplaced, booted, connected`
+`ipendpoints,<common_tags>, placed, unplaced, booted, connected`

-### Surrogate Measurements
+### IPEndpoint Measurements

-Surrogate measurements measure the configuration, usage and performance of VM/Container instances deployed by the platform within the context of a media service.
+ipendpoint measurements measure the configuration, usage and performance of VM/Container instances deployed by the platform within the context of a media service.

 Common tags

+* location – a physical or virtual server for hosting nodes instances
+* server – the location of the server
 * sfc – an orchestration template
 * sfc_i – an instance of the orchestration template
 * sf – a SF package identifier indicating the type and version of SF
 * sf_i – an instance of the SF type
-* surrogate – an authoritive copy of the SF instance either a container or VM
-* server – a physical or virtual server for hosting nodes instances
-* location – the location of the server
+* ipendpoint – an authoritive copy of the SF instance either a container or VM

-#### Surrogate Measurements
+#### ipendpoint Measurements

 SF Host Resource Measurements measures the host resources allocated to a service function deployed by the platform. All measurements have the following global tags to allow the data to be sliced and diced according to dimensions.

-**sf_instance_surrogate_config**
+**ipendpoint_config**

 The resources allocated to a VM/Container

-`sf_instance_surrogate_config,<common_tags>,vm_state cpu,memory,storage timestamp`
+`ipendpoint_config,<common_tags>,state cpu,memory,storage timestamp`

 Specific tags
-* vm_state
+* state

 **cpu_usage**

@@ -608,7 +608,7 @@ https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/SYSTEM_

 #### Network Measurements

-**sf_instance_surrogate_net_port_config**
+**net_port_config**

 network config is concerned with any network io allocation/constraints for network rx/tx. Possible fields (but these are not available from the FLIPS monitoring specification)

@@ -618,7 +618,7 @@ Specific tags
 * port_state
 * port_id

-**sf_instance_surrogate_net_port_io**
+**net_port_io**

 All net_port_io measurements are monitoring by FLIPS. Note that RX_PACKETS_M seems to have inconsistent naming convention unless we are mistaken

@@ -733,6 +733,14 @@ Some questions
 * Can a single value of jitter (e.g. avg jitter) be calculated from the set of measurements in PACKET_JITTER_CID_M message? What is the time period for the list of jitter measurements?
 * What does CID  mean? consecutive identical digits

+
+__Can we measure network usage for a specific VM from FLIPS monitoring?__
+__Some metrics from FLIPS contain 'port' label, others not, is this intended?__
+
+
+QUESTIONS
+1. Is the content navigation tag and fully qualified domain name (SDN based)? [Most likely: yes] although this may only be part of the URL?
+
 #### Link Measurements

 Links are established between VM/container instances, need to discuss what measurements make sense. Also the context for links could be between media services, therefore a link measurement should be within the platform context and NOT the media service context. Need a couple of scenarios to work this one out.