update to docs

8d5d1f99 · Michael Boniface · 0f47ec01 · 8d5d1f99
Commit 8d5d1f99 authored 7 years ago by Michael Boniface
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -334,7 +334,7 @@ All of the measurements on a surrogate share a common context that includes tag

 * sfc – an orchestration template
 * sfc_i – an instance of the orchestration template
-* sf_package – a SF type
+* sf – a SF type
 * sf_i – an instance of the SF type
 * surrogate – an authoritive copy of the SF instance either VM or container
 * server – a physical or virtual server for hosting VM or container instances
@@ -414,7 +414,7 @@ The pub/sub protocol still needs some work as we don't want the CLMC to have to

 |Decision Context|Measurement|Description
 |---|---|---|
-|Capacity|host_resource|the compute infrastructure slice allocation to the platform|
+|Capacity|compute_node_config|the compute infrastructure slice allocation to the platform|
 |Capacity|network_resource|the network infrastructure slice allocation to the platform|
 |Platform|topology_manager|specific metrics tbd|
 |Media Service|sfc_config|specific metrics tbd|
@@ -442,23 +442,31 @@ The pub/sub protocol still needs some work as we don't want the CLMC to have to

 ## Capacity Measurements 

-Capacity measurements measure the size of the infrastructure slice available to the platform that can be allocated on demand to tenants.
+Capacity measurements measure the size of the infrastructure slice available to the platform that can be allocated on demand to tenants. 
+
+*What is the format of the infrastructure slice and what data is available?*

 Common tags

 * slice_id – an idenfication id for the tenant infrastructure slice within openstack

-**host_resource**
+**compute_node_config**
+
+The *compute_node_config* measurement measures the wholesale host resources available to the platform that can be allocated to media services.
+
+`compute_node_config,slice_id,server_id,location cpu,memory,storage timestamp`
+
+**network_config**

-The *host_resource* measurement measures the wholesale host resources available to the platform that can be allocated to media services.
+network_config measures the overall capacity of the network available to the platform for allocation to tenants. There are currently no metrics defined for this in the FLIPS monitoring specification, although we can envisage usage metrics such as bandwidth being part of this measurement.

-`host_resource,slice_id,server_id,location cpu,memory,storage timestamp`
+`network_config,slice_id,network_id, bandwidth,X,Y,Z timestamp`

-**network_resource**
+**network_interface_config**

-network_resource measures the overall capacity of the network available to the platform for allocation to tenants. There are currently no metrics defined for this in the FLIPS monitoring specification, although we can envisage usage metrics such as bandwidth being part of this measurement.
+network_interface_config measures the connection bewteen a compute node and a network along with any constaints on that connection.  

-`network_resource,slice_id,network_id, bandwidth,X,Y,Z timestamp`
+`network_interface_config,comp_node_id,port_id rx_constraint,tx_constraint timestamp`

 ## Platform Measurements 

@@ -468,11 +476,7 @@ Platform measurements measure the configuration, usage and performance of platfo

 **nap**

-nap measurements are the platforms view on IP endpoints such as user equipment and services. A NAP is therefore the boundary of the platform. NAP also measures aspects of multicast performance 
-
-NAP multicast metrics that require further understanding
-
-Fields
+nap measurements are the platforms view on IP endpoints such as user equipment and services. A NAP is therefore the boundary of the platform. NAP also measures aspects of multicast performance. NAP multicast metrics that require further understanding although NAPs contribution towards understanding the source of requests is important in decisions regarding the placement of surrogates. The following fields require some clarification:

 * CHANNEL_AQUISITION_TIME_M 
 * CMC_GROUP_SIZE_M
@@ -483,29 +487,11 @@ Fields

 NAP data usage measurement

-`nap_data_io,node_id,ip_version <fields> timestamp`
-
-Fields
-
-* RX_BYTES_HTTP_M
-* TX_BYTES_HTTP_M 
-* RX_PACKETS_HTTP_M 
-* TX_PACKETS_HTTP_M
-* RX_BYTES_IP_M
-* TX_BYTES_IP_M 
-* RX_BYTES_IP_MULTICAST_M
-* TX_BYTES_IP_MULTICAST_M 
-* RX_PACKETS_IP_MULTICAST_M 
-* TX_PACKETS_IP_MULTICAST_M 
+`nap_data_io,node_id,ip_version RX_BYTES_HTTP_M,TX_BYTES_HTTP_M,RX_PACKETS_HTTP_M,TX_PACKETS_HTTP_M, RX_BYTES_IP_M, TX_BYTES_IP_M, RX_BYTES_IP_MULTICAST_M, TX_BYTES_IP_MULTICAST_M, RX_PACKETS_IP_MULTICAST_M, TX_PACKETS_IP_MULTICAST_M timestamp`

 NAP service request and response metrics

-`nap_fqdn_perf,<common_tags>,cont_nav=FQDN <fields> timestamp`
-
-Fields
-
-* HTTP_REQUESTS_FQDN_M
-* NETWORK_FQDN_LATENCY
+`surrogate_route_mon,node_id,cont_nav=FQDN HTTP_REQUESTS_FQDN_M, NETWORK_FQDN_LATENCY timestamp`

 **clmc**

@@ -547,51 +533,21 @@ Common tags

 * sfc – an orchestration template
 * sfc_i – an instance of the orchestration template
-* sf_pack – a SF package identifier indicating the type and version of SF
+* sf – a SF package identifier indicating the type and version of SF
 * sf_i – an instance of the SF type
 * surrogate – an authoritive copy of the SF instance either a container or VM
 * server – a physical or virtual server for hosting nodes instances
 * location – the location of the server

-#### Network Measurements
-
-**net_port_config**
-
-network config is concerned with any network io allocation/constraints for network rx/tx
-
-`net_port_config,<common_tags>,port_id,port_state <fields> timestamp`
-
-Possible fields (but these are not available from the FLIPS monitoring specification)
-
-* RX_USAGE_CONSTRAINT
-* TX_USAGE_CONSTRAINT
-* RX_THROUGHPUT_CONSTRAINT
-* TX_THROUGHPUT_CONSTRAINT
-
-Specific tags
-* port_state
-* port_id
-
-**net_port_io**
-
-All net_port_io measurements are monitoring by FLIPS
-
-`net_port_io,<common_tags>,port_id PACKET_DROP_RATE_M, PACKET_ERROR_RATE_M, RX_PACKETS_M, TX_PACKETS_PORT_M, RX_BYTES_PORT_M, TX_BYTES_PORT_M  timestamp`
-
-Specific tags
-* port_id
-
-Note that RX_PACKETS_M seems to have inconsistent naming convention. 
-
-#### VM Measurements
+#### Surrogate Measurements

 SF Host Resource Measurements measures the host resources allocated to a service function deployed by the platform. All measurements have the following global tags to allow the data to be sliced and diced according to dimensions.

-**vm_config**
+**sf_instance_surrogate_config**

 The resources allocated to a VM/Container

-`vm_res_alloc,<common_tags>,vm_state cpu,memory,storage timestamp`
+`sf_instance_surrogate_config,<common_tags>,vm_state cpu,memory,storage timestamp`

 Specific tags
 * vm_state
@@ -650,6 +606,27 @@ https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/SYSTEM_

 `system,<common_tags>,host load1,load5,load15,n_users,n_cpus timestamp`

+#### Network Measurements
+
+**sf_instance_surrogate_net_port_config**
+
+network config is concerned with any network io allocation/constraints for network rx/tx. Possible fields (but these are not available from the FLIPS monitoring specification)
+
+`net_port_config,<common_tags>,port_id,port_state RX_USAGE_CONSTRAINT, TX_USAGE_CONSTRAINT, RX_THROUGHPUT_CONSTRAINT, TX_THROUGHPUT_CONSTRAINT timestamp`
+
+Specific tags
+* port_state
+* port_id
+
+**sf_instance_surrogate_net_port_io**
+
+All net_port_io measurements are monitoring by FLIPS. Note that RX_PACKETS_M seems to have inconsistent naming convention unless we are mistaken
+
+`net_port_io,<common_tags>,port_id PACKET_DROP_RATE_M, PACKET_ERROR_RATE_M, RX_PACKETS_M, TX_PACKETS_PORT_M, RX_BYTES_PORT_M, TX_BYTES_PORT_M  timestamp`
+
+Specific tags
+* port_id
+
 #### Service Measurements

 **<prefix>_service_config**
@@ -662,7 +639,7 @@ Fields
 Specific Tags
 * service_state

-**<prefix>_service_perf**
+**<prefix>_service_mond**

 Each SF developed will measure service specific usage and performance measurements.

@@ -682,15 +659,18 @@ Specific Tags
 * cont_rep: the content representation requested
 * user: a user profile classification 
 * 
-# **Worked Usage Scenario - MPEG-DASH**

-## **CLMC Use Case Scenario**
+**Worked Usage Scenario - MPEG-DASH**

-The following scenario aims to verify two aspects
+The  scenario aims to verify two aspects

 * CLMC monitoring specification & data acquisition
 * Support for initial decision making processes for FLAME (re)orchestration

+The scenario is being developed in further document here:
+
+https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/blob/integration/docs/CLMC%20monitoring%20specification%20for%20a%20basic%20scenario.md
+
 The FLAME platform acquires a slice of the infrastructure resources (compute, RAM & storage [C1, C2, C3] and networking). A media service provider offers an MPEG-DASH service to end-users (via their video clients connected to NAPs on the FLAME platform). The service provider deploys surrogates of the MPEG-DASH service on all compute nodes [C1-C3]. All services (including NAPs) are monitored by the CLMC.

 Over time a growing number of video clients use a MPEG-DASH service to stream movies on demand. As clients connect and make requests, the platform makes decisions and takes actions in order to maintain quality of service for the increasing number of clients demanding an MPEG-DASH service.
@@ -720,31 +700,10 @@ Service actions

 * Lower overall service quality to clients… reduce overall resource usage

-Goal: Explore QoE under two different resource configurations
-
-KPI targetrs over a 1 hr period
-
 * Avg quality met: the ratio of average delivered quality out of requested quality
 * Avg start up time: the average time taken before a video stream starts playing less than a threshold
 * Avg video stalls: the percentage of stalls (dropped video segments that require re-sending) less than a threshold

-Configuration Measurements
-
-`vm_res_alloc,<common_tags>,vm_state=placed cpu=1,memory=2048,storage=100G timestamp`
-`vm_res_alloc,<common_tags>,vm_state=booted cpu=1,memory=2048,storage=100G timestamp`
-`vm_res_alloc,<common_tags>,vm_state=connected cpu=1,memory=2048,storage=100G timestamp`
-`net_port_config,<common_tags>,port_id=enps03,port_state=up RX_USAGE_CONSTRAINT=500G,TX_USAGE_CONSTRAINT=500G timestamp`
-`mpegdash_service_config,service_state=running connected_clients=10 timestamp`
-
-Monitoring Measurements
-
-`mpegdash_service,<common_tags>,cont_nav=url,cont_rep=video_quality requests=100,response_time=200mS,peak_response_time=5s timestamp`
-`cpu_usage,<common_tags>,cpu cpu_usage_user,cpu_usage_system timestamp`
-`network_io,<common_tags>,port_id PACKET_DROP_RATE_M, PACKET_ERROR_RATE_M, RX_PACKETS_M, TX_PACKETS_PORT_M, RX_BYTES_PORT_M, TX_BYTES_PORT_M  timestamp`
-
-Start-up time delay: 
-Video stalls: 
-
 # **MISC Measurements and Further Questions**

 The following data points require further analysis