* Allocation (Media Service Instance, Service Function Instance, Surrogate Instance)
* Basic State (up, down, etc.)
...
...
@@ -29,14 +30,13 @@ Briefly descirbe:
* the lifecycle of monitoring data within the platform and how it is used
* the type of monitoring data
Monitoring includes usage metrics
Usage metrics
* network resource usage
* host resource usage
* service usage
Monitoring includes performance metrics:
Performance metrics:
* cpu/sec
* throughput
...
...
@@ -140,7 +140,7 @@ All of the measurements on a specific VM/Container instance share a common conte
* server – a physical or virtual server for hosting VM instances
* location – the location of the server
By including this context with service, network and host measurements it is possible to support a wide range of queries associated with SFC’s whether they are Media Services or the Platform components themselves. By adopting the same convention for identifiers it is possible to combine measurements across service, network and host to create new series that allows exploration of diffeent aspects of the VM instance.
By including this context with service, network and host measurements it is possible to support a wide range of temporal queries associated with SFC’s whether they are Media Services or the Platform components . By adopting the same convention for identifiers it is possible to combine measurements across service, network and host to create new series that allows exploration of different aspects of the VM instance.
Give a worked example across service and network measurements
...
...
@@ -154,13 +154,13 @@ Discuss specific tags
### Architecture
The monitoring model using an agent based approach. The general architecture is shown in the diagram below.
The monitoring model uses an agent based approach with hierarchical aggregation used as required. The general architecture is shown in the diagram below.
An agent is deployed on each of the container/VM implementing a SF. The agent is deployed by the orchestrator when the SF is provisioned. The agent is configured with a set of input plugins that collect measurements from three aspects of the SF including network, host and SF usage/perf. The agent is configured with a set of global tags that are inserted for all measurements made by the agent on the host.
For monitoring a service function, an agent is deployed on each of the container/VM implementing a SF. The agent is deployed by the orchestrator when the SF is provisioned. The agent is configured with a set of input plugins that collect measurements from three aspects of the SF including network, host and SF usage/perf. The agent is configured with a set of global tags that are inserted for all measurements made by the agent on the host.
Telegraf agent-based monitoring
Telegraf agent-based monitoring with the following plugins potentially relevant for integration with FLAME
If the agent is deployed in a VM/container that a tenant has root access then a tenant could change the configuration to fake measuremnents associated with network and host in an attempt gain benefit. This is a security risk. Some ideas include
If the agent is deployed in a VM/container that a tenant has root access then a tenant could change the configuration to fake measurements associated with network and host in an attempt gain benefit. This is a security risk. Some ideas include
* Deploy additional agents on hosts rather than agents to measure network and VM performance. Could be hard to differentiate between the different SFs deployed on a host
* Generate a hash from the agent configuration file that's checked within the monitoring message. Probably too costly and not part of the telegraf protocol
* Use unix permissions (e.g. surrogates are deployed within root access to them)
## Configuration Measurements
## Configuration Measurement Summary
|Context|Measurement|Description
|---|---|---|
|Capacity|host_resource|the compute infrastructure allocation to the platform|
|Capacity|network_resource|the network infrastructure allocation to the platform|
|Platform|topology_manager|tbd|
|Media Service|sfc_config|tbd|
|Media Service|sf_config|tbd|
|Platform|topology_manager|specific metrics tbd|
|Media Service|sfc_config|specific metrics tbd|
|Media Service|sf_config|specific metrics tbd|
|Media Service|vm_host_config|compute resources allocated to a VM|
|Media Service|net_port_config|networking constraints on port on a VM|
## Monitoring Measurements
*Need to refer to TOSCA here*
## Usage and Performance Measurement Summary
|Context|Measurement|Description
|---|---|---|
|Platform|nap_data_io|nap data io at byte, ip and http levels|
|Platform|nap_fqdn_perf|fqdn request rate and latency|
|Platform|orchestrator|tbd|
|Platform|clmc|tbd|
|Media Service|cpu_usage|vm desc|
|Media Service|disk_usage|vm desc|
|Media Service|disk_IO|vm desc|
|Media Service|kernel_stats|vm desc|
|Media Service|memory_usage|vm desc|
|Media Service|process_status|vm desc|
|Media Service|swap_memory_usage|vm desc|
|Media Service|system_load_uptime|vm desc|
|Platform|orchestrator|specific metrics tbd|
|Platform|clmc|specific metrics tbd|
|Media Service|cpu_usage|vm metrics|
|Media Service|disk_usage|vm metrics|
|Media Service|disk_IO|vm metrics|
|Media Service|kernel_stats|vm metrics|
|Media Service|memory_usage|vm metrics|
|Media Service|process_status|vm metrics|
|Media Service|swap_memory_usage|vm metrics|
|Media Service|system_load_uptime|vm metrics|
|Media Service|net_port_io|vm port network io and error at L2|
|Media Service|service|vm service perf|
|Media Service|service|vm service perf metrics|
#### Infrastructure Capacity Measurements
## Capacity
Capacity measurements measure the size of the infrastructure slice available to the platform that can be allocated on demand to tenants.
...
...
@@ -252,9 +258,9 @@ network_resource measures the overall capacity of the network available to the p
Platform measurements measure the usage and performance of platform components.
Platform measurements measure the configuration, usage and performance of platform components.
**topology_manager**
...
...
@@ -301,11 +307,19 @@ Fields
**clmc**
#### Media Service Measurements
## Media Service
**media_service**
Media service measurements measure the configuration, usage and performance of mediaservice instances deployed by the platform.
Aggregate measurement derived from VM/container measurements, most likely calculated using a continuous query of a specific time interval
### Service Function Chain
**sfc_config**
tbd
**sf_config**
tbd
**sfc**
...
...
@@ -323,7 +337,9 @@ Aggregate measurement derived from VM/container measurements, most likely calcul
Aggregate measurement derived from VM/container measurements, most likely calculated using a continuous query of a specific time interval
#### VM/Container Measurements
### VM/Container Measurements
VM/Container Measurements measure the configuration, usage and performance of VM/Container instances deployed by the platform within the context of a media service.
Common tags
...
...
@@ -335,7 +351,7 @@ Common tags
* server – a physical or virtual server for hosting VM instances
* location – the location of the server
##### Network Measurements
#### Network Measurements
**net_port_config**
...
...
@@ -369,7 +385,7 @@ Fields
Note that RX_PACKETS_M seems to have inconsistent naming convention.
##### VM Host Measurements
#### VM Host Measurements
SF Host Resource Measurements measures the host resources allocated to a service function deployed by the platform. All measurements have the following global tags to allow the data to be sliced and diced according to dimensions.
...
...
@@ -415,7 +431,7 @@ Specific tags
[[inputs.system]]
##### Service Measurements
#### Service Measurements
**<prefix>_service_config**
...
...
@@ -445,17 +461,17 @@ Specific Tags
* cont_rep: the content representation requested
* user: the pseudonym of an individual user or a user classification
##### MISC Measurements and Questions
# MISC Measurements and Questions
The following data points require further analysis
* CPU_UTILISATION_M: will be replaced by other metrics provided directly by Telegraf plugins
* END_TO_END_LATENCY_M (not clear what this measurement means)
* CPU_UTILISATION_M: likely to be replaced by other metrics provided directly by Telegraf plugins
* END_TO_END_LATENCY_M: not clear what this measurement means, so needs clarification
* BUFFER_SIZES_M: needs clarification
* RX_PACKETS_IP_M: is this just NAP or all Nodes
* TX_PACKETS_IP_M: is this just NAP or all Nodes
The following fields need further analysis as they seem to relate to core ICN
The following fields need further analysis as they seem to relate to core ICN, most likely fields/measurements related to platform components
* FILE_DESCRIPTORS_TYPE_M
* MATCHES_NAMESPACE_M
...
...
@@ -465,18 +481,18 @@ The following fields need further analysis as they seem to relate to core ICN
The following fields relate to CID which I don't understand but jitter is an important metric so we need to find out.
* Can a single value of jitter (e.g. avg jitter) be calculated from the set of measurements in PACKET_JITTER_CID_M message? What is the time period for the list of jitter measurements?
* What does CID mean? consecutive identical digits
* PACKET_JITTER_CID_M
* RX_BYTES_CID_M
* TX_BYTES_CID_M
What about links? What about links between different media service nodes
Some questions
* Can a single value of jitter (e.g. avg jitter) be calculated from the set of measurements in PACKET_JITTER_CID_M message? What is the time period for the list of jitter measurements?
* What does CID mean? consecutive identical digits
#### Link Measurements
links are established between VM/container instances, need to discuss what measurements make sense. Also the context for links could be between media services, therefore a link measurement should be within the platform context and NOT the media service context. Need a couple of scenarios to work this one out.
Links are established between VM/container instances, need to discuss what measurements make sense. Also the context for links could be between media services, therefore a link measurement should be within the platform context and NOT the media service context. Need a couple of scenarios to work this one out.
**link_config**
...
...
@@ -489,4 +505,6 @@ Link Tags
* link_type
* link_state
**link_perf**
\ No newline at end of file
**link_perf**
link perf is measured at the nodes, related to end_to_end_latency. Needs further work.