Skip to content
Snippets Groups Projects
Commit c0c86678 authored by MJB's avatar MJB
Browse files

update to docs

parent 5c3e7bb6
No related branches found
No related tags found
No related merge requests found
......@@ -135,7 +135,7 @@ By including this context with service, network and host measurements it is poss
Give a worked example across service and network measurements
* Decide on the KPI of interest and how it's calculated from a series of measurements
* Decide on the measurement of interest and how it's calculated from a series of one or more other measurements (i.e. the function)
* Decide on time window for the series and sample rate
* Decide on interpolation approach for data points in the series
......@@ -194,21 +194,36 @@ If the agent is deployed in a VM/container that a tenant has root access then a
* Generate a hash from the agent configuration file that's checked within the monitoring message. Probably too costly and not part of the telegraf protocol
* Use unix permissions (e.g. surrogates are deployed within root access to them)
A couple of comments
* CPU_UTILISATION_M: will be replaced by other metrics provided directly by Telegraf plugins
* END_TO_END_LATENCY_M (not clear what this measurement means)
## Measurements
#### Infrastructure Slice Capacity Measurements
Capacity measurements measure the size of the infrastructure slice available to the platform that can be allocated on demand to tenants.
|Decision Context|Measurement|Description
|---|---|---|
|Capacity|host_resource|the compute infrastructure allocation to the platform|
|Capacity|network_resource|the network infrastructure allocation to the platform|
|Platform|topology_manager|tbd|
|Platform|nap_data_io|nap data io at byte, ip and http levels|
|Platform|nap_fqdn_perf|fqdn request rate and latency|
|Platform|orchestrator|---|
|Platform|clmc|---|
|Media Service|res_alloc|compute resources allocated to a VM|
|Media Service|network_io|vm port network io and error at L2|
|Media Service|service|vm service perf|
|Media Service|cpu_usage|vm desc|
|Media Service|disk_usage|vm desc|
|Media Service|disk_IO|vm desc|
|Media Service|kernel_stats|vm desc|
|Media Service|memory_usage|vm desc|
|Media Service|process_status|vm desc|
|Media Service|swap_memory_usage|vm desc|
|Media Service|system_load_uptime|vm desc|
#### Infrastructure Capacity Measurements
Capacity measurements measure the size of the infrastructure slice available to the platform that can be allocated on demand to tenants.
Common tags
* slice_id – an idenfication id for the infrastructure slice
* slice_id – an idenfication id for the tenant infrastructure slice within openstack
**host_resource**
......@@ -224,49 +239,50 @@ network_resource measures the overall capacity of the network available to the p
#### Platform Measurements
Platform measurements measure the usage and performance of platform components.
**topology_manager**
The following fields need further analysis as they seem to relate to core ICN and buffering. These do not seem that relevant
**nap**
* BUFFER_SIZES_M
* FILE_DESCRIPTORS_TYPE_M
* MATCHES_NAMESPACE_M
* PUBLISHERS_NAMESPACE_M
* SUBSCRIBERS_NAMESPACE_M
* PATH_CALCULATIONS_NAMESPACE_M
nap measurements are the platforms view on IP endpoints such as user equipment and services. A NAP is therefore the boundary of the platform. NAP also measures aspects of multicast performance
**nap**
NAP multicast metrics that require further understanding
nap measurements are the platforms view on IP endpoints such as user equipment and services. A NAP is therefore the boundary of the platform. NAP also measures aspects of co-incidental multicast performance
Fields
Questions
* CHANNEL_AQUISITION_TIME_M
* CMC_GROUP_SIZE_M
* What is the group id for CHANNEL_AQUISITION_TIME_M and how can this be related to FQDN of the content?
* what is the predefined time interval for CMC_GROUP_SIZE_M?
* How does NETWORK_LATENCY_FQDN_M relate to END_TO_END_LATENCY?
* How are multicast groups identified? i.e. "a request for FQDN within a time period", what's the content granularity here?
* HTTP_REQUESTS_FQDN_M says from an endpoint yet the measurement does not have a node id, it could be just the total number of requests for a FQDN, it which case it is very much like service request stats of a media service
* RX _BYTES_IP_MULTICAST_M
* TX_BYTES_IP_MULTICAST_M
* RX_PACKETS_HTTP_M
* TX_PACKETS_HTTP_M
NAP data usage measurement
`nap_data_io,node_id="",ip_version="" <fields> timestamp`
Fields
* RX_BYTES_HTTP_M
* TX_BYTES_HTTP_M
* RX_PACKETS_IP_MULTICAST_M
* TX_PACKETS_IP_MULTICAST_M
* RX_PACKETS_HTTP_M
* TX_PACKETS_HTTP_M
* RX_BYTES_IP_M
* TX_BYTES_IP_M
* RX_BYTES_IP_MULTICAST_M
* TX_BYTES_IP_MULTICAST_M
* RX_PACKETS_IP_MULTICAST_M
* TX_PACKETS_IP_MULTICAST_M
`nap_node,<global_tags>,nodeId="", CHANNEL_AQUISITION_TIME_M, timestamp`
NAP service request and response metrics
* Can we ignore the specific nodes and look at aggregate measurements associated with a multicast group?
* Here the assumption is that nodes are grouped around requests to access content idenified by hashes of FQDN.
`nap_fqdn_perf,<common_tags>,cont_nav=FQDN <fields> timestamp`
`nap_multicast,<global_tags>,groupId="",fqdn="" CHANNEL_AQUISITION_TIME_M, CMC_GROUP_SIZE_M, NETWORK_FQDN_LATENCY timestamp`
Fields
* CHANNEL_AQUISITION_TIME_M: avg time for all nodes in this group over sample period
* CMC_GROUP_SIZE_M: avg multicastgroup size over sample period
* NETWORK_FQDN_LATENCY: avg network latency over sample period
* HTTP_REQUESTS_FQDN_M
* NETWORK_FQDN_LATENCY
**orch_media_service**
......@@ -281,125 +297,132 @@ Questions
Common tags
* sfc – an orchestration template
* sfc_instance – an instance of the orchestration template
* sf_package – a SF type
* sf_instance – an instance of the SF type
* vm_instance – an authoritive copy of the SF instance
* sfc_i – an instance of the orchestration template
* sf_pack – a SF package identifier indicating the type and version of SF
* sf_i – an instance of the SF type
* vm_i – an authoritive copy of the SF instance (node_id)
* server – a physical or virtual server for hosting VM instances
* location – the location of the server
##### Network Measurements
Network Measurements measure aspects of network performance in relation to VMs/containers.
**node_network_perf**
node_network_perf provides the network measurement view for network elements. Network elements can be in the role of gateway, forwarding node, network attachment point, rendezvous, service, topology manager or user equipment as defined by the FLIPS monitoring specification. The measurements are made by the Mona monitoring agent.
Questions
* Can a single value of jitter (e.g. avg jitter) be calculated from the set of measurements in PACKET_JITTER_CID_M message? What is the time period for the list of jitter measurements?
* What does CID actually mean?
`node_network_perf,<global_tags>,node_role="",node_name="" timestamp`
* PACKET_JITTER_CID_M
* RX_BYTES_CID_M
* TX_BYTES_CID_M
* RX_PACKETS_IP_M (ipversion)
* TX_PACKETS_IP_M (ipversion)
Specific Tags:
**network_allocation**
* node_role
* name
* state
Possible Fields
**node_port_perf**
* bandwidth
The netnode_port series provides network measurements on host ports as defined by the FLIPS monitoring specification.
**network_io**
`port_network_perf,<global_tags>,node_id="",port_id="",port_name="" timestamp`
`network_io,<common_tags>,port_state="",port_id="" <fields> timestamp`
Fields
* PACKET_DROP_RATE_M
* PACKET_ERROR_RATE_M
* RX_PACKETS_M
* TX_PACKETS_PORT_M
* RX_BYTES_PORT_M
* TX_BYTES_PORT_M
* TX_PACKETS_PORT_M
**link**
The link series provides measurements about network links. Currently the FLIPS monitoring specification defines only topological configuration information and does not provide any measurements related to links. All performance information is included as part of the nodes. Further investigation is needed to understand if derived measurements related to links are needed or whether this is just useful for monitoring the temporal evolution of the topology.
Fields
* ??
* ??
Tags
* link_name
* link_id
* source_node_id
* destination_node_id
* link_type
Note that RX_PACKETS_M seems to have inconsistent naming convention.
##### Host Measurements
SF Host Resource Measurements measures the host resources allocated to a service function deployed by the platform. All measurements have the following global tags to allow the data to be sliced and diced according to dimensions.
**node_host_resource**
**node_resource**
The resources allocated to a VM/Container
`node_host_resource,<global-tags> cpu,memory,storage timestamp`
`res_alloc,<global-tags> cpu,memory,storage timestamp`
**node_cpu_usage**
**cpu_usage**
[[inputs.cpu]]
**node_disk_usage**
**disk_usage**
[[inputs.disk]]
**node_disk_IO**
**disk_IO**
[[inputs.diskio]]
**node_kernel_stats**
**kernel_stats**
[[inputs.kernel]]
**node_memory_usage**
**memory_usage**
[[inputs.mem]]
**node_process_status**
**process_status**
[[inputs.processes]]
**node_swap_memory_usage**
**swap_memory_usage**
[[inputs.swap]]
**node_system_load_uptime**
**system_load_uptime**
[[inputs.system]]
##### Service Measurements
Each SF developed will offer service specific usage and performance measurements.
Each SF developed will measure service specific usage and performance measurements. The following are provided as examples or common service metrics.
`service_request,<global_tags>,cont_nav="",cont_rep="",user_id="" <request-params> timestamp`
`service,<global_tags>,cont_nav="",cont_rep="",user="" <fields> timestamp`
Fields
`service_response,<global_tags>,cont_nav="",cont_rep="",user_id="" response_time timestamp`
* request_rate
* response_time
* peak_response_time
* error_rate
* throughput
Specific Tags
* cont_nav: the content requested
* cont_rep: the content representation requested
* user_id: the pseudonym of the user
* user: the pseudonym of an individual user or a user classification
##### MISC Measurements and Questions
The following data points require further analysis
* CPU_UTILISATION_M: will be replaced by other metrics provided directly by Telegraf plugins
* END_TO_END_LATENCY_M (not clear what this measurement means)
* BUFFER_SIZES_M: needs clarification
* RX_PACKETS_IP_M: is this just NAP or all Nodes
* TX_PACKETS_IP_M: is this just NAP or all Nodes
The following fields need further analysis as they seem to relate to core ICN
* FILE_DESCRIPTORS_TYPE_M
* MATCHES_NAMESPACE_M
* PATH_CALCULATIONS_NAMESPACE_M
* PUBLISHERS_NAMESPACE_M
* SUBSCRIBERS_NAMESPACE_M
The following fields relate to CID which I don't understand but jitter is an important metric so we need to find out.
* Can a single value of jitter (e.g. avg jitter) be calculated from the set of measurements in PACKET_JITTER_CID_M message? What is the time period for the list of jitter measurements?
* What does CID mean? consecutive identical digits
* PACKET_JITTER_CID_M
* RX_BYTES_CID_M
* TX_BYTES_CID_M
What do we do with the configuration states for nodes, links and ports? Do we put this as tags or are these separate series.
Link Tags
* link_name
* link_id
* source_node_id
* destination_node_id
* link_type
* link_state
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment