Skip to content
Snippets Groups Projects
Commit 09f16d7a authored by Michael Boniface's avatar Michael Boniface
Browse files

update to docs

parent 2f5626c3
No related branches found
No related tags found
No related merge requests found
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
<!--
// © University of Southampton IT Innovation Centre, 2017
//
// Copyright in this software belongs to University of Southampton
// IT Innovation Centre of Gamma House, Enterprise Road,
// Chilworth Science Park, Southampton, SO16 7NS, UK.
//
// This software may not be used, sold, licensed, transferred, copied
// or reproduced in whole or in part in any manner or form or in or
// on any media by any person other than in accordance with the terms
// of the Licence Agreement supplied with the software, or otherwise
// without the prior written consent of the copyright owners.
//
// This software is distributed WITHOUT ANY WARRANTY, without even the
// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
// PURPOSE, except where stated in the Licence Agreement supplied with
// the software.
//
// Created By : Michael Boniface
// Created Date : 18-12-2017
// Created for Project : FLAME
-->
# **FLAME CLMC Information Model Specification**
This document describe the configuration and monitoring specification for cross-layer management and control within the FLAME platform. All information measured by the CLMC aims to improve management and control decisions made by the platform against defined performance criteria such as increasing QoE or cost reduction.
This document describe the configuration and monitoring specification for cross-layer management and control within the FLAME platform. All information measured by the CLMC aims to improve management and control decisions made by the platform against defined performance criteria such as increasing Quality of Experience and cost reduction.
## **Authors**
|Authors|Organisation|
|-|-|
|[Michael Boniface](mailto:mjb@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
|[Simon Crowle](mailto:sgc@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
## **Service Management and Control Decisions**
......@@ -8,7 +38,7 @@ Service management decisions relate to processes for Service Request Management,
The management and control processes implemented by the FLAME platform define the decisions, variability and constraints. The detail for the implementation of orchestration, management and control is under discussion and the following is based on a best understanding of what was described in the FLAME architecture.
### Static configuration
### **Static Configuration**
The 1st scenario to consider is an entirely static configuration. In this scenario a media service provider defines explicitly the infrastructure resources needed for a media service. The static configuration is what is proposed by the adopted of the current TOSCA specification for the Alpha release. Here an MSP declares the resources needed to deliver a desired performance level (implicitly known to the MSP). In an extreme case, the approach results in a static infrastructure configuration where the MSP defines the entire topology including servers, links and resource requirements. This would include server locations (central, metro and edge DCs) and when the servers are needed. The most basic case is deploy everything now for the lifetime of the media service. This full declaration would statically configure surrogates through the explicit declaration of servers and software deployed on those services.
......@@ -44,7 +74,7 @@ Issues related to this decision include:
*D2 CLMC outcome: a set of measurements describing media service infrastructure requirements.*
### Variable Configuration
### **Variable Configuration**
Variable configuration identifies configuration state that can change in the lifetime of a media service. Variability in configuration state includes:
......@@ -133,71 +163,38 @@ The variability factors that influence E2E latency include:
* Server placement and server resource allocation/contention over time
* Network routing and network resource allocation/contention over time
Issues related to this decision include:
* How are NAP routing decisions coordinated?
## **CLMC Use Case Scenario**
The following scenario aims to verify two aspects
* CLMC monitoring specification & data acquisition
* Support for initial decision making processes for FLAME (re)orchestration
The FLAME platform acquires a slice of the infrastructure resources (compute, RAM & storage [C1, C2, C3] and networking). A media service provider offers an MPEG-DASH service to end-users (via their video clients connected to NAPs on the FLAME platform). The service provider deploys surrogates of the MPEG-DASH service on all compute nodes [C1-C3]. All services (including NAPs) are monitored by the CLMC.
Over time a growing number of video clients use a MPEG-DASH service to stream movies on demand. As clients connect and make requests, the platform makes decisions and takes actions in order to maintain quality of service for the increasing number of clients demanding an MPEG-DASH service.
What are the possible criteria (based on metrics and analytics provided by the CLMC) that could be used to help NAP makes these decisions?
In this scenario what are the possible actions a NAP could take?
![Scenario](/docs/image/scenario.jpg)
![Scenario Measurements](/docs/image/scenario-measurements.jpg)
* How are NAP routing decisions coordinated as requests are not sourced from a central point?
Platform actions
* Increase the resources available to MPEG-DASH surrogates
* This may not be possible if resources unavailable
* Vertical scaling may not solve the problem (i.e., I/O bottleneck)
* Re-route client requests to other MPEG-DASH services
* C1 – Closer to clients, but limited capability
* C3 – Greater capability but further away from clients
… note: NAP service end-point re-routing will need to take into account network factors AND compute resource availability related service KPIs; i.e., end-to-end performance
Service actions
## **Information Model**
* Lower overall service quality to clients… reduce overall resource usage
This section provides an overview of the FLAME CLMC information model in support of service management and control decisions. The information model is designed to support the exploration and understanding of state and factors contributing to changes in state over time as shown in the primitive below:
![Configuration Principle](/docs/image/configuration-principle.jpg)
## **Information Model**
The system (infrastructure, platform and media services) are composed of a set of configuration items that transition between different states during the lifecycle of the system. Configuration items of interest include significant components who's state change influence the response of the system. In general, the information aims to support the process of:
This section provides an overview of the FLAME CLMC information model considering the following elements:
* Identification of significant configuration items within the system
* Assertion of state using configuration measurements
* Measurement of response (monitoring measurements)
* Support for taking action (configuration measurements)
* Media Service
* Configuration
* Monitoring
* Information Security
* Privacy
This process is implemented in accordance with information security and privacy constraints. The following sections provides an overview of key aspects of monitoring.
### Media Service (https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/issues/2)
The FLAME architecture defines a media services as "An Internet accessible service supporting processing, storage and retrieval of content resources hosted and managed by the FLAME platform". A media service consists of 1 or more media components (also known as Service Functions) that together are composed to create an overall Service Function Chain. SFs are realised through the instantiation of virtual machines (or containers) based on resource management policy. Multiple VMs may be instantiated for each SF to create surrogate SFs to balance load and deliver against performance targets.
The FLAME architecture defines a media services as "An Internet accessible service supporting processing, storage and retrieval of content resources hosted and managed by the FLAME platform". A media service consists of 1 or more media components (also known as Service Functions) that together are composed to create an overall Service Function Chain. SFs are realised through the instantiation of virtual machines (or containers) deployed on servers based on resource management policy. Multiple VMs may be instantiated for each SF to create surrogate SFs, for example, to balance load and deliver against performance targets. Media Services, SFCs, SFs, VMs, links and servers are all examples of configuration items.
Media services are described using a template structured according to the TOSCA specification (http://docs.oasis-open.org/tosca/TOSCA/v1.0/TOSCA-v1.0.html). A TOSCA template includes all of the information needed for the FLAME orchestrator to instantiate a media service. This includes all SF's, links between SFs server resource configuration information. The Alpha version of the FLAME platform is based on the current published TOSCA specification. Future developments will extend the TOSCA specification (known as TOSCA++) to meet FLAME requirements such higher level KPIs and location-based constraints.
Media services are described using a template structured according to the TOSCA specification (http://docs.oasis-open.org/tosca/TOSCA/v1.0/TOSCA-v1.0.html). A TOSCA template includes all of the information needed for the FLAME orchestrator to instantiate a media service. This includes all SF's, links between SFs and resource configuration information. The Alpha version of the FLAME platform is based on the current published TOSCA specification. Future developments will extend the TOSCA specification (known as TOSCA++) to meet FLAME requirements such as higher level KPIs and location-based constraints.
The current TOSCA template provides the initial structure of the Media Service information model through specified service and resource configuration. Within this structure, system components are instantiated whose runtime characteristics are measured to inform management processes. Measurements relate to individual SF's as well as aggregated measurements structured according the context of deployment (e.g. media service, platform, etc). Measurements are made by monitoring processes deployed with system components.
The current TOSCA template provides the initial structure of the Media Service information model through specified service and resource configuration. Within this structure, system components are instantiated whose runtime characteristics are measured to inform management and control processes. Measurements relate to individual SF's as well as aggregated measurements structured according the structure of configured items within the system. Measurements are made by monitoring processes deployed with system components. The configured items provide the context for monitoring.
The media information model in relation to the high-level media service lifecycle is shown in the diagram below. The lifecycle includes processes for packaging, orchestration, routing and SF management/control. Each stage in the process creates context for decisions and measurements within the next stage of the lifecycle. Packaging creates the context for orchestration, orchestration creates the context for surrogate instantiation, and network topology management. In the diagram, the green concepts identify the context that is used for filtering and queries whilst the yellow concepts are the measurement data providing runtime measurements.
The media information model in relation to the high-level media service lifecycle is shown in the diagram below. The lifecycle includes processes for packaging, orchestration, routing and SF management/control. Each stage in the process creates context for decisions and measurements within the next stage of the lifecycle. Packaging creates the context for orchestration, orchestration creates the context for surrogate instantiation, and network topology management. In the diagram, the green concepts identify the context which can be used for filtering and queries whilst the yellow concepts are the measurement data providing runtime measurements.
![FLAMEContext](/docs/image/flame-context.jpg)
The primary measurement point for a media service is a surrogate. A surrogate is an instantation of a service function within a VM or container on a server. A surrogate exists within two main contexts: media service and virutal infrastructure. The media service context relates to the use of the surrogate within a service function chain designed to deliver content. The virtual infrastructure context relates to the host and network environment into which the surrogate is deployed. Deploying monitoring agents in different contexts and sharing information between contexts is a key part of cross-layer management and control.
The primary measurement point for a media service is a surrogate. A surrogate is an instantation of a service function within a VM or container on a server. A surrogate exists within two main contexts: media service and virtual infrastructure. The media service context relates to the use of the surrogate within a service function chain designed to deliver content. The virtual infrastructure context relates to the host and network environment into which the surrogate is deployed. Deploying monitoring agents in different contexts and sharing information between contexts is a key part of cross-layer management and control.
The diagram highlights the need to monitor three views on a surrogate: network, host, and service. The acquisition of these different views together are a key element of the cross-layer information required for management and control. The measurements are captured by different processes running on servers but are brought together by common context allowing the information to be integrated, correlated and analysed. The surrogate can measure a service view related to the content being delivered such as request rates, content types, etc, a VM can measure a virtual infrastructure view of a single surrogate, and the server view can measure an infrastructure view across multiple surrogates deployed on a server. These monitoring processes running on the server are managed by different stakeholders, for example, the platform operator would monitor servers, where as the media service provider would monitor service specific usage.
......@@ -207,20 +204,9 @@ Not all information acquired will be aggregated and stored within the CLMC. The
### Configuration (https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/issues/3)
Configuration information describes the structure of the system over time. Configuration information can include:
Configuration information describes the structure and state of the system over time. Each configuration item has a lifecycle that defines configuration states and events that cause a transition between states. The following table gives examples of configuration items and states.
|Component|Description|Examples|
|---|---|---|
|Capacity|total resource available to the platform|servers, networks, IPs|
|Topology|connections between resources|physical or virtual links between servers|
|Service|processes deployed on servers|database, webservice|
|Resource Allocation|usage constraint applied to service|cpus, memory, data IO, IP address range)|
|KPI|performance target for service, network or server|utilisation, response time, startup delay|
|State|any state of a service, network or server throughout it's lifecycle|port up, port down, sf placed, sf booted|
Each system component has a configuration lifecycle that defines configuration states and transistions between states. The following table gives examples of system components and states.
|Component|Configuration States|
|Configuration Item|Configuration States|
|---|---|
|Network|e.g. available, unavailable|
|Physical Link|up, down, unknown|
......@@ -228,10 +214,12 @@ Each system component has a configuration lifecycle that defines configuration s
|Port|up, down, unknown|
|Service function package|published, unpublished|
|Media service template|published, unpublished|
|Service function chain|starting, running, stopping, stopped, error|
|Service function chain|submitted, scheduled, starting, running, stopping, stopped, error|
|Service function|starting, running, stopping, stopped, error|
|Surrogate|placed, unplaced, booted, connected, error|
*The state of configuration items needs to be defined*
*Describe the failure taxonomy*
### Monitoring (https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/issues/8)
......@@ -257,19 +245,19 @@ The measurement model is based on a time-series model defined by TICK stack from
Each series has:
* a name "measurement"
* 0 or more tags for configuration context
* 0 or more tags for measurement context
* 1 or more fields for the measurement values
* a timestamp.
The model is used to report both configuration and monitoring data. In general, tags are used to provide configuration context for measurement values stored in fields. The tags are structured to provide queries by KPIs and dimensions defined in the FLAME architecture.
Tags are automatically indexed by InfluxDB. Global tags can be automatically inserted by contexualised agents collecting data from monitoring processes. The global tags used across different measurements are a key part of the database design. Although, InfluxDB is schemaless database allowing arbirtary measurement fields to be stored (e.g. allowing for a media component to have a set of specific metrics), using common global tags allows the aggregation of measurements across time with common context.
Tags are automatically indexed by InfluxDB. Global tags can be automatically inserted by contexualised agents collecting data from monitoring processes. The global tags used across different measurements are a key part of the database design. Although, InfluxDB is schemaless database allowing arbirtary measurement fields to be stored (e.g. allowing for a media component to have a set of specific metrics), using common global tags allows the aggregation of measurements across time with a known context.
Although similar to SQL influx is not a relational database and the primary key for all measuremetns is time. Further schema design recommendations can be found here: https://docs.influxdata.com/influxdb/v1.4/concepts/schema_and_data_layout/
Although similar to SQL, InfluxDB is not a relational database and the primary key for all measuremetns is **time**. Schema design recommendations can be found here: https://docs.influxdata.com/influxdb/v1.4/concepts/schema_and_data_layout/
### Temporal Measurements (TBC Simon Crowle)
Monitoring data must have time-stamp values that are consistent and sycnrhonised across the platform. This means that all VMs hosting SFs should have a synchronised system clock, or at least (and more likely) a means by which an millisecond offset from the local time can be retrieved so that a 'platform-correct' time value can be calculated.
Monitoring data must have time-stamp values that are consistent and sychronised across the platform. This means that all VMs hosting SFs should have a synchronised system clock, or at least (and more likely) a means by which an millisecond offset from the local time can be retrieved so that a 'platform-correct' time value can be calculated.
*Describe approaches to integrate temporal measurements, time as a primary key, etc.*
......@@ -279,7 +267,7 @@ influx -precision rfc3339 : The -precision argument specifies the format/precisi
### Spatial Measurements (TBC Simon Crowle)
Location can be provided in two forms: labelled (tag) and numeric (longitude and latitude as digitial degrees). Note that the location label is likely to be a _global tag_.
Location can be represented in forms: labelled (tag) and numeric (longitude and latitude as digitial degrees). Note that the location label is likely to be a _global tag_.
Tag location
......@@ -287,7 +275,7 @@ Tag location
| --- | --- | --- |
| DATACENTRE_1 | 0 | 0 |
A SF media transcoder is placed in a lamp-post. It has no means to obtain GPS coordinates but has a _location_label_ provided to it as a VM environment variable. It provides zeros in the longitude and latitude. In subsequent data analysis we can search for this SF by location label.
A surrogate is placed on a server has has no means to obtain GPS coordinates but has a _location_label_ provided to it as a server context. It provides zeros in the longitude and latitude. In subsequent data analysis we can search for this SF by location label.
GPS coordination location
......@@ -299,7 +287,7 @@ A SF that is a proxy to a user attached to a NAP running in street lamp post LAM
Note that tags are always strings and cannot be floats, therefore log and lat will always be stored as a measurement field.
Integrating and analysing location measurements
*Discuss integrating and analysing location measurements*
If tags are used then measurements of GPS coordinates will need to be translated into tag based approximation. For example, if a user device is tracking location information then for that to be combined with a server location the GPS coordinate needs to be translated.
......@@ -307,19 +295,19 @@ Matching on tags is limited to matching and potentially spatial hierarchies (e.g
## **Decision Context**
Monitoring data is collected to support service design, management and control decisions. The link between decisions and data is through queries and rules applied to contextual information stored with measurement values.
Monitoring data is collected to support service design, management and control decisions resulting in state changes in configuration items. The link between decisions and data is through queries and rules applied to contextual information stored with measurement values.
![MeasurementContext](/docs/image/measurement-context.jpg)
Every measurement has a measurement context. The context allows for time-based series to be created according to a set of query criteria which are then be processed to calculate statistical data over the desired time-period for the series. For example, in the following query the measurement is avg_response_time, the context is “service A” and the series are all of the data points from now minus 10 minutes.
Every measurement has a measurement context. The context allows for time-based series to be created according to a set of query criteria which are then be processed to calculate statistical data over the desired time-period for the series. For example, in the following simple query the measurement is avg_response_time, the context is “service A” and the series are all of the data points from now minus 10 minutes.
`find avg response time for service A between over the last 10 minutes`
`find avg response time for service A over the last 10 minutes`
To support this query the following measurement would be created:
`service_response,service_id=(string) response_time=(float) timestamp`
`serviceA_monitoring,service_id=(string) response_time=(float) timestamp`
In the FLAME architeture we discuss at length the relationship between KPIs and dimensions, and implementations based on OLAP. In our current implementation, KPIs are calculated from measurement fields and dimensions are encoded within measurement tags. This is a lightweight implementation that will allow for a broad range of questions to be asked about the cross layer information acquired.
In the FLAME architeture we discuss at length the relationship between KPIs and dimensions, and implementations based on OLAP. In the current CLMC implementation, KPIs are calculated from measurement fields and dimensions are encoded within measurement tags. This is a lightweight implementation that will allow for a broad range of questions to be asked about the cross layer information acquired.
Designing the context for measurements is an important step in the schema design. This is especially important when measurements from multiple monitoring sources need to be integrated and processed to provided data for queries and decision. The key design principles adopted include:
......@@ -333,13 +321,13 @@ The following figure shows the general structure approach for two measurements A
![FLAMEMeasurements](/docs/image/flame-measurements.jpg)
The measurement model considers three monitoring views on the VM/Container instance with field values:
The measurement model considers three monitoring views on a surrogate with field values:
* service: specific metrics associated within the SF (either media component or platform component)
* network: data usage TX/RX, latency, jitter, etc.
* host: cpu, storage, memory, storage I/O, etc
All of the measurements on a specific VM/Container instance share a common context that includes tag values:
All of the measurements on a surrogate share a common context that includes tag values:
* sfc – an orchestration template
* sfc_i – an instance of the orchestration template
......@@ -349,9 +337,9 @@ All of the measurements on a specific VM/Container instance share a common conte
* server – a physical or virtual server for hosting VM or container instances
* location – the location of the server
By including this context with service, network and host measurements it is possible to support a wide range of temporal queries associated with SFC’s. By adopting the same convention for identifiers it is possible to combine measurements across service, network and host to create new series that allows exploration of different aspects of the VM instance, including cross-layer queries.
By including this context with service, network and host measurements it is possible to support range of temporal queries associated with SFC’s. By adopting the same convention for identifiers it is possible to combine measurements across service, network and host to create new series that allows exploration of different aspects of the VM instance, including cross-layer queries.
Give a worked example across service and network measurements based on the mpeg-dash service
*Give a worked example across service and network measurements based on the mpeg-dash service*
* Decide on the service management decisions and time scales
* Decide on the measurements of interest that are needed to make the decisions
......@@ -368,11 +356,16 @@ Give a worked example across service and network measurements based on the mpeg-
## **Architecture**
### General
The monitoring model uses an agent based approach with hierarchical aggregation used as required for different time scales of decision making. The general architecture is shown in the diagram below.
![AgentArchitecture](/docs/image/agent-architecture.jpg)
To monitoring a SF an agent is deployed on each of the container/VM implementing a SF. The agent is deployed by the orchestrator when the SF is provisioned. The agent is configured with a set of input plugins that collect measurements from the three viewpoints of network, host and service. The agent is configured with a set of global tags that are inserted for all measurements made by the agent on the host.
To monitor a SF an agent is deployed on each of the surrogates implementing a SF. The agent is deployed by the orchestrator when the SF is provisioned. The agent is configured with
* a set of input plugins that collect measurements from the three viewpoints of network, host and service
* a set of global tags that are inserted for all measurements made by the agent on the host.
* 1 or more output plugs for publishing aggregated monitoring data.
Telegraf offers a wide range of integration with relevant monitoring processes.
......@@ -391,7 +384,7 @@ Telegraf offers a wide range of integration for 3rd party monitoring processes:
* Telegraf http listener: https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/http_listener
* Telegraf Bespoke Plugin: https://www.influxdata.com/blog/how-to-write-telegraf-plugin-beginners/
The architecture considers hierarchical monitoring and scalability, for example, AMQP can be used to buffer monitoring information whilst InfluxDB can be used to provide intermediate aggregation points when used with Telegraf input and output plugin.
The architecture considers hierarchical monitoring and scalability, for example, AMQP can be used to buffer monitoring information whilst InfluxDB can be used to provide intermediate aggregation points when used with Telegraf input and output plugin.
### Integration with FLIPS Monitoring
......@@ -399,8 +392,8 @@ FLIPS offers a scalable pub/sub system for distributing monitoring data. The arc
* MOOSE and CLMC provide similar functions in the architecture, the CLMC will not have access to MOOSE but will need to subscribe to data points provided by FLIPS
* The APIs for Moly and Blackadder are not provided therefore it's not possible to critically understand the correct implementation approach for agents and monitoring data distribution
* Individual datapoints need to be aggregated into measurements
* It's likely that we'll have to use the blackadder API for distribution of monitoring data, replacing messaging systems such as RabbitMQ with all buffering and pub/sub deployed on the nodes themselves rather than a central service.
* Individual datapoints need to be aggregated into measurements according to a sample rate
* We may need to use the blackadder API for distribution of monitoring data, replacing messaging systems such as AMQP with all buffering and pub/sub deployed on the nodes themselves rather than a central service.
There are a few architectural choices. The first below uses moly as an integration point for monitoring processes via a Telegraf output plugin with data inserted into influx using a blackadder API input plugin on another Telegraf agent running on the CLMC. In this case managing the subscriptions to nodes and data points is difficult. In addition, some data points will be individual from FLIPS monitoring whilst others will be in line protocol format from Telegraf. For the FLIPS data points a new input plugin would be required to aggregate individual data points into time-series measurements.
......@@ -426,7 +419,7 @@ The pub/sub protocol still needs some work as we don't want the CLMC to have to
|Media Service|vm_host_config|compute resources allocated to a VM|
|Media Service|net_port_config|networking constraints on port on a VM|
## Monitoring
### Monitoring
|Decision Context|Measurement|Description
|---|---|---|
......@@ -442,7 +435,7 @@ The pub/sub protocol still needs some work as we don't want the CLMC to have to
|Media Service|process_status|vm metrics|
|Media Service|system_load_uptime|vm metrics|
|Media Service|net_port_io|vm port network io and error at L2|
|Media Service|service|vm service perf metrics|
|Media Service|surrogate|service usage and performance metrics|
## Capacity Measurements
......@@ -523,29 +516,29 @@ Media service measurements measure the configuration, usage and performance of m
**sfc_i_config**
tbd
`sfc_i_config,<common_tags>,state <fields> timestamp`
**sfc_i**
**sfc_i_monitoring**
Aggregate measurement derived from VM/container measurements, most likely calculated using a continuous query over a specific time interval
**sf_i_config**
tbd
`sf_i_config,<common_tags>,state <fields> timestamp`
**sf_i**
**sf_i_monitoring**
Aggregate measurement derived from VM/container measurements, most likely calculated using a continuous query over a specific time interval
Aggregate measurement derived from surrogate measurements, most likely calculated using a continuous query over a specific time interval
**nodes**
**surrogates**
Aggregate measurement derived from VM/container measurements, most likely calculated using a continuous query over a specific time interval
Aggregate measurement derived from surrogate measurements, most likely calculated using a continuous query over a specific time interval
`nodes,<common_tags>, placed, unplaced, booted, connected`
`surrogates,<common_tags>, placed, unplaced, booted, connected`
### VM/Container Measurements
### Surrogate Measurements
VM/Container Measurements measure the configuration, usage and performance of VM/Container instances deployed by the platform within the context of a media service.
Surrogate measurements measure the configuration, usage and performance of VM/Container instances deployed by the platform within the context of a media service.
Common tags
......@@ -587,15 +580,15 @@ Specific tags
Note that RX_PACKETS_M seems to have inconsistent naming convention.
#### VM Host Measurements
#### VM Measurements
SF Host Resource Measurements measures the host resources allocated to a service function deployed by the platform. All measurements have the following global tags to allow the data to be sliced and diced according to dimensions.
**vm_host_config**
**vm_config**
The resources allocated to a VM/Container
`node_res_alloc,<common_tags>,vm_state cpu,memory,storage timestamp`
`vm_res_alloc,<common_tags>,vm_state cpu,memory,storage timestamp`
Specific tags
* vm_state
......@@ -688,6 +681,42 @@ Specific Tags
# **Worked Usage Scenario - MPEG-DASH**
## **CLMC Use Case Scenario**
The following scenario aims to verify two aspects
* CLMC monitoring specification & data acquisition
* Support for initial decision making processes for FLAME (re)orchestration
The FLAME platform acquires a slice of the infrastructure resources (compute, RAM & storage [C1, C2, C3] and networking). A media service provider offers an MPEG-DASH service to end-users (via their video clients connected to NAPs on the FLAME platform). The service provider deploys surrogates of the MPEG-DASH service on all compute nodes [C1-C3]. All services (including NAPs) are monitored by the CLMC.
Over time a growing number of video clients use a MPEG-DASH service to stream movies on demand. As clients connect and make requests, the platform makes decisions and takes actions in order to maintain quality of service for the increasing number of clients demanding an MPEG-DASH service.
What are the possible criteria (based on metrics and analytics provided by the CLMC) that could be used to help NAP makes these decisions?
In this scenario what are the possible actions a NAP could take?
![Scenario](/docs/image/scenario.jpg)
![Scenario Measurements](/docs/image/scenario-measurements.jpg)
Platform actions
* Increase the resources available to MPEG-DASH surrogates
* This may not be possible if resources unavailable
* Vertical scaling may not solve the problem (i.e., I/O bottleneck)
* Re-route client requests to other MPEG-DASH services
* C1 – Closer to clients, but limited capability
* C3 – Greater capability but further away from clients
… note: NAP service end-point re-routing will need to take into account network factors AND compute resource availability related service KPIs; i.e., end-to-end performance
Service actions
* Lower overall service quality to clients… reduce overall resource usage
Goal: Explore QoE under two different resource configurations
KPI targetrs over a 1 hr period
......@@ -698,9 +727,9 @@ KPI targetrs over a 1 hr period
Configuration Measurements
`node_res_alloc,<common_tags>,vm_state=placed cpu=1,memory=2048,storage=100G timestamp`
`node_res_alloc,<common_tags>,vm_state=booted cpu=1,memory=2048,storage=100G timestamp`
`node_res_alloc,<common_tags>,vm_state=connected cpu=1,memory=2048,storage=100G timestamp`
`vm_res_alloc,<common_tags>,vm_state=placed cpu=1,memory=2048,storage=100G timestamp`
`vm_res_alloc,<common_tags>,vm_state=booted cpu=1,memory=2048,storage=100G timestamp`
`vm_res_alloc,<common_tags>,vm_state=connected cpu=1,memory=2048,storage=100G timestamp`
`net_port_config,<common_tags>,port_id=enps03,port_state=up RX_USAGE_CONSTRAINT=500G,TX_USAGE_CONSTRAINT=500G timestamp`
`mpegdash_service_config,service_state=running connected_clients=10 timestamp`
......@@ -710,14 +739,10 @@ Monitoring Measurements
`cpu_usage,<common_tags>,cpu cpu_usage_user,cpu_usage_system timestamp`
`network_io,<common_tags>,port_id PACKET_DROP_RATE_M, PACKET_ERROR_RATE_M, RX_PACKETS_M, TX_PACKETS_PORT_M, RX_BYTES_PORT_M, TX_BYTES_PORT_M timestamp`
Start-up time delay:
Video stalls:
Service Measurements
# **MISC Measurements and Questions**
# **MISC Measurements and Further Questions**
The following data points require further analysis
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment