#### Configuration and Monitoring Specification Test Framework
To set up a test version of Tick Stack run the following command from the root diec
`vagrant up`
The will create a VM with InfluxDB, Kapacitor, Telegraf and Capacitor installed with the following ports forwarded on the host machine
Influx: 8086
Chronograf: 8888
Kapacitor: 9092
#### Configuration and Monitoring Specification Test Framework
A Java/JUnit test framework has been developed to provide concrete examples of the CLMC monitoring specification. To build and run this test framework you will need:
1. The CLMC TICK stack installed and running (provided as a Vagrant solution in this project)
## FLAME Configuration and Monitoring Specification
This document describe the low-level monitoring specification for cross-layer management and control within the FLAME platform.
This document describe the low-level configuration and monitoring specification for cross-layer management and control within the FLAME platform.
### Principles
#### Measurements Model
#### Configuration Data
The measurement model is based on a time-series model using the TICK stack from influxdata
Briefly describe
The data model is based on the line protocol which has the format
* the characteristics of configuration data as ways to describe the structure of the system over time.
* how configuration data provides context for measurements of system behaviour
* the lifecycle of configuration data within the platform and how it is used.
* the type of configuration data
Configuration includes the following aspects
* Topology (nodes and links)
* Capacity (servers and networks)
* Allocation (Media Service Instance, Service Function Instance, Surrogate Instance)
#### Monitoring Data
Briefly descirbe:
* the characteristics of monitoring data as ways to measure the behaviour of the system overtime including usage and performance
* how measurements relate to resources within the configured system
* the lifecycle of monitoring data within the platform and how it is used
* the type of monitoring data
Monitoring includes the following aspects
* network resource usage
* host resource usage
* service usage
#### Measurements Principles and Model
##### General
The measurement model is based on a time-series model using the TICK stack from influxdata. The data model is based on the line protocol which has the format
InfluxDB is schemaless allowing arbirtary series to be stored, for example, allows for arbritary measurements to be created by the wide variety of media components without requiring changes to a database schema.
Tags can be structured to provide query by dimensions allowing series data to be diced and sliced. The tags are automatically indexed.
The model will be used to report configuration and monitoring data. In general, tags are used to provide configuration context for measurement values stored in fields. The tags are structured to provide queries by dimensions defined in the FLAME architecture. Tags are automatically indexed by InfluxDB. Global tags are automatically inserted by contexualised agents collecting data from monitoring processes. The global tags used across different measurements are a key part of the database design. Although, InfluxDB is schemaless database allowing arbirtary measurement fields to be stored (e.g. allowing for a media component to have a set of specific metrics), using common global tags allows the aggregation of measurements across time with common context. Although similar to SQL influx is not a relational database and the primary key for all measuremetns is time. Further schema design recommendations can be found here:
Monitoring data must have time-stamp values that are consistent and sycnrhonised across the platform.
This means that all VMs hosting SFs should have a synchronised system clock, or at least (and more likely)
a means by which an millisecond offset from the local time can be retrieved so that a 'platform-correct'
time value can be calculated.
##### Temporal Measurements
Monitoring data must have time-stamp values that are consistent and sycnrhonised across the platform. This means that all VMs hosting SFs should have a synchronised system clock, or at least (and more likely) a means by which an millisecond offset from the local time can be retrieved so that a 'platform-correct' time value can be calculated.
#### Spatial Measurements
Integrating temporal measurements
Discuss hierarchical tags vs GPS coordinate systems
##### Spatial Measurements
Location should be provided in two forms: labelled (tag) and numeric (longitude and latitude as digitial degrees). Note that the location label is likely to be a _global tag_. This allows us to support the
following scenarios:
Location can be provided in two forms: labelled (tag) and numeric (longitude and latitude as digitial degrees). Note that the location label is likely to be a _global tag_.
##### SF with no knowlegde of GPS coordinates
Tag location
| location | loc_long | loc_lat |
| --- | --- | --- |
...
...
@@ -46,7 +71,7 @@ following scenarios:
A SF media transcoder is placed in a lamp-post. It has no means to obtain GPS coordinates but has a _loc_label_ provided to it as a VM environment variable. It provides zeros in the longitude and latitude. In subsequent data analysis we can search for this SF by location label.
##### SF with full location knowledge
GPS coordination location
| loc_label | loc_long | loc_lat |
| --- | --- | --- |
...
...
@@ -54,13 +79,69 @@ A SF media transcoder is placed in a lamp-post. It has no means to obtain GPS co
A SF that is a proxy to a user attached to a NAP running in street lamp post LAMP_1. Here we have knowledge both of the logical location of the service and also the fine-grained, dynamic position of the service user. Lots of interesting possibilities with both of these bits of information!
### Logical Model
Note that tags are always strings and cannot be floats, therefore log and lat will always be stored as a measurement field.
Integrating and analysing location measurements
If tags are used then measurements of GPS coordinates will need to be translated into tag based approximation. For example, if a user device is tracking location information then for that to be combined with a server location the GPS coordinate needs to be translated.
Matching on tags is limited to matching and potentially spatial hierarchies (e.g. country.city.street). Using a coordiante system allows for mathatical functions to be developed (e.g. proximity functions)
##### Measurement Context
Monitoring data is collected to support design, management and control decisions. The link between decisions and data is through queries applied to contextual information stored with measurement values.
Every measurement has a measurement context. The context allows for time-based series to be created according to a set of query criteria which are then be processed to calculate statistical data over the desired time-period for the series. For example, in the following query the measurement is avg_response_time, the context is “service A” and the series are all of the data points from now minus 10 minutes.
`find avg response time for service A between over the last 10 minutes`
To support this query the following measurement would be created:
Designing the context for measurements is an important step in the schema design. This is especially important when measurements from multiple monitoring sources need to be integrated and processed to provided data for queries and decision. The key design principles adopted include:
* identify common context across different measurements
* use the same identifiers and naming conventions for context across different measurements
* organise the context into hierarchies that are automatically added to measurements during the collection process
The following figure shows the general structure approach for two measurements A and B. Data points in each series have a set of tags that shares a common context and have a specific context related to the measurement values.
Now let’s look at the FLAME platform context for measurements within the FLAME platform. In the diagram below core of the model is the VM/Container instance as the primary measurement point as this is the realisation of computational processes running on the platform. A VM/Container is aone or more process running on a physical or virtual host with ports connecting to other VM/Container instances over network links. The VM/Container has measurement processes running to capture different views on the VM/Container include the network, host, and service. The acquisition of these different views of the VM/Container together are a key element of the cross-layer information required for management and control. The measurements about a VM/Container are captured by different processes running on the VM or container but are brought together by comon context allowing the information to be integrated, correlated and analysed.
We consider three views on the VM/Container instance including (in orange)
* service: specific metrics associated within the SF (either media component or platform component)
* network: data usage TX/RX, latency, jitter, etc.
* host: cpu, storage, memory, storage I/O, etc
The high-level entities involved in the measurement model are defined in the figure below. The core of the model is the Surrogate SF as the primary measurement point as this is the physical realisation of services running on the platform. A Surrogate SF is a process running on a physical or virtual host with ports connecting to other Surrogate SFs within the network. The Surrogate SF has measurement processes running to capture different views on the SF include the network, host resources, and SF usage/performance. The acquisition of these different views on the SF together is a key element of the cross-layer information required for management and control. The measurements about a surrogate SF is captured by different processes running on the VM or container but are brought together by globally asserted monitoring metadata allowing the information to be integrated, correlated and analysed.
All of the measurements on a specific VM/Container instance share a common context (green) that includes
Network and host measurements are general to all surrogate SFs running within the platform. SF usage and perf measurements are specific to the SF implementation. The Platform itself is realised using SFs and therefore NAPs and the Topology Manager are also monitored using the same model. For media component SFs that form part of a Service Function Chain within a Media Service, the measurement fields are not defined and developers can decide what fields they want to use. However, global tags will be inserted for all measurements to allow for integration of SF specific measurements with network and host measurements.
* sfc – an orchestration template
* sfc_instance – an instance of the orchestration template
* sf_package – a SF type
* sf_instance – an instance of the SF type
* vm_instance – an authoritive copy of the SF instance
* server – a physical or virtual server for hosting VM instances
* location – the location of the server
By including this context with service, network and host measurements it is possible to support a wide range of queries associated with SFC’s whether they are Media Services or the Platform components themselves. By adopting the same convention for identifiers it is possible to combine measurements across service, network and host to create new series that allows exploration of diffeent aspects of the VM instance.
Give a worked example across service and network measurements
* Decide on the KPI of interest and how it's calculated from a series of measurements
* Decide on time window for the series and sample rate
* Decide on interpolation approach for data points in the series
@@ -70,7 +151,7 @@ The monitoring model using an agent based approach. The general architecture is
An agent is deployed on each of the container/VM implementing a SF. The agent is deployed by the orchestrator when the SF is provisioned. The agent is configured with a set of input plugins that collect measurements from three aspects of the SF including network, host and SF usage/perf. The agent is configured with a set of global tags that are inserted for all measurements made by the agent on the host.
Will go with Http API rather than Java Client as the POJO abstraction is not needed for test cases as it hides
the detail of the line protocol.
**Adapting the Mona/MOOSE agent?**
MOOSE is the monitoring system provided by POINT and FLIPS. The monitoring specification has been analysed to refactor the measurements into series. The full monitoring specification is available here:
...
...
@@ -137,16 +225,7 @@ Capacity measurements measure the size of the virtual infrastructure slice avail
The *host_resource* measurement measures the wholesale host resources available to the platform that can be allocated to media services.