Skip to content
Snippets Groups Projects
Commit 1dd70ece authored by MJB's avatar MJB
Browse files

initial commit of docs and scripts

parent 77d26eb6
No related branches found
No related tags found
No related merge requests found
##FLAME Monitoring Specification
This document describe the low-level monitoring specification for cross-layer management and control within the FLAME platform.
###Principles
####Measurements Model
The measurement model is based on a time-series model using the TICK stack from influxdata
The data model is based on the line protocol which has the format
`<measurement>[,<tag-key>=<tag-value>...] <field-key>=<field-value>[,<field2-key>=<field2-value>...] [unix-nano-timestamp]`
Each series has
* a name "measurement"
* 0 or more tags for metadata
* 1 or more fields for the measurement values
* a timestamp.
InfluxDB is schemaless allowing arbirtary series to be stored, for example, allows for arbritary measurements to be created by the wide variety of media components without requiring changes to a database schema.
Tags can be structured to provide query by dimensions allowing series data to be diced and sliced. The tags are automatically indexed.
####Temporal Measurements
####Spatial Measurements
Discuss hierarchical tags vs GPS coordinate systems
### Logical Model
The high-level entities involved in the measurement model are defined in the figure below. The core of the model is the Surrogate SF as the primary measurement point as this is the physical realisation of services running on the platform. A Surrogate SF is a process running on a physical or virtual host with ports connecting to other Surrogate SFs within the network. The Surrogate SF has measurement processes running to capture different views on the SF include the network, host resources, and SF usage/performance. The acquisition of these different views on the SF together is a key element of the cross-layer information required for management and control. The measurements about a surrogate SF is captured by different processes running on the VM or container but are brought together by globally asserted monitoring metadata allowing the information to be integrated, correlated and analysed.
Network and host measurements are general to all surrogate SFs running within the platform. SF usage and perf measurements are specific to the SF implementation. The Platform itself is realised using SFs and therefore NAPs and the Topology Manager are also monitored using the same model. For media component SFs that form part of a Service Function Chain within a Media Service, the measurement fields are not defined and developers can decide what fields they want to use. However, global tags will be inserted for all measurements to allow for integration of SF specific measurements with network and host measurements.
###Architecture
Agent-based monitoring
* Telegraf AMQP: https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/amqp_consumer
* Telegrapf http json: https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/httpjson
* Telegraf http listener: https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/http_listener
* Telegraf Bespoke Plugin: https://www.influxdata.com/blog/how-to-write-telegraf-plugin-beginners/
* Telegraf Existing Plugins for common services, relevant plugins include
* Network Response https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/net_response: could be used to performance basic network monitoring
* nstat https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/nstat : could be used to monitor the network
* webhooks https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/webhooks: could be used to monitor end devices
* prostat https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/procstat: could be used to monitor containers
* SNMP https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/snmp: could be used to monitor flows
* systat https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/sysstat: could be used to monitor hosts
Direct InfluxDB ingest (for testing measurements and queries)
* Java Client : https://github.com/influxdata/influxdb-java
* Http API : /db/<database>/series?u=<user>&p=<pass>
Agents:
* deployed at monitoring points (e.g surrogates and other network elements)
* insert contextual metadata as tags into measurements
* How does this relate to the Mona agents
Hierarchical monitoring and scalability considerations
* AMQP can be used to buffer monitoring info
* InfluxDB can be used to provide aggregation points when used with Telegraf input and output plugin
* How does this relate to the pub/sub and mySQL aggregator in FLIPS?
ISSUES
* Adapting the MOOSE agent?
MOOSE is the monitoring system provided by POINT and FLIPS. The monitoring specification has been analysed to refactor the measurements into series. The full monitoring specification is available here:
https://drive.google.com/file/d/0B0ig-Rw0sniLMDN2bmhkaGIydzA/view
A couple of comments
* CPU_UTILISATION_M: will be replaced by other metrics provided directly by Telegraf plugins
* END_TO_END_LATENCY_M (not clear who the endpoints are)
### Measurements
#### Capacity Measurements
Capacity measurements measure the size of the virtual infrastructure slice available to the platform that can be allocated on demand to tenants.
**host_resource**
The *host_resource* measurement measures the wholesale host resources available to the platform that can be allocated to media services.
Fields
* cpus(integer)
* memory(integer)
* storage(integer)
Tags:
* server_id
* location
**network_resource**
network_resource measures the overall capacity of the network available to the platform for allocation to tenants. There are currently no metrics defined for this in the FLIPS monitoring specification, although we can envisage usage metrics such as bandwidth being part of this measurement.
#### SF Network Measurements
SF Network Measurements measure aspects of network performance in relation to SFs deployed within the network. There are currently too many names for a node within the network and the following can be considered synonyms (SF, network element, node)
**node_network_perf**
node_network_perf provides the network measurement view for network elements. Network elements can be in the role of gateway, forwarding node, network attachment point, rendezvous, service, topology manager or user equipment as defined by the FLIPS monitoring specification. The measurements are made by the Mona monitoring agent.
Fields:
* BUFFER_SIZES_M
* FILE_DESCRIPTORS_TYPE_M
* HTTP_REQUESTS_FQDN_M
* MATCHES_NAMESPACE_M
* PATH_CALCULATIONS_NAMESPACE_M
* PACKET_JITTER_CID_M
* PUBLISHERS_NAMESPACE_M
* RX_BYTES_CID_M
* RX_BYTES_PORT_M
* RX_PACKETS_M
* RX_PACKETS_HTTP_M
* RXPACKETS_IP_M
* RX_PACKETS_IP_MULTICAST_M
* SUBSCRIBERS_NAMESPACE_M
* TX_BYTES_PORT_M
* TX_BYTES_CID_M
* TX_BYTES_HTTP_M
* TX_BYTES_IP_M
* TX_BYTES_IP_MULTICAST_M
* TX_PACKETS_PORT_M
* TX_PACKETS_HTTP_M
* TX_PACKETS_IP_M
* TX_PACKETS_IP_MULTICAST_M
Global Tags
* node_id: the network element id allocated to this surrogate
* sf_inst_id : the service function instance that this node represents in the case of surrogates
* sf_id : the service function type
* sfc_inst_id : the service function chain instance that this node is part of
* sfc_id : the service function chain type that this node is part of
* server_id : the server where the node is provisioned
* location : the location of the server
Specific Tags:
* node_role
* name
* state
**node_port_perf**
The netnode_port series provides network measurements on host ports as defined by the FLIPS monitoring specification. The measurements are made by the Mona monitoring agent.
Fields
* PACKET_DROP_RATE_M
* PACKET_ERROR_RATE_M
Tags
* node_id
* port_id
* port_name
**link**
The link series provides measurements about network links. Currently the FLIPS monitoring specification defines only topological configuration information and does not provide any measurements related to links. All performance information is included as part of the nodes. Further investigation is needed to understand if derived measurements related to links are needed or whether this is just useful for monitoring the temporal evolution of the topology.
Fields
* ??
* ??
Tags
* link_name
* link_id
* source_node_id
* destination_node_id
* link_type
#### SF Host Resource Measurements
SF Host Resource Measurements measures the host resources allocated to a service function deployed by the platform. All measurements have the following global tags to allow the data to be sliced and diced according to dimensions.
Global Tags
* node_id : the unique id of the network element
* sf_inst_id : the service function instance that this node represents in the case of surrogates
* sf_id : the service function type
* sfc_inst_id : the service function chain instance that this node is part of
* sfc_id : the service function chain type that this node is part of
* server_id : the server where the node is provisioned
* location : the location of the server
**node_host_resource**
*node_host_resource* measures host resources allocated to a node.
Fields
* cpus (integer)
* memory(integer)
* storage(integer)
**node_cpu_usage**
[[inputs.cpu]]
**node_disk_usage**
[[inputs.disk]]
**node_disk_IO**
[[inputs.diskio]]
**node_kernel_stats**
[[inputs.kernel]]
**node_memory_usage**
[[inputs.mem]]
**node_process_status**
[[inputs.processes]]
**node_swap_memory_usage**
[[inputs.swap]]
**node_system_load_uptime**
[[inputs.system]]
##### SF Usage and Perf Measurements
**topology_manager**
Fields
* ???
Global Tags
* node_id: the network element id allocated to this surrogate
* sf_inst_id : the service function instance that this node represents in the case of surrogates
* sf_id : the service function type
* sfc_inst_id : the service function chain instance that this node is part of
* sfc_id : the service function chain type that this node is part of
* server_id : the server where the node is provisioned
* location : the location of the server
Tags
* node_id: the network element id allocated to the topology manager
**nap**
nap measurements are the platforms view on IP endpoints such as user equipment and services. A NAP is therefore the boundary of the platform. NAP measurements may need to be extended to provide more information on the relationship between clients and FQDN requests.
Fields
* CHANNEL_AQUISITION_TIME_M
* CMC_GROUP_SIZE_M
* NETWORK_LATENCY_FQDN_M
* RX _BYTES_HTTP_M
* RX _BYTES_IP_M
Global Tags
* node_id: the network element id allocated to this surrogate
* sf_inst_id : the service function instance that this node represents in the case of surrogates
* sf_id : the service function type
* sfc_inst_id : the service function chain instance that this node is part of
* sfc_id : the service function chain type that this node is part of
* server_id : the server where the node is provisioned
* location : the location of the server
Specific Tags
* coverage (tbc indicating the reach of the NAP)
**orchestrator**
Fields
* ???
Tags
* node_id: the network element id allocated to the orchestrator
**clmc**
Fields
* ???
Tags
* node_id: the network element id allocated to the clmc
**media_component**
Each SF developed by tenants will offer service specific usage and performance measurements. The fields in the measurements will be specific but the tags must include a predefined set of tags to allow series joins with SF Network and SF Host Resource measurements.
The actual measurements will be made by agents running on surrogate services which provide authoritative copies of SF instances deployed as part of an overall media service. Therefore the measurement series are named surrogate
Fields
* [developer defined]
Global Tags
* node_id: the network element id allocated to this surrogate
* sf_inst_id : the service function instance that this node represents in the case of surrogates
* sf_id : the service function type
* sfc_inst_id : the service function chain instance that this node is part of
* sfc_id : the service function chain type that this node is part of
* server_id : the server where the node is provisioned
* location : the location of the server
Specific Tags
* cont_nav: the content interaction id
* cont_rep: the content representation type
* user_id: the pseudonym of the user
#### Measurements that still need some thinking
**sf_instance**
Fields
* ??
Tags
* ??
**sf**
Fields
* ??
Tags
* ??
**sfc_inst**
Fields
* ??
Tags
* ??
**template**
Fields
* ??
Tags
* template_id
* owner
#!/bin/bash
#/////////////////////////////////////////////////////////////////////////
#//
#// (c) University of Southampton IT Innovation Centre, 2017
#//
#// Copyright in this software belongs to University of Southampton
#// IT Innovation Centre of Gamma House, Enterprise Road,
#// Chilworth Science Park, Southampton, SO16 7NS, UK.
#//
#// This software may not be used, sold, licensed, transferred, copied
#// or reproduced in whole or in part in any manner or form or in or
#// on any media by any person other than in accordance with the terms
#// of the Licence Agreement supplied with the software, or otherwise
#// without the prior written consent of the copyright owners.
#//
#// This software is distributed WITHOUT ANY WARRANTY, without even the
#// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
#// PURPOSE, except where stated in the Licence Agreement supplied with
#// the software.
#//
#// Created By : Michael Boniface
#// Created Date : 13/12/2017
#// Created for Project : FLAME
#//
#/////////////////////////////////////////////////////////////////////////
# install docker
apt-get -y update
apt-get -y install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt-get -y update
apt-get install docker-ce
# to get a specific version look at the cache and run the install with that version
# apt-cache madison docker-ce
# apt-get install docker-ce=<VERSION>
# test docker
# docker run hello-world
# install docker compose
curl -L https://github.com/docker/compose/releases/download/1.17.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
# test compose
docker-compose version 1.17.0, build 1719ceb
# install tick stack
git clone https://github.com/influxdata/TICK-docker.git /opt
cd /opt/TICK-docker/1.3
docker-compose up -d
#!/bin/bash
#/////////////////////////////////////////////////////////////////////////
#//
#// (c) University of Southampton IT Innovation Centre, 2017
#//
#// Copyright in this software belongs to University of Southampton
#// IT Innovation Centre of Gamma House, Enterprise Road,
#// Chilworth Science Park, Southampton, SO16 7NS, UK.
#//
#// This software may not be used, sold, licensed, transferred, copied
#// or reproduced in whole or in part in any manner or form or in or
#// on any media by any person other than in accordance with the terms
#// of the Licence Agreement supplied with the software, or otherwise
#// without the prior written consent of the copyright owners.
#//
#// This software is distributed WITHOUT ANY WARRANTY, without even the
#// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
#// PURPOSE, except where stated in the Licence Agreement supplied with
#// the software.
#//
#// Created By : Michael Boniface
#// Created Date : 13/12/2017
#// Created for Project : FLAME
#//
#/////////////////////////////////////////////////////////////////////////
# install influx
wget https://dl.influxdata.com/influxdb/releases/influxdb_1.2.4_amd64.deb
dpkg -i influxdb_1.2.4_amd64.deb
systemctl start influxdb
# install kapacitor
wget https://dl.influxdata.com/kapacitor/releases/kapacitor_1.3.1_amd64.deb
dpkg -i kapacitor_1.3.1_amd64.deb
systemctl start kapacitor
# install Telegraf
wget https://dl.influxdata.com/telegraf/releases/telegraf_1.3.2-1_amd64.deb
dpkg -i telegraf_1.3.2-1_amd64.deb
systemctl start telegraf
# install Chronograf
wget https://dl.influxdata.com/chronograf/releases/chronograf_1.3.3.0_amd64.deb
dpkg -i chronograf_1.3.3.0_amd64.deb
systemctl start chronograf
# test influx
#curl "http://localhost:8086/query?q=show+databases"
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment