diff --git a/docs/CLMC monitoring specification for a basic scenario.md b/docs/CLMC monitoring specification for a basic scenario.md
deleted file mode 100644
index 7bce8245fad97010150e3ebd00bc3c3814532022..0000000000000000000000000000000000000000
--- a/docs/CLMC monitoring specification for a basic scenario.md	
+++ /dev/null
@@ -1,117 +0,0 @@
-<!--
-// © University of Southampton IT Innovation Centre, 2018
-//
-// Copyright in this software belongs to University of Southampton
-// IT Innovation Centre of Gamma House, Enterprise Road, 
-// Chilworth Science Park, Southampton, SO16 7NS, UK.
-//
-// This software may not be used, sold, licensed, transferred, copied
-// or reproduced in whole or in part in any manner or form or in or
-// on any media by any person other than in accordance with the terms
-// of the Licence Agreement supplied with the software, or otherwise
-// without the prior written consent of the copyright owners.
-//
-// This software is distributed WITHOUT ANY WARRANTY, without even the
-// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
-// PURPOSE, except where stated in the Licence Agreement supplied with
-// the software.
-//
-//      Created By :            Simon Crowle
-//      Created Date :          10-01-2018
-//      Created for Project :   FLAME
--->
-
-# Adaptive Streaming Use Case Scenario
-
-## Infrastructure Slice
-
-### *compute_node_config*
-
-| compute_node_config | slice | location | comp_node | cpu | memory | storage | timestamp |
-| --- | --- | --- | --- | --- | --- |--- | --- |
-| compute_node_config | SLICE1 | locA | dc1 | 4 | 8 | 16 | 1515583926868000000 |
-| compute_node_config | SLICE1 | locB | dc2 | 8 | 16 | 64 | 1515583926868000000 |
-| compute_node_config | SLICE1 | locC | dc3 | 48 | 128 | 4000 | 1515583926868000000 |
-
-### *network_config*
-
-| network_config | slice | network | bandwidth | timestamp |
-| --- | --- | --- | --- | --- | --- |--- | 
-| network_config | SLICE1 | data1 | 100 | 1515583926868000000 |
-
-__How do we describe network configuration ?__
-__What is a format of an infrastructure slices ?__
-__What is the relevant information ?__
-
-### *network_interface_config*
-
-| network_interface_config | slice | comp_node | port | network | rx_constraint | tx_constraint | timestamp |
-| --- | --- | --- | --- | --- | --- |--- |--- | 
-| network_config | SLICE1 | dc1 | enps03 | data1 | 1000 | 1000 | 1515583926868000000 |
-| network_config | SLICE1 | dc2 | enps03 | data1 | 1000 | 1000 |  1515583926868000000 |
-| network_config | SLICE1 | dc3 | enps03 | data1 | 1000 | 1000 |  1515583926868000000 |
-
-## NAP
-
-### ipendpoint_route
-
-| ipendpoint_route | location | ipendpoint_id | cont_nav | avg_http_requests_fqdn_rate | avg_network_fqdn_latency | time |
-| --- | --- | --- | --- | --- | --- | --- |
-| ipendpoint_route | \<common tags> | DC1 | ipendpoint1 | http://netflix.com/scream | 386, | 50 | 1515583926868000000 |
-
-## Media Service 
-
-There are various aggregated metrics we can calculate but in the use case scenario we postpone that till later.
-
-### sfc_instance_config
-
-`sfc_i_config,<common_tags>,state <fields> timestamp`
-
-### sf_i_config
-
-`sf_i_config,<common_tags>,state <fields> timestamp`
-
-## IPEndpoint
-
-All IPEndpoint measurements have the following global tags injected by a configured Telegraf agent
-
-* location
-* compute_node
-* sfc
-* sfc_i
-* sf
-* sfc_i
-* ipendpoint
-
-Also NOTE: the metrics provided in the measurements below are effectively a 'snapshot' of usage over a relatively small period of time. The length of this snapshot may vary, depending on the underlying implementation of the instrumentation, so we might have to assume this snapshot is essentially an average of a period of 1 second. Measuring 'usage' is dependent on the units, for example as a proportion of a resource or as a proportion of time.
-
-### ipendpoint_config
-
-| ipendpoint_config | location | sfc | sfc_i | sf | sf_i | ipendpoint | state | cpu| memory | storage |timestamp |
-| --- | --- | --- | --- | --- | --- |--- | --- | --- |  --- |  --- |  --- | 
-| ipendpoint_config | dc1 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint1 | placed | 2 | 4 | 16 | 1515583926868000000 |
-| ipendpoint_config | dc2 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint2 | placed | 8 | 16 | 64 | 1515583926868000000 |
-| ipendpoint_config | dc3 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint3 | placed | 48 | 128 | 4000 | 1515583926868000000 |
-| ipendpoint_config | dc1 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint1 | booted | 2 | 4 | 16 | 1515583926868000000 |
-| ipendpoint_config | dc2 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint2 | booted | 8 | 16 | 64 | 1515583926868000000 |
-| ipendpoint_config | dc3 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint3 | booted | 48 | 128 | 4000 | 1515583926868000000 |
-
-### cpu_usage
-
-| cpu_usage | \<common tags> | cpu | avg_cpu_time_user | avg_cpu_time_idle | timestamp |
-| --- | --- | --- | --- | --- |--- |
-| cpu | \<common tags> | 1 | 40 | 5 | 1515583926868000000 |
-
-### net_port_io
-
-| net_port_io | \<common tags> | avg_packet_drop_rate | avg_packet_error_rate | rx_bytes_port_m | rx_packets_m | tx_bytes_port_m | tx_packets_port_m | timestamp |
-| --- | --- | --- | --- | --- | --- | --- | --- | --- |
-| net_port_io | \<common tags> | 0.3 | 0.1 | 13567 | 768 | 8102 | 356 | 1515583926868000000 |
-
-### mpegdash_service
-
-| mpegdash_service_mon | \<common tags> | cont_nav | cont_rep | user_profile |avg_req_rate | avg_resp_time | peak_resp_time | avg_error_rate | avg_throughput | avg_quality_delivered | avg_startup_delay | avg_dropped_segments |  timestamp |
-| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |--- |
-| mpegdash_service_mon | \<common tags> | http://netflix.com/scream | h264 | profileA | 10 | 40 | 230 | 0.2 | 200 | | 5 | 1200 | 2 | 1515583926868000000 |
-
-
diff --git a/docs/TestScenarios.md b/docs/TestScenarios.md
deleted file mode 100644
index d40d24a04d3aa62bb01c7a388a585cee0a4dd3b5..0000000000000000000000000000000000000000
--- a/docs/TestScenarios.md
+++ /dev/null
@@ -1,77 +0,0 @@
-<!--
-// © University of Southampton IT Innovation Centre, 2018
-//
-// Copyright in this software belongs to University of Southampton
-// IT Innovation Centre of Gamma House, Enterprise Road, 
-// Chilworth Science Park, Southampton, SO16 7NS, UK.
-//
-// This software may not be used, sold, licensed, transferred, copied
-// or reproduced in whole or in part in any manner or form or in or
-// on any media by any person other than in accordance with the terms
-// of the Licence Agreement supplied with the software, or otherwise
-// without the prior written consent of the copyright owners.
-//
-// This software is distributed WITHOUT ANY WARRANTY, without even the
-// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
-// PURPOSE, except where stated in the Licence Agreement supplied with
-// the software.
-//
-//      Created By :            Rowan Powell
-//      Created Date :          05-01-2018
-//      Created for Project :   FLAME
--->
-
-# Test Scenarios
-
-|author|
-|------|
-|Rowan Powell|
-
-
-### Useful InfluxDB commands 
-
-| Action | Command example |
-| ------ | --------------- |
-| get top 3 entries from a database testDB | ```influx -database='testDB' -execute='SELECT * FROM response LIMIT 3'``` |
-| show all metrics for a database | ```influx -execute 'SHOW MEASUREMENTS ON testDB'``` |
-| show all databases | ```inflix -execute 'SHOW DATABASES'``` |
-
-### Using Chronograf
-
-open ```http://localhost:8888/sources/1/chronograf/data-explorer```
-user: telegraf
-password: metricsmetricsmetrics
-
-### Scenario 1 - Linear user load increase
-
-
-Simulating data from during normal usage (Users already present) and linearly increasing users
-
-* Starting at 20 users and scaling up to 40 over the time period
-* X% users using HD v Y% using SD
-* A% using resource scream, B% using LegoStarWars
-
-| Data | Database |
-| ---- | -------- |
-| Client requests | request |
-| Server responses | response |
-| server VM network performance | network |
-| mpeg_dash reports | mpegdash_service |
-| server configuration events | host_resources |
-| VM state events | vm_res_alloc |
-
-
-This table is written in shorthand
-* ~: proportional to
-* #: number of
-
-| Measurement | Field | Relationships |
-| ----------- | ----- | ------------- |
-| response | cpuUsage | ~#clients |
-| SF | avg_response_time | ~#requests and ~quality |
-| SF | peak_repsonse_time | ~#requests and ~quality |
-
-
-### Scenario 2  - Two Dash Servers
-
-Simon has created a spec for this here: https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/blob/integration/docs/CLMC%20monitoring%20specification%20for%20a%20basic%20scenario.md
\ No newline at end of file
diff --git a/docs/adaptive-streaming-usecase-scenario.md b/docs/adaptive-streaming-usecase-scenario.md
deleted file mode 100644
index 7fc524ab9f768d0f401bdca725a8e1fba0173f7e..0000000000000000000000000000000000000000
--- a/docs/adaptive-streaming-usecase-scenario.md
+++ /dev/null
@@ -1,122 +0,0 @@
-<!--
-// © University of Southampton IT Innovation Centre, 2018
-//
-// Copyright in this software belongs to University of Southampton
-// IT Innovation Centre of Gamma House, Enterprise Road, 
-// Chilworth Science Park, Southampton, SO16 7NS, UK.
-//
-// This software may not be used, sold, licensed, transferred, copied
-// or reproduced in whole or in part in any manner or form or in or
-// on any media by any person other than in accordance with the terms
-// of the Licence Agreement supplied with the software, or otherwise
-// without the prior written consent of the copyright owners.
-//
-// This software is distributed WITHOUT ANY WARRANTY, without even the
-// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
-// PURPOSE, except where stated in the Licence Agreement supplied with
-// the software.
-//
-//      Created By :            Michael Boniface
-//      Created Date :          15-01-2018
-//      Created for Project :   FLAME
--->
-
-# Adaptive Streaming Use Case Scenario
-
-![Scenario Topoligy](/docs/image/scenario-topology.jpg)
-
-![Scenario Deployment](/docs/image/scenario-deployment.jpg)
-
-
-## Infrastructure Slice
-
-### *compute_node_config*
-
-| compute_node_config | slice | location | comp_node | cpu | memory | storage | timestamp |
-| --- | --- | --- | --- | --- | --- |--- | --- |
-| compute_node_config | SLICE1 | locA | dc1 | 4 | 8 | 16 | 1515583926868000000 |
-| compute_node_config | SLICE1 | locB | dc2 | 8 | 16 | 64 | 1515583926868000000 |
-| compute_node_config | SLICE1 | locC | dc3 | 48 | 128 | 4000 | 1515583926868000000 |
-
-### *network_config*
-
-| network_config | slice | network | bandwidth | timestamp |
-| --- | --- | --- | --- | --- | --- |--- | 
-| network_config | SLICE1 | data1 | 100 | 1515583926868000000 |
-
-__How do we describe network configuration ?__
-__What is a format of an infrastructure slices ?__
-__What is the relevant information ?__
-
-### *network_interface_config*
-
-| network_interface_config | slice | comp_node | port | network | rx_constraint | tx_constraint | timestamp |
-| --- | --- | --- | --- | --- | --- |--- |--- | 
-| network_config | SLICE1 | dc1 | enps03 | data1 | 1000 | 1000 | 1515583926868000000 |
-| network_config | SLICE1 | dc2 | enps03 | data1 | 1000 | 1000 |  1515583926868000000 |
-| network_config | SLICE1 | dc3 | enps03 | data1 | 1000 | 1000 |  1515583926868000000 |
-
-## NAP
-
-### ipendpoint_route
-
-| ipendpoint_route | location | ipendpoint_id | cont_nav | avg_http_requests_fqdn_rate | avg_network_fqdn_latency | time |
-| --- | --- | --- | --- | --- | --- | --- |
-| ipendpoint_route | \<common tags> | DC1 | ipendpoint1 | http://netflix.com/scream | 386, | 50 | 1515583926868000000 |
-
-## Media Service 
-
-There are various aggregated metrics we can calculate but in the use case scenario we postpone that till later.
-
-### sfc_instance_config
-
-`sfc_i_config,<common_tags>,state <fields> timestamp`
-
-### sf_i_config
-
-`sf_i_config,<common_tags>,state <fields> timestamp`
-
-## IPEndpoint
-
-All IPEndpoint measurements have the following global tags injected by a configured Telegraf agent
-
-* location
-* compute_node
-* sfc
-* sfc_i
-* sf
-* sfc_i
-* ipendpoint
-
-Also NOTE: the metrics provided in the measurements below are effectively a 'snapshot' of usage over a relatively small period of time. The length of this snapshot may vary, depending on the underlying implementation of the instrumentation, so we might have to assume this snapshot is essentially an average of a period of 1 second. Measuring 'usage' is dependent on the units, for example as a proportion of a resource or as a proportion of time.
-
-### ipendpoint_config
-
-| ipendpoint_config | location | sfc | sfc_i | sf | sf_i | ipendpoint | state | cpu| memory | storage |timestamp |
-| --- | --- | --- | --- | --- | --- |--- | --- | --- |  --- |  --- |  --- | 
-| ipendpoint_config | dc1 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint1 | placed | 2 | 4 | 16 | 1515583926868000000 |
-| ipendpoint_config | dc2 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint2 | placed | 8 | 16 | 64 | 1515583926868000000 |
-| ipendpoint_config | dc3 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint3 | placed | 48 | 128 | 4000 | 1515583926868000000 |
-| ipendpoint_config | dc1 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint1 | booted | 2 | 4 | 16 | 1515583926868000000 |
-| ipendpoint_config | dc2 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint2 | booted | 8 | 16 | 64 | 1515583926868000000 |
-| ipendpoint_config | dc3 | MediaServiceTemplate | MediaServiceA | AdaptiveStreamingComp | AdaptiveStreamingComp1 | ipendpoint3 | booted | 48 | 128 | 4000 | 1515583926868000000 |
-
-### cpu_usage
-
-| cpu_usage | \<common tags> | cpu | avg_cpu_time_user | avg_cpu_time_idle | timestamp |
-| --- | --- | --- | --- | --- |--- |
-| cpu | \<common tags> | 1 | 40 | 5 | 1515583926868000000 |
-
-### net_port_io
-
-| net_port_io | \<common tags> | avg_packet_drop_rate | avg_packet_error_rate | rx_bytes_port_m | rx_packets_m | tx_bytes_port_m | tx_packets_port_m | timestamp |
-| --- | --- | --- | --- | --- | --- | --- | --- | --- |
-| net_port_io | \<common tags> | 0.3 | 0.1 | 13567 | 768 | 8102 | 356 | 1515583926868000000 |
-
-### mpegdash_service
-
-| mpegdash_service_mon | \<common tags> | cont_nav | cont_rep | user_profile |avg_req_rate | avg_resp_time | peak_resp_time | avg_error_rate | avg_throughput | avg_quality_delivered | avg_startup_delay | avg_dropped_segments |  timestamp |
-| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |--- |
-| mpegdash_service_mon | \<common tags> | http://netflix.com/scream | h264 | profileA | 10 | 40 | 230 | 0.2 | 200 | | 5 | 1200 | 2 | 1515583926868000000 |
-
-
diff --git a/docs/clmc-information-model.md b/docs/clmc-information-model.md
new file mode 100644
index 0000000000000000000000000000000000000000..0e8d61a7c7272c6ed79f3ddc053d1e7da53e7550
--- /dev/null
+++ b/docs/clmc-information-model.md
@@ -0,0 +1,552 @@
+<!--
+// © University of Southampton IT Innovation Centre, 2017
+//
+// Copyright in this software belongs to University of Southampton
+// IT Innovation Centre of Gamma House, Enterprise Road, 
+// Chilworth Science Park, Southampton, SO16 7NS, UK.
+//
+// This software may not be used, sold, licensed, transferred, copied
+// or reproduced in whole or in part in any manner or form or in or
+// on any media by any person other than in accordance with the terms
+// of the Licence Agreement supplied with the software, or otherwise
+// without the prior written consent of the copyright owners.
+//
+// This software is distributed WITHOUT ANY WARRANTY, without even the
+// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+// PURPOSE, except where stated in the Licence Agreement supplied with
+// the software.
+//
+//      Created By :            Michael Boniface
+//      Created Date :          18-12-2017
+//      Created for Project :   FLAME
+-->
+
+
+## **Cross-Layer Management and Control Information Model**
+
+This document provides an overview of the FLAME CLMC information model in support of service management and control decisions. The information model is designed to support the exploration and understanding of state and factors contributing to changes in state over time as shown in the primitive below:
+
+The system (infrastructure, platform and media services) are composed of a set of configuration items that transition between different states during the lifecycle of the system. Configuration items of interest include significant components who's state change influence the response of the system. In general, the information aims to support the process of:
+
+* Identification of significant configuration items within the system
+* Assertion of state using configuration measurements
+* Measurement of response (monitoring measurements)
+* Support for taking action (configuration measurements)
+
+![Configuration Principle](/docs/image/configuration-principle.jpg)
+
+This process is implemented in accordance with information security and privacy constraints. The following sections provides an overview of key aspects of monitoring.
+
+### Media Service 
+
+The FLAME architecture defines a media services as "An Internet accessible service supporting processing, storage and retrieval of content resources hosted and managed by the FLAME platform". A media service consists of 1 or more media components (also known as Service Functions) that together are composed to create an overall Service Function Chain. SFs are realised through the instantiation of virtual machines (or containers) deployed on servers based on resource management policy. Multiple VMs may be instantiated for each SF to create surrogate SFs, for example, to balance load and deliver against performance targets. Media Services, SFCs, SFs, VMs, links and servers are all examples of configuration items.
+
+Media services are described using a template structured according to the TOSCA specification (http://docs.oasis-open.org/tosca/TOSCA/v1.0/TOSCA-v1.0.html). A TOSCA template includes all of the information needed for the FLAME orchestrator to instantiate a media service. This includes all SF's, links between SFs and resource configuration information. The Alpha version of the FLAME platform is based on the current published TOSCA specification. Future developments will extend the TOSCA specification (known as TOSCA++) to meet FLAME requirements such as higher level KPIs and location-based constraints.
+
+The current TOSCA template provides the initial structure of the Media Service information model through specified service and resource configuration. Within this structure, system components are instantiated whose runtime characteristics are measured to inform management and control processes. Measurements relate to individual SF's as well as aggregated measurements structured according the structure of configured items within the system. Measurements are made by monitoring processes deployed with system components. The configured items provide the context for monitoring.
+
+The media information model in relation to the high-level media service lifecycle is shown in the diagram below. The lifecycle includes processes for packaging, orchestration, routing and SF management/control. Each stage in the process creates context for decisions and measurements within the next stage of the lifecycle. Packaging creates the context for orchestration, orchestration creates the context for endpoint instantiation, and network topology management. In the diagram, the green concepts identify the context which can be used for filtering and queries whilst the yellow concepts are the measurement data providing runtime measurements.
+
+![FLAMEContext](/docs/image/flame-context.jpg)
+
+The primary measurement point for a media service is an endpoint. An endpoint is an instantation of a service function within a VM or container on a server. An endpoint exists within two main contexts: media service and virtual infrastructure. The media service context relates to the use of the endpoint within a service function chain designed to deliver content. The virtual infrastructure context relates to the host and network environment into which the endpoint is deployed. Deploying monitoring agents in different contexts and sharing information between contexts is a key part of cross-layer management and control. 
+
+The diagram highlights the need to monitor three views on an endpoint: network, host, and service. The acquisition of these different views together are a key element of the cross-layer information required for management and control.  The measurements are captured by different processes running on servers but are brought together by common context allowing the information to be integrated, correlated and analysed. The endpoint can measure a service view related to the content being delivered such as request rates, content types, etc, a VM can measure a virtual infrastructure view of a single endpoint, and the server view can measure an infrastructure view across multiple endpoints deployed on a server. These monitoring processes running on the server are managed by different stakeholders, for example, the platform operator would monitor servers, where as the media service provider would monitor service specific usage.
+
+Not all information acquired will be aggregated and stored within the CLMC. The CLMC is not responsible for capturing every measurement point related to transferring bytes over the network. It's also not responsible for capturing every interaction between a user and a service. The key design principle is to acquire information from one context that can be used in another context. For example, instead of recording every service interaction an aggregate service usage metric (e.g. request rate/s) would be acquired and stored, and the similar aggregation would be needed for infrastructure monitoring. 
+
+![Agents](/docs/image/agents.jpg)
+
+### Configuration 
+
+Configuration information describes the structure and state of the system over time. Each configuration item has a  lifecycle that defines configuration states and events that cause a transition between states. The following table gives examples of configuration items and states. 
+
+### Monitoring 
+
+Monitoring measures the behaviour of the system and system components overtime including metrics associated with usage and performance. Measurements are made within the context of a known configuration state. Usage monitoring information can include measurements such as network resource usage, host resource usage and service usage. Performance monitoring information can include measurements such as cpu/s, throughput/s, avg response time and error_rate
+
+### Information Security 
+
+*to be completed*
+
+### Data Subject
+
+*to be completed*
+
+## **Measurement Model**
+
+### General 
+
+The measurement model is based on a time-series model defined by TICK stack from influxdata called the line protocol. The protocol defines a format for measurement samples which together can be combined to create series.
+
+`<measurement>[,<tag-key>=<tag-value>...] <field-key>=<field-value>[,<field2-key>=<field2-value>...] [unix-nano-timestamp]`
+
+Each series has:
+
+* a name "measurement"
+* 0 or more tags for measurement context
+* 1 or more fields for the measurement values
+* a timestamp.
+
+The model is used to report both configuration and monitoring data. In general, tags are used to provide configuration context for measurement values stored in fields. The tags are structured to provide queries by KPIs and dimensions defined in the FLAME architecture. 
+
+Tags are automatically indexed by InfluxDB. Global tags can be automatically inserted by contexualised agents collecting data from monitoring processes. The global tags used across different measurements are a key part of the database design. Although, InfluxDB is schemaless database allowing arbirtary measurement fields to be stored (e.g. allowing for a media component to have a set of specific metrics), using common global tags allows the aggregation of measurements across time with a known context.
+
+Although similar to SQL, InfluxDB is not a relational database and the primary key for all measuremetns is **time**. Schema design recommendations can be found here: https://docs.influxdata.com/influxdb/v1.4/concepts/schema_and_data_layout/
+
+### Temporal Measurements
+
+Monitoring data must have time-stamp values that are consistent and sychronised across the platform. This means that all VMs hosting SFs should have a synchronised system clock, or at least (and more likely) a means by which an millisecond offset from the local time can be retrieved so that a 'platform-correct' time value can be calculated.
+
+*Describe approaches to integrate temporal measurements, time as a primary key, etc.*
+
+*Discuss precision*
+
+influx -precision rfc3339 : The -precision argument specifies the format/precision of any returned timestamps. In the example above, rfc3339 tells InfluxDB to return timestamps in RFC3339 format (YYYY-MM-DDTHH:MM:SS.nnnnnnnnnZ).
+
+### Spatial Measurements 
+
+Location can be represented in forms: labelled (tag) and numeric (longitude and latitude as digitial degrees). Note that the location label is likely to be a _global tag_. 
+
+Tag location
+
+| location | loc_long | loc_lat |
+| --- | --- | --- |
+| DATACENTRE_1 | 0 | 0 |
+
+An endpoint is placed on a server has has no means to obtain GPS coordinates but has a _location_label_ provided to it as a server context. It provides zeros in the longitude and latitude. In subsequent data analysis we can search for this SF by location label.
+
+GPS coordination location
+
+| location_label | location_long | location_lat |
+| --- | --- | --- |
+| LAMP_1 | 50.842715 | -0.778276 |
+
+A SF that is a proxy to a user attached to a NAP running in street lamp post LAMP_1. Here we have knowledge both of the logical location of the service and also the fine-grained, dynamic position of the service user.
+
+Note that tags are always strings and cannot be floats, therefore log and lat will always be stored as a measurement field. 
+
+*Discuss integrating and analysing location measurements*
+
+If tags are used then measurements of GPS coordinates will need to be translated into tag based approximation. For example, if a user device is tracking location information then for that to be combined with a server location the GPS coordinate needs to be translated.
+
+Matching on tags is limited to matching and potentially spatial hierarchies (e.g. country.city.street). Using a coordiante system allows for mathatical functions to be developed (e.g. proximity functions)
+
+### Configuration Measurements
+
+FLAME _endpoints_ (VMs created and managed by the SFEMC) and media service _media components_ (processes that realise the execution of the media service) both undergo changes in configuration state during the lifetime of a media service's deployment. Observations of these state changes are recorded in the CLMC under named measurement sets, for example 'endpoint_config' and '\<media component name\>_config' for endpoint and media component labels respectively. In each case, all recordable states of the endpoint/media component are enumerated as columns within the measurement set (see respective state models below for details). Example states include:
+
+|Configuration Item|Configuration States|
+|---|---|
+|Network|e.g. available, unavailable|
+|Physical Link|up, down, unknown|
+|Server|e.g. available, unavailable|
+|Port|up, down, unknown|
+|Service function package|published, unpublished|
+|Media service template|published, unpublished|
+|Service function chain|submitted, scheduled, starting, running, stopping, stopped, error|
+|Service function|starting, running, stopping, stopped, error| 
+|Endpoint|placed, unplaced, booted, connected, error|
+
+>
+> __Side note: a few definitions__
+>
+> 'EP' - Endpoint: a VM created and managed by the SFEMC
+>
+> 'MC' - Media component: a process that realizes a part or the whole of a media service
+>
+> 'Sampling period' - the time elapsed before Telegraf reports (plugin generated) metrics to the CLMC
+>
+> 'Completed state' - a state that has been entered into and then exited
+>
+> 'Current state' - a state that has been entered into but not yet exited
+>
+> 'MST' - Mean state time: the sum of each time taken by each completed state of type 'X' divided by the number of completed state 'X's; i.e:
+>
+>```math
+> meanStateTime = \frac{\sum(endTimeOfState - startTimeOfState)}{numberOfTimesInState}
+>```
+>
+
+Observation of EP or MC states will be performed by a Telegraf plugin. For example, a Telegraf plugin could periodically __report__ on the state of an NGINX process to the CLMC at a _fixed_ time interval (say 10 seconds). In between these times (the _sampling period_) the Telegraf plugin will sample (or 'poll') the state of the EP or MC several times (say 10 each second). Note that during any sampling period, the EP or MC _may_ transition from one state to another, as a simple example:
+
+![exampleStateFlow](./image/configStateFlow.png)
+
+_Above: example observations within a four sampling periods for a MC configuration state_
+
+In the example provided above a MC moves through several states. During each sampling period, the total time in the observed states is measured and for those that are _completed states_ a sum of all the time and the mean average time for that state is recorded. For any state that has not been observed during the sample period, the sum and average values will be recorded as zero. For a state that has not yet completed, this state will be considered as the 'current state' and the length of time in this state increases and does so continuously, over multiple sample periods if necessary, until it exits. Finally, if a state completes directly after sample period '1' ends and a new state begins before the start of the next sample period '2', then the previous current state (from period '1') should be recorded as _completed_ as part period '2's report.
+
+In the following examples we illustrate how to calculate _mean time between failures_ (MTBF); _mean time to repair_ (MTTR) and _mean down time_ (MDT) for a media component (in our case, the _mpegdash_ MC) according to definitions found [here](https://en.wikipedia.org/wiki/Mean_time_between_failures).
+
+_Q. What is the Mean Time Before Failure (MTBF) of media component 'mpegdash'?_
+
+```
+select mean(running_mst) as "mpegdash_MTBF(s)" from "mpegdash_mc_config" where running_mst <> 0
+```
+
+```
+time mpegdash_MTBF(s)
+---- ----------------
+0    3602.1000000000004
+```
+ 
+
+_Q. What is the Mean Time to Repair (MTTR) of media component 'mpegdash'?_
+
+```
+select mean(starting_mst) as "mpegdash_MTTR(s)" from "mpegdash_mc_config" where starting_mst <> 0
+```
+
+```
+name: mpegdash_mc_config
+time mpegdash_MTTR(s)
+---- ----------------
+0    5.5
+```
+
+_Q. What is the Mean Down Time (MTD) of media component 'mpegdash'?_
+
+```
+select mean(starting_mst) as "starting_mdt" into "mpegdash_mc_config_mdt" from "mpegdash_mc_config" where starting_mst <> 0
+select mean(stopping_mst) as "stopping_mdt" into "mpegdash_mc_config_mdt" from "mpegdash_mc_config" where stopping_mst <> 0
+select mean(stopped_mst) as "stopped_mdt" into "mpegdash_mc_config_mdt" from "mpegdash_mc_config" where stopped_mst <> 0
+select (starting_mdt + stopping_mdt + stopped_mdt) as "MDT(s)" from "mpegdash_mc_config_mdt"
+```
+
+```
+name: mpegdash_mc_config_mdt
+time MDT(s)
+---- ------
+0    6.8
+```
+
+## **Decision Context**
+
+Monitoring data is collected to support service design, management and control decisions resulting in state changes in configuration items. The link between decisions and data is through queries and rules applied to contextual information stored with measurement values.  
+
+![MeasurementContext](/docs/image/measurement-context.jpg)
+
+Every measurement has a measurement context. The context allows for time-based series to be created according to a set of query criteria which are then be processed to calculate statistical data over the desired time-period for the series. For example, in the following simple query the measurement is avg_response_time, the context is “service A” and the series are all of the data points from now minus 10 minutes.
+
+`find avg response time for service A  over the last 10 minutes`
+
+To support this query the following measurement would be created:
+
+`serviceA_monitoring,service_id=(string) response_time=(float) timestamp`
+
+In the FLAME architeture we discuss at length the relationship between KPIs and dimensions, and implementations based on OLAP. In the current CLMC implementation, KPIs are calculated from measurement fields and dimensions are encoded within measurement tags. This is a lightweight implementation that will allow for a broad range of questions to be asked about the cross layer information acquired.
+
+Designing the context for measurements is an important step in the schema design. This is especially important when measurements from multiple monitoring sources need to be integrated and processed to provided data for queries and decision. The key design principles adopted include:
+
+* identify common context across different measurements
+* where possible use the same identifiers and naming conventions for context across different measurements
+* organise the context into hierarchies that are automatically added to measurements during the collection process
+
+![CommonContext](/docs/image/common-measurement-context.jpg)
+
+The following figure shows the general structure approach for two measurements A and B. Data points in each series have a set of tags that shares a common context and have a specific context related to the measurement values.
+
+![FLAMEMeasurements](/docs/image/flame-measurements.jpg)
+
+The measurement model considers three monitoring views on an endpoint with field values:
+
+* service: specific metrics associated within the SF (either media component or platform component) 
+* network: data usage TX/RX, latency, jitter, etc.
+* host: cpu, storage, memory, storage I/O, etc
+
+All of the measurements on a endpoint share a common context that includes tag values:
+
+* sfc – an orchestration template
+* sfc_i – an instance of the orchestration template
+* sf – a SF type
+* sf_i – an instance of the SF type
+* ipendpoint – an authoritive copy of the SF instance either VM or container
+* server – a physical or virtual server for hosting VM or container instances
+* location – the location of the server
+
+By including this context with service, network and host measurements it is possible to support range of temporal queries associated with SFC’s. By adopting the same convention for identifiers it is possible to combine measurements across service, network and host to create new series that allows exploration of different aspects of the VM instance, including cross-layer queries.
+
+* Decide on the service management decisions and time scales
+* Decide on the measurements of interest that are needed to make the decisions
+* Decide how measurements are calculated from a series of one or more other measurements 
+* Decide on time window for the series and sample rate
+* Decide on interpolation approach for data points in the series
+
+Each input measurement plugin can provide additional specific tags to filter measurement data.
+
+## **Data Retention Policy**
+
+*Discuss what data needs to be kept and for how long in relation to decision making*
+
+## **Architecture**
+
+### General
+
+The monitoring model uses an agent based approach with hierarchical aggregation used as required for different time scales of decision making. The general architecture is shown in the diagram below.
+
+![AgentArchitecture](/docs/image/agents.jpg)
+
+To monitor a SF an agent is deployed on each of the endpoint implementing a SF. The agent is deployed by the orchestrator when the SF is provisioned. The agent is configured with 
+
+* a set of input plugins that collect measurements from the three viewpoints of network, host and service
+* a set of global tags that are inserted for all measurements made by the agent on the host.
+* 1 or more output plugs for publishing aggregated monitoring data.
+
+Telegraf offers a wide range of integration with relevant monitoring processes.
+
+Telegraf offers a wide range of integration for 3rd party monitoring processes:
+
+* Telegraf AMQP: https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/amqp_consumer
+* Telegrapf http json: https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/httpjson
+* Telegraf http listener: https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/http_listener 
+* Telegraf Bespoke Plugin: https://www.influxdata.com/blog/how-to-write-telegraf-plugin-beginners/
+
+The architecture considers hierarchical monitoring and scalability, for example, AMQP can be used to buffer monitoring information whilst InfluxDB can be used to provide intermediate aggregation points when used with Telegraf input and output plugin.  
+
+## **Measurements Summary**
+
+## **Infrastructure Capacity Measurements** 
+
+Capacity measurements measure the size of the infrastructure slice available to the platform that can be allocated on demand to tenants. 
+
+*What is the format of the infrastructure slice and what data is available?*
+
+Common tags
+
+* slice_id – an idenfication id for the tenant infrastructure slice within openstack
+
+**compute_node_config**
+
+The *compute_node_config* measurement measures the wholesale host resources available to the platform that can be allocated to media services.
+
+`compute_node_config,slice_id,server_id,location cpu,memory,storage timestamp`
+
+**network_config**
+
+network_config measures the overall capacity of the network available to the platform for allocation to tenants. There are currently no metrics defined for this in the FLIPS monitoring specification, although we can envisage usage metrics such as bandwidth being part of this measurement.
+
+`network_config,slice_id,network_id, bandwidth,X,Y,Z timestamp`
+
+**network_interface_config**
+
+network_interface_config measures the connection bewteen a compute node and a network along with any constaints on that connection.  
+
+`network_interface_config,comp_node_id,port_id,network_id rx_constraint,tx_constraint timestamp`
+
+## **Platform Measurements** 
+
+Platform measurements measure the configuration, usage and performance of platform components.
+
+**topology_manager**
+
+tbd
+
+**nap**
+
+nap measurements are the platforms view on IP endpoints such as user equipment and services. A NAP is therefore the boundary of the platform. NAP also measures aspects of multicast performance. NAP multicast metrics that require further understanding although NAPs contribution towards understanding the source of requests is important in decisions regarding the placement of endpoints. The following fields require some clarification:
+
+* CHANNEL_AQUISITION_TIME_M 
+* CMC_GROUP_SIZE_M
+
+* What is the group id for CHANNEL_AQUISITION_TIME_M and how can this be related to FQDN of the content?
+* what is the predefined time interval for CMC_GROUP_SIZE_M?
+* How are multicast groups identified? i.e. "a request for FQDN within a time period", what's the content granularity here?
+
+NAP data usage measurement
+
+`nap_data_io,node_id,ip_version RX_BYTES_HTTP_M,TX_BYTES_HTTP_M,RX_PACKETS_HTTP_M,TX_PACKETS_HTTP_M, RX_BYTES_IP_M, TX_BYTES_IP_M, RX_BYTES_IP_MULTICAST_M, TX_BYTES_IP_MULTICAST_M, RX_PACKETS_IP_MULTICAST_M, TX_PACKETS_IP_MULTICAST_M timestamp`
+
+NAP service request and response metrics
+
+`ipendpoint_route,ipendpoint_id,cont_nav=FQDN HTTP_REQUESTS_FQDN_M, NETWORK_FQDN_LATENCY timestamp`
+
+**clmc**
+
+tbd
+
+### **IPEndpoint Measurements**
+
+ipendpoint measurements measure the configuration, usage and performance of VM/Container instances deployed by the platform within the context of a media service.
+
+Common tags
+
+* location – a physical or virtual server for hosting nodes instances
+* server – the location of the server
+* sfc – an orchestration template
+* sfc_i – an instance of the orchestration template
+* sf – a SF package identifier indicating the type and version of SF
+* sf_i – an instance of the SF type
+* ipendpoint – an authoritive copy of the SF instance either a container or VM
+
+**endpoint_config**
+
+An endpoint configuration state model consists of the following states:
+
+* unplaced
+* placing [transitional]
+* placed
+* booting [transitional]
+* booted
+* connecting [transitional]
+* connected
+
+A simple example and some measurement rows for an endpoint configuration states is given in the table below. The scenario is the following:
+
+Each sample period is 1 second.
+
+First sample period reports the VM being in state __unpalced__ for 0.7s, then changing state to __placing__ for 0.3 seconds. __placing__ is not 
+reported since it is not a **completed state**. The mean state time value for __unplaced__ is the same as the sum value because the VM has only been once in this state.
+
+Then the VM is reported to be in current state __placing__ for the whole sample period (1s) for 9 consecutive times. Only the 'current_state' tag value and the 'current_state_time'
+field value are filled in the measurement rows, since the VM has not exited its state, yet.
+
+The last sample period reports the VM exiting state __placing__, then changing state to __placed__. Hence, the __current_state__ tag is set to __placed__.  
+From the whole sample period (1s), the VM has been 0.9s in state 'placed'. Hence, the __current_state_time__ field is set to 0.9. For the other 0.1s of the sample period,
+the VM has been reported to be in state __placing__. Since it has exited state __placing__, the total time spent in this state (9.3s + 0.1s = 9.4s) is reported.
+This includes the state time from previous reports. The mean state time value for __placing__ is the same as the sum value because the VM has only been once in this state.
+
+| global tags | current_state (tag) | current_state_time | unplaced_sum | unplaced_mst | placing_sum | placing_mst | placed_sum | placed_mst | booting_sum | booting_mst | booted_sum | booted_mst | connecting_sum | connecting_mst | connected_sum | connected_mst | time |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| ... | placing | 0.3 | 0.7 | 0.7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+| ... | placing | 1.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+| ... | placing | 2.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+| ... | placing | 3.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+| ... | placing | 4.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+| ... | placing | 5.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+| ... | placing | 6.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+| ... | placing | 7.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+| ... | placing | 8.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+| ... | placing | 9.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+| ... | placed | 0.9 | 0 | 0 | 9.4 | 9.4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+
+**net**
+
+https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/NET_README.md
+
+[[inputs.net}}
+
+`net,interface=eth0,host=HOST bytes_sent=451838509i,bytes_recv=3284081640i,packets_sent=2663590i,packets_recv=3585442i,err_in=0i,err_out=0i,drop_in=4i,drop_out=0i 1492834180000000000`
+
+**cpu_usage**
+
+https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/CPU_README.md
+
+[[inputs.cpu]]
+
+`cpu_usage,<common_tags>,cpu cpu_usage_user,cpu_usage_system,cpu_usage_idle,cpu_usage_active,cpu_usage_nice,cpu_usage_iowait,cpu_usage_irq,cpu_usage_softirq,cpu_usage_steal,cpu_usage_guest,cpu_usage_guest_nice timestamp`
+
+**disk_usage**
+
+https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/DISK_README.md
+
+[[inputs.disk]]
+
+`disk,<common_tags>,fstype,mode,path free,inodes_free,inodes_total,inodes_used,total,used,used_percent timestamp`
+
+**disk_IO**
+
+https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/DISK_README.md
+
+[[inputs.diskio]]
+
+`diskio,<common_tags>,name weighted_io_time,read_time,write_time,io_time,write_bytes,iops_in_progress,reads,writes,read_bytes timestamp`
+
+**kernel_stats**
+
+https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/KERNEL_README.md
+
+[[inputs.kernel]]
+
+`kernel,<common_tags> boot_time,context_switches,disk_pages_in,disk_pages_out,interrupts,processes_forked timestamp`
+
+**memory_usage**
+
+[[inputs.mem]]
+
+`mem,<common_tags> cached,inactive,total,available,buffered,active,slab,used_percent,available_percent,used,free timestamp`
+
+**process_status**
+
+https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/PROCESSES_README.md
+
+[[inputs.processes]]
+
+`processes,<common_tags> blocked,running,sleeping,stopped,total,zombie,dead,paging,total_threads timestamp`
+
+**system_load_uptime**
+
+https://github.com/influxdata/telegraf/blob/master/plugins/inputs/system/SYSTEM_README.md
+
+[[inputs.system]]
+
+`system,<common_tags>,host load1,load5,load15,n_users,n_cpus timestamp`
+
+## **Media Service Measurements** 
+
+Media service measurements measure the configuration, usage and performance of media service instances deployed by the platform.
+
+**media_service_config**
+
+A media component configuration state model consists of the following states:
+
+* stopped
+* starting [transitional]
+* running
+* stopping [transitional]
+
+An example (based on the figure above) of some measurement rows for a media component configuration states is below:
+
+| global tags | current_state (tag) | current_state_time | stopped_sum | stopped_mst | starting_sum | starting_mst | running_sum | running_mst | stopping_sum | stopping_mst | time |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| ... | starting | 3  | 5 | 2.5 | 2 | 2 | 0 | 0 | 0 | 0 | ... |
+| ... | running  | 8  | 0 | 0   | 5 | 5 | 0 | 0 | 0 | 0 | ... |
+| ... | stopped  | 5  | 0 | 0   | 0 | 0 | 9 | 9 | 4 | 4 | ... |
+| ... | starting | 10 | 5 | 5   | 0 | 0 | 0 | 0 | 0 | 0 | ... |
+
+**<prefix>_service**
+
+Each SF developed will measure service specific usage and performance measurements. Telegraf offers plugins for many middleware services, however, it is likely that specific monitoring plugins will need to be developed where existing plugins are not available or the data is not sampled as required. The following is a theoretical example for media service monitoring.
+
+`<prefix>_service,<common_tags>,cont_nav,cont_rep,user <fields> timestamp`
+
+Fields (only examples as these are specific to each service)
+
+* request_rate
+* response_time
+* peak_response_time
+* error_rate
+* throughput
+
+Specific Tags
+
+* cont_nav: the content requested
+* cont_rep: the content representation requested
+* user: a user profile classification 
+
+### Service Function Chain Measurements
+
+**sfc_i_config**
+
+SFC configuration state monitoring implemented in accordance with the CLMC state model
+
+`sfc_i_config,<common_tags>,current_state <fields> timestamp`
+
+**sfc_i_monitoring**
+
+Aggregate measurement derived from VM/container measurements, most likely calculated using a continuous query over a specific time interval
+
+**sf_i_config**
+
+SFC instance configuration state monitoring implemented in accordance with the CLMC state model
+
+`sf_i_config,<common_tags>,current_state <fields> timestamp`
+
+**sf_i_monitoring**
+
+Aggregate measurement derived from ipendpoint measurements, most likely calculated using a continuous query over a specific time interval
+
+**ipendpoints**
+
+Aggregate measurement derived from ipendpoint measurements, most likely calculated using a continuous query over a specific time interval
+
+`ipendpoints,<common_tags>, placed, unplaced, booted, connected`
\ No newline at end of file
diff --git a/docs/flips-integration.md b/docs/flips-integration.md
new file mode 100644
index 0000000000000000000000000000000000000000..23f455a7a6c38cf38aa5d8b4aadfd5486ae11186
--- /dev/null
+++ b/docs/flips-integration.md
@@ -0,0 +1,164 @@
+# **Service Management and Control Decisions**
+
+© University of Southampton IT Innovation Centre, 2018
+
+This document describe possible approaches for integrating FLIPS monitoring with the CLMC. 
+
+## **Authors**
+
+|Authors|Organisation|                    
+|-|-|
+|[Michael Boniface](mailto:mjb@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
+|[Simon Crowle](mailto:sgc@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
+
+### Integration with FLIPS Monitoring
+
+FLIPS offers a scalable pub/sub system for distributing monitoring data. The architecture is described in the POINT monitoring specification https://drive.google.com/file/d/0B0ig-Rw0sniLMDN2bmhkaGIydzA/view. Some observations can be made
+
+* MOOSE and CLMC provide similar functions in the architecture, the CLMC will not have access to MOOSE but will need to subscribe to data points provided by FLIPS
+* The APIs for Moly and Blackadder are not provided therefore it's not possible to critically understand the correct implementation approach for agents and monitoring data distribution
+* Individual datapoints need to be aggregated into measurements according to a sample rate
+* We may need to use the blackadder API for distribution of monitoring data, replacing messaging systems such as AMQP with all buffering and pub/sub deployed on the nodes themselves rather than a central service. 
+
+There are a few architectural choices. The first below uses moly as an integration point for monitoring processes via a Telegraf output plugin with data inserted into influx using a blackadder API input plugin on another Telegraf agent running on the CLMC. In this case managing the subscriptions to nodes and data points is difficult. In addition, some data points will be individual from FLIPS monitoring whilst others will be in line protocol format from Telegraf. For the FLIPS data points a new input plugin would be required to aggregate individual data points into time-series measurements. 
+
+![FLIPSAgentArchitecture](/docs/image/flips-monitoring-architecture.jpg)
+
+The second (currently preferred) choice only sends line protocol format over the wire. Here we develop telegraf input and output plugins for blackadder benefiting from the scalable nature of the pub/sub system rather than introducing RabbitMQ as a central server. In this case the agent on each node would be configured with input plugins for service, host and network . We'd deploy a new Telegraf input plugin for FLIPS data points on the node's agent by subscribing to blackadder locally and then publish the aggregated measurement using the line protocol back over blackadder to the CLMC. FLIPS can still publish data to MOOSE as required. 
+
+![FLIPSAgentArchitecture](/docs/image/flips-monitoring-architecture2.jpg)
+
+The pub/sub protocol still needs some work as we don't want the CLMC to have to subscribe to nodes as they start and stop. We want the nodes to register with a known CLMC and then start publishing data to the CLMC according to a monitoring configuration (e.g. sample rate, etc). So we want a "monitoring topic" that nodes publish to and that the CLMC can pull data from. This topic is on the CLMC itself and note the nodes. Reading the FLIPS specification it seems that this is not how the nodes current distribute data, although could be wrong
+
+
+
+#### Network Measurements
+
+**net_port_config**
+
+network config is concerned with any network io allocation/constraints for network rx/tx. Possible fields (but these are not available from the FLIPS monitoring specification)
+
+`net_port_config,<common_tags>,port_id,port_state RX_USAGE_CONSTRAINT, TX_USAGE_CONSTRAINT, RX_THROUGHPUT_CONSTRAINT, TX_THROUGHPUT_CONSTRAINT timestamp`
+
+Specific tags
+* port_state
+* port_id
+
+**net_port_io**
+
+All net_port_io measurements are monitoring by FLIPS. Note that RX_PACKETS_M seems to have inconsistent naming convention unless we are mistaken
+
+`net_port_io,<common_tags>,port_id PACKET_DROP_RATE_M, PACKET_ERROR_RATE_M, RX_PACKETS_M, TX_PACKETS_PORT_M, RX_BYTES_PORT_M, TX_BYTES_PORT_M  timestamp`
+
+Specific tags
+* port_id 
+
+**Worked Usage Scenario - MPEG-DASH**
+
+The  scenario aims to verify two aspects
+
+* CLMC monitoring specification & data acquisition
+* Support for initial decision making processes for FLAME (re)orchestration
+
+The scenario is being developed in further document here:
+
+https://gitlab.it-innovation.soton.ac.uk/mjb/flame-clmc/blob/integration/docs/CLMC%20monitoring%20specification%20for%20a%20basic%20scenario.md
+
+The FLAME platform acquires a slice of the infrastructure resources (compute, RAM & storage [C1, C2, C3] and networking). A media service provider offers an MPEG-DASH service to end-users (via their video clients connected to NAPs on the FLAME platform). The service provider deploys surrogates of the MPEG-DASH service on all compute nodes [C1-C3]. All services (including NAPs) are monitored by the CLMC.
+
+Over time a growing number of video clients use a MPEG-DASH service to stream movies on demand. As clients connect and make requests, the platform makes decisions and takes actions in order to maintain quality of service for the increasing number of clients demanding an MPEG-DASH service.
+
+What are the possible criteria (based on metrics and analytics provided by the CLMC) that could be used to help NAP makes these decisions?
+
+In this scenario what are the possible actions a NAP could take?
+
+![Scenario](/docs/image/scenario.jpg)
+
+![Scenario Measurements](/docs/image/scenario-measurements.jpg)
+
+Platform actions
+
+* Increase the resources available to MPEG-DASH surrogates
+
+ * This may not be possible if resources unavailable
+ * Vertical scaling may not solve the problem (i.e., I/O bottleneck)
+
+* Re-route client requests to other MPEG-DASH services
+
+ * C1 – Closer to clients, but limited capability
+ * C3 – Greater capability but further away from clients
+… note: NAP service end-point re-routing will need to take into account network factors AND compute resource availability related service KPIs; i.e., end-to-end performance
+
+Service actions
+
+* Lower overall service quality to clients… reduce overall resource usage
+
+* Avg quality met: the ratio of average delivered quality out of requested quality
+* Avg start up time: the average time taken before a video stream starts playing less than a threshold
+* Avg video stalls: the percentage of stalls (dropped video segments that require re-sending) less than a threshold
+
+# **MISC Measurements and Further Questions**
+
+The following data points require further analysis
+
+* CPU_UTILISATION_M: likely to be replaced by other metrics provided directly by Telegraf plugins
+* END_TO_END_LATENCY_M: not clear what this measurement means, so needs clarification
+* BUFFER_SIZES_M: needs clarification 
+* RX_PACKETS_IP_M: is this just NAP or all Nodes
+* TX_PACKETS_IP_M: is this just NAP or all Nodes
+
+The following fields need further analysis as they seem to relate to core ICN, most likely fields/measurements related to platform components
+
+* FILE_DESCRIPTORS_TYPE_M 
+* MATCHES_NAMESPACE_M
+* PATH_CALCULATIONS_NAMESPACE_M 
+* PUBLISHERS_NAMESPACE_M 
+* SUBSCRIBERS_NAMESPACE_M 
+
+The following fields relate to CID which I don't understand but jitter is an important metric so we need to find out.
+
+* PACKET_JITTER_CID_M
+* RX_BYTES_CID_M 
+* TX_BYTES_CID_M 
+
+Some questions
+
+* Can a single value of jitter (e.g. avg jitter) be calculated from the set of measurements in PACKET_JITTER_CID_M message? What is the time period for the list of jitter measurements?
+* What does CID  mean? consecutive identical digits
+
+
+__Can we measure network usage for a specific VM from FLIPS monitoring?__
+__Some metrics from FLIPS contain 'port' label, others not, is this intended?__
+
+
+QUESTIONS
+1. Is the content navigation tag and fully qualified domain name (SDN based)? [Most likely: yes] although this may only be part of the URL?
+
+#### Link Measurements
+
+Links are established between VM/container instances, need to discuss what measurements make sense. Also the context for links could be between media services, therefore a link measurement should be within the platform context and NOT the media service context. Need a couple of scenarios to work this one out. 
+
+**link_config**
+
+Link Tags
+
+* link_name
+* link_id
+* source_node_id
+* destination_node_id
+* link_type
+* link_state
+
+**link_perf**
+
+link perf is measured at the nodes, related to end_to_end_latency. Needs further work.
+
+# Other Issues
+
+**Trust in measurements**
+
+If the agent is deployed in a VM/container that a tenant has root access then a tenant could change the configuration to fake measurements associated with network and host in an attempt gain benefit. This is a security risk. Some ideas include
+
+* Deploy additional agents on hosts rather than agents to measure network and VM performance. Could be hard to differentiate between the different SFs deployed on a host
+* Generate a hash from the agent configuration file that's checked within the monitoring message. Probably too costly and not part of the telegraf protocol
+* Use unix permissions (e.g. surrogates are deployed within root access to them)
\ No newline at end of file
diff --git a/docs/image/flame-context.jpg b/docs/image/flame-context.jpg
index 4c082bbec1ac40bd45a2315d5c39d2534a0e7885..2e42a0b724e32cd71280ccf779f9c0e5a02d65b7 100644
Binary files a/docs/image/flame-context.jpg and b/docs/image/flame-context.jpg differ
diff --git a/docs/monitoring.md b/docs/monitoring.md
index c69733e55b2d0b3b817eed7f6237970a36610d2e..ee5cf15af1fcb2377375919bb1b460f6581d817c 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -21,158 +21,10 @@
 //      Created for Project :   FLAME
 -->
 
-# **FLAME CLMC Information Model Specification**
 
-© University of Southampton IT Innovation Centre, 2017
+## **Cross-Layer Management and Control Information Model**
 
-This document describe the configuration and monitoring specification for cross-layer management and control within the FLAME platform. All information measured by the CLMC aims to improve management and control decisions made by the platform and/or media service providers against defined performance criteria such as increasing Quality of Experience and cost reduction. 
-
-## **Authors**
-
-|Authors|Organisation|                    
-|-|-|
-|[Michael Boniface](mailto:mjb@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
-|[Simon Crowle](mailto:sgc@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
-
-## **Service Management and Control Decisions**
-
-Service management decisions relate to processes for Service Request Management, Fault Management and Configuration Management. There are many possible management and control decisions and it is the purpose of the CLMC to provide decision makers with empirical knowledge to design and implement better policies. The FLAME architecture describes how the CLMC uses KPIs to measure performance and highlights examples of control policies such as shortest path routing to a SF and horizontal scaling of SFs in response to changes in workload. A Platform Provider and Media Service Provider will have KPI targets that are different and also not independent of each other. For example, allocating all of the resources needed for an expected peak workload of a media service when it is submitted for orchestration would guarantee a performance level . However, the outcome would typically produce low utilisation and increased costs due to peak workload only being of a fraction of the overall service operation time. The solution is to provide greater flexibility by exploiting points of variabilty within the system in relation to constraints. Constraints are imposed by policy (e.g. a limit on resource allocation) and technology limitations (e.g. VM boot time, horizontal/vertical scaling, routing).  
-
-The management and control processes implemented by the FLAME platform define the decisions, variability and constraints. The detail for the implementation of orchestration, management and control is under discussion and the following is based on a best understanding of what was described in the FLAME architecture. 
-
-### **An Elementary Starting Point: The Static Configuration Scenario**
-
-The 1st scenario to consider is an entirely static configuration. In this scenario a media service provider defines explicitly the infrastructure resources needed for a media service. The static configuration is what is proposed by the adopted of the current TOSCA specification for the Alpha release. Here an MSP declares the resources needed to deliver a desired performance level (implicitly known to the MSP).  In an extreme case, the approach  results in a static infrastructure configuration where the MSP defines the entire topology including servers, links and resource requirements. This would include server locations (central, metro and edge DCs) and when the servers are needed. The most basic case is deploy everything now for the lifetime of the media service. This full declaration would statically configure surrogates through the explicit declaration of servers and software deployed on those services. 
-
-In this case, the Platform Provider is responsible for allocating the requested resources to the media service provider for the lifetime of the media service. The performance of the service is entirely related to the knowledge of the MSP and the workload over time. 
-
-Even this simple example leads important decisions
-
-**D1: “How much infrastructure resource does a PP need from an IP?”**
-
-The infrastructure resource (slice) defines a topology of compute, storage and network resources allocated by an IP to a PP. The slice is used by the PP to resource media services. The PP will allocate proportions of the slice to media services within the lifecycle of such services. In most cases, a PP will need to define resource management policies that define rules for allocation of resources considering that multiple media services are contending for such resources.
-
-The capacity of the slice and the distribution of the resources within the infrastructure is a planning decision made by the PP based on a prediction of media service demand. The allocation of a slice has cost implications as from an IP’s perspective resources are dedicated to a PP. Depending on the business model and cost structures, the slice allocation would typically become a fixed cost to the PP and revenue for the IP. The PP must now allocate the slice to MSPs in the context of KPIs designed to maximise revenue from the slice. 
-
-Issues related to this decision include:
-
-* What are the temporal constraints on a slice? Is there a defined end time, recurring subscription or is a slice perpetual? 
-* How fine grained are temporal constraints considering the fact that an IP has resource scarcity at edge DCs in comparison to metro and central DCs?
-* What are the states of a slice? What causes the state transition?
-* Can a slice be modified and if so how can the slice change?
-
-*D1 CLMC outcome: a set of measurements describing an infrastructure slice.*
-
-**D2: “How much infrastructure resource does a MSP need from a PP?”**
-
-Once the PP has a slice then media services can be orchestrated, managed and controlled within the slice. Here the PP must consider the MSP infrastructure resource requirements. In the Alpha release FLAME adopts the current TOSCA specification where the MSPs define declaratively server resources required for each SF. The PP has no understanding of how a media service will behave in response to the resource allocation as that knowledge is within the MSP. In TOSCA++ FLAME is exploring KPI-based media service specifications where resource management knowledge forms part of the platform’s responsibility. 
-
-Issues related to this decision include:
-
-* What are the temporal constraints on resource requirements within a TOSCA specification?
-* How fine grained are the temporal constraints considering that a media service includes a set of media components with temporal resourcing requirements? E.g. media component A needs resource on Monday and media component B resource on Tuesday. 
-* What are the spatial constraints associated with the resource requirements? Does an MSP specify the precise DC (or set of DCs) where the SF needs or can be deployed? In effect, if the MSP says where the SF needs to be deployed this encodes the surrogate policy directly within the media service definition. 
-* How much variability are there is routing rules? How much of this is implicit within the platform implementation (e.g. coincidental multicast features)
-
-*D2 CLMC outcome: a set of measurements describing media service infrastructure requirements.*
-
-### **Where Next: Fast Variable Configuration Scenarios**
-
-Variable configuration identifies configuration state that can change in the lifetime of a media service. Variability in configuration state includes:
-
-* Vertically up and down scaling SF resources (i.e. compute, storage, network IO)
-* Horizontally up and down scaling SF resources (i.e. replication)
-* Distributing SFs by location (i.e. placement of a VM on an edge DC)
-* Routing traffic between SFs (i.e. load balancing algorithms)
-* Adapting content (i.e. reducing the resolution of a video stream)
-
-Each transition in state is a decision that has a time in the lifecycle (when is it implemeted), a duration (how long does it take to implement), actor (who is responsible) and an expected outcome.
-
-General issues reated to variable configuration include:
-
-* What are the points of variability within the platform?
-* How is variability configured, either through default platform policy or TOSCA templates?
-* Where are we contributing innovation in variability? e.g. network + compute + service factors considered together
-
-We now discuss key decisions associated variable configuration 
-
-**D3: “When should resources be allocated to a media service”?**
-
-When a PP receives a request to orchestrate a media service the PP must decide on when to allocate infrastructure resources. Allocation has a temporal dimension defining a start time and end time. An allocation in the future can be seen as a commitment. Allocation is important for accounting purposes but even more important in situation of resource scarcity. In most public clouds, resources from a MSP perspective are assumed to be infinite and there’s little need to consider temporal constraints associated with resource allocations. As long as the MSP has budget to pay for the resources, public cloud providers will scale those resources as requested.
-
-In FLAME we have resource scarcity and contention in edge DCs and therefore MSPs and the PP must find workable ways to negotiate allocation of resources over time.  Different resource management policies can be considered.
-
-* Allocate on request: PP allocates when the orchestration request is made. The PP would determine if sufficient infrastructure capacity exists considering the current commitments to other media services and if capacity is available then the resources would be allocated. This is a reservation for an MSP and is likely to result in underutilisation of the resources and increased costs for an MSP but may be needed to guarantee performance. 
-* Allocate on placement: PP allocates when the SFs are placed. The impact depends on the placement strategy as if SFs are placed when the MS is requested it will have the same effect to allocate all on request. If placement is selective based on factors such as utilisation/demand then some cost reduction may be achieved at the risk the resources might not be available. Note that placement does incur resource allocation to the MSP (e.g. storage and network I/O for ingest) but this is traded off with the potential to boot and respond to demand quickly.
-* Allocate on boot: PP allocates when then SFs are booted if they are available. Here the VMs are placed with a defined resource that’s allocated when the machine boots. The PP needs to decide if the machine can be booted according to the utilisation by other VMs deployed on the server.
-* Best effort with contention ratio: PP does not make any attempt to allocate resources but does place based on a defined contention ratio. Here there’s a high risk that performance is degraded by others competing for the resources
-
-Some resource management constraints relate to peak usage rate, for example, 50M/s peak and 100G a month usage.
-
-Issues related to this decision include:
-
-* What is the resource management policy for Alpha?
-* Do different policies apply for different types of infrastructure resources?
-* How long does it take to allocate different types of infrastructure resources?
-
-*D3 CLMC outcome: a set of measurements describing an allocation of infrastructure to a media service or SF over a time period*
-
-**D4: “Where can a SF be placed over a time period”?**
-
-When a media service tempalte is submitted for orchestration the PP must determine where SFs can be placed. Placement of a SF results in a VM being deployed on a server ready to be booted. Placement uses storage resources associated with a VM and network resources for ingest of the VM/content but does not utilise resources such as cpu, memory and data i/o incurred when the VM is used.
-
-In alpha where no KPIs are provided, placement is a spatial/temporal decision based on a function of the following measurements
-
-* infrastructure slice 
-* media service allocations
-* SF server requirements 
-
-*The outcome is a set of server options where an SF could be placed within a time period. This outcome is not related to the CLMC monitoring beyond CLMC measurements providing input to placement functions*
-
-**D5: “Where is a SF best placed over a time period”?**
-
-The question of where is it best to place an SF is challenging and depends on responsibility for delivering KPIs. A PP define a KPI to achieve an utilisation target of 80% for servers and place VMs on servers according to an utilisation measurement. A MSP may have a KPI to achieve a response time of 200mS for 80% of requests and place VMs according to a request rate and location measurement. 
-
-*The outcome is a decision on where to place a SF. There’s no change to system state at this point just a decision to take an action now or in the future. *
-
-**D6: “When is a SF placed”?**
-
-The placement strategy is driven by KPIs such as response time. Placement takes time to transfer the VMs and content to a service. Placed VMs boot faster but they consume storage resources as a consequence.
-
-A default PP strategy may be needed for the alpha release. For example, a strategy could be to place and boot in a metro or central DC where there’s less scarcity, and then selectively place/boot VMs in edge DCs on demand. However it’s not clear how such behaviour can be expressed in the TOSCA specification and how this relates to allocations. 
-A default policy could be that the PP can place a SF on any compute node in the network where there’s sufficient resources with a guarantee that there will be at least one instance of a SF, it’s then the PPs decision to create surrogates rather than have an explicit definition as per the static configuration scenario above. This policy is sensible as it moves towards the situation where the PP manages services based on KPIs, however it does require the PP to manage allocations over time in response to demand.
-
-*D6 CLMC outcome: VM configuration measurement updating state to “placed”*
-
-**D7: “When is a SF be booted?”**
-
-The booting strategy is driven by KPIs such as response time. VMs take time to boot. Booted VMs are available to serve requests routed to them immediately. When SF’s are booted the VM consumes resources in accordance within the context of the applicable resource management policy (e.g. guaranteed allocation or with contention ratio)
-
-*D5 CLMC outcome: VM configuration measurement updating state to “booted”*
-
-**D8: “Which surrogate are requests routed to?”**
-
-An SFC may have multiple surrogate services booted serving requests. A decision needs to be made on where to route requests. In a typical load balancing situation requests are routed using algorithms such as round robin and source-based. Routing to the closest surrogate may not deliver improved performance especially if the surrogate is deployed on a resource constrained server and the NAP is experiencing a high level of demand. In many typical load balancing scenarios, the servers are homogenous, network delay is not considered and requests are processed from a central point of access. In our scenario the server resources are heterogeneous, network delay is critical and requests enter from multiple points of access as defined by NAPs.
-
-At this point it’s worth highlighting that we are considering E2E performance and that each step in an end to end process contributions to an overall performance. If we take latency (as a key benefit of the platform), the E2E latency is the sum of delays in network and servers contributing to a content delivery process as shown in the diagram below:
-
-![E2E latency](/docs/image/e2e-latency.jpg)
-
-If we the average delay for parts of a process over a time period we have some indication of best routing policy.
-
-The variability factors that influence E2E latency include:
-
-* Spatial/temporal demand
-* Server placement and server resource allocation/contention over time
-* Network routing and network resource allocation/contention over time
-
-Issues related to this decision include:
-
-* How are NAP routing decisions coordinated as requests are not sourced from a central point?
-
-## **Information Model**
-
-This section provides an overview of the FLAME CLMC information model in support of service management and control decisions. The information model is designed to support the exploration and understanding of state and factors contributing to changes in state over time as shown in the primitive below:
+This document provides an overview of the FLAME CLMC information model in support of service management and control decisions. The information model is designed to support the exploration and understanding of state and factors contributing to changes in state over time as shown in the primitive below:
 
 ![Configuration Principle](/docs/image/configuration-principle.jpg)
 
@@ -258,7 +110,7 @@ Tags are automatically indexed by InfluxDB. Global tags can be automatically ins
 
 Although similar to SQL, InfluxDB is not a relational database and the primary key for all measuremetns is **time**. Schema design recommendations can be found here: https://docs.influxdata.com/influxdb/v1.4/concepts/schema_and_data_layout/
 
-### Temporal Measurements (TBC Simon Crowle)
+### Temporal Measurements
 
 Monitoring data must have time-stamp values that are consistent and sychronised across the platform. This means that all VMs hosting SFs should have a synchronised system clock, or at least (and more likely) a means by which an millisecond offset from the local time can be retrieved so that a 'platform-correct' time value can be calculated.
 
@@ -268,7 +120,7 @@ Monitoring data must have time-stamp values that are consistent and sychronised
 
 influx -precision rfc3339 : The -precision argument specifies the format/precision of any returned timestamps. In the example above, rfc3339 tells InfluxDB to return timestamps in RFC3339 format (YYYY-MM-DDTHH:MM:SS.nnnnnnnnnZ).
 
-### Spatial Measurements (TBC Simon Crowle)
+### Spatial Measurements 
 
 Location can be represented in forms: labelled (tag) and numeric (longitude and latitude as digitial degrees). Note that the location label is likely to be a _global tag_. 
 
@@ -372,14 +224,6 @@ To monitor a SF an agent is deployed on each of the surrogates implementing a SF
 
 Telegraf offers a wide range of integration with relevant monitoring processes.
 
-* Telegraf Existing Plugins for common services, relevant plugins include
- * Network Response https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/net_response: could be used to performance basic network monitoring
- * nstat https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/nstat : could be used to monitor the network
- * webhooks https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/webhooks: could be used to monitor end devices
- * prostat https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/procstat: could be used to monitor containers
- * SNMP https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/snmp: could be used to monitor flows
- * systat https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/sysstat: could be used to monitor hosts
-
 Telegraf offers a wide range of integration for 3rd party monitoring processes:
 
 * Telegraf AMQP: https://github.com/influxdata/telegraf/tree/release-1.5/plugins/inputs/amqp_consumer
@@ -389,25 +233,6 @@ Telegraf offers a wide range of integration for 3rd party monitoring processes:
 
 The architecture considers hierarchical monitoring and scalability, for example, AMQP can be used to buffer monitoring information whilst InfluxDB can be used to provide intermediate aggregation points when used with Telegraf input and output plugin.  
 
-### Integration with FLIPS Monitoring
-
-FLIPS offers a scalable pub/sub system for distributing monitoring data. The architecture is described in the POINT monitoring specification https://drive.google.com/file/d/0B0ig-Rw0sniLMDN2bmhkaGIydzA/view. Some observations can be made
-
-* MOOSE and CLMC provide similar functions in the architecture, the CLMC will not have access to MOOSE but will need to subscribe to data points provided by FLIPS
-* The APIs for Moly and Blackadder are not provided therefore it's not possible to critically understand the correct implementation approach for agents and monitoring data distribution
-* Individual datapoints need to be aggregated into measurements according to a sample rate
-* We may need to use the blackadder API for distribution of monitoring data, replacing messaging systems such as AMQP with all buffering and pub/sub deployed on the nodes themselves rather than a central service. 
-
-There are a few architectural choices. The first below uses moly as an integration point for monitoring processes via a Telegraf output plugin with data inserted into influx using a blackadder API input plugin on another Telegraf agent running on the CLMC. In this case managing the subscriptions to nodes and data points is difficult. In addition, some data points will be individual from FLIPS monitoring whilst others will be in line protocol format from Telegraf. For the FLIPS data points a new input plugin would be required to aggregate individual data points into time-series measurements. 
-
-![FLIPSAgentArchitecture](/docs/image/flips-monitoring-architecture.jpg)
-
-The second (currently preferred) choice only sends line protocol format over the wire. Here we develop telegraf input and output plugins for blackadder benefiting from the scalable nature of the pub/sub system rather than introducing RabbitMQ as a central server. In this case the agent on each node would be configured with input plugins for service, host and network . We'd deploy a new Telegraf input plugin for FLIPS data points on the node's agent by subscribing to blackadder locally and then publish the aggregated measurement using the line protocol back over blackadder to the CLMC. FLIPS can still publish data to MOOSE as required. 
-
-![FLIPSAgentArchitecture](/docs/image/flips-monitoring-architecture2.jpg)
-
-The pub/sub protocol still needs some work as we don't want the CLMC to have to subscribe to nodes as they start and stop. We want the nodes to register with a known CLMC and then start publishing data to the CLMC according to a monitoring configuration (e.g. sample rate, etc). So we want a "monitoring topic" that nodes publish to and that the CLMC can pull data from. This topic is on the CLMC itself and note the nodes. Reading the FLIPS specification it seems that this is not how the nodes current distribute data, although could be wrong
-
 ## **Measurements Summary**
 
 ### Configuration
@@ -497,7 +322,7 @@ NAP service request and response metrics
 
 tbd
 
-## Configuration status modelling and monitoring
+## Configuration state modelling and monitoring
 
 FLAME _endpoints_ (VMs created and managed by the SFEMC) and media service _media components_ (processes that realise the execution of the media service) both undergo changes in configuration state during the lifetime of a media service's deployment. Observations of these state changes are recorded in the CLMC under named measurement sets, for example 'endpoint_config' and '\<media component name\>_config' for endpoint and media component labels respectively. In each case, all recordable states of the endpoint/media component are enumerated as columns within the measurement set (see respective state models below for details).
 
diff --git a/docs/service-management-decisions.md b/docs/service-management-decisions.md
new file mode 100644
index 0000000000000000000000000000000000000000..53b037a2328d7161bba19d47048279588e8d3eac
--- /dev/null
+++ b/docs/service-management-decisions.md
@@ -0,0 +1,148 @@
+# **Service Management and Control Decisions**
+
+© University of Southampton IT Innovation Centre, 2017
+
+This document describe the configuration and monitoring specification for cross-layer management and control within the FLAME platform. All information measured by the CLMC aims to improve management and control decisions made by the platform and/or media service providers against defined performance criteria such as increasing Quality of Experience and cost reduction. 
+
+## **Authors**
+
+|Authors|Organisation|                    
+|-|-|
+|[Michael Boniface](mailto:mjb@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
+|[Simon Crowle](mailto:sgc@it-innovation.soton.ac.uk)|[University of Southampton, IT Innovation Centre](http://www.it-innovation.soton.ac.uk)|
+
+## **Service Management and Control Decisions**
+
+Service management decisions relate to processes for Service Request Management, Fault Management and Configuration Management. There are many possible management and control decisions and it is the purpose of the CLMC to provide decision makers with empirical knowledge to design and implement better policies. The FLAME architecture describes how the CLMC uses KPIs to measure performance and highlights examples of control policies such as shortest path routing to a SF and horizontal scaling of SFs in response to changes in workload. A Platform Provider and Media Service Provider will have KPI targets that are different and also not independent of each other. For example, allocating all of the resources needed for an expected peak workload of a media service when it is submitted for orchestration would guarantee a performance level . However, the outcome would typically produce low utilisation and increased costs due to peak workload only being of a fraction of the overall service operation time. The solution is to provide greater flexibility by exploiting points of variabilty within the system in relation to constraints. Constraints are imposed by policy (e.g. a limit on resource allocation) and technology limitations (e.g. VM boot time, horizontal/vertical scaling, routing).  
+
+The management and control processes implemented by the FLAME platform define the decisions, variability and constraints. The detail for the implementation of orchestration, management and control is under discussion and the following is based on a best understanding of what was described in the FLAME architecture. 
+
+### **An Elementary Starting Point: The Static Configuration Scenario**
+
+The 1st scenario to consider is an entirely static configuration. In this scenario a media service provider defines explicitly the infrastructure resources needed for a media service. The static configuration is what is proposed by the adopted of the current TOSCA specification for the Alpha release. Here an MSP declares the resources needed to deliver a desired performance level (implicitly known to the MSP).  In an extreme case, the approach  results in a static infrastructure configuration where the MSP defines the entire topology including servers, links and resource requirements. This would include server locations (central, metro and edge DCs) and when the servers are needed. The most basic case is deploy everything now for the lifetime of the media service. This full declaration would statically configure surrogates through the explicit declaration of servers and software deployed on those services. 
+
+In this case, the Platform Provider is responsible for allocating the requested resources to the media service provider for the lifetime of the media service. The performance of the service is entirely related to the knowledge of the MSP and the workload over time. 
+
+Even this simple example leads important decisions
+
+**D1: “How much infrastructure resource does a PP need from an IP?”**
+
+The infrastructure resource (slice) defines a topology of compute, storage and network resources allocated by an IP to a PP. The slice is used by the PP to resource media services. The PP will allocate proportions of the slice to media services within the lifecycle of such services. In most cases, a PP will need to define resource management policies that define rules for allocation of resources considering that multiple media services are contending for such resources.
+
+The capacity of the slice and the distribution of the resources within the infrastructure is a planning decision made by the PP based on a prediction of media service demand. The allocation of a slice has cost implications as from an IP’s perspective resources are dedicated to a PP. Depending on the business model and cost structures, the slice allocation would typically become a fixed cost to the PP and revenue for the IP. The PP must now allocate the slice to MSPs in the context of KPIs designed to maximise revenue from the slice. 
+
+Issues related to this decision include:
+
+* What are the temporal constraints on a slice? Is there a defined end time, recurring subscription or is a slice perpetual? 
+* How fine grained are temporal constraints considering the fact that an IP has resource scarcity at edge DCs in comparison to metro and central DCs?
+* What are the states of a slice? What causes the state transition?
+* Can a slice be modified and if so how can the slice change?
+
+*D1 CLMC outcome: a set of measurements describing an infrastructure slice.*
+
+**D2: “How much infrastructure resource does a MSP need from a PP?”**
+
+Once the PP has a slice then media services can be orchestrated, managed and controlled within the slice. Here the PP must consider the MSP infrastructure resource requirements. In the Alpha release FLAME adopts the current TOSCA specification where the MSPs define declaratively server resources required for each SF. The PP has no understanding of how a media service will behave in response to the resource allocation as that knowledge is within the MSP. In TOSCA++ FLAME is exploring KPI-based media service specifications where resource management knowledge forms part of the platform’s responsibility. 
+
+Issues related to this decision include:
+
+* What are the temporal constraints on resource requirements within a TOSCA specification?
+* How fine grained are the temporal constraints considering that a media service includes a set of media components with temporal resourcing requirements? E.g. media component A needs resource on Monday and media component B resource on Tuesday. 
+* What are the spatial constraints associated with the resource requirements? Does an MSP specify the precise DC (or set of DCs) where the SF needs or can be deployed? In effect, if the MSP says where the SF needs to be deployed this encodes the surrogate policy directly within the media service definition. 
+* How much variability are there is routing rules? How much of this is implicit within the platform implementation (e.g. coincidental multicast features)
+
+*D2 CLMC outcome: a set of measurements describing media service infrastructure requirements.*
+
+### **Where Next: Fast Variable Configuration Scenarios**
+
+Variable configuration identifies configuration state that can change in the lifetime of a media service. Variability in configuration state includes:
+
+* Vertically up and down scaling SF resources (i.e. compute, storage, network IO)
+* Horizontally up and down scaling SF resources (i.e. replication)
+* Distributing SFs by location (i.e. placement of a VM on an edge DC)
+* Routing traffic between SFs (i.e. load balancing algorithms)
+* Adapting content (i.e. reducing the resolution of a video stream)
+
+Each transition in state is a decision that has a time in the lifecycle (when is it implemeted), a duration (how long does it take to implement), actor (who is responsible) and an expected outcome.
+
+General issues reated to variable configuration include:
+
+* What are the points of variability within the platform?
+* How is variability configured, either through default platform policy or TOSCA templates?
+* Where are we contributing innovation in variability? e.g. network + compute + service factors considered together
+
+We now discuss key decisions associated variable configuration 
+
+**D3: “When should resources be allocated to a media service”?**
+
+When a PP receives a request to orchestrate a media service the PP must decide on when to allocate infrastructure resources. Allocation has a temporal dimension defining a start time and end time. An allocation in the future can be seen as a commitment. Allocation is important for accounting purposes but even more important in situation of resource scarcity. In most public clouds, resources from a MSP perspective are assumed to be infinite and there’s little need to consider temporal constraints associated with resource allocations. As long as the MSP has budget to pay for the resources, public cloud providers will scale those resources as requested.
+
+In FLAME we have resource scarcity and contention in edge DCs and therefore MSPs and the PP must find workable ways to negotiate allocation of resources over time.  Different resource management policies can be considered.
+
+* Allocate on request: PP allocates when the orchestration request is made. The PP would determine if sufficient infrastructure capacity exists considering the current commitments to other media services and if capacity is available then the resources would be allocated. This is a reservation for an MSP and is likely to result in underutilisation of the resources and increased costs for an MSP but may be needed to guarantee performance. 
+* Allocate on placement: PP allocates when the SFs are placed. The impact depends on the placement strategy as if SFs are placed when the MS is requested it will have the same effect to allocate all on request. If placement is selective based on factors such as utilisation/demand then some cost reduction may be achieved at the risk the resources might not be available. Note that placement does incur resource allocation to the MSP (e.g. storage and network I/O for ingest) but this is traded off with the potential to boot and respond to demand quickly.
+* Allocate on boot: PP allocates when then SFs are booted if they are available. Here the VMs are placed with a defined resource that’s allocated when the machine boots. The PP needs to decide if the machine can be booted according to the utilisation by other VMs deployed on the server.
+* Best effort with contention ratio: PP does not make any attempt to allocate resources but does place based on a defined contention ratio. Here there’s a high risk that performance is degraded by others competing for the resources
+
+Some resource management constraints relate to peak usage rate, for example, 50M/s peak and 100G a month usage.
+
+Issues related to this decision include:
+
+* What is the resource management policy for Alpha?
+* Do different policies apply for different types of infrastructure resources?
+* How long does it take to allocate different types of infrastructure resources?
+
+*D3 CLMC outcome: a set of measurements describing an allocation of infrastructure to a media service or SF over a time period*
+
+**D4: “Where can a SF be placed over a time period”?**
+
+When a media service tempalte is submitted for orchestration the PP must determine where SFs can be placed. Placement of a SF results in a VM being deployed on a server ready to be booted. Placement uses storage resources associated with a VM and network resources for ingest of the VM/content but does not utilise resources such as cpu, memory and data i/o incurred when the VM is used.
+
+In alpha where no KPIs are provided, placement is a spatial/temporal decision based on a function of the following measurements
+
+* infrastructure slice 
+* media service allocations
+* SF server requirements 
+
+*The outcome is a set of server options where an SF could be placed within a time period. This outcome is not related to the CLMC monitoring beyond CLMC measurements providing input to placement functions*
+
+**D5: “Where is a SF best placed over a time period”?**
+
+The question of where is it best to place an SF is challenging and depends on responsibility for delivering KPIs. A PP define a KPI to achieve an utilisation target of 80% for servers and place VMs on servers according to an utilisation measurement. A MSP may have a KPI to achieve a response time of 200mS for 80% of requests and place VMs according to a request rate and location measurement. 
+
+*The outcome is a decision on where to place a SF. There’s no change to system state at this point just a decision to take an action now or in the future. *
+
+**D6: “When is a SF placed”?**
+
+The placement strategy is driven by KPIs such as response time. Placement takes time to transfer the VMs and content to a service. Placed VMs boot faster but they consume storage resources as a consequence.
+
+A default PP strategy may be needed for the alpha release. For example, a strategy could be to place and boot in a metro or central DC where there’s less scarcity, and then selectively place/boot VMs in edge DCs on demand. However it’s not clear how such behaviour can be expressed in the TOSCA specification and how this relates to allocations. 
+A default policy could be that the PP can place a SF on any compute node in the network where there’s sufficient resources with a guarantee that there will be at least one instance of a SF, it’s then the PPs decision to create surrogates rather than have an explicit definition as per the static configuration scenario above. This policy is sensible as it moves towards the situation where the PP manages services based on KPIs, however it does require the PP to manage allocations over time in response to demand.
+
+*D6 CLMC outcome: VM configuration measurement updating state to “placed”*
+
+**D7: “When is a SF be booted?”**
+
+The booting strategy is driven by KPIs such as response time. VMs take time to boot. Booted VMs are available to serve requests routed to them immediately. When SF’s are booted the VM consumes resources in accordance within the context of the applicable resource management policy (e.g. guaranteed allocation or with contention ratio)
+
+*D5 CLMC outcome: VM configuration measurement updating state to “booted”*
+
+**D8: “Which surrogate are requests routed to?”**
+
+An SFC may have multiple surrogate services booted serving requests. A decision needs to be made on where to route requests. In a typical load balancing situation requests are routed using algorithms such as round robin and source-based. Routing to the closest surrogate may not deliver improved performance especially if the surrogate is deployed on a resource constrained server and the NAP is experiencing a high level of demand. In many typical load balancing scenarios, the servers are homogenous, network delay is not considered and requests are processed from a central point of access. In our scenario the server resources are heterogeneous, network delay is critical and requests enter from multiple points of access as defined by NAPs.
+
+At this point it’s worth highlighting that we are considering E2E performance and that each step in an end to end process contributions to an overall performance. If we take latency (as a key benefit of the platform), the E2E latency is the sum of delays in network and servers contributing to a content delivery process as shown in the diagram below:
+
+![E2E latency](/docs/image/e2e-latency.jpg)
+
+If we the average delay for parts of a process over a time period we have some indication of best routing policy.
+
+The variability factors that influence E2E latency include:
+
+* Spatial/temporal demand
+* Server placement and server resource allocation/contention over time
+* Network routing and network resource allocation/contention over time
+
+Issues related to this decision include:
+
+* How are NAP routing decisions coordinated as requests are not sourced from a central point?