diff --git a/README.md b/README.md index 8948957bacb471cdc8bea568ef9aebe5e026114f..0d9dbc763b314984cbd9a9e975e27427f6293b0c 100644 --- a/README.md +++ b/README.md @@ -119,50 +119,4 @@ If pytest is not installed, an easy solution is to use the Python Package Index `sudo apt-get install python3-pip` -`pip3 install pytest` - -#### Configuration status modelling and monitoring - -FLAME _endpoints_ (VMs created and managed by the SFEMC) and media service _media components_ (processes that realise the execution of the media service) both undergo changes in configuration state during the lifetime of a media service's deployment. Observations of these state changes are recorded in the CLMC under named measurement sets, for example 'endpoint_config' and '\<media component name\>_config' for endpoint and media component labels respectively. In each case, all recordable states of the endpoint/media component are enumerated as columns within the measurement set (see respective state models below for details). - -Observation of these states will be performed by a third party - for example, a Telegraf plugin will continuously __report__ on the state of an NGINX service to the CLMC using a _fixed_ interval (say 10 seconds). During this _reporting_ period, the actual state of the NGINX service will be sampled (polled) by the plugin several times (say 10 each second). During any reporting period, the NGINX service _may_ transition from one state to another: - -| State observation # | State | -| --- | --- | -| 1 | stopped | -| 2 | stopped | -| 3 | stopped | -| 4 | stopped | -| 5 | starting | -| 6 | starting | -| 7 | starting | -| 8 | starting | -| 9 | starting | -| 10 | starting | - -_Above: example observations within a single reporting period of a media component configuration state_ - -Therefore each report will include for each state: - -* The total time in the state for the reporting period -* The avarage time in the state for the reporting period - -##### Endpoint configuration state model - - -##### Media component configuration state model - -A media component configuration state model consists of the following states: - -* stopped -* starting [transitional] -* running -* stopping [transitional] - -An example measurement row for a media component configuration states is below: - -| tags | stopped | avg_stopped | starting | avg_starting | running | avg_running | stopping | avg_stopping | time | -| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | -| \<global tags...\> | 0 | 0 | 4 | 0.4 | 6 | 0.6 | 0.0 | 0.0 | 0 | - -In this example, the _reporting period_ is 10 seconds and with an observation rate of 1/second; the observed states were 'stopped' (4 observations) and 'starting' (6 observations). \ No newline at end of file +`pip3 install pytest` \ No newline at end of file diff --git a/docs/image/configStateFlow.png b/docs/image/configStateFlow.png new file mode 100644 index 0000000000000000000000000000000000000000..1916e8cac5e9eca7156c1f6fafdbb7fe0805aab7 Binary files /dev/null and b/docs/image/configStateFlow.png differ diff --git a/docs/monitoring.md b/docs/monitoring.md index e116d73a66b5386f1ebdedbf8f68a27299cb43a6..dd253cc7da3cc728572cbb3c97ff3d93e71614c9 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -497,6 +497,56 @@ NAP service request and response metrics tbd +## Configuration status modelling and monitoring + +FLAME _endpoints_ (VMs created and managed by the SFEMC) and media service _media components_ (processes that realise the execution of the media service) both undergo changes in configuration state during the lifetime of a media service's deployment. Observations of these state changes are recorded in the CLMC under named measurement sets, for example 'endpoint_config' and '\<media component name\>_config' for endpoint and media component labels respectively. In each case, all recordable states of the endpoint/media component are enumerated as columns within the measurement set (see respective state models below for details). + +> +> Side note: a few definitions +> +> 'EP' - Endpoint: a VM created and managed by the SFEMC +> 'MC' - Media component: a process that realizes a part or the whole of a media service +> 'Sampling period' - the time elapsed before Telegraf reports (plugin generated) metrics to the CLMC +> 'Completed state' - a state that has been entered into and then exited +> 'Current state' - a state that has been entered into but not yet exited +> 'MST' - Mean state time: the sum of each time taken by each completed state of type 'X' divided by the number of completed state 'X's; i.e: +> +>```math +> meanStateTime = \frac{\sum(startTimeOfState - endTimeOfState)}{numberOfTimesInState} +>``` +> + +Observation of EP or MC states will be performed by a Telegraf plugin. For example, a Telegraf plugin could periodically __report__ on the state of an NGINX process to the CLMC at a _fixed_ time interval (say 10 seconds). In between these times (the _sampling period_) the Telegraf plugin will sample (or 'poll') the state of the EP or MC several times (say 10 each second). Note that during any sampling period, the EP or MC _may_ transition from one state to another, as a simple example: + + + +_Above: example observations within a two sampling periods for a MC configuration state_ + +In the example provided above a MC moves through several states, finishing in a stopped state. During each sampling period, the total time in observed states is measured and for those that are _completed states_ a sum of all the time and the average time for that state is recorded. For any state that has not been observed during the sample period, the sum and average values will be recorded as zero. For a state that has not yet completed, this state will be considered as the 'current state' and the length of time in this state increases and does so continuously, over multiple sample periods if necessary, until it exits. Finally, if a state completes directly after sample period [1] ends and a new state begins before the start of the next sample period [2], then the previous current state from period [1] should be recorded as _completed_ as part period [2]'s report. + + +##### Endpoint configuration state model + + +##### Media component configuration state model + +A media component configuration state model consists of the following states: + +* stopped +* starting [transitional] +* running +* stopping [transitional] + +An example (based the figure above) of some measurement rows for a media component configuration states is below: + +| global tags | current_state (tag) | current_state_time | stopped_sum | stopped_mst | starting_sum | starting_mst | running_sum | running_mst | stopping_sum | stopping_mst | time | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| ... | starting | 3 | 5 | 2.5 | 2 | 2 | 0 | 0 | 0 | 0 | ... | +| ... | running | 8 | 0 | 0 | 5 | 5 | 0 | 0 | 0 | 0 | ... | +| ... | stopped | 5 | 0 | 0 | 0 | 0 | 9 | 9 | 4 | 4 | ... | +| ... | starting | 10 | 5 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | ... | + + ## Media Service Measurements Media service measurements measure the configuration, usage and performance of media service instances deployed by the platform.