Skip to content
Snippets Groups Projects
Commit 5286337d authored by Simon Crowle's avatar Simon Crowle
Browse files

Updates configuration state modelling methodology and write-up

Removes initial write up from README and puts into monitoring.md
parent 2460412b
No related branches found
No related tags found
No related merge requests found
......@@ -119,50 +119,4 @@ If pytest is not installed, an easy solution is to use the Python Package Index
`sudo apt-get install python3-pip`
`pip3 install pytest`
#### Configuration status modelling and monitoring
FLAME _endpoints_ (VMs created and managed by the SFEMC) and media service _media components_ (processes that realise the execution of the media service) both undergo changes in configuration state during the lifetime of a media service's deployment. Observations of these state changes are recorded in the CLMC under named measurement sets, for example 'endpoint_config' and '\<media component name\>_config' for endpoint and media component labels respectively. In each case, all recordable states of the endpoint/media component are enumerated as columns within the measurement set (see respective state models below for details).
Observation of these states will be performed by a third party - for example, a Telegraf plugin will continuously __report__ on the state of an NGINX service to the CLMC using a _fixed_ interval (say 10 seconds). During this _reporting_ period, the actual state of the NGINX service will be sampled (polled) by the plugin several times (say 10 each second). During any reporting period, the NGINX service _may_ transition from one state to another:
| State observation # | State |
| --- | --- |
| 1 | stopped |
| 2 | stopped |
| 3 | stopped |
| 4 | stopped |
| 5 | starting |
| 6 | starting |
| 7 | starting |
| 8 | starting |
| 9 | starting |
| 10 | starting |
_Above: example observations within a single reporting period of a media component configuration state_
Therefore each report will include for each state:
* The total time in the state for the reporting period
* The avarage time in the state for the reporting period
##### Endpoint configuration state model
##### Media component configuration state model
A media component configuration state model consists of the following states:
* stopped
* starting [transitional]
* running
* stopping [transitional]
An example measurement row for a media component configuration states is below:
| tags | stopped | avg_stopped | starting | avg_starting | running | avg_running | stopping | avg_stopping | time |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| \<global tags...\> | 0 | 0 | 4 | 0.4 | 6 | 0.6 | 0.0 | 0.0 | 0 |
In this example, the _reporting period_ is 10 seconds and with an observation rate of 1/second; the observed states were 'stopped' (4 observations) and 'starting' (6 observations).
\ No newline at end of file
`pip3 install pytest`
\ No newline at end of file
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
......@@ -497,6 +497,56 @@ NAP service request and response metrics
tbd
## Configuration status modelling and monitoring
FLAME _endpoints_ (VMs created and managed by the SFEMC) and media service _media components_ (processes that realise the execution of the media service) both undergo changes in configuration state during the lifetime of a media service's deployment. Observations of these state changes are recorded in the CLMC under named measurement sets, for example 'endpoint_config' and '\<media component name\>_config' for endpoint and media component labels respectively. In each case, all recordable states of the endpoint/media component are enumerated as columns within the measurement set (see respective state models below for details).
>
> Side note: a few definitions
>
> 'EP' - Endpoint: a VM created and managed by the SFEMC
> 'MC' - Media component: a process that realizes a part or the whole of a media service
> 'Sampling period' - the time elapsed before Telegraf reports (plugin generated) metrics to the CLMC
> 'Completed state' - a state that has been entered into and then exited
> 'Current state' - a state that has been entered into but not yet exited
> 'MST' - Mean state time: the sum of each time taken by each completed state of type 'X' divided by the number of completed state 'X's; i.e:
>
>```math
> meanStateTime = \frac{\sum(startTimeOfState - endTimeOfState)}{numberOfTimesInState}
>```
>
Observation of EP or MC states will be performed by a Telegraf plugin. For example, a Telegraf plugin could periodically __report__ on the state of an NGINX process to the CLMC at a _fixed_ time interval (say 10 seconds). In between these times (the _sampling period_) the Telegraf plugin will sample (or 'poll') the state of the EP or MC several times (say 10 each second). Note that during any sampling period, the EP or MC _may_ transition from one state to another, as a simple example:
![exampleStateFlow](./image/configStateFlow.png)
_Above: example observations within a two sampling periods for a MC configuration state_
In the example provided above a MC moves through several states, finishing in a stopped state. During each sampling period, the total time in observed states is measured and for those that are _completed states_ a sum of all the time and the average time for that state is recorded. For any state that has not been observed during the sample period, the sum and average values will be recorded as zero. For a state that has not yet completed, this state will be considered as the 'current state' and the length of time in this state increases and does so continuously, over multiple sample periods if necessary, until it exits. Finally, if a state completes directly after sample period [1] ends and a new state begins before the start of the next sample period [2], then the previous current state from period [1] should be recorded as _completed_ as part period [2]'s report.
##### Endpoint configuration state model
##### Media component configuration state model
A media component configuration state model consists of the following states:
* stopped
* starting [transitional]
* running
* stopping [transitional]
An example (based the figure above) of some measurement rows for a media component configuration states is below:
| global tags | current_state (tag) | current_state_time | stopped_sum | stopped_mst | starting_sum | starting_mst | running_sum | running_mst | stopping_sum | stopping_mst | time |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| ... | starting | 3 | 5 | 2.5 | 2 | 2 | 0 | 0 | 0 | 0 | ... |
| ... | running | 8 | 0 | 0 | 5 | 5 | 0 | 0 | 0 | 0 | ... |
| ... | stopped | 5 | 0 | 0 | 0 | 0 | 9 | 9 | 4 | 4 | ... |
| ... | starting | 10 | 5 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | ... |
## Media Service Measurements
Media service measurements measure the configuration, usage and performance of media service instances deployed by the platform.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment