diff --git a/docs/CLMC monitoring specification for a basic scenario.md b/docs/CLMC monitoring specification for a basic scenario.md index a8afea64076b471d3d3a7927401388a2b4bf4f2e..e56dc7da3671f94ac6bad1c4db99ba0fc8bc81ec 100644 --- a/docs/CLMC monitoring specification for a basic scenario.md +++ b/docs/CLMC monitoring specification for a basic scenario.md @@ -9,132 +9,141 @@ | compute_node_config, | slice_id="SLICE1" | #### Specific context -| tag | tag | field | field | field | timestamp | -| --- | --- | --- | --- | --- | --- | -| location='DC1', | comp_node_id='c1' | cpus=4, | memory=8, | storage=16 | 1515583926868000000 | +| tag | tag | +| --- | --- | +| location='DC1', | comp_node_id='c1' | + +#### Configurations +| field | field | field | timestamp | +| --- | --- | --- | --- | +| cpus=4, | memory=8, | storage=16 | 1515583926868000000 | ### Network configuration -__MORE NEEDED FROM IDE__ +__How do we describe network configuration ?__ +__What is a format of an infrastructure slices ?__ +__What is the relevant information ?__ -#### Common context: VLANs +#### Common context: network | measurement | tag | | --- | --- | --- | -| network_resource_config,| slice_id='SLICE1' | +| network_config,| slice_id='SLICE1' | + +#### Specific context: network +| tag | +| --- | +| network_id="NET1" | + +#### Configurations: network +| field | timestamp | +| --- | --- | --- | --- | --- | +| bandwidth=400 | 1515583926868000000 | -#### Specific context: VLANs -| tag | field | timestamp | -| --- | --- | --- | --- | --- | --- | -| network_id="VNET1" | bandwidth=400 | 1515583926868000000 | #### Common context: Network interfaces | measurement | tag | | --- | --- | --- | | network_interface_config,| slice_id='SLICE1' | -| tag | tag | field | field | timestamp | -| --- | --- | --- | --- | --- | --- | -| comp_node_id='c1', | port_id='enps03', | rx_constraint=1000, | tx_constraint=1000 | 1515583926868000000 | +#### Specific context: Network interfaces +| tag | tag | +| --- | --- | +| comp_node_id='c1', | port_id='enps03', | +#### Configurations: Network interfaces +| field | field | timestamp | +| --- | --- | --- | --- | +| rx_constraint=1000, | tx_constraint=1000 | 1515583926868000000 | -## CONFIGURATION: SF template (TOSCA) +## CONFIGURATION: SFC template (TOSCA) ### Media Service SFC states -__MORE NEEDED FROM IDE__ - +__What are the SFC states ?__ -### CONFIGURATION: Media Service SF Instance states -__MORE NEEDED FROM IDE__ +### CONFIGURATION: Media Service SF states +__What are the SF states ?__ -### CONFIGURATION: Media Service SF Instance Surrogate states +### CONFIGURATION: Media Service SF Instance states #### Common context -|measurement|tag|tag|tag|tag|tag| +| measurement | tag | tag | tag | tag | tag | | --- | --- | --- | --- | --- | --- | -| sf_instance_surrogate_config, | location="DC1", | sfc="Scenario1_Template", | sfc_i="Scenario1_Template_I1", | sf_package="MS_STREAMING", | sf_i="MS_STREAMING_1" | +| sf_instance_surrogate_config, | location='DC1', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1' | #### Specific context -|field|field|field|timestamp| -| --- | --- | --- | --- | --- | -| cpus=2, | memory=4, | storage=8 | 1515583926868000000 | - +| tag | +| --- | +| surrogate_id='MS_STREAM_1_SURROGATE_1' | -## Monitor Service Function Instance Surrogate (VMs) -#### Common context: Usage and Performance -All of the specific context measurements below carry the following common context (this has not be replicated for brevity) for both usage and performance measurements. In this example, we illustrate using 2 surrogate VMs. +#### Configurations +| field | field | field | field | timestamp | +| --- | --- | --- | --- | --- | +| state='placed', | cpus=2, | memory=4, | storage=8 | 1515583926868000000 | +### CONFIGURATION: Media Service Function Instance Surrogates +#### Common context | measurement | tag | tag | tag | tag | tag | | --- | --- | --- | --- | --- | --- | --- | -| \<measurement>, | location="DC1", | sfc="Scenario1_Template", | sfc_i="Scenario1_Template_I1", | sf_package="MS_STREAMING", | sf_i="MS_STREAMING_1", | vm_instance ="MPEG_DASH_Server1" | -| \<measurement>, | location="DC2", | sfc="Scenario1_Template", | sfc_i="Scenario1_Template_I1", | sf_package="MS_STREAMING", | sf_i="MS_STREAMING_2", | vm_instance ="MPEG_DASH_Server2" | +| \<measurement label> | location='DC1', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', | -Also NOTE: the metrics provided in the measurements below are effectively a 'snapshot' of usage over a relatively small period of time. The length of this snapshot may vary, depending on the underlying implementation of the instrumentation, so we might have to assume this snapshot is close to 'instant'. +#### Specific context +| tag | +| --- | +| surrogate_id='MS_STREAM_1_SURROGATE_1' | -### USAGE: Monitor Service Function Instance Surrogate (VMs) -#### Specific context: CPU usage -| measurement | \<common tags> | field | field | timestamp | -| --- | --- | --- | --- | --- | -| vm_host_cpu_usage | \<common tags> | cpu_usage=40, | cpu_usage_system=5 | 1515583926868000000 | +#### Configurations -##### Specific context: RAM usage -| measurement | \<common tags> | field | field | timestamp | -| --- | --- | --- | --- | --- | --- | -| vm_host_ram_usage | \<common tags> | ram_usage=1400, | ram_usage_system=512, | 1515583926868000000 | +__QUESTION__: Do we only allow a 1-to-1 mapping between Media Service SF Instances and Surrogates w.r.t. configurations (i.e: I asked for 2 CPUs, I got 2 CPUs). If yes, we could cut some of the fields below. -##### Specific context: storage usage -| measurement | \<common tags> | field | field | timestamp | -| --- | --- | --- | --- | --- | --- | --- | -| vm_host_storage_usage | \<common tags> | disk_free=8144, | swap_size=1576 | 1515583926868000000 | - -##### Specific context: network usage +| field | field | field | field | timestamp | +| --- | --- | --- | --- | --- | +| state='booted', | cpus=2, | memory=4, | storage=8 | 1515583926868000000 | -> Not sure about this one. For example: Bytes over port_M - is this a total byte count that increments over time -> or is it a snapshot shot of throughput on port M over the course of, say, 1 second? -| measurement | \<common tags> | field | field | timestamp | -| --- | --- | --- | --- | --- | --- | -| vm_host_network_usage | \<common tags> | rx_bytes_port_M=5242880, | tx_bytes_port_M=15728640 | 1515583926868000000 | +## MONITORING +#### Common context: Usage and Performance +All of the specific context measurements below carry the following common context (this has not be replicated for brevity) for both usage and performance measurements. In this example, we illustrate using two surrogate VMs. +#TODO: Label as averages -### PERFORMANCE: Monitor Service Function Instance Surrogate (VMs) -Note all metrics described below are assumed to be *averages* over 1 second. +| measurement | tag | tag | tag | tag | tag | tag | tag | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| \<measurement>, | location='DC1', | comp_node_id='c1', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', | surrogate_id ='MS_STREAM_1_SURROGATE_1' | +| \<measurement>, | location='DC2', | comp_node_id='c2', | sfc='Scenario1_Template', | sfc_i='Scenario1_Instance_I1', | sf='MS_STREAMING', | sf_i='MS_STREAMING_1', | surrogate_id ='MS_STREAM_1_SURROGATE_2' | -#### Specific context: CPU performance -| measurement | \<common tags> | field | timestamp | -| --- | --- | --- | --- | --- | -| vm_host_cpu_perf | \<common tags> | idle_seconds=0.5 | 1515583926868000000 | +Also NOTE: the metrics provided in the measurements below are effectively a 'snapshot' of usage over a relatively small period of time. The length of this snapshot may vary, depending on the underlying implementation of the instrumentation, so we might have to assume this snapshot is essentially an average of a period of 1 second. Measuring 'usage' is dependent on the units, for example as a proportion of a resource or as a proportion of time. -#### Specific context: RAM performance -| measurement | \<common tags> | field | timestamp | +### Monitoring values: Monitor Service Function Instance Surrogate (VMs) +#### Monitoring values: CPU (Telegraf system metrics) +| measurement | \<common tags> | field | field | timestamp | | --- | --- | --- | --- | --- | -| vm_host_ram_perf | \<common tags> | swap_memory_used | 1515583926868000000 | +| cpu | \<common tags> | cpu_time_user=40, | cpu_time_idle=5 | 1515583926868000000 | -#### Specific context: storage performance +#### Monitoring values: RAM (Telegraf system metrics) | measurement | \<common tags> | field | field | timestamp | -| --- | --- | --- | --- | --- | -| vm_host_storage_perf | \<common tags> | read_bytes=4194304, | write_bytes=2097152 | 1515583926868000000 | +| --- | --- | --- | --- | --- | --- | +| mem | \<common tags> | free=880, | total=2048, | 1515583926868000000 | -#### Specific context: network performance -| measurement | \<common tags> | field | field | field | field | timestamp | +#### Monitoring values: Storage (Telegraf system metrics) +| measurement | \<common tags> | field | field | timestamp | | --- | --- | --- | --- | --- | --- | --- | -| vm_host_network_perf | \<common tags> | rx_bytes_port_M=4194, | tx_bytes_port_M=2097, | packets_dropped=198, | packets_error=5 | 1515583926868000000 | +| vm_host_storage_usage | \<common tags> | disk_free=8144, | swap_size=1576, read_bytes=4194304, | write_bytes=2097152 | 1515583926868000000 | +#### Monitoring values: Network (FLIPS network metrics) +__ Can we measure network usage for a specific VM from FLIPS monitoring? __ -### USAGE: Service Function Instance Surrogate Service usage +| measurement | \<common tags> | field | field | timestamp | +| --- | --- | --- | --- | --- | --- | +| vm_host_network_usage | \<common tags> | rx_bytes_port_M=5242880, | tx_bytes_port_M=15728640 | packets_dropped=198, | packets_error=5 | 1515583926868000000 | -#### Common context -| measurement | tag | tag | tag | tag | tag | tag | -| --- | --- | --- | --- | --- | --- | --- | -| \<measurement label> | location="DC1", | sfc="Scenario1_Template", | sfc_i="Scenario1_Template_I1", | sf_package="MS_STREAMING", | sf_i="MS_STREAMING_1", | vm_instance ="MPEG_DASH_Server1" | +#### Monitoring values: Surrogates + +QUESTIONS +1. Is the content navigation tag and fully qualified domain name (SDN based)? [Most likely: yes] -#### Request (WIP) | measurement | \<common tags> | tag | timestamp | | --- | --- | --- | --- | --- | --- | --- | -| mpegdash_service_request | \<common tags> | cont_nav='http://netflix.com/scream' | 1515583926868000000 | - -#### Response (WIP) - - +| mpegdash_service_mon | \<common tags> | cont_nav='http://netflix.com/scream' | cont_rep='h264' | req_rate=10, | 'avg_resp_time=40, | peak_resp_time=230, | avg_error_rate=0.2, | avg_throughput=200, | \<userProfileField>=\<value> | quality_delivered='5', startup_delay='1200', dropped_segments='2', 1515583926868000000 | -### PERFORMANCE: Service Function Instance Surrogate VM Service performance (KPIs) +| surrogate_route_mon | ... | location='DC1', cont_nav='http://netflix.com/scream', http_requests_fqdn_rate=386, avg_network_fqdn_latency=50, |