Loading...

XML

Word

Printable

Type: Epic
Resolution: Won't Do
Priority: Medium
Fix Version/s: Frankfurt Release
Affects Version/s: None
Labels:

Epic Name:
Platform Maturity: Performance, Stability, Resiliency, Scalability

https://wiki.onap.org/display/DW/S3P
+) Performance optimization of the ELK stack in OOM
https://lists.onap.org/pipermail/onap-discuss/2018-May/009822.html
See ~~LOG-181~~ (DaemonSets), ~~LOG-376~~ (temporary ReplicaSet:3)

An idle ONAP deployment pushes 30+ logs/sec (1+ Gb/day) into the ELK stack – we are going through optimizing the stack.

Logstash was saturating a single VM of a cluster before we put in a temp fix to up the replicaSet from 1 to 3 – there is a change under test/review for a DaemonSet (1 container/vm) – we will discuss this
Logstash issue - https://jira.onap.org/browse/LOG-376
S3P ELK epic - https://jira.onap.org/browse/LOG-258
A root-cause-analysis will be attempted both for the southbound filebeat push of logs into logstash and the northbound pull from elasticsearch during indexing
The 30 logs/sec causes periodic vCore peaks of 7
The full resource optimization of all of ONAP is being started by Mike E. under https://jira.onap.org/browse/OOM-927 to start - we will work with these changes
The resource requirements of most of the components in OOM are defaulted/commented – we will look at the ELK containers in this session
Fine tuning the CPU/RAM only for ELK may be problematic unless we tune all ONAP components together (prioritize) – in this session we will just do the 3 log containers
If we have time we will cover off
Use of a load balancer serviceType – we are taking on faith that the current service distributes load properly on the DaemonSet
GC heap usage – if it is an issue
Elasticsearch shard settings (beyond defaults)
Oscillation behavior under a forced 2g or 2core limit stop/starting container
Determine the sweet spot for horizontal clustering of es and ls
Determine the effect of cpu/ram resource limits on other pods in particular vms
Elasticsearch messagebroker usage – if es is overloading ls
Future: ElasticSearch as a service for clamp/aai/log
Check backpressure setting against filebeat

Shane: hold off on processing in ES, incrementally process to identify the 7 core bottleneck
for RCA - back off on heartbeat error logs coming from cluster logs in components - prioritize real onap logs - not infrastructure

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

20180913_s3p_req_0.png
455 kB
13/Sep/18 3:28 PM
20180913_s3p_req_1.png
563 kB
13/Sep/18 3:29 PM

blocks

LOG-376 Logstash full saturation of 8 cores with AAI deployed on one of the quad 8 vCore vms for 30 logs/sec - up replicaSet 1 to 3 or use DaemonSet

Closed

LOG-507 POMBA as RI/Demo of Casablanca Logging spec/library compliance - deployment/helm and runtime

Closed

is blocked by

LOG-154 Platform Maturity: Beijing required security badging procedure

Closed

LOG-992 RKE 0.16 / Docker 18.06 for ONAP installation - migrate from Rancher for Dublin - script support

Closed

LOG-258 S3P ELK stack performance and clustering

Closed

AAF-273 Cassandra pod running over 8G heap - or 10% of ONAP ram (for 135 other pods on 256G 4 node cluster)

Closed

LOG-651 LOG: address failing jenkins jobs

Closed

is duplicated by

LOG-258 S3P ELK stack performance and clustering

Closed

LOG-146 ELK stack performance testing under logging library

Closed

relates to

LOG-256 log-kibana deployment is timing out periodically

Closed

LOG-157 S3P Events, Metrics, Monitoring

Closed

LOG-366 OOM Volumetrics - docker image sizes

Closed

LOG-436 Log file rotation - to avoid saturating a particular VM

Closed

LOG-181 Use DaemonSets for logging - Logstash

Closed

LOG-494 Use Search Guard Community Edition for TLS REST encryption

Closed

LOG-876 S3P: Logging for Core Service/VNF state and transition model - Deutsche Telekom and Vodafone

Closed

INT-597 Review container images best practices - Logging team

Closed

OOM-346 Platform Resiliency (Recoverability, High-Availability, Geo-Diversity)

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(2 is blocked by, 2 is duplicated by, 9 relates to, 5 mentioned in)

1.	Kubernetes log rotation strategy for /var/lib/docker/container/	Closed	pau2882
2.	Baseline standard log load	Closed	pau2882
3.	Kibana must wait for elasticsearch to pass readiness/liveness	Closed	pau2882

Assignee:: pau2882

Reporter:: michaelobrien

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 24/Apr/18 6:37 PM

Updated:: 12/Aug/23 3:50 AM

Resolved:: 09/Dec/21 4:22 PM

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates