Uploaded image for project: 'Logging analytics'
  1. Logging analytics
  2. LOG-380

Platform Maturity: Performance, Stability, Resiliency, Scalability


    • Platform Maturity: Performance, Stability, Resiliency, Scalability

      +) Performance optimization of the ELK stack in OOM
      See LOG-181 (DaemonSets), LOG-376 (temporary ReplicaSet:3)

      An idle ONAP deployment pushes 30+ logs/sec (1+ Gb/day) into the ELK stack – we are going through optimizing the stack.

      • Logstash was saturating a single VM of a cluster before we put in a temp fix to up the replicaSet from 1 to 3 – there is a change under test/review for a DaemonSet (1 container/vm) – we will discuss this
      • Logstash issue - https://jira.onap.org/browse/LOG-376
      • S3P ELK epic - https://jira.onap.org/browse/LOG-258
      • A root-cause-analysis will be attempted both for the southbound filebeat push of logs into logstash and the northbound pull from elasticsearch during indexing
      • The 30 logs/sec causes periodic vCore peaks of 7
      • The full resource optimization of all of ONAP is being started by Mike E. under https://jira.onap.org/browse/OOM-927 to start - we will work with these changes
      • The resource requirements of most of the components in OOM are defaulted/commented – we will look at the ELK containers in this session
      • Fine tuning the CPU/RAM only for ELK may be problematic unless we tune all ONAP components together (prioritize) – in this session we will just do the 3 log containers
      • If we have time we will cover off
      • Use of a load balancer serviceType – we are taking on faith that the current service distributes load properly on the DaemonSet
      • GC heap usage – if it is an issue
      • Elasticsearch shard settings (beyond defaults)
      • Oscillation behavior under a forced 2g or 2core limit stop/starting container
      • Determine the sweet spot for horizontal clustering of es and ls
      • Determine the effect of cpu/ram resource limits on other pods in particular vms
      • Elasticsearch messagebroker usage – if es is overloading ls
      • Future: ElasticSearch as a service for clamp/aai/log
        Check backpressure setting against filebeat

      Shane: hold off on processing in ES, incrementally process to identify the 7 core bottleneck
      for RCA - back off on heartbeat error logs coming from cluster logs in components - prioritize real onap logs - not infrastructure

            pau2882 pau2882
            michaelobrien michaelobrien
            0 Vote for this issue
            3 Start watching this issue