Uploaded image for project: 'Logging analytics'
  1. Logging analytics
  2. LOG-380

Platform Maturity: Performance, Stability, Resiliency, Scalability

    XMLWordPrintable

Details

    • Platform Maturity: Performance, Stability, Resiliency, Scalability

    Description

      https://wiki.onap.org/display/DW/S3P
      +) Performance optimization of the ELK stack in OOM
      https://lists.onap.org/pipermail/onap-discuss/2018-May/009822.html
      See LOG-181 (DaemonSets), LOG-376 (temporary ReplicaSet:3)

      An idle ONAP deployment pushes 30+ logs/sec (1+ Gb/day) into the ELK stack – we are going through optimizing the stack.

      • Logstash was saturating a single VM of a cluster before we put in a temp fix to up the replicaSet from 1 to 3 – there is a change under test/review for a DaemonSet (1 container/vm) – we will discuss this
      • Logstash issue - https://jira.onap.org/browse/LOG-376
      • S3P ELK epic - https://jira.onap.org/browse/LOG-258
      • A root-cause-analysis will be attempted both for the southbound filebeat push of logs into logstash and the northbound pull from elasticsearch during indexing
      • The 30 logs/sec causes periodic vCore peaks of 7
      • The full resource optimization of all of ONAP is being started by Mike E. under https://jira.onap.org/browse/OOM-927 to start - we will work with these changes
      • The resource requirements of most of the components in OOM are defaulted/commented – we will look at the ELK containers in this session
      • Fine tuning the CPU/RAM only for ELK may be problematic unless we tune all ONAP components together (prioritize) – in this session we will just do the 3 log containers
      • If we have time we will cover off
      • Use of a load balancer serviceType – we are taking on faith that the current service distributes load properly on the DaemonSet
      • GC heap usage – if it is an issue
      • Elasticsearch shard settings (beyond defaults)
      • Oscillation behavior under a forced 2g or 2core limit stop/starting container
      • Determine the sweet spot for horizontal clustering of es and ls
      • Determine the effect of cpu/ram resource limits on other pods in particular vms
      • Elasticsearch messagebroker usage – if es is overloading ls
      • Future: ElasticSearch as a service for clamp/aai/log
        Check backpressure setting against filebeat

      Shane: hold off on processing in ES, incrementally process to identify the 7 core bottleneck
      for RCA - back off on heartbeat error logs coming from cluster logs in components - prioritize real onap logs - not infrastructure

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              pau2882 Prudence Au
              michaelobrien Michael O'Brien
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: