Uploaded image for project: 'Data Collection, Analytics, and Events'
  1. Data Collection, Analytics, and Events
  2. DCAEGEN2-2863

PRH holds too many connections to CBS and cause system crash

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: High High
    • Istanbul Release
    • Guilin Release, Honolulu Release
    • None

      PRH (PNF Registration Handler) microservice periodically get its configuration from CBS (Config Binding Service).
      However, it does not clean up connections with CBS, so the old sessions accumulated endlessly.
      Finally, the PRH/CBS pod exhausted all available file descriptors up to system limit, then the other pods on the same node crashed with 'connection refused' 'connection timed out' 'too many open files' error.

      This issue occurs if ONAP is running for long time.
      How long to occur depends on setting, by default polling interval to CBS is 5 mins and ubuntu's file descriptor limit is 65535. If PRH and CBS are running on the same node, the issue can happen in 2~3 months.

        cbs-netstat.txt.zip ... active connections by netstat command. we can see about 25k sessions were kept by PRH/CBS.

      Steps to reproduce

      1. To be reproduced faster, change the property updates-interval: 5m to 5s in bootstrap.yaml in prh-app-server-1.5.4.jar .
      2. Start CBS and PRH pod on the same node (use nodeSelector etc)
      3. About 35 hours later, the node become unstable (due to system resource exhausted), and some pods on the same node crash and stop working due to an error.

            deen1985 deen1985
            s4fujii s4fujii
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: