Uploaded image for project: 'Data Collection, Analytics, and Events'
  1. Data Collection, Analytics, and Events
  2. DCAEGEN2-3277

DCAE-Hearthbeat pod only healthy, if MR has not been started

XMLWordPrintable

      The HB service starts the worker process (hb_worker_process), if the process "htbtworker" has finished.
      Only, then the "hb_worker_process" is up, the HB pod supervision is showing a successful result and the deployment shows "healthy" (readiness check"

      The "htbtworker" process tries periodically to connect to the Kafka Topic "unauthenticated.SEC_HEARTBEAT_OUTPUT", which usually not exists.
      Therefor the "htbtworker" will not finish and the "hb_worker_process" is blocked and the pod readiness check fails.
      In usual ONAP deployment scenarios the message-router is not running, when the HB service is started, which leads to an exception in the "htbtworker" and the "hb_worker_process" is launched:

      2022-09-30 05:13:29,931 | __main__ | db_monitoring | <module> | 235 |  INFO | DBM: DBM Process started
      2022-09-30 05:13:29,932 | __main__ | db_monitoring | <module> | 240 |  INFO | ('DBM:Parent process ID and json file name', '6', '../etc/config.json')
      2022-09-30 05:13:48,984 | __main__ | htbtworker | process_msg | 79 |  INFO | ('\n\nHBT:eventnameList values ', ['Heartbeat_vDNS', 'Heartbeat_vFW', 'Heartbeat_xx'])
      2022-09-30 05:13:48,985 | __main__ | htbtworker | process_msg | 85 |  INFO | HBT:Getting :http://message-router:3904/events/unauthenticated.SEC_HEARTBEAT_OUTPUT/hbgrpID/1?timeout=15000
      2022-09-30 05:13:49,015 | urllib3.connectionpool | connectionpool | _new_conn | 226 |  DEBUG | Starting new HTTP connection (1): message-router:3904
      2022-09-30 05:13:49,985 | __main__ | db_monitoring | db_monitoring | 161 |  INFO | DBM: Active DB Monitoring Instance
      Traceback (most recent call last):
        File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 159, in _new_conn
          conn = connection.create_connection(
        File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 84, in create_connection
          raise err
        File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 74, in create_connection
          sock.connect(sa)
      ConnectionRefusedError: [Errno 111] Connection refusedDuring handling of the above exception, another exception occurred:
      ...
      requests.exceptions.ConnectionError: HTTPConnectionPool(host='message-router', port=3904): Max retries exceeded with url: /events/unauthenticated.SEC_HEARTBEAT_OUTPUT/hbgrpID/1?timeout=15000 (Caused by New
      ConnectionError('<urllib3.connection.HTTPConnection object at 0x7f737a282460>: Failed to establish a new connection: [Errno 111] Connection refused'))
      2022-09-30 05:13:50,123 | __main__ | misshtbtd | hb_worker_process | 202 |  INFO | MSHBT:Creaated Heartbeat worker process
      2022-09-30 05:13:54,625 | __main__ | misshtbtd | main | 359 |  INFO | ('MSHBT: hb_common values ', 6, 'RUNNING', 'dcae-heartbeat-', 1664514809)
      2022-09-30 05:13:54,626 | __main__ | misshtbtd | main | 363 |  INFO | ('MSHBD:pid,srcName,state,time,ctime,timeDiff is', 6, 'dcae-heartbeat-', 'RUNNING', 1664514809, 1664514835, 26)
      2022-09-30 05:13:54,627 | __main__ | misshtbtd | main | 379 |  INFO | ('MSHBD:config status is', 'RUNNING')
       

      So only due to this exception finally the hb_worker_process is started:

      2022-09-30 05:13:50,123 | __main__ | misshtbtd | hb_worker_process | 202 |  INFO | MSHBT:Creaated Heartbeat worker process  

      In the "normal" case the "htbtworker" would run endlessly until the topic is created (e.g. in the ServiceMesh environment) and the "hb_worker_process" will never be started.

      2022-09-28 10:49:08,399 | __main__ | htbtworker | <module> | 258 |  INFO | HBT:HeartBeat thread Created
      2022-09-28 10:49:08,399 | __main__ | htbtworker | <module> | 260 |  INFO | ('HBT:The config file name passed is -%s', '../etc/config.json')
      2022-09-28 10:49:09,011 | __main__ | misshtbtd | create_process | 327 |  INFO | ('MSHBD:jobs list is', [<Process name='Process-2' pid=13 parent=6 started>, <Process name='Process-3' pid=15 parent=6 started
      >])
      2022-09-28 10:49:09,129 | __main__ | misshtbtd | create_update_hb_common | 121 |  INFO | MSHBT:Updated  hb_common DB with new values
      2022-09-28 10:49:09,510 | __main__ | db_monitoring | <module> | 235 |  INFO | DBM: DBM Process started
      2022-09-28 10:49:09,511 | __main__ | db_monitoring | <module> | 240 |  INFO | ('DBM:Parent process ID and json file name', '6', '../etc/config.json')
      2022-09-28 10:49:28,461 | __main__ | htbtworker | process_msg | 79 |  INFO | ('\n\nHBT:eventnameList values ', ['Heartbeat_vDNS', 'Heartbeat_vFW', 'Heartbeat_xx'])
      2022-09-28 10:49:28,462 | __main__ | htbtworker | process_msg | 85 |  INFO | HBT:Getting :http://message-router:3904/events/unauthenticated.SEC_HEARTBEAT_OUTPUT/hbgrpID/1?timeout=15000
      2022-09-28 10:49:28,485 | urllib3.connectionpool | connectionpool | _new_conn | 226 |  DEBUG | Starting new HTTP connection (1): message-router:3904
      2022-09-28 10:49:28,562 | urllib3.connectionpool | connectionpool | _make_request | 433 |  DEBUG | http://message-router:3904 "GET /events/unauthenticated.SEC_HEARTBEAT_OUTPUT/hbgrpID/1?timeout=15000 HTTP/
      1.1" 404 142
      2022-09-28 10:49:28,563 | __main__ | htbtworker | process_msg | 100 |  INFO | ('HBT:', '{"mrstatus":3001,"helpURL":"http://onap.readthedocs.io","message":"No such topic exists.-[unauthenticated.SEC_HEARTBE
      AT_OUTPUT]","status":404}')
      2022-09-28 10:49:29,583 | __main__ | db_monitoring | db_monitoring | 161 |  INFO | DBM: Active DB Monitoring Instance
      2022-09-28 10:49:34,210 | __main__ | misshtbtd | main | 359 |  INFO | ('MSHBT: hb_common values ', 6, 'RUNNING', 'dcae-heartbeat-', 1664362149)
      2022-09-28 10:49:34,211 | __main__ | misshtbtd | main | 363 |  INFO | ('MSHBD:pid,srcName,state,time,ctime,timeDiff is', 6, 'dcae-heartbeat-', 'RUNNING', 1664362149, 1664362174, 25)
      2022-09-28 10:49:34,215 | __main__ | misshtbtd | main | 379 |  INFO | ('MSHBD:config status is', 'RUNNING')
      2022-09-28 10:49:34,339 | __main__ | misshtbtd | create_update_hb_common | 121 |  INFO | MSHBT:Updated  hb_common DB with new values
      2022-09-28 10:49:48,624 | __main__ | htbtworker | process_msg | 79 |  INFO | ('\n\nHBT:eventnameList values ', ['Heartbeat_vDNS', 'Heartbeat_vFW', 'Heartbeat_xx'])
      2022-09-28 10:49:48,624 | __main__ | htbtworker | process_msg | 85 |  INFO | HBT:Getting :http://message-router:3904/events/unauthenticated.SEC_HEARTBEAT_OUTPUT/hbgrpID/1?timeout=15000
      2022-09-28 10:49:48,637 | urllib3.connectionpool | connectionpool | _new_conn | 226 |  DEBUG | Starting new HTTP connection (1): message-router:3904
       

       

            vv770d vv770d
            andreasgeissler Andreas Geissler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: