Uploaded image for project: 'Data Collection, Analytics, and Events'
  1. Data Collection, Analytics, and Events
  2. DCAEGEN2-928

Cloudify Manager image created with dirty pgsql

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • Casablanca Release
    • Casablanca Release
    • None

      We have been encountering a problem when deploying ONAP using OOM on an OpenStack cloud. The dev-dcae-bootstrap job attempts to bring up the dev-dcae-cloudify-manager pod.

      The pod never passes the TCP socket liveness probe, and so continuously restarts. Thus the job never completes.

      Adjusting the liveness probe time to a much larger value allows us to look at the logs for the Cloudify manager, where the PostgreSQL logs show that the database was closed incorrectly before the image was created. Obviously, with the normal liveness probe time, the database never reaches the point where the automatic recovery can complete.

       

      $ kubectl -n onap exec -it dev-dcae-cloudify-manager-59898df56c-28z59 -- bash
      # more /var/lib/pgsql/9.5/data/pg_log/postgresql-Wed.log
      < 2018-10-31 07:11:24.624 UTC >LOG:  database system was interrupted; last known up at 2018-03-11 11:37:27 UTC
      < 2018-10-31 07:11:25.511 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:26.514 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:27.516 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:28.518 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:29.519 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:30.525 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:31.527 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:32.529 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:33.531 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:34.533 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:35.534 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:36.537 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:37.539 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:38.540 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:39.543 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:40.545 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:41.547 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:42.549 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:43.551 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:44.554 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:45.555 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:46.557 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:47.559 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:48.560 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:49.563 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:50.565 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:51.567 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:52.569 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:53.570 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:54.575 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:55.577 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:56.579 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:57.581 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:58.583 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:11:59.585 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:12:00.070 UTC >LOG:  database system was not properly shut down; automatic recovery in progress
      < 2018-10-31 07:12:00.084 UTC >LOG:  redo starts at 0/1715F50
      < 2018-10-31 07:12:00.181 UTC >LOG:  invalid record length at 0/1960AF8
      < 2018-10-31 07:12:00.181 UTC >LOG:  redo done at 0/1960AD0
      < 2018-10-31 07:12:00.181 UTC >LOG:  last completed transaction was at log time 2018-03-11 11:39:45.243944+00
      < 2018-10-31 07:12:00.587 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:12:01.589 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:12:02.591 UTC >FATAL:  the database system is starting up
      < 2018-10-31 07:12:03.428 UTC >LOG:  MultiXact member wraparound protections are now enabled
      < 2018-10-31 07:12:03.432 UTC >LOG:  database system is ready to accept connections
      < 2018-10-31 07:12:03.433 UTC >LOG:  autovacuum launcher started
      

      Can the image be recreated with the database fault corrected?

       

            jackl Jack Lucas
            camoo camoo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: