Uploaded image for project: 'Active and Available Inventory'
  1. Active and Available Inventory
  2. AAI-3390

Enhanced AAI test failing while all pods are Running after storage backend incident

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Medium Medium
    • Montreal Release
    • Honolulu Release

      In a honolulu based ONAP deployment, we faced a major outage on the storage backend (Ceph) for a couple of hours.

      After this outage has been solved, it seems that there are some issues on the A&AI.

      Even if all the pods look OK

      • onap-aai-5f5b79578f-9dl9w 1/1 Running 0 14d
        onap-aai-babel-6b5b9d68db-8gg7x 2/2 Running 0 14d
        onap-aai-graphadmin-76f54749fd-qm444 2/2 Running 0 42m
        onap-aai-graphadmin-create-db-schema-w9f8g 0/1 Completed 0 14d
        onap-aai-modelloader-6cd5f9c76-zhdwb 2/2 Running 0 14d
        onap-aai-resources-5bcbb56745-fth5k 2/2 Running 0 21h
        onap-aai-schema-service-c8c9bfb59-dgbdh 2/2 Running 0 14d
        onap-aai-sparky-be-66767c86ff-bck8k 2/2 Running 0 14d
        onap-aai-traversal-85b8bd9994-xs22k 2/2 Running 0 14d
        onap-aai-traversal-update-query-data-c6kk6 0/1 Completed 0 14d

       

      The Enhanced healthcheck test is now systematically FAIL

      Basic A&AI Health Check | PASS |
      ------------------------------------------------------------------------------
      Enhanced A&AI Health Check | FAIL |
      500 != 201
      ------------------------------------------------------------------------------

      I attached the test and the AAI logs in the Jira

       

      I gave a try

      • restart of onap-aai-graphadmin
      • restart of cassandra

      without success

       

      in the onap-aai-graphadmin pod, I can see a nice exception

       

      021-10-05 13:04:28.912 WARN 1 — [pool-9-thread-1] o.j.diskstorage.log.kcvs.KCVSLog : Could not read messages for timestamp [2021-10-05T12:31:00.074Z] (this read will be retried)
      org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
      at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
      at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:159)
      at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.run(KCVSLog.java:732)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT4S
      at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101)
      at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55)
      ... 9 common frames omitted
      Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
      at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$null$2(CQLKeyColumnValueStore.java:126)
      at io.vavr.API$Match$Case0.apply(API.java:3174)
      at io.vavr.API$Match.of(API.java:3137)
      at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$3(CQLKeyColumnValueStore.java:123)
      at io.vavr.control.Try.getOrElseThrow(Try.java:671)
      at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:290)
      at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:798)
      at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:795)
      at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:148)
      at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:162)
      at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
      

       

       

      I had a look at https://docs.onap.org/projects/onap-aai-aai-common/en/honolulu/platform/architecture.html

      but did not find any solution.

      How can I troubleshoot this issue?

       

            Unassigned Unassigned
            mrichomme mrichomme
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: