-
Bug
-
Resolution: Unresolved
-
Medium
-
Honolulu Release
-
Internal Honolulu ONAP
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.14+rke2r1", GitCommit:"0fd2b5afdfe3134d6e1531365fdb37dd11f54d1c", GitTreeState:"clean", BuildDate:"2021-08-20T21:41:21Z", GoVersion:"go1.15.14b5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.14+rke2r1", GitCommit:"0fd2b5afdfe3134d6e1531365fdb37dd11f54d1c", GitTreeState:"clean", BuildDate:"2021-08-20T21:41:21Z", GoVersion:"go1.15.14b5", Compiler:"gc", Platform:"linux/amd64"}version.BuildInfo{Version:"v3.5.3", GitCommit:"041ce5a2c17a58be0fcd5f5e16fb3e7e95fea622", GitTreeState:"dirty", GoVersion:"go1.15.8"}
Internal Honolulu ONAP Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.14+rke2r1", GitCommit:"0fd2b5afdfe3134d6e1531365fdb37dd11f54d1c", GitTreeState:"clean", BuildDate:"2021-08-20T21:41:21Z", GoVersion:"go1.15.14b5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.14+rke2r1", GitCommit:"0fd2b5afdfe3134d6e1531365fdb37dd11f54d1c", GitTreeState:"clean", BuildDate:"2021-08-20T21:41:21Z", GoVersion:"go1.15.14b5", Compiler:"gc", Platform:"linux/amd64"} version.BuildInfo{Version:"v3.5.3", GitCommit:"041ce5a2c17a58be0fcd5f5e16fb3e7e95fea622", GitTreeState:"dirty", GoVersion:"go1.15.8"} aai.tar.gz
In a honolulu based ONAP deployment, we faced a major outage on the storage backend (Ceph) for a couple of hours.
After this outage has been solved, it seems that there are some issues on the A&AI.
Even if all the pods look OK
- onap-aai-5f5b79578f-9dl9w 1/1 Running 0 14d
onap-aai-babel-6b5b9d68db-8gg7x 2/2 Running 0 14d
onap-aai-graphadmin-76f54749fd-qm444 2/2 Running 0 42m
onap-aai-graphadmin-create-db-schema-w9f8g 0/1 Completed 0 14d
onap-aai-modelloader-6cd5f9c76-zhdwb 2/2 Running 0 14d
onap-aai-resources-5bcbb56745-fth5k 2/2 Running 0 21h
onap-aai-schema-service-c8c9bfb59-dgbdh 2/2 Running 0 14d
onap-aai-sparky-be-66767c86ff-bck8k 2/2 Running 0 14d
onap-aai-traversal-85b8bd9994-xs22k 2/2 Running 0 14d
onap-aai-traversal-update-query-data-c6kk6 0/1 Completed 0 14d
The Enhanced healthcheck test is now systematically FAIL
Basic A&AI Health Check | PASS | ------------------------------------------------------------------------------ Enhanced A&AI Health Check | FAIL | 500 != 201 ------------------------------------------------------------------------------
I attached the test and the AAI logs in the Jira
I gave a try
- restart of onap-aai-graphadmin
- restart of cassandra
without success
in the onap-aai-graphadmin pod, I can see a nice exception
021-10-05 13:04:28.912 WARN 1 — [pool-9-thread-1] o.j.diskstorage.log.kcvs.KCVSLog : Could not read messages for timestamp [2021-10-05T12:31:00.074Z] (this read will be retried) org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57) at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:159) at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.run(KCVSLog.java:732) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Could not successfully complete backend operation due to repeated temporary exceptions after PT4S at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:101) at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:55) ... 9 common frames omitted Caused by: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$null$2(CQLKeyColumnValueStore.java:126) at io.vavr.API$Match$Case0.apply(API.java:3174) at io.vavr.API$Match.of(API.java:3137) at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.lambda$static$3(CQLKeyColumnValueStore.java:123) at io.vavr.control.Try.getOrElseThrow(Try.java:671) at org.janusgraph.diskstorage.cql.CQLKeyColumnValueStore.getSlice(CQLKeyColumnValueStore.java:290) at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:798) at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:795) at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:148) at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:162) at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:69)
I had a look at https://docs.onap.org/projects/onap-aai-aai-common/en/honolulu/platform/architecture.html
but did not find any solution.
How can I troubleshoot this issue?