Uploaded image for project: 'Application Authorization Framework'
  1. Application Authorization Framework
  2. AAF-722

aaf continues to be available to aai-resources even though aaf database appears to be down

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • Dublin Release
    • Casablanca Release
    • None
    • AAF Dublin M1 01/11-01/31, AAF Dublin M2 02/01-02/21

      While debugging and re-testing AAI-2082, found that the aai-resources REST calls started to return "Unauthorized" error on GET calls, even though the GET calls were just working a second ago:

      < HTTP/1.1 401 Unauthorized
      < Connection: close
      < Date: Tue, 22 Jan 2019 03:24:05 GMT
      * Authentication problem. Ignoring this.
      < WWW-Authenticate: Basic realm="people.osaaf.org"
      < Strict-Transport-Security: max-age=16000000; includeSubDomains; preload;
      <
       

      The aai-resources container logs showed the transition from "authenticated" to "invalid":

      docker logs `docker ps --filter name=k8s_aai-resources --quiet` > log-dockerlog-resources.txt
      
      2019-01-21T17:30:30.393+0000 AUDIT [cadi] ID AAI converted to aai@aai.onap.org
      2019-01-21T17:30:30.393+0000 INFO [cadi] Authenticated: aai@aai.onap.org authenticated by cached AAF password from 172.16.0.140:42812
      2019-01-21T17:30:30.408+0000 WARN [cadi] Trans: user=aai@aai.onap.org[BAth],ip=172.16.0.140,ms=14.684558,validate=0.151018,code=14.532035
      2019-01-21T17:30:30.465+0000 AUDIT [cadi] ID AAI converted to aai@aai.onap.org
      2019-01-21T17:30:30.492+0000 INFO [cadi] Authenticating via redirection:  user/pass combo invalid via AAF from 172.16.0.140 for aai@aai.onap.org from 172.16.0.140:42812
      2019-01-21T17:30:30.493+0000 WARN [cadi] Trans: user=n/a[],ip=172.16.0.140,ms=27.228724,validate=27.197535,code=0.000000
      2019-01-21T17:30:30.578+0000 AUDIT [cadi] ID AAI converted to aai@aai.onap.org
      2019-01-21T17:30:30.582+0000 INFO [cadi] Authenticating via redirection:  user/pass combo invalid via AAF from 172.16.0.140 for aai@aai.onap.org from 172.16.0.140:42812
      2019-01-21T17:30:30.582+0000 WARN [cadi] Trans: user=n/a[],ip=172.16.0.140,ms=4.122453,validate=4.113385,code=0.000000
      2019-01-21T17:30:30.773+0000 AUDIT [cadi] ID AAI converted to aai@aai.onap.org
      2019-01-21T17:30:30.773+0000 INFO [cadi] Authenticating via redirection:  User/Pass Retry limit exceeded for aai@aai.onap.org from 172.16.0.140:42812
      2019-01-21T17:30:30.773+0000 WARN [cadi] Trans: user=n/a[],ip=172.16.0.140,ms=0.297568,validate=0.291654,code=0.000000

      Thereafter, every GET call to aai-resources returned the "Unauthorized" error.

      Checking the pods with kubectl showed no particular recent changes:

      root@kubectl:~# kubectl -n onap get pods | grep aa
      dev-aaf-cm-bbfddb884-2c4h9                              1/1       Running            0          31d
      dev-aaf-cs-5fb47b9747-bjfbv                             0/1       CrashLoopBackOff   162        31d
      dev-aaf-fs-845884fcf9-kfshf                             1/1       Running            0          31d
      dev-aaf-gui-5dbd7479b9-mr2jl                            1/1       Running            0          31d
      dev-aaf-hello-64db6d658-24bbq                           1/1       Running            0          31d
      dev-aaf-locate-586b8b8bbc-r98xg                         1/1       Running            0          31d
      dev-aaf-oauth-5c8b4c5f87-grhx5                          1/1       Running            0          31d
      dev-aaf-service-7db77c4d47-p9lrg                        1/1       Running            0          31d
      dev-aaf-sms-5d8765bcb-8c6nm                             1/1       Running            0          31d
      dev-aaf-sms-preload-bcgqs                               0/1       Completed          0          26d
      dev-aaf-sms-quorumclient-0                              1/1       Running            0          31d
      dev-aaf-sms-quorumclient-1                              1/1       Running            0          31d
      dev-aaf-sms-quorumclient-2                              1/1       Running            0          31d
      dev-aaf-sms-vault-0                                     2/2       Running            1          29d
      dev-aaf-sshsm-distcenter-phhqz                          0/1       Completed          0          31d
      dev-aaf-sshsm-testca-vsjls                              0/1       Completed          0          31d
      dev-aai-5f58b6bb4f-sj4pj                                1/1       Running            0          31d
      dev-aai-babel-5767fc855c-zxkmr                          2/2       Running            0          31d
      dev-aai-cassandra-0                                     1/1       Running            0          5d
      dev-aai-cassandra-1                                     1/1       Running            14         5d
      dev-aai-cassandra-2                                     1/1       Running            0          5d
      dev-aai-champ-f48fbb795-8bt9c                           2/2       Running            0          31d
      dev-aai-data-router-7c549f657-tt5tv                     2/2       Running            0          31d
      dev-aai-elasticsearch-67cfb8dfc5-p77z9                  1/1       Running            0          31d
      dev-aai-gizmo-6c9465c596-97f4x                          2/2       Running            0          31d
      dev-aai-graphadmin-7747f78755-458gs                     2/2       Running            0          31d
      dev-aai-graphadmin-create-db-schema-4mhj9               0/1       Completed          0          31d
      dev-aai-graphadmin-create-db-schema-b9496               0/1       Init:Error         0          31d
      dev-aai-modelloader-9fdb4df5c-p9wfb                     2/2       Running            0          31d
      dev-aai-resources-d5465675d-msjlf                       2/2       Running            0          31d
      dev-aai-search-data-bcfb67bdc-74pn8                     2/2       Running            0          31d
      dev-aai-sparky-be-6b7d5b5dc4-dl44d                      2/2       Running            0          31d
      dev-aai-spike-56f8768d9c-mwm62                          2/2       Running            0          31d
      dev-aai-traversal-bdf87969c-pkzn9                       2/2       Running            0          31d
      dev-aai-traversal-update-query-data-zsxnm               0/1       Completed          0          31d
      dev-dmaap-bus-controller-6777548d7c-m7j6z               1/1       Running            0          31d
      dev-dmaap-dr-db-ff8cfdc9c-564qh                         1/1       Running            0          27d
      dev-dmaap-dr-node-57f8f8bf74-7q7tb                      1/1       Running            0          31d
      dev-dmaap-dr-prov-5658f66b4c-wvk87                      1/1       Running            0          31d
      dev-pomba-aaictxbuilder-7c5b94fd9d-cq9w6                2/2       Running            0          31d
      dev-sdnc-dmaap-listener-d9448dbf8-t9xv4                 1/1       Running            0          31d
       

      Checking the logs of aaf-service showed:

      root@kubectl:~# kubectl -n onap logs dev-aaf-service-7db77c4d47-p9lrg > logs-dev-aaf-service.txt
      
      2019-01-21 17:30:30,473 WARN [service] 2019-01-21T17:30:30.473+0000 WARN [service] aai@aai.onap.org failed password validation, ip=n/a: Error - com.datastax.driver.core.exceptions.NoHostAvailableException-All host(s) tried for query failed (no host was tried) executing SELECT ID,TYPE,EXPIRES,OTHER,NS,NOTES,CRED FROM CRED WHERE ID = ?
      2019-01-21 17:30:30,473 WARN [service] 2019-01-21T17:30:30.473+0000 WARN [service] user=aai@aai.onap.org[x509],ip=172.16.0.49,port=56090,ms=0.660414,meth=GET,path=/authn/basicAuth
      2019-01-21 17:30:30,581 ERROR [service] 2019-01-21T17:30:30.581+0000 ERROR [service]
      2019-01-21 17:30:30,581 WARN [service] 2019-01-21T17:30:30.581+0000 WARN [service] aai@aai.onap.org failed password validation, ip=n/a: Error - com.datastax.driver.core.exceptions.NoHostAvailableException-All host(s) tried for query failed (no host was tried) executing SELECT ID,TYPE,EXPIRES,OTHER,NS,NOTES,CRED FROM CRED WHERE ID = ?
      2019-01-21 17:30:30,581 WARN [service] 2019-01-21T17:30:30.581+0000 WARN [service] user=aai@aai.onap.org[x509],ip=172.16.0.49,port=56100,ms=0.527703,meth=GET,path=/authn/basicAuth
       

      So apparently the AAF database is down, but the authentication API continued to respond to calls. Apparently this condition was not picked up by healthchecks.

      We are trying to do a workaround by restarting AAF pods, but presumably this kind of corrective action should be automated by kubernetes, assuming it detects the service is down.

      After a restart of AAF pods, found this error in the logs:

      root@kubectl:~# kubectl -n onap logs dev-aaf-cs-5fb47b9747-bjfbv > logs-aaf-cs.txt
      
      INFO  [main] 2019-01-22 05:28:42,415 CommitLogReader.java:105 - Skipping playback of empty log: CommitLog-6-1548134219713.log
      INFO  [main] 2019-01-22 05:28:42,417 CommitLogReader.java:105 - Skipping playback of empty log: CommitLog-6-1548134562587.log
      Waiting for Cassandra to start... Sleep 10
      ERROR [main] 2019-01-22 05:28:53,542 JVMStabilityInspector.java:102 - Exiting due to error while processing commit log during initialization.
      org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Mutation checksum failure at 18796429 in Next section at 18796429 in CommitLog-6-1547613282570.log
              at org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:394) [apache-cassandra-3.11.3.jar:3.11.3]
              at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:251) [apache-cassandra-3.11.3.jar:3.11.3]
              at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:132) [apache-cassandra-3.11.3.jar:3.11.3]
              at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:137) [apache-cassandra-3.11.3.jar:3.11.3]
              at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:177) [apache-cassandra-3.11.3.jar:3.11.3]
              at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:158) [apache-cassandra-3.11.3.jar:3.11.3]
              at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:324) [apache-cassandra-3.11.3.jar:3.11.3]
              at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:602) [apache-cassandra-3.11.3.jar:3.11.3]
              at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:691) [apache-cassandra-3.11.3.jar:3.11.3]
      Waiting for Cassandra to start... Sleep 10
      

       

            ak1761 ak1761
            keong keong
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Estimated:
                Original Estimate - 0 minutes
                0m
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 day
                1d