Uploaded image for project: 'Active and Available Inventory'
  1. Active and Available Inventory
  2. AAI-2474

AAF healthcheck passes but AAI receives 403 error during authentication and certificate_unknown

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • Casablanca Release, Dublin Release
    • None
    • El Alto Preliminary

      In this environment, the symptom of the problem was a GET for AAI-Resources URL started to return a 403 Forbidden "Access Denied" error, where it previously was working as expected. None of the AAI calls were working at all.

      curl --verbose --silent --insecure --user AAI:AAI --header 'Accept: application/json' --header 'X-TransactionId: testaai' --header 'Content-Type: application/json' --header 'X-FromAppId: AAI' https://${AAI_IP}:30233/aai/v14/network/collections
      
      < HTTP/1.1 403 Forbidden^M
      < Connection: close^M
      < Date: Tue, 04 Jun 2019 07:17:22 GMT^M
      < Content-Type: application/json;charset=utf-8^M
      < Strict-Transport-Security: max-age=16000000; includeSubDomains; preload;^M
      < ^M
      { [data not shown]
      * Closing connection 0
      {
        "timestamp": 1559632642839,
        "status": 403,
        "error": "Forbidden",
        "message": "Access Denied",
        "path": "/aai/v14/network/collections"
      }
      
      

      Further investigation showed that the healthchecks were passing for both AAF and AAI, but there were errors in the logs for pods:

      • dev-aaf-aaf-sms
      • dev-aaf-aaf-sms-quorumclient-0
      • dev-aaf-aaf-sms-quorumclient-1
      • dev-aaf-aaf-sms-quorumclient-2

      The logs show errors for an earlier date, but no logs at all for recent dates (not sure if this means its currently working OK or not).

      onap-hk@kubectl:~$ kubectl -n onap logs dev-aaf-aaf-sms-quorumclient-0 | less    INFO: 2019/02/25 11:15:00 quorumclient.go:77: Starting Log for Quorum Client
      ERROR: 2019/02/25 11:15:00 auth.go:226: Read from file: open auth/dev-aaf-aaf-sms-quorumclient-0/id: no such file or directory
      ERROR: 2019/02/25 11:15:00 quorumclient.go:86: Read ID: open auth/dev-aaf-aaf-sms-quorumclient-0/id: no such file or directory
      WARNING: 2019/02/25 11:15:00 quorumclient.go:87: Unable to find an ID for this client. Generating...
      ERROR: 2019/02/25 11:15:00 auth.go:226: Read from file: open auth/dev-aaf-aaf-sms-quorumclient-0/shard: no such file or directory
      ERROR: 2019/02/25 11:15:00 quorumclient.go:99: Read Shard: open auth/dev-aaf-aaf-sms-quorumclient-0/shard: no such file or directory
      WARNING: 2019/02/25 11:15:00 quorumclient.go:100: Unable to find a shard file. Registering with SMS...
      ERROR: 2019/02/25 11:15:00 auth.go:226: Read from file: open auth/dev-aaf-aaf-sms-quorumclient-0/prkey: no such file or directory
      ERROR: 2019/02/25 11:15:00 quorumclient.go:40: LoadPGP Private Key: open auth/dev-aaf-aaf-sms-quorumclient-0/prkey: no such file or directory
      INFO: 2019/02/25 11:15:00 quorumclient.go:41: No Private Key found. Generating...
      ERROR: 2019/02/25 11:17:18 quorumclient.go:176: Connect to SMS: Get https://aaf-sms.onap:10443/v1/sms/quorum/status: dial tcp 10.247.37.16:10443: connect: connection timed out
      ERROR: 2019/02/25 11:19:25 quorumclient.go:176: Connect to SMS: Get https://aaf-sms.onap:10443/v1/sms/quorum/status: dial tcp 10.247.37.16:10443: connect: connection timed out
      ERROR: 2019/02/25 11:21:32 quorumclient.go:176: Connect to SMS: Get https://aaf-sms.onap:10443/v1/sms/quorum/status: dial tcp 10.247.37.16:10443: connect: connection timed out
      ERROR: 2019/02/25 11:23:39 quorumclient.go:176: Connect to SMS: Get https://aaf-sms.onap:10443/v1/sms/quorum/status: dial tcp 10.247.37.16:10443: connect: connection timed out
      ERROR: 2019/02/25 11:25:47 quorumclient.go:176: Connect to SMS: Get https://aaf-sms.onap:10443/v1/sms/quorum/status: dial tcp 10.247.37.16:10443: connect: connection timed out
      

      The logs for AAI-Resources seemed to show a problem with aaf_locate.

      onap-hk@kubectl:~$ kubectl -n onap logs dev-aai-aai-resources-7c8b66fbf4-pkplj aai-resources | less
      2019-06-04T07:01:54.453+0000: Error connecting https://aaf-locate.onap:8095/locat
      e/AAF_NS.service:2.0 for location.
      org.onap.aaf.misc.env.APIException: javax.net.ssl.SSLHandshakeException: Received
       fatal alert: certificate_unknown
              at org.onap.aaf.cadi.http.HClient.send(HClient.java:153)
              at org.onap.aaf.cadi.aaf.v2_0.AAFLocator.refresh(AAFLocator.java:109)
      
      2019-06-04T07:01:54.454+0000 WARN [cadi] Authenticating Service unavailable from 172.16.0.72:35486 ( No Entries found for 'https://aaf-locate.onap:8095/locate/AAF_NS.service:2.0' )
      2019-06-04T07:01:54.454+0000 AUDIT [cadi] No TAF will authorize for request from 172.16.0.72:35486
      

      There was a message on onap-discuss about certificates expiring 31st May, but not sure if that is also the case here:
      https://lists.onap.org/g/onap-discuss/topic/31911771

      The pod and container for aaf_locate was still running.

      onap-hk@kubectl:~$ kubectl -n onap get pods |  grep aaf-locate
      dev-aaf-aaf-locate-55cf75bbc8-4m7gd                          1/1       Running            0          99d
      

      The aaf-locate logs only shows the same registration endpoint call repeated every few minutes:

      onap-hk@kubectl:~$ kubectl -n onap logs dev-aaf-aaf-locate-55cf75bbc8-4m7gd | less
      
      2019-06-04 07:05:10,385 WARN [service] 2019-06-04T07:05:10.385+0000 WARN [service] user=aaf@aaf.osaaf.org[x509],ip=172.16.0.49,port=59432,ms=0.311474,meth=PUT,path=/registration,msg="Put Mgmt Endpoints"
      

      Checking on the aai-haproxy pod shows that it has detected several events of unhealthiness:

      $ kubectl -n onap describe pod dev-aai-aai-5f48cc456-pwscq
      Name:               dev-aai-aai-5f48cc456-pwscq
      Namespace:          onap
      Priority:           0
      PriorityClassName:  <none>
      Node:               192.168.1.200/192.168.1.200
      Start Time:         Mon, 25 Feb 2019 11:13:42 +0800
      Labels:             app=aai
                          pod-template-hash=190477012
                          release=dev-aai
      Annotations:        checksum/config=edc35b32bd1b7973d32b6ed47481fe22c7596a491b213ea093f0bf9021c7daf6
                          kubernetes.io/availablezone=ap-southeast-1b
      Status:             Running
      IP:                 172.16.0.72
      Controlled By:      ReplicaSet/dev-aai-aai-5f48cc456
      Init Containers:
        aai-readiness:
          Container ID:  docker://ba7d49090e2db4172d07c3348d5ddc4e82c5060a29bcce590db2e812dcf2e041
          Image:         oomk8s/readiness-check:2.0.0
          Image ID:      docker-pullable://oomk8s/readiness-check@sha256:7daa08b81954360a1111d03364febcb3dcfeb723bcc12ce3eb3ed3e53f2323ed
          Port:          <none>
          Host Port:     <none>
          Command:
            /root/ready.py
          Args:
            --container-name
            aai-resources
            --container-name
            aai-traversal
            --container-name
            aai-graphadmin
          State:          Terminated
            Reason:       Completed
            Exit Code:    0
            Started:      Mon, 25 Feb 2019 11:47:48 +0800
            Finished:     Mon, 25 Feb 2019 11:53:15 +0800
          Ready:          True
          Restart Count:  3
          Environment:
            NAMESPACE:  onap (v1:metadata.namespace)
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from default-token-zgjkm (ro)
      Containers:
        aai:
          Container ID:   docker://2d5a500a10c133388e842755782d2001adf3096bf3b22c491fb566f8edf5c92c
          Image:          docker.io/aaionap/haproxy:1.2.4
          Image ID:       docker-pullable://aaionap/haproxy@sha256:002d2cc8519fdeb082943f9bd5cf5c4bd4f489199e6c3e9697fda46259522d3e
          Ports:          8080/TCP, 8443/TCP
          Host Ports:     0/TCP, 0/TCP
          State:          Running
            Started:      Mon, 25 Feb 2019 11:54:01 +0800
          Ready:          True
          Restart Count:  0
          Liveness:       tcp-socket :8443 delay=10s timeout=1s period=10s #success=1 #failure=3
          Readiness:      http-get https://:8443/aai/util/echo delay=10s timeout=1s period=10s #success=1 #failure=3
          Environment:    <none>
          Mounts:
            /dev/log from aai-service-log (rw)
            /etc/localtime from localtime (ro)
            /usr/local/etc/haproxy/haproxy.cfg from haproxy-cfg (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from default-token-zgjkm (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             True
        ContainersReady   True
        PodScheduled      True
      Volumes:
        localtime:
          Type:          HostPath (bare host directory volume)
          Path:          /etc/localtime
          HostPathType:
        aai-service-log:
          Type:          HostPath (bare host directory volume)
          Path:          /dev/log
          HostPathType:
        haproxy-cfg:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      aai-deployment-configmap
          Optional:  false
        default-token-zgjkm:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  default-token-zgjkm
          Optional:    false
      QoS Class:       BestEffort
      Node-Selectors:  <none>
      Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                       node.kubernetes.io/unreachable:NoExecute for 300s
      Events:
        Type     Reason     Age                 From                    Message
        ----     ------     ----                ----                    -------
        Warning  Unhealthy  2m (x5 over 100d)   kubelet, 192.168.1.200  Readiness probe failed: Get https://172.16.0.72:8443/aai/util/echo: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
        Warning  Unhealthy  21s (x19 over 99d)  kubelet, 192.168.1.200  Readiness probe failed: HTTP probe failed with statuscode: 503
        Normal   Healthy    11s (x8 over 100d)  kubelet, 192.168.1.200  container docker://2d5a500a10c133388e842755782d2001adf3096bf3b22c491fb566f8edf5c92c in health status
      

      The system is definitely not healthy, but kubernetes is not doing anything to recover it, i.e. no pods are being restarted, and the externally visible status of the pods/containers seem to falsely claim everything is running fine.

            keong keong
            keong keong
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: