Uploaded image for project: 'ONAP Operations Manager'
  1. ONAP Operations Manager
  2. OOM-1547

ONAP consistent deployment of pods in dependency order - to avoid failed random deployments

    XMLWordPrintable

Details

    Description

      20190108 work continues on the cd.sh patch in LOG-898 under https://gerrit.onap.org/r/#/c/75422

      ONAP 3.0.0-ONAP comes up randomly (Dublin order is being refactored/fixed) - this causes contention for resources vCPU, HDD, Network during pod startup - causing some k8s jobs to timeout (sdnc, appc, so) before their dependencies or access to resources is fulfilled.  For example SNDC ansible container takes up to 20 min to pull the docker image when running alongside other starting pods.

          Sequenced/tagged 29 pod serial deployment work for consistency (Alain and I came up with this - I like his tagged id approach - we used to deploy a bit like this in the A release but without HC or pod state check in between or tag intelligence) - better is to use a dependency order - first case is just to deploy as we do alphabetically but wait for the pods to be up for each of the 29 pods.

       Proposal: temporary from Yumtaax3425 and michaelobrien

      • bring up each pod in dependency order (for phase 1 even just bringing up the pods 1 by 1 in alphabetical order is ok)  - for example AAI and AAF first- to allow for each pod to complete and pass atomic healthcheck before deploying the next pod.
      • If we have issues with a pod - fix the undeploy so that pv/pvc's are also purged - see OOM-1543
      • In this we we should be able to have a consistent deployment where 51 of 51 HC's pass on every deployment - not by chance currently
      • The timing workarounds for example in https://git.onap.org/oom/tree/kubernetes/onap/resources/environments/public-cloud.yaml are actually a workaround for the underlying issue of random start order of the pods
      • In some ways deployment may actually be faster (not as fast as docker preload before deploy) than allowing for resource contention the first 30 min of the deploy

       20181212 casablanca branch state for 3.0.0-ONAP (Azure 432g vm)

       

      ubuntu@a-cd-cas2:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' | wc -l
      43
      ubuntu@a-cd-cas2:~$ kubectl get pods --all-namespaces | grep -E '1/1|2/2' | wc -l
      217
      ubuntu@a-cd-cas2:~$ free
                    total        used        free      shared  buff/cache   available
      Mem:      445804668   150300876   180957764      261628   114546028   285549272
      Swap:             0           0           0
      ubuntu@a-cd-cas2:~$ df
      Filesystem     1K-blocks      Used Available Use% Mounted on
      udev           222891708         0 222891708   0% /dev
      tmpfs           44580468     70620  44509848   1% /run
      /dev/sda1      129029904 103296108  25717412  81% /
      tmpfs          222902332     40776 222861556   1% /dev/shm
      tmpfs               5120         0      5120   0% /run/lock
      tmpfs          222902332         0 222902332   0% /sys/fs/cgroup
      /dev/sdb1      891622252     73748 846233692   1% /mnt
      tmpfs           44580468         0  44580468   0% /run/user/1000
      u
      ubuntu@a-cd-cas2:~/oom/kubernetes/robot$ ./ete-k8s.sh onap health
      ++ export NAMESPACE=onap
      ++ NAMESPACE=onap
      +++ kubectl --namespace onap get pods
      +++ sed 's/ .*//'
      +++ grep robot
      ++ POD=onap-robot-robot-ddd948476-zgp8n
      ++ TAGS='-i health'
      ++ ETEHOME=/var/opt/OpenECOMP_ETE
      +++ kubectl --namespace onap exec onap-robot-robot-ddd948476-zgp8n -- bash -c 'ls -1q /share/logs/ | wc -l'
      ++ export GLOBAL_BUILD_NUMBER=1
      ++ GLOBAL_BUILD_NUMBER=1
      +++ printf %04d 1
      ++ OUTPUT_FOLDER=0001_ete_health
      ++ DISPLAY_NUM=91
      ++ VARIABLEFILES='-V /share/config/vm_properties.py -V /share/config/integration_robot_properties.py -V /share/config/integration_preload_parameters.py'
      ++ VARIABLES='-v GLOBAL_BUILD_NUMBER:3332'
      ++ kubectl --namespace onap exec onap-robot-robot-ddd948476-zgp8n -- /var/opt/OpenECOMP_ETE/runTags.sh -V /share/config/vm_properties.py -V /share/config/integration_robot_properties.py -V /share/config/integration_preload_parameters.py -v GLOBAL_BUILD_NUMBER:3332 -d /share/logs/0001_ete_health -i health --display 91
      Starting Xvfb on display :91 with res 1280x1024x24
      Executing robot tests at log level TRACE
      ==============================================================================
      Testsuites                                                                    
      ==============================================================================
      Testsuites.Health-Check :: Testing ecomp components are available via calls. 
      ==============================================================================
      Basic A&AI Health Check                                               | PASS |
      ------------------------------------------------------------------------------
      Basic AAF Health Check                                                | PASS |
      ------------------------------------------------------------------------------
      Basic AAF SMS Health Check                                            | PASS |
      ------------------------------------------------------------------------------
      Basic APPC Health Check                                               | PASS |
      ------------------------------------------------------------------------------
      Basic CLI Health Check                                                | PASS |
      ------------------------------------------------------------------------------
      Basic CLAMP Health Check                                              | PASS |
      ------------------------------------------------------------------------------
      Basic DCAE Health Check                                               | PASS |
      ------------------------------------------------------------------------------
      Basic DMAAP Data Router Health Check                                  [ WARN ] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7feb9dc24fd0>: Failed to establish a new connection: [Errno 111] Connection refused',)': /internal/fetchProv
      [ WARN ] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7feb9dc72b50>: Failed to establish a new connection: [Errno 111] Connection refused',)': /internal/fetchProv
      [ WARN ] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7feb9dc2eb90>: Failed to establish a new connection: [Errno 111] Connection refused',)': /internal/fetchProv
      | FAIL |
      ConnectionError: HTTPConnectionPool(host='dmaap-dr-node.onap', port=8080): Max retries exceeded with url: /internal/fetchProv (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7feb9dc2e490>: Failed to establish a new connection: [Errno 111] Connection refused',))
      ------------------------------------------------------------------------------
      Basic DMAAP Message Router Health Check                               | PASS |
      ------------------------------------------------------------------------------
      Basic External API NBI Health Check                                   | PASS |
      ------------------------------------------------------------------------------
      Basic Log Elasticsearch Health Check                                  | PASS |
      ------------------------------------------------------------------------------
      Basic Log Kibana Health Check                                         | PASS |
      ------------------------------------------------------------------------------
      Basic Log Logstash Health Check                                       | PASS |
      ------------------------------------------------------------------------------
      Basic Microservice Bus Health Check                                   | PASS |
      ------------------------------------------------------------------------------
      Basic Multicloud API Health Check                                     | PASS |
      ------------------------------------------------------------------------------
      Basic Multicloud-ocata API Health Check                               | PASS |
      ------------------------------------------------------------------------------
      Basic Multicloud-pike API Health Check                                | PASS |
      ------------------------------------------------------------------------------
      Basic Multicloud-titanium_cloud API Health Check                      | PASS |
      ------------------------------------------------------------------------------
      Basic Multicloud-vio API Health Check                                 | PASS |
      ------------------------------------------------------------------------------
      Basic OOF-Homing Health Check                                         | PASS |
      ------------------------------------------------------------------------------
      Basic OOF-SNIRO Health Check                                          | PASS |
      ------------------------------------------------------------------------------
      Basic OOF-CMSO Health Check                                           | PASS |
      ------------------------------------------------------------------------------
      Basic Policy Health Check                                             | PASS |
      ------------------------------------------------------------------------------
      Basic Pomba AAI-context-builder Health Check                          | PASS |
      ------------------------------------------------------------------------------
      Basic Pomba SDC-context-builder Health Check                          | PASS |
      ------------------------------------------------------------------------------
      Basic Pomba Network-discovery-context-builder Health Check            | PASS |
      ------------------------------------------------------------------------------
      Basic Portal Health Check                                             | PASS |
      ------------------------------------------------------------------------------
      Basic SDC Health Check                                                (DMaaP:UP)| PASS |
      ------------------------------------------------------------------------------
      Basic SDNC Health Check                                               | PASS |
      ------------------------------------------------------------------------------
      Basic SO Health Check                                                 | PASS |
      ------------------------------------------------------------------------------
      Basic UseCaseUI API Health Check                                      | PASS |
      ------------------------------------------------------------------------------
      Basic VFC catalog API Health Check                                    | PASS |
      ------------------------------------------------------------------------------
      Basic VFC emsdriver API Health Check                                  | PASS |
      ------------------------------------------------------------------------------
      Basic VFC gvnfmdriver API Health Check                                | PASS |
      ------------------------------------------------------------------------------
      Basic VFC huaweivnfmdriver API Health Check                           | PASS |
      ------------------------------------------------------------------------------
      Basic VFC jujuvnfmdriver API Health Check                             | PASS |
      ------------------------------------------------------------------------------
      Basic VFC multivimproxy API Health Check                              | PASS |
      ------------------------------------------------------------------------------
      Basic VFC nokiavnfmdriver API Health Check                            | PASS |
      ------------------------------------------------------------------------------
      Basic VFC nokiav2driver API Health Check                              | PASS |
      ------------------------------------------------------------------------------
      Basic VFC nslcm API Health Check                                      | PASS |
      ------------------------------------------------------------------------------
      Basic VFC resmgr API Health Check                                     | PASS |
      ------------------------------------------------------------------------------
      Basic VFC vnflcm API Health Check                                     | PASS |
      ------------------------------------------------------------------------------
      Basic VFC vnfmgr API Health Check                                     | PASS |
      ------------------------------------------------------------------------------
      Basic VFC vnfres API Health Check                                     | PASS |
      ------------------------------------------------------------------------------
      Basic VFC workflow API Health Check                                   | PASS |
      ------------------------------------------------------------------------------
      Basic VFC ztesdncdriver API Health Check                              | PASS |
      ------------------------------------------------------------------------------
      Basic VFC ztevnfmdriver API Health Check                              | PASS |
      ------------------------------------------------------------------------------
      Basic VID Health Check                                                | PASS |
      ------------------------------------------------------------------------------
      Basic VNFSDK Health Check                                             | PASS |
      ------------------------------------------------------------------------------
      Basic Holmes Rule Management API Health Check                         | PASS |
      ------------------------------------------------------------------------------
      Basic Holmes Engine Management API Health Check                       | PASS |
      ------------------------------------------------------------------------------
      Testsuites.Health-Check :: Testing ecomp components are available ... | FAIL |
      51 critical tests, 50 passed, 1 failed
      51 tests total, 50 passed, 1 failed
      ==============================================================================
      Testsuites                                                            | FAIL |
      51 critical tests, 50 passed, 1 failed
      51 tests total, 50 passed, 1 failed
      ==============================================================================
      Output:  /share/logs/0001_ete_health/output.xml
      Log:     /share/logs/0001_ete_health/log.html
      Report:  /share/logs/0001_ete_health/report.html
      command terminated with exit code 1

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              sdesbure Sylvain Desbureaux
              michaelobrien Michael O'Brien
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: