-
Story
-
Resolution: Won't Do
-
Highest
-
None
-
None
20190108 work continues on the cd.sh patch in LOG-898 under https://gerrit.onap.org/r/#/c/75422
ONAP 3.0.0-ONAP comes up randomly (Dublin order is being refactored/fixed) - this causes contention for resources vCPU, HDD, Network during pod startup - causing some k8s jobs to timeout (sdnc, appc, so) before their dependencies or access to resources is fulfilled. For example SNDC ansible container takes up to 20 min to pull the docker image when running alongside other starting pods.
Sequenced/tagged 29 pod serial deployment work for consistency (Alain and I came up with this - I like his tagged id approach - we used to deploy a bit like this in the A release but without HC or pod state check in between or tag intelligence) - better is to use a dependency order - first case is just to deploy as we do alphabetically but wait for the pods to be up for each of the 29 pods.
Proposal: temporary from Yumtaax3425 and michaelobrien
- bring up each pod in dependency order (for phase 1 even just bringing up the pods 1 by 1 in alphabetical order is ok) - for example AAI and AAF first- to allow for each pod to complete and pass atomic healthcheck before deploying the next pod.
- If we have issues with a pod - fix the undeploy so that pv/pvc's are also purged - see
OOM-1543 - In this we we should be able to have a consistent deployment where 51 of 51 HC's pass on every deployment - not by chance currently
- The timing workarounds for example in https://git.onap.org/oom/tree/kubernetes/onap/resources/environments/public-cloud.yaml are actually a workaround for the underlying issue of random start order of the pods
- In some ways deployment may actually be faster (not as fast as docker preload before deploy) than allowing for resource contention the first 30 min of the deploy
20181212 casablanca branch state for 3.0.0-ONAP (Azure 432g vm)
ubuntu@a-cd-cas2:~$ kubectl get pods --all-namespaces | grep -E '0/|1/2' | wc -l 43 ubuntu@a-cd-cas2:~$ kubectl get pods --all-namespaces | grep -E '1/1|2/2' | wc -l 217 ubuntu@a-cd-cas2:~$ free total used free shared buff/cache available Mem: 445804668 150300876 180957764 261628 114546028 285549272 Swap: 0 0 0 ubuntu@a-cd-cas2:~$ df Filesystem 1K-blocks Used Available Use% Mounted on udev 222891708 0 222891708 0% /dev tmpfs 44580468 70620 44509848 1% /run /dev/sda1 129029904 103296108 25717412 81% / tmpfs 222902332 40776 222861556 1% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 222902332 0 222902332 0% /sys/fs/cgroup /dev/sdb1 891622252 73748 846233692 1% /mnt tmpfs 44580468 0 44580468 0% /run/user/1000 u ubuntu@a-cd-cas2:~/oom/kubernetes/robot$ ./ete-k8s.sh onap health ++ export NAMESPACE=onap ++ NAMESPACE=onap +++ kubectl --namespace onap get pods +++ sed 's/ .*//' +++ grep robot ++ POD=onap-robot-robot-ddd948476-zgp8n ++ TAGS='-i health' ++ ETEHOME=/var/opt/OpenECOMP_ETE +++ kubectl --namespace onap exec onap-robot-robot-ddd948476-zgp8n -- bash -c 'ls -1q /share/logs/ | wc -l' ++ export GLOBAL_BUILD_NUMBER=1 ++ GLOBAL_BUILD_NUMBER=1 +++ printf %04d 1 ++ OUTPUT_FOLDER=0001_ete_health ++ DISPLAY_NUM=91 ++ VARIABLEFILES='-V /share/config/vm_properties.py -V /share/config/integration_robot_properties.py -V /share/config/integration_preload_parameters.py' ++ VARIABLES='-v GLOBAL_BUILD_NUMBER:3332' ++ kubectl --namespace onap exec onap-robot-robot-ddd948476-zgp8n -- /var/opt/OpenECOMP_ETE/runTags.sh -V /share/config/vm_properties.py -V /share/config/integration_robot_properties.py -V /share/config/integration_preload_parameters.py -v GLOBAL_BUILD_NUMBER:3332 -d /share/logs/0001_ete_health -i health --display 91 Starting Xvfb on display :91 with res 1280x1024x24 Executing robot tests at log level TRACE ============================================================================== Testsuites ============================================================================== Testsuites.Health-Check :: Testing ecomp components are available via calls. ============================================================================== Basic A&AI Health Check | PASS | ------------------------------------------------------------------------------ Basic AAF Health Check | PASS | ------------------------------------------------------------------------------ Basic AAF SMS Health Check | PASS | ------------------------------------------------------------------------------ Basic APPC Health Check | PASS | ------------------------------------------------------------------------------ Basic CLI Health Check | PASS | ------------------------------------------------------------------------------ Basic CLAMP Health Check | PASS | ------------------------------------------------------------------------------ Basic DCAE Health Check | PASS | ------------------------------------------------------------------------------ Basic DMAAP Data Router Health Check [ WARN ] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7feb9dc24fd0>: Failed to establish a new connection: [Errno 111] Connection refused',)': /internal/fetchProv [ WARN ] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7feb9dc72b50>: Failed to establish a new connection: [Errno 111] Connection refused',)': /internal/fetchProv [ WARN ] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7feb9dc2eb90>: Failed to establish a new connection: [Errno 111] Connection refused',)': /internal/fetchProv | FAIL | ConnectionError: HTTPConnectionPool(host='dmaap-dr-node.onap', port=8080): Max retries exceeded with url: /internal/fetchProv (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7feb9dc2e490>: Failed to establish a new connection: [Errno 111] Connection refused',)) ------------------------------------------------------------------------------ Basic DMAAP Message Router Health Check | PASS | ------------------------------------------------------------------------------ Basic External API NBI Health Check | PASS | ------------------------------------------------------------------------------ Basic Log Elasticsearch Health Check | PASS | ------------------------------------------------------------------------------ Basic Log Kibana Health Check | PASS | ------------------------------------------------------------------------------ Basic Log Logstash Health Check | PASS | ------------------------------------------------------------------------------ Basic Microservice Bus Health Check | PASS | ------------------------------------------------------------------------------ Basic Multicloud API Health Check | PASS | ------------------------------------------------------------------------------ Basic Multicloud-ocata API Health Check | PASS | ------------------------------------------------------------------------------ Basic Multicloud-pike API Health Check | PASS | ------------------------------------------------------------------------------ Basic Multicloud-titanium_cloud API Health Check | PASS | ------------------------------------------------------------------------------ Basic Multicloud-vio API Health Check | PASS | ------------------------------------------------------------------------------ Basic OOF-Homing Health Check | PASS | ------------------------------------------------------------------------------ Basic OOF-SNIRO Health Check | PASS | ------------------------------------------------------------------------------ Basic OOF-CMSO Health Check | PASS | ------------------------------------------------------------------------------ Basic Policy Health Check | PASS | ------------------------------------------------------------------------------ Basic Pomba AAI-context-builder Health Check | PASS | ------------------------------------------------------------------------------ Basic Pomba SDC-context-builder Health Check | PASS | ------------------------------------------------------------------------------ Basic Pomba Network-discovery-context-builder Health Check | PASS | ------------------------------------------------------------------------------ Basic Portal Health Check | PASS | ------------------------------------------------------------------------------ Basic SDC Health Check (DMaaP:UP)| PASS | ------------------------------------------------------------------------------ Basic SDNC Health Check | PASS | ------------------------------------------------------------------------------ Basic SO Health Check | PASS | ------------------------------------------------------------------------------ Basic UseCaseUI API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC catalog API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC emsdriver API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC gvnfmdriver API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC huaweivnfmdriver API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC jujuvnfmdriver API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC multivimproxy API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC nokiavnfmdriver API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC nokiav2driver API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC nslcm API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC resmgr API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC vnflcm API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC vnfmgr API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC vnfres API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC workflow API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC ztesdncdriver API Health Check | PASS | ------------------------------------------------------------------------------ Basic VFC ztevnfmdriver API Health Check | PASS | ------------------------------------------------------------------------------ Basic VID Health Check | PASS | ------------------------------------------------------------------------------ Basic VNFSDK Health Check | PASS | ------------------------------------------------------------------------------ Basic Holmes Rule Management API Health Check | PASS | ------------------------------------------------------------------------------ Basic Holmes Engine Management API Health Check | PASS | ------------------------------------------------------------------------------ Testsuites.Health-Check :: Testing ecomp components are available ... | FAIL | 51 critical tests, 50 passed, 1 failed 51 tests total, 50 passed, 1 failed ============================================================================== Testsuites | FAIL | 51 critical tests, 50 passed, 1 failed 51 tests total, 50 passed, 1 failed ============================================================================== Output: /share/logs/0001_ete_health/output.xml Log: /share/logs/0001_ete_health/log.html Report: /share/logs/0001_ete_health/report.html command terminated with exit code 1
- blocks
-
OOM-1543 Helm deploy/undeploy adjustments for triage of failed/unlisted deployments
- Closed
- duplicates
-
LOG-898 ONAP Deployment Resiliency changes to deploy without POD failures
- Closed
- is duplicated by
-
OOM-1493 Deploy order for deploy plugin should follow dependency tree
- Closed
- relates to
-
LOG-325 CD: Kubernetes Automated Provisioning Rancher Script - for Ubuntu 16 VMs
- Closed
-
LOG-326 CD: OOM automated deployment script
- Closed
-
LOG-898 ONAP Deployment Resiliency changes to deploy without POD failures
- Closed
-
LOG-899 ONAP vFW vFirewall Use Case validation - so we can use it for transaction tracing
- Closed
- mentioned in
-
Page Loading...