Details
-
Bug
-
Status: Closed
-
Medium
-
Resolution: Done
-
Dublin Release
Description
20190107: update post PTL meet - the manifest is the source for now - ideally update both oom and the manifest to keep them in lockstep - but we will add a dock task to describe the script to generate your integration level override from the manifest
Issue
Source of truth for docker images during the oom kubernetes deployment
(Note this pertains only to the docker manifest that is used to deploy the system - not the java manifest needed to "build" it
1) integration repo manifest csv?
or
2) oom repo values.yaml(s) ?
Currently everyone except the integration team uses 2) the oom repo values.yaml
add to discussion and update the docs for either case
https://lists.onap.org/g/onap-discuss/topic/oom_onap_deployment/28883609?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28883609
Brian is right – anything over 2 hours will not restart – bounce your particular pod –set ?.enabled=false/true – with a dockerdata-nfs clean in the middle – and a pv/pvc deletion for some rogue pods that have pvs outside the loop – only if you get a pv exists exception on restart.
https://wiki.onap.org/display/DW/ONAP+Development#ONAPDevelopment-Bounce/Fixafailedcontainer
change –all to a particular pod – or use a –force delete like
kubectl delete pod $ENVIRON-aaf-sms-vault-0 -n $ENVIRON --grace-period=0 --force
https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-RemoveaDeployment
https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n79
The above script with a 3 min timeout between the 30 pod deploys brings up the whole system except some minor issues with a couple pods
For your situation – make sure healthcheck is 50/51 before attempting to use the cluster
3.2.2 ete-k8s.sh health test: 51 critical tests, 7 passed, 44 failed
Should be 50 passed (verified 20181229 for 3.0.0-ONAP (Casablanca manifest)
Guys,
A secondary question – are the values.yaml files in OOM the truth or is the manifest file the truth.
I also need a bit of clarification on the source of truth for the image tags in the values.yamls for the 30 components in OOM
• “Did you update the image versions in the OOM clone using the script in the integration project ? No and According to Michael O'Brien, he recommends not to change it...”
My understanding is that the oom repo is the source of truth – and that the manifest file in the integration repo is 99-100% kept up to date with what is deployed – but the manifest is not the truth- if it is the reverse then we would need to either make sure every deployment (all CD systems including mine) and all developers use a generated values.yaml override – or at least force a change in OOM to a values.yaml everytime the manifest is updated – from reviewing patches between oom and integration it looks like image names in oom are the source.
I would like to nail this down because no one I know of adjusts any of the merged docker image tag names that OOM uses to deploy – only the prepull script makes use of the manifest – as far as I know – if not we need to adjust the documentation so that we are running the same way integration does.
The following assumes the manifest is the same as the values.yamls
https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-Nexus3proxyusageperclusternode
Currently we use the manifest as the docker_prepull.sh target (as it is easier to parse and pull from there instead of mining OOM like the previous iteration did) – however oom when we deploy – will still use what is hardcoded into each values.yaml.
sudo nohup ./docker_prepull.sh -b casablanca -s nexus4.onap.cloud:5000 &
pulls from
https://git.onap.org/logging-analytics/tree/deploy/docker_prepull.sh#n35
https://git.onap.org/integration/plain/version-manifest/src/main/resources/docker-manifest.csv?h=$BRANCH
If there is a script that we run somewhere that takes the this manifest and overrides all the image tags as a values.yaml overlay before deployment – let us know. The current wiki and readthedocs do not mention this.
Current Process for deployment
https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html
https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-Scriptedundercloud(Helm/Kubernetes/Docker)andONAPinstall-clustered
thanks guys
/michael
From: Dominique Deschênes <dominique.deschenes@gcgenicom.com>
Sent: Thursday, January 3, 2019 2:37 PM
To: Borislav Glozman <Borislav.Glozman@amdocs.com>; bf1936@att.com; onap-discuss@lists.onap.org
Cc: Jacques Faucher <jacques.faucher@gcgenicom.com>; Jasmin Audet <jasmin.audet@gcgenicom.com>; Michael O'Brien <Frank.Obrien@amdocs.com>
Subject: Re[2]: [onap-discuss] OOM ONAP Deployment
Hi,
• Are you using the integration-override.yaml file ? No
• Did you update the image versions in the OOM clone using the script in the integration project ? No and According to Michael O'Brien, he recommends not to change it...
thanks
Dominique Deschênes
Ingénieur chargé de projets, Responsable TI
816, boulevard Guimond, Longueuil J4G 1T5
450 670-8383 x105 450 670-2259
----- Message reçu -----
________________________________________
De: Brian (bf1936@att.com)
Date: 02/01/19 10:19
À: onap-discuss@lists.onap.org, dominique.deschenes@gcgenicom.com, Borislav Glozman (borislav.glozman@amdocs.com)
Cc: Jasmin Audet (jasmin.audet@gcgenicom.com), Jacques Faucher (jacques.faucher@gcgenicom.com)
Objet: Re: [onap-discuss] OOM ONAP Deployment
If its not up within a few hours then something in your configuration is wrong either images or access to the nexus repository.
1. Are you using the integration-override.yaml file ?
2. Did you update the image versions in the OOM clone using the script in the integration project ?
See the reference to the (update-oom-image-versions.sh) and the use of integration-override.yaml and public-cloud.yaml
helm deploy dev local/onap -f /root/oom/kubernetes/onap/resources/environments/public-cloud.yaml -f /root/integration-override.yaml --namespace onap
https://wiki.onap.org/pages/viewpage.action?pageId=29787124 also has some pointers.
Brian
From: onap-discuss@lists.onap.org <onap-discuss@lists.onap.org> On Behalf Of Dominique Desches
Sent: Wednesday, January 02, 2019 9:33 AM
To: onap-discuss@lists.onap.org; Borislav Glozman <borislav.glozman@amdocs.com>
Cc: Jasmin Audet <jasmin.audet@gcgenicom.com>; Jacques Faucher <jacques.faucher@gcgenicom.com>
Subject: Re: [onap-discuss] OOM ONAP Deployment
hi,
13 days and we have the same result
Thanks,
Dominique Deschênes
Ingénieur chargé de projets, Responsable TI
816, boulevard Guimond, Longueuil J4G 1T5
450 670-8383 x105 450 670-2259
----- Message reçu -----
________________________________________
De: Borislav Glozman (Borislav.Glozman@amdocs.com)
Date: 30/12/18 02:59
À: onap-discuss@lists.onap.org, dominique.deschenes@gcgenicom.com
Cc: Jasmin Audet (jasmin.audet@gcgenicom.com)
Objet: RE: [onap-discuss] OOM ONAP Deployment
Hi,
How long did you wait before doing the tests?
PodInitializing state usually means docker is pulling images. So maybe you need to wait longer (hopefully by now all is downloaded…)
Please try to make your environment with all pods in Ready state and then try robot tests.
Thanks,
Borislav Glozman
O:+972.9.776.1988
M:+972.52.2835726
Amdocs aPlatinum member of ONAP
From:onap-discuss@lists.onap.org <onap-discuss@lists.onap.org> On Behalf Of Dominique Desches
Sent: Thursday, December 27, 2018 9:58 PM
To: onap-discuss@lists.onap.org
Cc: 'Jasmin Audet' <jasmin.audet@gcgenicom.com>
Subject: [onap-discuss] OOM ONAP Deployment
Hello,
We have some issues in the deployment of ONAP Casablanca.
1. The setup we have:
1.1 OpenStack Pike (OS)
- OpenStack Ansible
- 1 Controller
- 4 Compute nodes
- 3 Ceph storage nodes
All nodes are Dell R620 servers with 2 Xeon E5-2650v2 and 256GB of RAM.
1.2 Kubernetes, Rancher
- Rancher v1.6.22, 1 node on OS, 4 vCPU 4GB RAM.
- Kubernetes v1.8.3, Docker 17.03.2-ce, 4 nodes on OS, 16 vCPU 64GB RAM each.
1.3 ONAP Casablanca
2. The procedure we followed:
2.1 The k8s environment was setup by following this guide on ReadTheDocs:
https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html
2.2 ONAP was setup by following this guide on ReadTheDocs:
https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html#quick-start-label
2.3 I've joinded our version of onap/values.yaml
3. The results we get:
3.1 We can access the portal, but most pages return "refused to connect".
3.1.1 A&AI: aai.api.sparky.simpledemo.onap.org took too long to respond.
3.1.2 CLI: cli.api.simpledemo.onap.org refused to connect.
3.1.3 Policy: policy.api.simpledemo.onap.org refused to connect.
3.1.4 SDC returns the proper User Management page.
3.1.5 VID: The web page at https://vid.api.simpledemo.onap.org:30200/vid/welcome.htm?cc=1545938381291 might be temporarily down or it may have moved permanently to a new web address.
3.2 We can run the robot tests, but we get failures.
3.2.1 ete-k8s.sh portal test is passing.
3.2.2 ete-k8s.sh health test: 51 critical tests, 7 passed, 44 failed
3.3 Some containers are stuck in CrashLoopBackOff or in PodItinializing state.
3.3.1 onap-aai-aai-traversal-b5dc9895d-7gc5f: Readiness probe failed: dial tcp 10.42.68.214:8446: connect: connection refused
3.3.2 onap-aai-aai-sparky-be-75658695f5-m6kql: Waiting: PodInitializing
3.3.3 onap-aai-aai-traversal-update-query-data-zfvwb: Waiting: PodInitializing
3.3.4 onap-aai-aai-746f4ff754-c9b67: Waiting: PodInitializing
4. Questions:
4.1 Should we change anything in our onap/values.yaml file?
4.2 Is there another deployment procedure that can be followed?
4.3 Are there any known issues that results in ONAP being in a broken state like we are getting?
4.4 Is there any way for us as users to fix these issues in our deployment?
4.5 What's the most reliable way to get a basic deployment of ONAP up and running?
Tks.
Dominique Deschenes
Ingénieur chargé de projet, Responsable TI
816, boulevard Guimond, Longueuil J4G 1T5
450 670-8383 x105 450 670-2259
This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review athttps://www.amdocs.com/about/email-terms-of-service
This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service
.,.,_
________________________________________
Links:
You receive all messages sent to this group.
View/Reply Online (#14787) | Reply To Group | Reply To Sender | Mute This Topic | New Topic
Your Subscription | Contact Group Owner | Unsubscribe [frank.obrien@amdocs.com]
.,.,_
Attachments
Issue Links
- blocks
-
TSC-58 Dublin Toolchain Improvement
-
- Closed
-
-
TSC-25 Task Force to implement CD (Continuous Deployment)
-
- Closed
-
-
TSC-79 LF Nexus3 routing slowdown starting 20181217 - 80-100x slower download times totalling 120+ hours - using nexus3/4.onap.cloud proxy for now
-
- Closed
-
-
LOG-707 Logging El-Alto (moved from) Dublin Scope
-
- Closed
-
-
LOG-905 docker_prepull.sh script for casablanca
-
- Closed
-
- is blocked by
-
LOG-1049 POMBA: data-router pod fails to start
-
- Closed
-
-
INT-1042 Update OOM image version with docker-manifest-staging.csv before retiring docker-manifest-staging.csv file
-
- Closed
-
- relates to
-
LOG-966 dublin pomba-validation-service crashes on start - with staging integration manifest override
-
- Closed
-
-
INT-943 Bring Pomba docker image versions in docker-manifest-staging.csv in sync with OOM charts
-
- Closed
-
-
OOM-1525 Update OOM Developer guide to document how to override repository/image per chart
-
- Closed
-
- links to
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...