Loading...

Type: Bug
Resolution: Done
Priority: Medium
Fix Version/s: Dublin Release
Affects Version/s: Dublin Release
Labels:
- Integration
- Kubernetes
- OOM
- docker

20190107: update post PTL meet - the manifest is the source for now - ideally update both oom and the manifest to keep them in lockstep - but we will add a dock task to describe the script to generate your integration level override from the manifest

Issue
Source of truth for docker images during the oom kubernetes deployment
(Note this pertains only to the docker manifest that is used to deploy the system - not the java manifest needed to "build" it

1) integration repo manifest csv?
or
2) oom repo values.yaml(s) ?

Currently everyone except the integration team uses 2) the oom repo values.yaml

add to discussion and update the docs for either case
https://lists.onap.org/g/onap-discuss/topic/oom_onap_deployment/28883609?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28883609

Brian is right – anything over 2 hours will not restart – bounce your particular pod –set ?.enabled=false/true – with a dockerdata-nfs clean in the middle – and a pv/pvc deletion for some rogue pods that have pvs outside the loop – only if you get a pv exists exception on restart.
https://wiki.onap.org/display/DW/ONAP+Development#ONAPDevelopment-Bounce/Fixafailedcontainer
change –all to a particular pod – or use a –force delete like
kubectl delete pod $ENVIRON-aaf-sms-vault-0 -n $ENVIRON --grace-period=0 --force
https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-RemoveaDeployment
https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n79
The above script with a 3 min timeout between the 30 pod deploys brings up the whole system except some minor issues with a couple pods
For your situation – make sure healthcheck is 50/51 before attempting to use the cluster
3.2.2 ete-k8s.sh health test: 51 critical tests, 7 passed, 44 failed
Should be 50 passed (verified 20181229 for 3.0.0-ONAP (Casablanca manifest)

Guys,
A secondary question – are the values.yaml files in OOM the truth or is the manifest file the truth.
I also need a bit of clarification on the source of truth for the image tags in the values.yamls for the 30 components in OOM

• “Did you update the image versions in the OOM clone using the script in the integration project ? No and According to Michael O'Brien, he recommends not to change it...”
My understanding is that the oom repo is the source of truth – and that the manifest file in the integration repo is 99-100% kept up to date with what is deployed – but the manifest is not the truth- if it is the reverse then we would need to either make sure every deployment (all CD systems including mine) and all developers use a generated values.yaml override – or at least force a change in OOM to a values.yaml everytime the manifest is updated – from reviewing patches between oom and integration it looks like image names in oom are the source.

I would like to nail this down because no one I know of adjusts any of the merged docker image tag names that OOM uses to deploy – only the prepull script makes use of the manifest – as far as I know – if not we need to adjust the documentation so that we are running the same way integration does.
The following assumes the manifest is the same as the values.yamls
https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-Nexus3proxyusageperclusternode

Currently we use the manifest as the docker_prepull.sh target (as it is easier to parse and pull from there instead of mining OOM like the previous iteration did) – however oom when we deploy – will still use what is hardcoded into each values.yaml.
sudo nohup ./docker_prepull.sh -b casablanca -s nexus4.onap.cloud:5000 &
pulls from
https://git.onap.org/logging-analytics/tree/deploy/docker_prepull.sh#n35
https://git.onap.org/integration/plain/version-manifest/src/main/resources/docker-manifest.csv?h=$BRANCH

If there is a script that we run somewhere that takes the this manifest and overrides all the image tags as a values.yaml overlay before deployment – let us know. The current wiki and readthedocs do not mention this.

Current Process for deployment
https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html
https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-Scriptedundercloud(Helm/Kubernetes/Docker)andONAPinstall-clustered

thanks guys
/michael

From: Dominique Deschênes <dominique.deschenes@gcgenicom.com>
Sent: Thursday, January 3, 2019 2:37 PM
To: Borislav Glozman <Borislav.Glozman@amdocs.com>; bf1936@att.com; onap-discuss@lists.onap.org
Cc: Jacques Faucher <jacques.faucher@gcgenicom.com>; Jasmin Audet <jasmin.audet@gcgenicom.com>; Michael O'Brien <Frank.Obrien@amdocs.com>
Subject: Re[2]: [onap-discuss] OOM ONAP Deployment

Hi,

• Are you using the integration-override.yaml file ? No
• Did you update the image versions in the OOM clone using the script in the integration project ? No and According to Michael O'Brien, he recommends not to change it...

thanks

Dominique Deschênes
Ingénieur chargé de projets, Responsable TI
816, boulevard Guimond, Longueuil J4G 1T5
450 670-8383 x105 450 670-2259

----- Message reçu -----
________________________________________
De: Brian (bf1936@att.com)
Date: 02/01/19 10:19
À: onap-discuss@lists.onap.org, dominique.deschenes@gcgenicom.com, Borislav Glozman (borislav.glozman@amdocs.com)
Cc: Jasmin Audet (jasmin.audet@gcgenicom.com), Jacques Faucher (jacques.faucher@gcgenicom.com)
Objet: Re: [onap-discuss] OOM ONAP Deployment

If its not up within a few hours then something in your configuration is wrong either images or access to the nexus repository.

1. Are you using the integration-override.yaml file ?
2. Did you update the image versions in the OOM clone using the script in the integration project ?

https://wiki.onap.org/display/DW/OOM+Helm+%28un%29Deploy+plugins?focusedCommentId=48529890#comment-48529890

See the reference to the (update-oom-image-versions.sh) and the use of integration-override.yaml and public-cloud.yaml

helm deploy dev local/onap -f /root/oom/kubernetes/onap/resources/environments/public-cloud.yaml -f /root/integration-override.yaml --namespace onap

https://wiki.onap.org/pages/viewpage.action?pageId=29787124 also has some pointers.

Brian

From: onap-discuss@lists.onap.org <onap-discuss@lists.onap.org> On Behalf Of Dominique Desches
Sent: Wednesday, January 02, 2019 9:33 AM
To: onap-discuss@lists.onap.org; Borislav Glozman <borislav.glozman@amdocs.com>
Cc: Jasmin Audet <jasmin.audet@gcgenicom.com>; Jacques Faucher <jacques.faucher@gcgenicom.com>
Subject: Re: [onap-discuss] OOM ONAP Deployment

hi,

13 days and we have the same result

Thanks,

Dominique Deschênes
Ingénieur chargé de projets, Responsable TI
816, boulevard Guimond, Longueuil J4G 1T5
450 670-8383 x105 450 670-2259

----- Message reçu -----
________________________________________
De: Borislav Glozman (Borislav.Glozman@amdocs.com)
Date: 30/12/18 02:59
À: onap-discuss@lists.onap.org, dominique.deschenes@gcgenicom.com
Cc: Jasmin Audet (jasmin.audet@gcgenicom.com)
Objet: RE: [onap-discuss] OOM ONAP Deployment

Hi,

How long did you wait before doing the tests?
PodInitializing state usually means docker is pulling images. So maybe you need to wait longer (hopefully by now all is downloaded…)
Please try to make your environment with all pods in Ready state and then try robot tests.

Thanks,
Borislav Glozman
O:+972.9.776.1988
M:+972.52.2835726

Amdocs aPlatinum member of ONAP

From:onap-discuss@lists.onap.org <onap-discuss@lists.onap.org> On Behalf Of Dominique Desches
Sent: Thursday, December 27, 2018 9:58 PM
To: onap-discuss@lists.onap.org
Cc: 'Jasmin Audet' <jasmin.audet@gcgenicom.com>
Subject: [onap-discuss] OOM ONAP Deployment

Hello,

We have some issues in the deployment of ONAP Casablanca.

1. The setup we have:

1.1 OpenStack Pike (OS)

OpenStack Ansible
1 Controller
4 Compute nodes
3 Ceph storage nodes

All nodes are Dell R620 servers with 2 Xeon E5-2650v2 and 256GB of RAM.

1.2 Kubernetes, Rancher

Rancher v1.6.22, 1 node on OS, 4 vCPU 4GB RAM.
Kubernetes v1.8.3, Docker 17.03.2-ce, 4 nodes on OS, 16 vCPU 64GB RAM each.

1.3 ONAP Casablanca

2. The procedure we followed:

2.1 The k8s environment was setup by following this guide on ReadTheDocs:
https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html

2.2 ONAP was setup by following this guide on ReadTheDocs:
https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html#quick-start-label

2.3 I've joinded our version of onap/values.yaml

3. The results we get:

3.1 We can access the portal, but most pages return "refused to connect".

3.1.1 A&AI: aai.api.sparky.simpledemo.onap.org took too long to respond.
3.1.2 CLI: cli.api.simpledemo.onap.org refused to connect.
3.1.3 Policy: policy.api.simpledemo.onap.org refused to connect.
3.1.4 SDC returns the proper User Management page.
3.1.5 VID: The web page at https://vid.api.simpledemo.onap.org:30200/vid/welcome.htm?cc=1545938381291 might be temporarily down or it may have moved permanently to a new web address.

3.2 We can run the robot tests, but we get failures.

3.2.1 ete-k8s.sh portal test is passing.
3.2.2 ete-k8s.sh health test: 51 critical tests, 7 passed, 44 failed

3.3 Some containers are stuck in CrashLoopBackOff or in PodItinializing state.

3.3.1 onap-aai-aai-traversal-b5dc9895d-7gc5f: Readiness probe failed: dial tcp 10.42.68.214:8446: connect: connection refused
3.3.2 onap-aai-aai-sparky-be-75658695f5-m6kql: Waiting: PodInitializing
3.3.3 onap-aai-aai-traversal-update-query-data-zfvwb: Waiting: PodInitializing
3.3.4 onap-aai-aai-746f4ff754-c9b67: Waiting: PodInitializing

4. Questions:

4.1 Should we change anything in our onap/values.yaml file?

4.2 Is there another deployment procedure that can be followed?

4.3 Are there any known issues that results in ONAP being in a broken state like we are getting?

4.4 Is there any way for us as users to fix these issues in our deployment?

4.5 What's the most reliable way to get a basic deployment of ONAP up and running?

Tks.

Dominique Deschenes
Ingénieur chargé de projet, Responsable TI
816, boulevard Guimond, Longueuil J4G 1T5
450 670-8383 x105 450 670-2259

This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review athttps://www.amdocs.com/about/email-terms-of-service

This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service
.,.,_
________________________________________
Links:
You receive all messages sent to this group.
View/Reply Online (#14787) | Reply To Group | Reply To Sender | Mute This Topic | New Topic

Your Subscription | Contact Group Owner | Unsubscribe [frank.obrien@amdocs.com]
.,.,_

adjust
https://wiki.onap.org/display/DW/OOM+Helm+%28un%29Deploy+plugins?focusedCommentId=48529890#comment-48529890

blocks

TSC-58 Dublin Toolchain Improvement

Closed

TSC-25 Task Force to implement CD (Continuous Deployment)

Closed

TSC-79 LF Nexus3 routing slowdown starting 20181217 - 80-100x slower download times totalling 120+ hours - using nexus3/4.onap.cloud proxy for now

Closed

LOG-707 Logging El-Alto (moved from) Dublin Scope

Closed

LOG-905 docker_prepull.sh script for casablanca

Closed

is blocked by

LOG-1049 POMBA: data-router pod fails to start

Closed

INT-1042 Update OOM image version with docker-manifest-staging.csv before retiring docker-manifest-staging.csv file

Closed

relates to

LOG-966 dublin pomba-validation-service crashes on start - with staging integration manifest override

Closed

INT-943 Bring Pomba docker image versions in docker-manifest-staging.csv in sync with OOM charts

Closed

OOM-1525 Update OOM Developer guide to document how to override repository/image per chart

Closed

links to

https://lists.onap.org/g/onap-discuss/topic/onap_master_or_onap_master/29668857?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,29668857

https://onap.readthedocs.io/en/latest/submodules/integration.git/docs/onap-oom-heat.html?highlight=manifest%20csv

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(2 is blocked by, 3 relates to, 2 links to, 14 mentioned in)

1.	Update RTD to detail yaml override generation from the docker manifest for values.yaml docker image tag overrides		Closed	michaelobrien
2.	Update the wiki to detail yaml override generation from the docker manifest for values.yaml docker image tag overrides		Closed	michaelobrien

Details

Description

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates