Uploaded image for project: 'ONAP TSC'
  1. ONAP TSC
  2. TSC-86

Lock down docker image tag name source of truth - oom values.yaml or integration repo manifest - A: both but manifest is the source

    Details

      Description

      20190107: update post PTL meet - the manifest is the source for now - ideally update both oom and the manifest to keep them in lockstep - but we will add a dock task to describe the script to generate your integration level override from the manifest

      Issue
      Source of truth for docker images during the oom kubernetes deployment
      (Note this pertains only to the docker manifest that is used to deploy the system - not the java manifest needed to "build" it

      1) integration repo manifest csv?
      or
      2) oom repo values.yaml(s) ?

      Currently everyone except the integration team uses 2) the oom repo values.yaml

      add to discussion and update the docs for either case
      https://lists.onap.org/g/onap-discuss/topic/oom_onap_deployment/28883609?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,28883609

      Brian is right – anything over 2 hours will not restart – bounce your particular pod –set ?.enabled=false/true – with a dockerdata-nfs clean in the middle – and a pv/pvc deletion for some rogue pods that have pvs outside the loop – only if you get a pv exists exception on restart.
      https://wiki.onap.org/display/DW/ONAP+Development#ONAPDevelopment-Bounce/Fixafailedcontainer
      change –all to a particular pod – or use a –force delete like
      kubectl delete pod $ENVIRON-aaf-sms-vault-0 -n $ENVIRON --grace-period=0 --force
      https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-RemoveaDeployment
      https://git.onap.org/logging-analytics/tree/deploy/cd.sh#n79
      The above script with a 3 min timeout between the 30 pod deploys brings up the whole system except some minor issues with a couple pods
      For your situation – make sure healthcheck is 50/51 before attempting to use the cluster
      3.2.2 ete-k8s.sh health test: 51 critical tests, 7 passed, 44 failed
      Should be 50 passed (verified 20181229 for 3.0.0-ONAP (Casablanca manifest)

      Guys,
      A secondary question – are the values.yaml files in OOM the truth or is the manifest file the truth.
      I also need a bit of clarification on the source of truth for the image tags in the values.yamls for the 30 components in OOM

      • “Did you update the image versions in the OOM clone using the script in the integration project ? No and According to Michael O'Brien, he recommends not to change it...”
      My understanding is that the oom repo is the source of truth – and that the manifest file in the integration repo is 99-100% kept up to date with what is deployed – but the manifest is not the truth- if it is the reverse then we would need to either make sure every deployment (all CD systems including mine) and all developers use a generated values.yaml override – or at least force a change in OOM to a values.yaml everytime the manifest is updated – from reviewing patches between oom and integration it looks like image names in oom are the source.

      I would like to nail this down because no one I know of adjusts any of the merged docker image tag names that OOM uses to deploy – only the prepull script makes use of the manifest – as far as I know – if not we need to adjust the documentation so that we are running the same way integration does.
      The following assumes the manifest is the same as the values.yamls
      https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-Nexus3proxyusageperclusternode

      Currently we use the manifest as the docker_prepull.sh target (as it is easier to parse and pull from there instead of mining OOM like the previous iteration did) – however oom when we deploy – will still use what is hardcoded into each values.yaml.
      sudo nohup ./docker_prepull.sh -b casablanca -s nexus4.onap.cloud:5000 &
      pulls from
      https://git.onap.org/logging-analytics/tree/deploy/docker_prepull.sh#n35
      https://git.onap.org/integration/plain/version-manifest/src/main/resources/docker-manifest.csv?h=$BRANCH

      If there is a script that we run somewhere that takes the this manifest and overrides all the image tags as a values.yaml overlay before deployment – let us know. The current wiki and readthedocs do not mention this.

      Current Process for deployment
      https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html
      https://wiki.onap.org/display/DW/Cloud+Native+Deployment#CloudNativeDeployment-Scriptedundercloud(Helm/Kubernetes/Docker)andONAPinstall-clustered

      thanks guys
      /michael

      From: Dominique Deschênes <dominique.deschenes@gcgenicom.com>
      Sent: Thursday, January 3, 2019 2:37 PM
      To: Borislav Glozman <Borislav.Glozman@amdocs.com>; bf1936@att.com; onap-discuss@lists.onap.org
      Cc: Jacques Faucher <jacques.faucher@gcgenicom.com>; Jasmin Audet <jasmin.audet@gcgenicom.com>; Michael O'Brien <Frank.Obrien@amdocs.com>
      Subject: Re[2]: [onap-discuss] OOM ONAP Deployment

      Hi,

      • Are you using the integration-override.yaml file ? No
      • Did you update the image versions in the OOM clone using the script in the integration project ? No and According to Michael O'Brien, he recommends not to change it...

      thanks

      Dominique Deschênes
      Ingénieur chargé de projets, Responsable TI
      816, boulevard Guimond, Longueuil J4G 1T5
      450 670-8383 x105 450 670-2259

      ----- Message reçu -----
      ________________________________________
      De: Brian (bf1936@att.com)
      Date: 02/01/19 10:19
      À: onap-discuss@lists.onap.org, dominique.deschenes@gcgenicom.com, Borislav Glozman (borislav.glozman@amdocs.com)
      Cc: Jasmin Audet (jasmin.audet@gcgenicom.com), Jacques Faucher (jacques.faucher@gcgenicom.com)
      Objet: Re: [onap-discuss] OOM ONAP Deployment

      If its not up within a few hours then something in your configuration is wrong either images or access to the nexus repository.

      1. Are you using the integration-override.yaml file ?
      2. Did you update the image versions in the OOM clone using the script in the integration project ?

      https://wiki.onap.org/display/DW/OOM+Helm+%28un%29Deploy+plugins?focusedCommentId=48529890#comment-48529890

      See the reference to the (update-oom-image-versions.sh) and the use of integration-override.yaml and public-cloud.yaml

      helm deploy dev local/onap -f /root/oom/kubernetes/onap/resources/environments/public-cloud.yaml -f /root/integration-override.yaml --namespace onap

      https://wiki.onap.org/pages/viewpage.action?pageId=29787124 also has some pointers.

      Brian

      From: onap-discuss@lists.onap.org <onap-discuss@lists.onap.org> On Behalf Of Dominique Desches
      Sent: Wednesday, January 02, 2019 9:33 AM
      To: onap-discuss@lists.onap.org; Borislav Glozman <borislav.glozman@amdocs.com>
      Cc: Jasmin Audet <jasmin.audet@gcgenicom.com>; Jacques Faucher <jacques.faucher@gcgenicom.com>
      Subject: Re: [onap-discuss] OOM ONAP Deployment

      hi,

      13 days and we have the same result

      Thanks,

      Dominique Deschênes
      Ingénieur chargé de projets, Responsable TI
      816, boulevard Guimond, Longueuil J4G 1T5
      450 670-8383 x105 450 670-2259

      ----- Message reçu -----
      ________________________________________
      De: Borislav Glozman (Borislav.Glozman@amdocs.com)
      Date: 30/12/18 02:59
      À: onap-discuss@lists.onap.org, dominique.deschenes@gcgenicom.com
      Cc: Jasmin Audet (jasmin.audet@gcgenicom.com)
      Objet: RE: [onap-discuss] OOM ONAP Deployment

      Hi,

      How long did you wait before doing the tests?
      PodInitializing state usually means docker is pulling images. So maybe you need to wait longer (hopefully by now all is downloaded…)
      Please try to make your environment with all pods in Ready state and then try robot tests.

      Thanks,
      Borislav Glozman
      O:+972.9.776.1988
      M:+972.52.2835726

      Amdocs aPlatinum member of ONAP

      From:onap-discuss@lists.onap.org <onap-discuss@lists.onap.org> On Behalf Of Dominique Desches
      Sent: Thursday, December 27, 2018 9:58 PM
      To: onap-discuss@lists.onap.org
      Cc: 'Jasmin Audet' <jasmin.audet@gcgenicom.com>
      Subject: [onap-discuss] OOM ONAP Deployment

      Hello,

      We have some issues in the deployment of ONAP Casablanca.

      1. The setup we have:

      1.1 OpenStack Pike (OS)

      • OpenStack Ansible
      • 1 Controller
      • 4 Compute nodes
      • 3 Ceph storage nodes

      All nodes are Dell R620 servers with 2 Xeon E5-2650v2 and 256GB of RAM.

      1.2 Kubernetes, Rancher

      • Rancher v1.6.22, 1 node on OS, 4 vCPU 4GB RAM.
      • Kubernetes v1.8.3, Docker 17.03.2-ce, 4 nodes on OS, 16 vCPU 64GB RAM each.

      1.3 ONAP Casablanca

      2. The procedure we followed:

      2.1 The k8s environment was setup by following this guide on ReadTheDocs:
      https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_setup_kubernetes_rancher.html

      2.2 ONAP was setup by following this guide on ReadTheDocs:
      https://onap.readthedocs.io/en/latest/submodules/oom.git/docs/oom_quickstart_guide.html#quick-start-label

      2.3 I've joinded our version of onap/values.yaml

      3. The results we get:

      3.1 We can access the portal, but most pages return "refused to connect".

      3.1.1 A&AI: aai.api.sparky.simpledemo.onap.org took too long to respond.
      3.1.2 CLI: cli.api.simpledemo.onap.org refused to connect.
      3.1.3 Policy: policy.api.simpledemo.onap.org refused to connect.
      3.1.4 SDC returns the proper User Management page.
      3.1.5 VID: The web page at https://vid.api.simpledemo.onap.org:30200/vid/welcome.htm?cc=1545938381291 might be temporarily down or it may have moved permanently to a new web address.

      3.2 We can run the robot tests, but we get failures.

      3.2.1 ete-k8s.sh portal test is passing.
      3.2.2 ete-k8s.sh health test: 51 critical tests, 7 passed, 44 failed

      3.3 Some containers are stuck in CrashLoopBackOff or in PodItinializing state.

      3.3.1 onap-aai-aai-traversal-b5dc9895d-7gc5f: Readiness probe failed: dial tcp 10.42.68.214:8446: connect: connection refused
      3.3.2 onap-aai-aai-sparky-be-75658695f5-m6kql: Waiting: PodInitializing
      3.3.3 onap-aai-aai-traversal-update-query-data-zfvwb: Waiting: PodInitializing
      3.3.4 onap-aai-aai-746f4ff754-c9b67: Waiting: PodInitializing

      4. Questions:

      4.1 Should we change anything in our onap/values.yaml file?

      4.2 Is there another deployment procedure that can be followed?

      4.3 Are there any known issues that results in ONAP being in a broken state like we are getting?

      4.4 Is there any way for us as users to fix these issues in our deployment?

      4.5 What's the most reliable way to get a basic deployment of ONAP up and running?

      Tks.

      Dominique Deschenes
      Ingénieur chargé de projet, Responsable TI
      816, boulevard Guimond, Longueuil J4G 1T5
      450 670-8383 x105 450 670-2259

      This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review athttps://www.amdocs.com/about/email-terms-of-service

      This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service
      .,.,_
      ________________________________________
      Links:
      You receive all messages sent to this group.
      View/Reply Online (#14787) | Reply To Group | Reply To Sender | Mute This Topic | New Topic

      Your Subscription | Contact Group Owner | Unsubscribe [frank.obrien@amdocs.com]
      .,.,_

      adjust
      https://wiki.onap.org/display/DW/OOM+Helm+%28un%29Deploy+plugins?focusedCommentId=48529890#comment-48529890

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

              People

              • Assignee:
                michaelobrien Michael O'Brien
                Reporter:
                michaelobrien Michael O'Brien
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: