Uploaded image for project: 'Logging analytics'
  1. Logging analytics
  2. LOG-333

OOM vnc-portal container down since 20171114:2300UTC after helm upgrade from 2.3 to 2.6 and Rancher 1.6.11 on same day

XMLWordPrintable

      20180224 update: https://gerrit.onap.org/r/#/c/28291/6

       fix: https://gerrit.onap.org/r/#/c/24225/1 and Rancher 1.6.10 (not 1.6.11 with helm 2.6.1 instead of helm 2.3.0)

      docker run -d --restart=unless-stopped -p 8880:8080 rancher/server:stable

       

       Issue:

      The vnc-portal pod started crashing at 2300UTC 14 Dec

      http://kibana.onap.info:5601/app/kibana#/discover?_g=(refreshInterval:(display:'5%20seconds',pause:!f,section:1,value:5000),time:(from:now-7d,mode:quick,to:now))&_a=(columns:!(_source),index:AV-vxHzpUcDKD8zJ1zbn,interval:auto,query:(query_string:(query:'message:%20%20(%22vnc-portal%22%20AND%20%22CrashLoopBackOff%22%20AND%20%2210m%22)')),sort:!('@timestamp',asc))

       The rest of the system except for the ELK onap-log elasticsearch regression is ok - portal healthcheck is OK (we should include the vnc-portal)

      Went through the commits the only one that went in at that time is https://gerrit.onap.org/r/#/c/23247/ under ----OOM-420---- that forced a helm 2.6 upgrade as part of Rancher 1.6.11

      Gerrit Code Review
      Nov 14 12:32 PM
      Change has been successfully merged by Yury Novitsky

       
       
      But nothing should affect vnc-portal - it is self contained

      Turns out that the helm 2.3 upgrade to 2.6 is breaking this.

      Server version in Rancher 1.6.11 is 2.6, but in 1.6.10 it is 2.3

      https://hub.docker.com/r/rancher/server/tags/ 
      v1.6.11 351 MB 3 days ago
      Check versions

      AWS current
      ubuntu@ip-172-31-23-250:~$ docker images | grep vnc
      dorowu/ubuntu-desktop-lxde-vnc                                latest               65906f49c700        2 weeks ago         1.311 GB
      nexus3.onap.org:10001/dorowu/ubuntu-desktop-lxde-vnc          latest               65906f49c700        2 weeks ago         1.311 GB
      
      2 week old internal
      oom/kubernetes/oneclick# docker images | grep vnc
      dorowu/ubuntu-desktop-lxde-vnc latest 65906f49c700 2 weeks ago 1.311 GB
      nexus3.onap.org:10001/dorowu/ubuntu-desktop-lxde-vnc latest 65906f49c700 2 weeks ago 1.311 GB
      dorowu/ubuntu-desktop-lxde-vnc <none> 5f7af1a8a2a4 3 months ago 1.251 GB
      nexus3.onap.org:10001/dorowu/ubuntu-desktop-lxde-vnc <none> 5f7af1a8a2a4 3 months ago 1.251 GB

      04:46:22 Basic Portal Health Check | PASS |
      see

      http://jenkins.onap.info/job/oom-cd/102/console 
      04:45:35 3 pending > 2 at the 27th 15 sec interval

      04:45:51 onap-log elasticsearch-6df4f65775-xxszs 0/1 CrashLoopBackOff 5 8m

      04:45:51 onap-portal vnc-portal-5b45665475-7tn9p 0/1 CrashLoopBackOff 6 10m

      04:45:51 3 pending > 2 at the 28th 15 sec interva

      l*04:45:52* report on non-running containers

      04:45:52 onap-log elasticsearch-6df4f65775-xxszs 0/1 CrashLoopBackOff 5 8m

      04:45:52 onap-portal vnc-portal-5b45665475-7tn9p 0/1 CrashLoopBackOff 6 10m

      The last successful pod was 14th 2300UTC (The CD system was down for 4 hours after that - for migration)

      Build #98 (14-Nov-2017 11:00:00 PM)

      http://jenkins.onap.info/job/oom-cd/98/ 
      23:13:23 1 pending > 0 at the 24th 15 sec interval

      23:13:39 onap-aai aai-service-3869033750-jd5wn 0/1 Running 0 7m

      23:13:39 1 pending > 0 at the 25th 15 sec interva

      l*23:13:56* 23:13:56 1 pending > 0 at the 26th 15 sec interva

      l*23:13:57*

      Tried running older version of vnc-portal from before 24 days ago

      https://hub.docker.com/r/dorowu/ubuntu-desktop-lxde-vnc/tags/

       

       

            michaelobrien michaelobrien
            michaelobrien michaelobrien
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Estimated:
                Original Estimate - 0 minutes
                0m
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 4 hours
                4h