Uploaded image for project: 'Common Controller SDK'
  1. Common Controller SDK
  2. CCSDK-1885

CDS Rolling Upgrade Support

XMLWordPrintable

      Currently CDS does not have full support for rolling upgrade. This means when upgrading it will cause downtime. CDS is currently use from automated system at Bell which means it's process request very often (base on alarms/telemetry).

      We need to be able to upgrade CDS without impacting those external systems.

      Here are the steps required to have this functionality.

       

      1 - OOM will need to be changed to update the value of rollingUpdate.maxUnavailable from 1 to 0. This will force kubernetes to create a new pod and wait for it to be ready before the current pod gets terminated. (In the case replicas is set to 1 which is the default)

       

      2 - When kubernetes terminates a POD it sends a SIGTERM signal to the process with PID 1. We need to change the Dockerfile and startService.sh script to make sure java process becomes PID 1 in the POD (currently startService.sh is PID 1 thus java process is not receiving SIGTERM). By default if the POD doesn't stop under 30 seconds (configurable with the stop_grace_period parameter) it gets killed using SIGKILL signal. We probably need to increase the value of stop_grace_period since requests often take more than 30s to execute.

       

      3 -  BP needs to handle SIGTERM and wait for the in-flight requests to finish before stopping. This part will need further discussion

       

            spremont spremont
            spremont spremont
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: