Over the past days, it has been noticed that the clamp container has restarted several times in the different environments. Additional information can be found on the following dashboard:
Additional information from Gary on Aug 7th
Generally speaking, a container restart means that:
- The liveness probe initial delay is too low, and
- The liveness probe, when triggered, is returning the wrong status while the container is still initializing (i.e. not dead).
The default liveness probe in OOM is a TCP probe with a short (10 seconds?) initial delay. This means that if the TCP port specified is not responding at 10s after container start, then the container will be killed and restarted.
There are two ways to remedy this:
- Make the initial delay long enough to guarantee that by then the TCP port will be up, keeping in mind that ONAP might be running on arbitrarily slow hardware.
- Change the liveness probe to something else, perhaps to a shell script probe inside the container that can do more sophisticated checks like process status, etc.
The deployments are done daily at midnight Pacific so anyone who is actively working a restart issue should have plenty of time to gather information and try out liveness probe changes.
Please contact Gary Wu if you have any query