Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: High
Fix Version/s: Casablanca Release
Affects Version/s: Beijing Release
Labels:
- OOM

Sprint:
CLAMP-R3 Sprint 1

Over the past days, it has been noticed that the clamp container has restarted several times in the different environments. Additional information can be found on the following dashboard:

http://onapci.org/grafana/d/kRvfoqKmz/oom-container-restarts?orgId=1

Additional information from Gary on Aug 7th

Generally speaking, a container restart means that:

The liveness probe initial delay is too low, and
The liveness probe, when triggered, is returning the wrong status while the container is still initializing (i.e. not dead).

The default liveness probe in OOM is a TCP probe with a short (10 seconds?) initial delay. This means that if the TCP port specified is not responding at 10s after container start, then the container will be killed and restarted.

There are two ways to remedy this:

Make the initial delay long enough to guarantee that by then the TCP port will be up, keeping in mind that ONAP might be running on arbitrarily slow hardware.
Change the liveness probe to something else, perhaps to a shell script probe inside the container that can do more sophisticated checks like process status, etc.

The deployments are done daily at midnight Pacific so anyone who is actively working a restart issue should have plenty of time to gather information and try out liveness probe changes.

Please contact Gary Wu if you have any query

is duplicated by

CLAMP-207 POD is going in CRASHLOOPBACKOFF state

Closed

mentioned in: Page Loading...

Assignee:: sebdet

Reporter:: katel34

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 07/Aug/18 11:56 AM

Updated:: 12/Aug/23 4:14 AM

Resolved:: 08/Aug/18 2:54 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates