-
Bug
-
Resolution: Done
-
Medium
-
New Delhi Release
-
None
-
None
Description:
We faced a Hazelcast OOM failure on the stability environment, after it seems that the NCMP and Hazelcast were not able to recover properly, and this caused further failures and constantly failing test runs on the given environment.
Logs:
pod which had the first OOM:
- eric-oss-ncmp-6c9f45c4d5-pwjf6_fen28.txt
- eric-oss-ncmp-6c9f45c4d5-pwjf6_feb29.txt
- 02_28_ncmp-04_pwjf6.log
2024-02-28T05:21:51.206Z@eric-oss-ncmp-04@ncmp@Stopping container due to an Error, logger: org.springframework.kafka.listener.KafkaMessageListenerContainer, thread_name: org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1, stack_trace: java.lang.OutOfMemoryError: Java heap space@
other pod:
- eric-oss-ncmp-6c9f45c4d5-gpxlx_fen28.txt
- eric-oss-ncmp-6c9f45c4d5-gpxlx_feb29.txt
- 02_28_ncmp-05_gpxlx.log
- 02_29_ncmp-05_gpxlx_1.log
- 02_29_ncmp-05_gpxlx_2.log
2024-02-29T09:25:25.466Z@eric-oss-ncmp-05@ncmp@java.lang.OutOfMemoryError: Java heap space
Faulty version:
CPS 3.4.3
Reproduction:
Stability env:
- Since the first failure NCMP related TCs are constantly failing on the stability environment.
- That environment has two big ENMs (30k + 50k) already discovered, and they are running PE functional tests (discovery of a small ENM) and CH load test against it.
Manual reproduction:
- EIC deployment with a big ENMs discovered (80k or 30k + 50k)
- sending multiple requests for searches or id-searches endpoints with model filters
Expected Behavior:
No OOM errors, or in case it happens accidently then a proper error handling and a healthy recovery.