Details
-
Bug
-
Status: Closed
-
Low
-
Resolution: Done
-
Beijing Release, Casablanca Release
-
None
-
SB02
-
Policy Casablanca - 2
Description
platania reported a situation in SB-02 (OOM deployments) which causes policy controllers to stop processing events. ONSETs events were received but none processed by the PDP-D. Going thorugh the logs and inspecting the source code, it seems that the culprit is in the feature-pooling added new in Beijing. It seems that the feature-pooling calls the controller.stop() when there’s some issue with the channel. This in turn leaves the amsterdam controller stopped and never goes back to life, which explains the situation that Marco was seeing. ONSETs were going to drools-pdp-0, were discarded, and the other drools-pdp-1 did not take over. Actually, his lab was stuck for many hours until policy was restarted. The implementation has to be reviewed to avoid these drastic side effects. This seems to be it will be a not uncommon situation in a k8s environment.
Code that stops the controller :
@Override
public CountDownLatch internalTopicFailed() {
logger.error("communication failed for topic {}", topic);
..
new Thread(() -> {
controller.stop();
latch.countDown();
}).start();
..
}
+State of amsterdam controller in drools-pdp-0 (note that the controller is false):
+
HTTP/1.1 200 OK
Content-Length: 2247
Content-Type: application/json
Date: Thu, 31 May 2018 20:22:51 GMT
Server: Jetty(9.3.20.v20170531)
{
"alive": false,
"drools": {
"alive": false,
"artifactId": "policy-amsterdam-rules",
"brained": true,
"groupId": "org.onap.policy-engine.drools.amsterdam",
"locked": false,
"modelClassLoaderHash": 651347276,
"recentSinkEvents": [],
"recentSourceEvents": [],
"sessionCoordinates": [
"org.onap.policy-engine.drools.amsterdam:policy-amsterdam-rules:0.4.0:closedloop-amsterdam"
],
"sessions": [
"closedloop-amsterdam"
],
"version": "0.4.0"
},
"locked": false,
"name": "amsterdam",
"topicSinks": [
{
"alive": false,
"allowSelfSignedCerts": false,
"apiKey": "",
"apiSecret": "",
"locked": false,
"partitionKey": "ad654ed3-78f1-4fc1-bb41-5a554cbda1a8",
"recentEvents": [],
"servers": [
"message-router"
],
"topic": "APPC-CL",
"topicCommInfrastructure": "UEB",
"useHttps": false
},
{
"alive": false,
"allowSelfSignedCerts": false,
"apiKey": "",
"apiSecret": "",
"locked": false,
"partitionKey": "375fe28b-d575-40ba-9812-e1e8591a6079",
"recentEvents": [],
"servers": [
"message-router"
],
"topic": "APPC-LCM-READ",
"topicCommInfrastructure": "UEB",
"useHttps": false
},
{
"alive": false,
"allowSelfSignedCerts": false,
"apiKey": "",
"apiSecret": "",
"locked": false,
"partitionKey": "be167d60-09e1-40d3-a8e1-894955e293fa",
"recentEvents": [],
"servers": [
"message-router"
],
"topic": "POLICY-CL-MGT",
"topicCommInfrastructure": "UEB",
"useHttps": false
}
],
"topicSources": [
{
"alive": false,
"allowSelfSignedCerts": false,
"apiKey": "",
"apiSecret": "",
"consumerGroup": "dcae.policy.shared",
"consumerInstance": "dev-drools-1",
"fetchLimit": 100,
"fetchTimeout": 15000,
"locked": false,
"recentEvents": [],
"servers": [
"message-router"
],
"topic": "unauthenticated.DCAE_CL_OUTPUT",
"topicCommInfrastructure": "UEB",
"useHttps": false
},
{
"alive": false,
"allowSelfSignedCerts": false,
"apiKey": "",
"apiSecret": "",
"consumerGroup": "c602f0c9-f6dd-48e0-8524-2d560dae8998",
"consumerInstance": "dev-drools-1",
"fetchLimit": 100,
"fetchTimeout": 15000,
"locked": false,
"recentEvents": [],
"servers": [
"message-router"
],
"topic": "APPC-CL",
"topicCommInfrastructure": "UEB",
"useHttps": false
},
{
"alive": false,
"allowSelfSignedCerts": false,
"apiKey": "",
"apiSecret": "",
"consumerGroup": "acadf51f-1519-49cc-b923-551548bc8619",
"consumerInstance": "dev-drools-1",
"fetchLimit": 100,
"fetchTimeout": 15000,
"locked": false,
"recentEvents": [],
"servers": [
"message-router"
],
"topic": "APPC-LCM-WRITE",
"topicCommInfrastructure": "UEB",
"useHttps": false
}
]
}
Logs (see debug.2018-05-30.0.log at 10.12.5.123:
9991 [2018-05-30T16:17:18.523+00:00|INFO|CambriaConsumerImpl|UEB-source-POOLING] UEB GET /events/POOLING/a94d13ce-8e1f-4056-a7d2-b0c49195abfd/dev-drools-0?timeout=15000&limit=100&filter=%7B%22class%22%3A%22Or%22%2C%22filters%22%3A%5B%7B%22class%22%3A%22Equals%22%2C%22field%22%3A%22channel%22%2C%22value%22%3A%22_admin%22%7D%2C%7B%22class%22%3A%22And%22%2C%22filters%22%3A%5B%7B%22class%22%3A%22Equals%22%2C%22field%22%3A%22channel%22%2C%22value%22%3A%220d67cebb-0b80-41bf-b378-b7f4466c997e%22%7D%2C%7B%22class%22%3A%22Equals%22%2C%22field%22%3A%22timestampMs%22%2C%22value%22%3A%221527696929094%22%7D%5D%7D%5D%7D
9992 [2018-05-30T16:17:18.523+00:00|INFO|HttpClient|UEB-source-POOLING] GET http://message-router:3904/events/POOLING/a94d13ce-8e1f-4056-a7d2-b0c49195abfd/dev-drools-0?timeout=15000&limit=100&filter=%7B%22class%22%3A%22Or%22%2C%22filters%22%3A%5B%7B%22class%22%3A%22Equals%22%2C%22field%22%3A%22channel%22%2C%22value%22%3A%22_admin%22%7D%2C%7B%22class%22%3A%22And%22%2C%22filters%22%3A%5B%7B%22class%22%3A%22Equals%22%2C%22field%22%3A%22channel%22%2C%22value%22%3A%220d67cebb-0b80-41bf-b378-b7f4466c997e%22%7D%2C%7B%22class%22%3A%22Equals%22%2C%22field%22%3A%22timestampMs%22%2C%22value%22%3A%221527696929094%22%7D%5D%7D%5D%7D (anonymous) ...
10216 [2018-05-30T16:17:33.290+00:00|ERROR|PoolingManagerImpl|pool-9-thread-1] communication failed for topic POOLING
10217 [2018-05-30T16:17:33.290+00:00|INFO|AggregatedPolicyController|Thread-51] AggregatedPolicyController [name=amsterdam, alive=true, locked=false, droolsController=NullDroolsController []]: stop
10218 [2018-05-30T16:17:33.291+00:00|INFO|BusConsumer$CambriaConsumerWrapper|pool-9-thread-1] CambriaConsumerWrapper [fetchTimeout=15000]: setting DMAAP server-side filter: {"class":"Or","filters":[\{"class":"Equals","field":"channel","value":"_admin"},\{"class":"Equals","field":"channel","value":"0d67cebb-0b80-41bf-b378-b7f4466c997e"}]}
10219 [2018-05-30T16:17:33.293+00:00|INFO|State|pool-9-thread-1] entered InactiveState for topic POOLING
10220 [2018-05-30T16:17:33.293+00:00|INFO|BusConsumer$CambriaConsumerWrapper|Thread-51] CambriaConsumerWrapper [fetchTimeout=15000]: setting DMAAP server-side filter: {"class":"Or","filters":[\{"class":"Equals","field":"channel","value":"_admin"},\{"class":"Equals","field":"channel","value":"0d67cebb-0b80-41bf-b378-b7f4466c997e"}]}
10221 [2018-05-30T16:17:33.296+00:00|INFO|State|Thread-51] entered IdleState for topic POOLING
10222 [2018-05-30T16:17:33.296+00:00|INFO|DmaapManager|Thread-51] stop consuming from topic POOLING
10223 [2018-05-30T16:17:33.296+00:00|INFO|TopicBase|Thread-51] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=a94d13ce-8e1f-4056-a7d2-b0c49195abfd, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=CambriaConsumerWrapper [fetchTimeout=15000], alive=true, locked=false, uebThread=Thread[UEB-source-POOLING,5,main], topicListeners=1, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=POOLING, #recentEvents=10, locked=false, #topicListeners=1]]]: unregistering org.onap.policy.drools.pooling.PoolingManagerImpl@43599640
10224 [2018-05-30T16:17:33.296+00:00|INFO|InlineBusTopicSink|Thread-51] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=a94d13ce-8e1f-4056-a7d2-b0c49195abfd, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=CambriaConsumerWrapper [fetchTimeout=15000], alive=true, locked=false, uebThread=Thread[UEB-source-POOLING,5,main], topicListeners=0, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=POOLING, #recentEvents=10, locked=false, #topicListeners=0]]]: stopping
10225 [2018-05-30T16:17:33.296+00:00|INFO|PoolingManagerImpl|Thread-51] publish Offline to _admin on topic POOLING
10225 [2018-05-30T16:17:33.296+00:00|INFO|PoolingManagerImpl|Thread-51] publish Offline to _admin on topic POOLING
10226 [2018-05-30T16:17:33.297+00:00|INFO|TopicBase|Thread-51] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=dcae.policy.shared, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=CambriaConsumerWrapper [fetchTimeout=15000], alive=true, locked=false, uebThread=Thread[UEB-source-unauthenticated.DCAE_CL_OUTPUT,5,main], topicListeners=1, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=unauthenticated.DCAE_CL_OUTPUT, #recentEvents=0, locked=false, #topicListeners=1]]]: unregistering AggregatedPolicyController [name=amsterdam, alive=false, locked=false, droolsController=NullDroolsController []]
10227 [2018-05-30T16:17:33.297+00:00|INFO|InlineBusTopicSink|Thread-51] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=dcae.policy.shared, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=CambriaConsumerWrapper [fetchTimeout=15000], alive=true, locked=false, uebThread=Thread[UEB-source-unauthenticated.DCAE_CL_OUTPUT,5,main], topicListeners=0, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=unauthenticated.DCAE_CL_OUTPUT, #recentEvents=0, locked=false, #topicListeners=0]]]: stopping
10228 [2018-05-30T16:17:33.297+00:00|INFO|TopicBase|Thread-51] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=ce3c4088-f788-45e3-80d9-fa251f064341, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=CambriaConsumerWrapper [fetchTimeout=15000], alive=true, locked=false, uebThread=Thread[UEB-source-APPC-CL,5,main], topicListeners=1, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=APPC-CL, #recentEvents=0, locked=false, #topicListeners=1]]]: unregistering AggregatedPolicyController [name=amsterdam, alive=false, locked=false, droolsController=NullDroolsController []]
10229 [2018-05-30T16:17:33.297+00:00|INFO|InlineBusTopicSink|Thread-51] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=ce3c4088-f788-45e3-80d9-fa251f064341, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=CambriaConsumerWrapper [fetchTimeout=15000], alive=true, locked=false, uebThread=Thread[UEB-source-APPC-CL,5,main], topicListeners=0, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=APPC-CL, #recentEvents=0, locked=false, #topicListeners=0]]]: stopping
10230 [2018-05-30T16:17:33.297+00:00|INFO|TopicBase|Thread-51] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=94d49b6c-6461-49e0-b479-4dd0483b124e, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=CambriaConsumerWrapper [fetchTimeout=15000], alive=true, locked=false, uebThread=Thread[UEB-source-APPC-LCM-WRITE,5,main], topicListeners=1, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=APPC-LCM-WRITE, #recentEvents=0, locked=false, #topicListeners=1]]]: unregistering AggregatedPolicyController [name=amsterdam, alive=false, locked=false, droolsController=NullDroolsController []]
10231 [2018-05-30T16:17:33.297+00:00|INFO|InlineBusTopicSink|Thread-51] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=94d49b6c-6461-49e0-b479-4dd0483b124e, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=CambriaConsumerWrapper [fetchTimeout=15000], alive=true, locked=false, uebThread=Thread[UEB-source-APPC-LCM-WRITE,5,main], topicListeners=0, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=APPC-LCM-WRITE, #recentEvents=0, locked=false, #topicListeners=0]]]: stopping
10232 [2018-05-30T16:17:33.672+00:00|INFO|HttpClient|UEB-source-POOLING] --> HTTP/1.1 200 OK
10233 [2018-05-30T16:17:33.672+00:00|INFO|InlineBusTopicSink|UEB-source-POOLING] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=a94d13ce-8e1f-4056-a7d2-b0c49195abfd, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=null, alive=false, locked=false, uebThread=Thread[UEB-source-POOLING,5,main], topicListeners=0, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=POOLING, #recentEvents=10, locked=false, #topicListeners=0]]]: exiting thread
10235 [2018-05-30T16:17:34.309+00:00|WARN|HostSelector|pool-4-thread-1] All hosts were blacklisted; reverting to full set of hosts.
10236 [2018-05-30T16:17:34.309+00:00|INFO|HttpClient|pool-4-thread-1] POST http://message-router:3904/events/POOLING (anonymous) ...
10237 [2018-05-30T16:17:36.298+00:00|INFO|DmaapManager|Thread-51] stop publishing to topic POOLING
10238 [2018-05-30T16:17:37.960+00:00|INFO|InlineBusTopicSink|UEB-source-unauthenticated.DCAE_CL_OUTPUT] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=dcae.policy.shared, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=null, alive=false, locked=false, uebThread=Thread[UEB-source-unauthenticated.DCAE_CL_OUTPUT,5,main], topicListeners=0, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=unauthenticated.DCAE_CL_OUTPUT, #recentEvents=0, locked=false, #topicListeners=0]]]: exiting thread
10239 [2018-05-30T16:17:38.020+00:00|INFO|InlineBusTopicSink|UEB-source-APPC-CL] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=ce3c4088-f788-45e3-80d9-fa251f064341, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=null, alive=false, locked=false, uebThread=Thread[UEB-source-APPC-CL,5,main], topicListeners=0, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=APPC-CL, #recentEvents=0, locked=false, #topicListeners=0]]]: exiting thread
10240 [2018-05-30T16:17:38.028+00:00|INFO|InlineBusTopicSink|UEB-source-APPC-LCM-WRITE] SingleThreadedUebTopicSource [getTopicCommInfrastructure()=UEB, toString()=SingleThreadedBusTopicSource [consumerGroup=94d49b6c-6461-49e0-b479-4dd0483b124e, consumerInstance=dev-drools-0, fetchTimeout=15000, fetchLimit=100, consumer=null, alive=false, locked=false, uebThread=Thread[UEB-source-APPC-LCM-WRITE,5,main], topicListeners=0, toString()=BusTopicBase [apiKey=, apiSecret=, useHttps=false, allowSelfSignedCerts=false, toString()=TopicBase servers=[message-router], topic=APPC-LCM-WRITE, #recentEvents=0, locked=false, #topicListeners=0]]]: exiting thread
10241 [2018-05-30T16:17:43.820+00:00|INFO|HttpClient|UEB-source-PDPD-CONFIGURATION] --> HTTP/1.1 200 OK
10242 [2018-05-30T16:17:43.820+00:00|INFO|CambriaConsumerImpl|UEB-source-PDPD-CONFIGURATION] UEB GET /events/PDPD-CONFIGURATION/52035527-511e-4385-9ba6-dacf63d6b335/dev-drools-0?timeout=15000&limit=100
10243 [2018-05-30T16:17:43.820+00:00|INFO|HttpClient|UEB-source-PDPD-CONFIGURATION] GET http://message-router:3904/events/PDPD-CONFIGURATION/52035527-511e-4385-9ba6-dacf63d6b335/dev-drools-0?timeout=15000&limit=100 (anonymous) ...
…