-
Bug
-
Resolution: Done
-
Medium
-
New Delhi Release
-
None
Description:
If we use strimzi kafka where authorization needed, but authorization of kafka consumer is failing, kafka consumer stopping in NCMP and it is restarted after pod restart only.
Any temporary kafka bootstrap issue can cause similar error.
Spring kafka consumer container do not have retry on Authorization/Authentication failure by default, so it stops the consumer container. It is causing that NCMP application is running but kafka consumers are stopped till next pod restart.
This can happen especially during deployment if kafka starting together with NCMP, but later any kafka bootstrap failure can cause similar.
Logs:
It can be seen here that kafka consumer is stopped on auth exception because no retry set
{"version":"1.2.0","timestamp":"2024-02-05T10:40:36.673+0000","severity":"warning","service_id":"ncmp","message":"[Consumer clientId=consumer-ncmp-async-rest-request-event-group-3, groupId=ncmp-async-rest-request-event-group] Error while fetching metadata with correlation id 2 : {ncmp-async-m2m=TOPIC_AUTHORIZATION_FAILED}","extra_data":{"logger":"org.apache.kafka.clients.NetworkClient","thread_info":{"thread_name":"org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1"}},"metadata":{"pod_name":"ncmp-865785d685-b6rg9","pod_uid":"8376c34a-ecf9-4501-ba38-3c1b431d5d5c","container_name":"ncmp","node_name":"node-10-63-135-214","namespace":"testtopic"}} {"version":"1.2.0","timestamp":"2024-02-05T10:40:36.673+0000","severity":"error","service_id":"ncmp","message":"[Consumer clientId=consumer-ncmp-async-rest-request-event-group-3, groupId=ncmp-async-rest-request-event-group] Topic authorization failed for topics [ncmp-async-m2m]","extra_data":{"logger":"org.apache.kafka.clients.Metadata","thread_info":{"thread_name":"org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1"}},"metadata":{"pod_name":"ncmp-865785d685-b6rg9","pod_uid":"8376c34a-ecf9-4501-ba38-3c1b431d5d5c","container_name":"ncmp","node_name":"node-10-63-135-214","namespace":"testtopic"}} {"version":"1.2.0","timestamp":"2024-02-05T10:40:36.674+0000","severity":"error","service_id":"ncmp","message":"Authentication/Authorization Exception and no authExceptionRetryInterval set","extra_data":{"logger":"org.springframework.kafka.listener.KafkaMessageListenerContainer","thread_info":{"thread_name":"org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1"},"exception":{"stack_trace":"org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [ncmp-async-m2m]\n"}},"metadata":{"pod_name":"ncmp-865785d685-b6rg9","pod_uid":"8376c34a-ecf9-4501-ba38-3c1b431d5d5c","container_name":"ncmp","node_name":"node-10-63-135-214","namespace":"testtopic"}}
Faulty version
CPS 3.4.1
Reproduction:
- Set authorization but turn off anonymous access in strimzi kafka resource (can be found under kafkas.kafka.strimzi.io)
spec:
kafka:
authorization:
superUsers:
- nonexistinguser
type: simple - Restart ncmp pod. You will see that consumer stopped
- Enable anonymous access in strimzi kafka resource (can be found under kafkas.kafka.strimzi.io)
spec:
kafka:
authorization:
superUsers:
- ANONYMOUS
type: simple - It can be seen that in NCMP the kafka consumers are not recovered and the consumer related functionalities (like async requests are not working)
Expected behavior:
Have a kafka consumer container retry for these temporary kafka auth failures
In org.onap.cps.ncmp.api.impl.config.kafka.KafkaConfig.java AuthRetry can be added to every containerFactory creation
containerFactory.getContainerProperties().setAuthExceptionRetryInterval(Duration.ofSeconds(10L));
With this correction we will see retries until authorization succeeded:
{"version":"1.2.0","timestamp":"2024-02-05T10:31:05.040+0000","severity":"warning","service_id":"ncmp","message":"[Consumer clientId=consumer-ncmp-data-operation-event-group-5, groupId=ncmp-data-operation-event-group] Error while fetching metadata with correlation id 101385 : {ncmp-async-m2m=TOPIC_AUTHORIZATION_FAILED}","extra_data":{"logger":"org.apache.kafka.clients.NetworkClient","thread_info":{"thread_name":"kafka-coordinator-heartbeat-thread | ncmp-data-operation-event-group"}},"metadata":{"pod_name":"ncmp-55b555bf96-q8svg","pod_uid":"31f768ae-2449-41d9-a799-547917612e24","container_name":"ncmp","node_name":"node-10-63-135-219","namespace":"testnamespace"}} {"version":"1.2.0","timestamp":"2024-02-05T10:31:05.040+0000","severity":"error","service_id":"ncmp","message":"[Consumer clientId=consumer-ncmp-data-operation-event-group-5, groupId=ncmp-data-operation-event-group] Topic authorization failed for topics [ncmp-async-m2m]","extra_data":{"logger":"org.apache.kafka.clients.Metadata","thread_info":{"thread_name":"kafka-coordinator-heartbeat-thread | ncmp-data-operation-event-group"}},"metadata":{"pod_name":"ncmp-55b555bf96-q8svg","pod_uid":"31f768ae-2449-41d9-a799-547917612e24","container_name":"ncmp","node_name":"node-10-63-135-219","namespace":"testnamespace"}} {"version":"1.2.0","timestamp":"2024-02-05T10:31:05.041+0000","severity":"error","service_id":"ncmp","message":"[Consumer clientId=consumer-ncmp-data-operation-event-group-5, groupId=ncmp-data-operation-event-group] Heartbeat thread failed due to unexpected error","extra_data":{"logger":"org.apache.kafka.clients.consumer.internals.ConsumerCoordinator","thread_info":{"thread_name":"kafka-coordinator-heartbeat-thread | ncmp-data-operation-event-group"},"exception":{"stack_trace":"org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [ncmp-async-m2m]\n"}},"metadata":{"pod_name":"ncmp-55b555bf96-q8svg","pod_uid":"31f768ae-2449-41d9-a799-547917612e24","container_name":"ncmp","node_name":"node-10-63-135-219","namespace":"testnamespace"}} {"version":"1.2.0","timestamp":"2024-02-05T10:31:14.855+0000","severity":"error","service_id":"ncmp","message":"Authentication/Authorization Exception, retrying in 10000 ms","extra_data":{"logger":"org.springframework.kafka.listener.KafkaMessageListenerContainer","thread_info":{"thread_name":"org.springframework.kafka.KafkaListenerEndpointContainer#1-0-C-1"},"exception":{"stack_trace":"org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [ncmp-async-m2m]\n"}},"metadata":{"pod_name":"ncmp-55b555bf96-q8svg","pod_uid":"31f768ae-2449-41d9-a799-547917612e24","container_name":"ncmp","node_name":"node-10-63-135-219","namespace":"testnamespace"}}