Uploaded image for project: 'Configuration Persistence Service'
  1. Configuration Persistence Service
  2. CPS-2070

No retry on kafka consumer authorization failure causing stopped kafka consumer, which can be restarted with pod restart only

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • New Delhi Release
    • New Delhi Release
    • NCMP
    • None

      Description:
      If we use strimzi kafka where authorization needed, but authorization of kafka consumer is failing, kafka consumer stopping in NCMP and it is restarted after pod restart only.
      Any temporary kafka bootstrap issue can cause similar error.
      Spring kafka consumer container do not have retry on Authorization/Authentication failure by default, so it stops the consumer container. It is causing that NCMP application is running but kafka consumers are stopped till next pod restart.
      This can happen especially during deployment if kafka starting together with NCMP, but later any kafka bootstrap failure can cause similar.

      Logs:
      It can be seen here that kafka consumer is stopped on auth exception because no retry set

      {"version":"1.2.0","timestamp":"2024-02-05T10:40:36.673+0000","severity":"warning","service_id":"ncmp","message":"[Consumer clientId=consumer-ncmp-async-rest-request-event-group-3, groupId=ncmp-async-rest-request-event-group] Error while fetching metadata with correlation id 2 : {ncmp-async-m2m=TOPIC_AUTHORIZATION_FAILED}","extra_data":{"logger":"org.apache.kafka.clients.NetworkClient","thread_info":{"thread_name":"org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1"}},"metadata":{"pod_name":"ncmp-865785d685-b6rg9","pod_uid":"8376c34a-ecf9-4501-ba38-3c1b431d5d5c","container_name":"ncmp","node_name":"node-10-63-135-214","namespace":"testtopic"}}
      {"version":"1.2.0","timestamp":"2024-02-05T10:40:36.673+0000","severity":"error","service_id":"ncmp","message":"[Consumer clientId=consumer-ncmp-async-rest-request-event-group-3, groupId=ncmp-async-rest-request-event-group] Topic authorization failed for topics [ncmp-async-m2m]","extra_data":{"logger":"org.apache.kafka.clients.Metadata","thread_info":{"thread_name":"org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1"}},"metadata":{"pod_name":"ncmp-865785d685-b6rg9","pod_uid":"8376c34a-ecf9-4501-ba38-3c1b431d5d5c","container_name":"ncmp","node_name":"node-10-63-135-214","namespace":"testtopic"}}
      {"version":"1.2.0","timestamp":"2024-02-05T10:40:36.674+0000","severity":"error","service_id":"ncmp","message":"Authentication/Authorization Exception and no authExceptionRetryInterval set","extra_data":{"logger":"org.springframework.kafka.listener.KafkaMessageListenerContainer","thread_info":{"thread_name":"org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1"},"exception":{"stack_trace":"org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [ncmp-async-m2m]\n"}},"metadata":{"pod_name":"ncmp-865785d685-b6rg9","pod_uid":"8376c34a-ecf9-4501-ba38-3c1b431d5d5c","container_name":"ncmp","node_name":"node-10-63-135-214","namespace":"testtopic"}}

       

      Faulty version
      CPS 3.4.1

       

      Reproduction:

      • Set authorization but turn off anonymous access in strimzi kafka resource (can be found under kafkas.kafka.strimzi.io)
        spec:
          kafka:
            authorization:
              superUsers:
              - nonexistinguser
              type: simple
      • Restart ncmp pod. You will see that consumer stopped
      • Enable anonymous access in strimzi kafka resource (can be found under kafkas.kafka.strimzi.io)
        spec:
          kafka:
            authorization:
              superUsers:
              - ANONYMOUS
              type: simple
      • It can be seen that in NCMP the kafka consumers are not recovered and the consumer related functionalities (like async requests are not working)
             

      Expected behavior:
      Have a kafka consumer container retry for these temporary kafka auth failures
      In org.onap.cps.ncmp.api.impl.config.kafka.KafkaConfig.java  AuthRetry can be added to every containerFactory creation
              containerFactory.getContainerProperties().setAuthExceptionRetryInterval(Duration.ofSeconds(10L));
              
      With this correction we will see retries until authorization succeeded:

      {"version":"1.2.0","timestamp":"2024-02-05T10:31:05.040+0000","severity":"warning","service_id":"ncmp","message":"[Consumer clientId=consumer-ncmp-data-operation-event-group-5, groupId=ncmp-data-operation-event-group] Error while fetching metadata with correlation id 101385 : {ncmp-async-m2m=TOPIC_AUTHORIZATION_FAILED}","extra_data":{"logger":"org.apache.kafka.clients.NetworkClient","thread_info":{"thread_name":"kafka-coordinator-heartbeat-thread | ncmp-data-operation-event-group"}},"metadata":{"pod_name":"ncmp-55b555bf96-q8svg","pod_uid":"31f768ae-2449-41d9-a799-547917612e24","container_name":"ncmp","node_name":"node-10-63-135-219","namespace":"testnamespace"}}
      {"version":"1.2.0","timestamp":"2024-02-05T10:31:05.040+0000","severity":"error","service_id":"ncmp","message":"[Consumer clientId=consumer-ncmp-data-operation-event-group-5, groupId=ncmp-data-operation-event-group] Topic authorization failed for topics [ncmp-async-m2m]","extra_data":{"logger":"org.apache.kafka.clients.Metadata","thread_info":{"thread_name":"kafka-coordinator-heartbeat-thread | ncmp-data-operation-event-group"}},"metadata":{"pod_name":"ncmp-55b555bf96-q8svg","pod_uid":"31f768ae-2449-41d9-a799-547917612e24","container_name":"ncmp","node_name":"node-10-63-135-219","namespace":"testnamespace"}}
      {"version":"1.2.0","timestamp":"2024-02-05T10:31:05.041+0000","severity":"error","service_id":"ncmp","message":"[Consumer clientId=consumer-ncmp-data-operation-event-group-5, groupId=ncmp-data-operation-event-group] Heartbeat thread failed due to unexpected error","extra_data":{"logger":"org.apache.kafka.clients.consumer.internals.ConsumerCoordinator","thread_info":{"thread_name":"kafka-coordinator-heartbeat-thread | ncmp-data-operation-event-group"},"exception":{"stack_trace":"org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [ncmp-async-m2m]\n"}},"metadata":{"pod_name":"ncmp-55b555bf96-q8svg","pod_uid":"31f768ae-2449-41d9-a799-547917612e24","container_name":"ncmp","node_name":"node-10-63-135-219","namespace":"testnamespace"}}
      {"version":"1.2.0","timestamp":"2024-02-05T10:31:14.855+0000","severity":"error","service_id":"ncmp","message":"Authentication/Authorization Exception, retrying in 10000 ms","extra_data":{"logger":"org.springframework.kafka.listener.KafkaMessageListenerContainer","thread_info":{"thread_name":"org.springframework.kafka.KafkaListenerEndpointContainer#1-0-C-1"},"exception":{"stack_trace":"org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [ncmp-async-m2m]\n"}},"metadata":{"pod_name":"ncmp-55b555bf96-q8svg","pod_uid":"31f768ae-2449-41d9-a799-547917612e24","container_name":"ncmp","node_name":"node-10-63-135-219","namespace":"testnamespace"}}

            mpriyank Priyank Maheshwari
            csaba.eder csaba Eder
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: