Uploaded image for project: 'Configuration Persistence Service'
  1. Configuration Persistence Service
  2. CPS-2186

CPS NCMP: Async task failed, no error in topic and no information about failed cmhandles

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • New Delhi Release
    • New Delhi Release
    • NCMP
    • None

      Description
      We had issues during async calls, where the request was not forwarded to the dmi plugin and we had "async task failed" errors in ncmp logs.
      We adjusted the notification.executor.time-out-value-in-ms value to the ncmp.dmi.httpclient.connectionTimeoutInSeconds and spring.datasource.hikari.connectionTimeout values to avoid async task execution errors, but there still can be cases when we got error and we have just a basic error printout.
      If async task execution failed in CpsNcmpTaskExecutor there is only an error printout but no error sent to the topic and we do not have details, this way we do not know which async task (with which cmhandles) failed exactly. Even in dmi plugin we do not have any information about the failed request because it is not arrived there. Without detailed information the clients can not handle the failed requests and resend it if needed.

      Based on the cps ncmp code during an async request handling many thing happens:
      The actual dmi plugin sending is in DmiDataOperations.sendDataOperationRequestToDmiService, in case of any dmi plugin communication failure or request timeout we have 102/103 errors written in to the kafka topic, that is fine.
      There is DB call too to check READY cmhandles. If any DB exception is coming there than it is caught in CpsNcmpTaskExecutor at async execution.
      Any other code fault can cause exception which is handled only in CpsNcmpTaskExecutor at async execution.
      Async executor timeout still can came here.
      In these cases there wont be any error response in the topic and no detailed information in the logs.

      Affected version:
      CPS NCMP 3.4.6, 3.4.7

      Reproduction

      • Unreachable Database
        • Restart postgres
        • Send in an async request
        • DB connection related errors appearing in the logs, "Async task failed" error message appeared in the logs, no error event in the async response topic
      • Async executor timeout
        • Set notification.executor.time-out-value-in-ms to a really low value like 1ms
        • Send in an async request
        •  "Async task failed" error message appeared in the logs, no error event in the async response topic

      Expected behavior
      Send error in to the topic if exception handled in CpsNcmpTaskExecutor async task executor.
      Preferably with code 102, because that is retried by clients, but sure it depends on the nature of the exception.
      If it totally can not be solved we need at least detailed error logs.

            danielhanrahan Daniel Hanrahan
            csaba.eder csaba Eder
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: