Conquest Delivery Errors

  • Hi Marcel, we've been having a lot of intermittent "Forward failed to connect" errors for quite some time, with no apparent rhyme or reason, and it only seems to be getting worse. We archive several days worth of logs (serverstatus.log, PacsTrouble, etc). After doing a grep of these files (on 3 of our conquest servers all running CentOS and Conquest 1417e) we're able to get a tally of how often and on which days we're seeing errors in the last several days (see below). Sometimes getting these failed forwards just needs a reboot and it magically works, other times our only option is to manually delete the potentially faulty file... But we often don't know what's faulty about it because the logs don't return detailed enough information.


    We're wondering a couple things.

    1) is there a way to get better logging for when image-sends fail? We have it on debug, however, we're not actually getting a real reason why the "forward failed to connect..."

    2) do you think upgrading to Conquest 1419b will either

    a) fix some of these failure instances

         b) and/or give us better logging?

    3) do you have any other recommendations?


    We're also seeing sometimes images will be sucessfully delivered to Conquest and then no other movement or errors or logging will occur on these images... We're not sure what to make of this. I'll post the config below as well.


    Errors | forward failed to connect, HERE.



    Our dicom.ini file, posted separately.

  • dicom.ini file, HERE. (posted separately from post above)

  • I'm working with kakarr0t on this issue. Our biggest question right now revolves around the fact that stopping and starting the process will result in successful deliveries. It's as if the ExportConverter thread itself becomes unstable or something. We have observed this issue on several export converters delivering to several differing destinations. I cap'd some packets but I did not find any answers in the TCP packets themselves.


    When this issue begins, the EC will begin processing data, then instantly fail delivery. It will remain in that state until the dgate process is restarted. If we begin deleting data from the EC queue, all the next data will also fail. If we stop/start, it's all retried and stored very very rapidly.

  • We upgraded yesterday actually. Unfortunately that did not resolve the issue.


    /edit: we have also now added an additional vCPU to the guest in case this was somehow related to processing power.


    /edit2: Marcel - I am reading in other threads about how you recommend setting up a forwarder and am unclear how it really is best to do so. We have settings similar to what you have outlined here Exporter Failures but sometimes see images missing from series that need to be redelivered. Speed is a big factor in what we do, so adding in a delay is a little scary sounding... but if we can't rely on the EC's to send everything without doing so that would be nice to know. For a group using ConQuest as a router, how would you recommend we develop our import/export converters to assure 100% store accuracy?


    /edit3: we are running on Linux just as an fyi

  • Hi,


    too bad. Can you strip down the export converters to their bare essence and try again? And what happens if you disable the delete converter? 'Failed to connect' of course points at a network error. Is the listed host correct? There may be a lookup error in acrnema.map. Otherwise it would be relatively easy to add a quiick retry there, as indicated below


    if (!PDU[part].Connect((unsigned char *)&host, (unsigned char *)&port))

    { Sleep(rand(1000)+500); // wait a bit, randomly to avoid network congestion on retry

    if (!PDU[part].Connect((unsigned char *)&host, (unsigned char *)&port)) // retry once

    { OperatorConsole.printf("*** ExportConverter%d.%d: Forward failed to connect to host %s\n", N, part, line+offset);

    delete DDO;

    return TRUE; // may be good idea to retry later

    }

    } // extra }


    Also the 'forward series' e.g. 'forward series to ARCHIVE1 after 50' converter applies a different mechanism so it would be interesting to see if the same behavior occurs.


    If you use 'forward series to ARCHIVE1 after 50' this will wait 50s after the first image. In 1.4.19c (not yet released) it will be 50s after the last image. The relevant change in into_queue_unique is show below:


    // search for identical items

    for (i=q->bottom; i!=(q->top+1)%q->num; i = (i+1)%q->num)

    { if (memcmp(q->data + i * q->entrysize, in, numunique)==0)

    { memcpy(q->data + i * q->entrysize, in, q->entrysize); // add to let delay work from last image

    LeaveCriticalSection(&q->critical);

    return FALSE;

    }

    }



    Marcel van Herk is developer of the Conquest DICOM server together with Lambert Zijp.

  • We have been holding steady (4 full days) since adding the additional vCPU to nodes in the cluster (from 2cpu to 3cpu on 4 nodes). We want to avoid additional change so we can be more sure about the root cause.


    Our Import and Export converters call scripting as of a few weeks ago which passes data to an influxDB using curl so we can visualize stats through Grafana. It's looking like the additional overhead from curl/HTTP was somehow causing this behavior.

  • This occurred again today on one node in the cluster. This time, it didn't appear to be driven by load as we have observed in the past.

    The red line in the top-left of the graph is the rough window in time vlconquest01 was misbehaving (failing to send all studies which stored). Stopped and started dgate and the behavior returned to normal.

  • Mem increase has had no effect. We observed this issue again today.


    What's so curious is this effects ExportConverters themselves. We have more than one ExportConverter sending to the same destination and one will fire successfully while the other does not. Stop/Start dgate and everything is fine again.