Can't a CPU bound task just make itself preemptible in Python by calling some du...

orisho · on March 8, 2020

Yes, but you shouldn't just do that every time as calling into the event loop has its own processing cost. If you do that on each iteration of the loop, it will probably become much slower.

At a previous job, I wrote a wrapper that would compare the last execution time of the event loop scheduler, and if it was larger than some value, it would yield (by sleeping for 0ms). This was intended for use in loops with varying length, which might be long and could be short depending on some external factors, but yielding on each loop iteration made them slower by an order of magnitude.

IIRC It was sending messages to RabbitMQ, which somewhere down the line is writing to a socket, but not necessarily blocking - if the socket manages to flush its buffer fast enough (faster than the processing code can send messages), writing to it may never block, resulting in a CPU bound loop (since the loop may perform work other than sending).

We didn't want people to have to think too hard about how their loop was going to behave (especially as it might mean reasoning about the internals of a third party library), and so the wrapper was born. If your loop was short enough, all it did was compare ints so the cost was negligible.

anaphor · on March 8, 2020

Doesn't that mean you have to somehow know when it's been running too long and then yield back control somehow? What if the slow operation is something that's atomic, like multiplying some huge numbers?

NewJazz · on March 8, 2020

Well the example given is decoding JSON. If that is happening in a long loop, you can yield once per loop and be safe. Not all problems are neatly broken apart like that, but in those cases how much of a chance does the server have to not timeout regardless, you know?

Note that once per loop might be too often, but you can just measure how long a loop run typically takes and compare to how soon you want to preempt the task, then yield at the right interval.

emj · on March 8, 2020

Seems like abstractions will bite you there, most will just do cool_library.unmarshall(request) in some variant, those libraries will not have the same method for yielding as you have.

TeMPOraL · on March 8, 2020

Abstractions are meant to be broken! One could probably work around this problem by adding new functions to cool_library, or modifying existing ones, whose code would be copy-pasted from the library but with some asyncio.sleep(0) calls spliced in strategic places :). For legacy projects, it may make more sense to cheat like this than to rewrite the whole project in a saner tech stack.

Too · on March 8, 2020

Before webworkers this was how you did things in the browser to not get the "page is not responding" popup on computationally expensive operations. Break each big operation into many small operations and step to the next phase using setTimeout(1,..)

pdonis · on March 8, 2020

Many CPU bound tasks don't have convenient break points that are short enough times apart where doing this would be useful.

_pgmf · on March 8, 2020

Yes, `gevent.sleep()` or `time.sleep()` (if you monkey-patch) will yield control back to the scheduler.