-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
try piggy-backing on tornado for proactor loop support #1524
Conversation
a6f6e8c
to
49661cc
Compare
use vendored copy of tornado's AddThread as a separate SelectorThread object
try to avoid leaking loop closers
@minrk Quick question, when using asyncio on windows, I get a warning when using the Proactor event loop, I have tornado 6.1 installed. Is this expected?
I am having a persistent issue on windows where some tasks seem to suddenly stop receiving messages. I am implementing the MDP protocol in python, and I have automated tests that create multiple workers as asyncio tasks to simulate a busy server. As the first few of the workers complete, the rest of the workers suddenly stop reporting heartbeats and the test hangs forever. I imagine this is something I have done, painting myself into a corner somehow. But it would be great to know if any of this sounds suspect ;) |
You can try calling |
Thanks! I did try that, results were that the warning does go away at least. Still having issues where the test will progress for a while but then hangs eventually. I can't 100% say for sure the its due to this though, I am also having problems on linux too, so for the moment, I can't really prove that the hang is due to the event loop policy. If that changes I'll report back. |
If changing the policy still hangs, then I think it's probably not that, and something else, possibly related to edge-triggering issues. These things can be hard to track down! |
Indeed! I am digging, the problem is that it's difficult to reproduce reliably. There are other things in here than pyzmq, for example the python logging module. I am currently removing all logging to check that this is not a factor. So I'm proceeding to eliminate things by removing them where I can. Will let you know if anything points back at ZMQ. |
Ok, have a question I am seeing an error on linux now. This error suggests I am exhausting the file descriptor quota. I had a look at the offending process, looks like it is accumulating fd's alright, but I wondered if you'd be able to tell me if this looks like something the asyncio pyzmq sockets might use? The majority of the fd's in use are of the type eventfd. Man page on that is here It basically says that these are used an event wait/notify mechanism by user-space applications, so I am guessing that this either
It seems like they are not being released, but can't confirm. The process hung, so they might have been if it had closed :)
|
Certainly possible, but I can't be sure. I don't know exactly what operations create these. You might check It's conceivable you have launched some task/future that you lose track of without awaiting or cancelling. This could be due to your code, or even a pyzmq bug. |
Ok, I think I am getting close (though honestly concurrent programming can certainly prove me wrong it seems) I have several workers which each run as an asyncio task. Each worker has a zmq.DEALER socket, plus, I create a monitor socket for each of the dealers using get_monitor_socket. During a shutdown I call cancel on each worker task, this triggers a shutdown handler which calls the disable_monitor() method for the dealer socket. This is where the loop hangs. It seems a little bit random as sometimes a few workers will all be able to cleanly shutdown, but then one will hang the loop on the call to disable_monitor. I gets the feels that I may have abused disable_monitor, or sockets, or both here. Is there a right way to cleanup a socket and its monitor socket? I am willing to bet, when multiple sockets with monitors attached are concerned, I am probably not doing it right. |
OK, so, no need to wait, I decided to simply comment out the line that called disable_monitor and 'give it a ripper of a go' so to speak. Now, the loop no longer hangs, in fact the entire test suite seems to be passing consistently now. So it seems calling disable_monitor was the wrong thing to do? Just don't know why. Should I:-
|
If disable_monitor causes a hang, this suggests to me that there is a LINGER or ordering issue - that perhaps there are some messages not yet consumed by the monitor socket receiver, and the sender is blocking waiting for messages to be delivered. That's a bit of a guess, though. From this discussion you need to call disable before close on the monitor socket (disable closes the socket that |
Thanks :) I eventually got the test suite to pass on macOS and windows using the following self.zmq_socket.disable_monitor()
self.mon_sock.close(linger=0)
self.zmq_socket.close(linger=0) Which matches the discussion you just referred to, which I noticed I've actually been a part of. Seems that came back to bite me by not paying attention to it! Confirming this now works fine on windows and macOS, and also linux now. I still have a run away condition of too many open fd's happening on linux. But that's a story for another day I think. |
Tornado 6.1 enables support for proactor by running a separate selector loop in a thread
try piggy-backing on that functionality by using tornado's AddThreadEventLoop when someone attempts to use zmq.asyncio with proactorWent with vendoring SelectorThread from tornadoweb/tornado#3029 so no dependency is added.
closes #1521
closes #1423