-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codec Error: Broken Pipe #153
Comments
Not sure that I have enough information to help..
According to ZMQ spec client (the side that does
So in terms of application logic it seams reasonable to catch this error and resend the same message to other clients if there are any. Or maybe wait for some short period of time to allow the same client to reconnect and resend the message again to the same client. So I would say that in such case you should implement a small loop with back-off that would be responsible for resending messages in case of such errors.
|
Thanks for the quick reply. :)
Seems like the initial connection and first few packets work okay, it's an intermittent error. After re-connecting and re-sending it seems to work for another handful of packets before the pipe breaks again. |
Pub socket has different behaviour and has this error handled internally - https://github.com/zeromq/zmq.rs/blob/master/src/pub.rs#L179
|
Sorry, you're right. Second link should have been here. Wireshark capture attached with one good sequence (execute_request + execute_reply) followed by one failed sequence. It not obvious to me that there is any difference between the packets, although it's easy to see the TCP RST from Jupyter after the failed sequence. |
If I drop a debug message in the notebook application where it receives ZMQ messages I can see that half the messages never make it up the stack -- and it seems to be a pattern of every-other packet is dropped (sometimes the pipe closes when a message is dropped, sometimes it doesn't). After about 8 messages, the dropped messages stop and it seems to work fine from there on out. A partial stack trace from the app (most recent call in the stack at the bottom) is:
It seems most likely that the packets are getting dropped somewhere in |
I'll be able to take a look later this week. Currently busy with other stuff.. |
Ok. so I've done some debugging and checked the network dump you provided. I've run both packets through a decoder and both of them contain valid messages:
So this issue is not related with messages formed incorrectly or damaged. My thoughts on this situation is the following:
So according to the spec you should have some logic in your application code that would handle this error appropriately (by waiting and resending the message or passing this error on a higher level)
|
Good catch on the port numbers, I think that's symptomatic of the problem. I would think that's because the TCP connection dropped and the client opened up a new connection with an ephemeral port. It's odd that we don't see a new TCP SYN / ACK sequence and that the new port issues a TCP RST indicating that it wasn't expecting packets on that port. I'm not sure why our ZMQ socket would suddenly switch ports without a new TCP connection? According to the Jupyter spec their client is using a Dealer socket. |
I've took a quick look on the specs and it seems that you might be missing some parts in your implementation: For example the doc says:
As far as I see from your code you just create sockets with default parameters and the get random UUIDs as identities. You can check this example to see how to assign specific identity to your sockets - https://github.com/zeromq/zmq.rs/blob/master/examples/socket_client_with_options.rs#L10 I guess it might be reasonable to check correct implementation in Python and see how it's built |
@Alexei-Kornienko thanks for help in debugging and pointing out problem with I'm still getting |
@bartlomieju you use dealer socket to bind a port so it acts as a server. Client (pyzmq) is responsible for reconnect so you will still have this errors (in theory) so you just need to handle them somehow. Dealer socket (when acting as a server) will not handle this for you cause it doesn't have any guarantees that client will ever reconnect. |
Thanks you are obviously right. I will keep digging how that situation is handled in other kernels. |
When sending multiple packets in a relatively short period of time, it appears that sometimes the underlying communication pipe breaks. The error is
Codec Error: Broken pipe (os error 32)
. Here's a log with two examples: one, two. Here's a demo video so you can get a sense of timing.Not sure if this is somehow related to #151 or not.
The text was updated successfully, but these errors were encountered: