z411 <z411%omaera.org@localhost> writes: > So with a increased the value it can occasionally hit higher speeds, > it's not much. And what's worse, it makes it very unstable. TCP functions by assuming that loss is due to congestion. (Plus, my impression is that CUBIC doesn't follow the congestion rules and is unfairly aggressive. However, it seems that the IETF gave up on that and defined it as ok.) If there is loss not due to congestion, TCP performs badly. However, absent flaky links, it is reasonably likely that most loss really is congestion. > In Wireshark, with kern.sbmax=262144 I very rarely see any packet loss > or retransmissions. Any higher than that and packet loss and slow > retransmissions become very frequent. Switching CC to CUBIC improves > speeds but not the heavy packet loss and slow recovery. I am assuming that "slow retransmissions" means retramissions that appear to be from timeout rather than "fast retransmit". This is a clue of more than one lost packet per window. > I also often see duplicate ACKs by the client for a good while before > slow retransmissions start (RTO?) with no fast retransmit happening > before that. That is perhaps a bug. You would have to go over the RFC and the code to figure it out. > But again, no loss happens in the first place when the > buffer is low. > > I uploaded a sample pcap: > http://u.omaera.org/nbsd.pcapng I analyze TCP by using xplot (in pkgsrc as graphics/xplot and graphics/xplot-devel), and have modified tcpdump2xplot to deal with the drift in tcpdump output over the years (since xplot was written in about 1989!). Looking at your file, it appears there is sudden massive packet loss, and there is fast retransmit, but the loss is beyond what fast retransmit can repair. > - Why could it be that packet loss almost doesn't happen with the > default buffer size, but becomes very frequent when it's increased? Because when the buffer is smaller TCP does not transmit enough to provoke loss. > - Is fast retransmit not being triggered? Long ago, I became suspicious of a bug in fast retransmit but never figured it out. I think the third duplicate ack is supposed to trigger it, in addition to new segments being clocked out by SACKS -- but it's been years since I read the RFC. > - Would it be appropriate to take this to tech-net@ as well? So far no, because you have not identified any misbehavior in NetBSD code. You're basically asking for help debugging what is going on. Which is 100% appropriate here. If you have an argument that there is a bug (relative to RFCs), that's fair on tech-net too, but I don't think you'd reach relevant people who are not here. If you have a proposal for code to auto-tune some of the buffer sizes, that will stand up to scrutiny across a range of ports, memory sizes, interfaces and network environments, that seems in scope. > [implied question: what do you think is going on?] It's really hard to be sure what's going on only having a trace at the receiver. It would be good to run tcpdump at both the sender and the receiver both (with synchronized clocks). Then you can see the packets being sent and which ones go missing. I think something in the network is badly behaved. Plus, I question the retransmit behavior, but without seeing the sender's view one can't say. We don't know if the acks we see are making it, and we don't know if more retransmits happened but don't arrive. There is a TCP feature called "pacing" that attempts to spread out transmissions, but I am really unclear on whether it exists in deployed code. My guess is no, not really.
Attachment:
view1.png
Description: PNG image
Attachment:
view2.png
Description: PNG image
Attachment:
view3.png
Description: PNG image
Attachment:
view4.png
Description: PNG image