[clug-talk] nasty problem
lewkor at gmail.com
Thu May 18 23:40:18 PDT 2006
Well, I think that I got lucky. One thing that I noticed is that I had some
service.. What happened today is that I started getting UDP packets again.
Someone who has a lot more money than me or my client must have raised the
problem to a level where it got fixed. There's lots of places where you
just can't use TCP.
Question: did the explanation I gave shed light on the thing or confuse the
hell out of everyone?
On 5/18/06, Gustin Johnson <gustin at echostar.ca> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> Robert Lewko wrote:
> > On 5/18/06, *Gustin Johnson* <gustin at echostar.ca
> > <mailto:gustin at echostar.ca>> wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > Why are you using UDP instead of TCP? Since your app is going over
> > satellite, and latency is not as much of an issue as reliability,
> > seems better suited. If a UDP packet is lost on the wire, there is
> > mechanism to help you isolate the problem. It is possible that your
> > is being filtered. You could try to change the port, but I would
> > seriously consider TCP. Keep in mind I am a network admin and not a
> > developer, you may have a valid reason for using UDP, it is just
> > from my point of view.
> > The reason that I used UDP is that no fork/exec/new socket is used.
> > First you have to understand its not the latency that makes satellite
> > communication hard - although without altering any socket options TCP
> > will detect the latency and start to back off.
> > What happens is that you have two things to worry about. On the client
> > end you have to worry about losing the connection and reconnecting - ie.
> > one satellite passes out of LOS (line of site) and it can be up to
> > 20-25 minutes before the next one is in site but the average is 10
> > minutes without service. When the next satellite you have a 2-3 minute
> > period when you may have very sporadic network availablility. You may
> > have a 6 second period with network availability, just enough time to
> > dial and get a connection without getting data through. Once the next
> > satellite gets in sight you can have 90 minutes to 2 hours with good
> > service.
> This almost sounds like the old irridium satellite phone, where the
> actual satellites flew around in a LEO, with no handoff between the
> > So lets consider what it would look like if we used TCP. This
> > application gets a file in a directory (not my design in this part)
> > every 5 minutes. It parses the file, puts it in packets then sends it
> > to the server. So it does a write to a TCP socket and gets an error:
> > "No route to host". So it closes the socket and calls the windoze shit
> > that dials the network through the modem. Great! now you have a
> > connection. OK you are in one of those 6 second spots of connectivity
> > at the start of getting a new satellite. So you start to send a 4k
> > packet at 9600 baud. Do you see a problem? You won't get your packet
> > sent before the network goes down. Remember the 3 way handshake that
> > TCP needs to use before a connection is made. Well that uses up about 3
> > seconds right there. Using UDP you can actually get 2 2k packets
> > through with their ack returned in 6 seconds. BTW I have restricted
> > myself to a 2k packet size in my program.
> > What's happening on the server? There are two things you could do:
> > construct a single threaded server or one that uses fork/exec. They
> > each exhibit a different form of the same problem. The server accepts a
> > new connection. The accept system call receives a new connection on the
> > listening port, dups the connection on a free port and assigns a new fd
> > for that new connection.
> > The single threaded server will use fd's and the fork/exec server will
> > use slots in the process table. To make that clearer the single
> > threaded server will get activity that indicates a new connection, call
> > accept to get a new fd to communicate with the new connection and manage
> > that new connection in the next select call. The concurrent server (one
> > process per connection) will wait for accept to return a new fd for a
> > new connection, then it will fork/exec to make a new process to handle
> > that connection.
> > OK so data comes into that fd for a while until the client gets a broken
> > connection. At that point the server socket will wait for hours,
> > literally indefinitely for more data on that fd. So now you have to put
> > a timer on each fd/process so you can detect when no data has been
> > received for whatever timeout period that you decide to use (what do you
> > use for the timeout period?). Keep in mind that 2-3 minute period can
> > generate 10-12 broken connections. So depending on which server design
> > used there will either 10-12 unused fd's that have no client or 10-12
> > processes that are there listening with no client to give them data.
> > Also know that there are possible 10 to 12 mobile systems doing that and
> > now you have the possibility of 100 to 150 unused resources that need to
> > be cleaned up and that each process has a maximum number of fd's and a
> > maximum number of processes that can run. So, what if they bought
> > another company with a similar number of trucks or another company 10
> > times larger bought them 'cause of the way that they do "real time"
> > testing? Instant problem!
> > This whole discussion is based on that when I get a broken connection
> > when the client sends some data that there is no way to tell the socket
> > that it can try again. If someone knows how to do that and can point me
> > to docs then I will be glad of the info. In my reading of Stevens I
> > didn't see how to recover from a broken connection.
> > Using UDP just side steps these issues. You put the responsibility for
> > the communication on the client. The client is the one that detects
> > when the packet has not been sent by putting a sequence/timestamp in
> > each packet and comparing that tuple to each packet that is returned.
> > If the seq/ts does not match the one you are looking for then dump it.
> > When it does match you can process the packet and transmit the next one
> > (there are more efficient ways of handling multiple packets, but I'm
> > keeping it simple).
> > The UDP server can be MUCH simpler by handling each packet as a self
> > contained entity. It gets a packet from the client, processes that
> > packet, then uses the source addresss as the destination for the return
> > packet. No fork, no exec, no accept/new fd, no cleanup. With UDP you
> > can have one process that deals with one packet at a time. What you
> > have to ensure is that the server does not get busy enough that clients
> > will get thier response before the end of the retransmit delay.
> It sounds like you are between a rock and a hard place, since tracking
> down UDP packet loss is not fun. Personally, if I was asked to support
> this environment, I would push to replace Layers 1&2 with something more
> robust. I am assuming that this is not an option. I am also going to
> guess that GPRS and TDMA (cell data networks) are not viable options? I
> do not envy the work you have in front of you. You really were not
> being overly dramatic with the subject line.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> -----END PGP SIGNATURE-----
> clug-talk mailing list
> clug-talk at clug.ca
> Mailing List Guidelines (http://clug.ca/ml_guidelines.php)
> **Please remove these lines when replying
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the clug-talk