[clug-talk] nasty problem
gustin at echostar.ca
Thu May 18 16:23:31 PDT 2006
-----BEGIN PGP SIGNED MESSAGE-----
Robert Lewko wrote:
> On 5/18/06, *Gustin Johnson* <gustin at echostar.ca
> <mailto:gustin at echostar.ca>> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> Why are you using UDP instead of TCP? Since your app is going over
> satellite, and latency is not as much of an issue as reliability, TCP
> seems better suited. If a UDP packet is lost on the wire, there is no
> mechanism to help you isolate the problem. It is possible that your app
> is being filtered. You could try to change the port, but I would
> seriously consider TCP. Keep in mind I am a network admin and not a
> developer, you may have a valid reason for using UDP, it is just S.E.P.
> from my point of view.
> The reason that I used UDP is that no fork/exec/new socket is used.
> First you have to understand its not the latency that makes satellite
> communication hard - although without altering any socket options TCP
> will detect the latency and start to back off.
> What happens is that you have two things to worry about. On the client
> end you have to worry about losing the connection and reconnecting - ie.
> one satellite passes out of LOS (line of site) and it can be up to
> 20-25 minutes before the next one is in site but the average is 10
> minutes without service. When the next satellite you have a 2-3 minute
> period when you may have very sporadic network availablility. You may
> have a 6 second period with network availability, just enough time to
> dial and get a connection without getting data through. Once the next
> satellite gets in sight you can have 90 minutes to 2 hours with good
This almost sounds like the old irridium satellite phone, where the
actual satellites flew around in a LEO, with no handoff between the
> So lets consider what it would look like if we used TCP. This
> application gets a file in a directory (not my design in this part)
> every 5 minutes. It parses the file, puts it in packets then sends it
> to the server. So it does a write to a TCP socket and gets an error:
> "No route to host". So it closes the socket and calls the windoze shit
> that dials the network through the modem. Great! now you have a
> connection. OK you are in one of those 6 second spots of connectivity
> at the start of getting a new satellite. So you start to send a 4k
> packet at 9600 baud. Do you see a problem? You won't get your packet
> sent before the network goes down. Remember the 3 way handshake that
> TCP needs to use before a connection is made. Well that uses up about 3
> seconds right there. Using UDP you can actually get 2 2k packets
> through with their ack returned in 6 seconds. BTW I have restricted
> myself to a 2k packet size in my program.
> What's happening on the server? There are two things you could do:
> construct a single threaded server or one that uses fork/exec. They
> each exhibit a different form of the same problem. The server accepts a
> new connection. The accept system call receives a new connection on the
> listening port, dups the connection on a free port and assigns a new fd
> for that new connection.
> The single threaded server will use fd's and the fork/exec server will
> use slots in the process table. To make that clearer the single
> threaded server will get activity that indicates a new connection, call
> accept to get a new fd to communicate with the new connection and manage
> that new connection in the next select call. The concurrent server (one
> process per connection) will wait for accept to return a new fd for a
> new connection, then it will fork/exec to make a new process to handle
> that connection.
> OK so data comes into that fd for a while until the client gets a broken
> connection. At that point the server socket will wait for hours,
> literally indefinitely for more data on that fd. So now you have to put
> a timer on each fd/process so you can detect when no data has been
> received for whatever timeout period that you decide to use (what do you
> use for the timeout period?). Keep in mind that 2-3 minute period can
> generate 10-12 broken connections. So depending on which server design
> used there will either 10-12 unused fd's that have no client or 10-12
> processes that are there listening with no client to give them data.
> Also know that there are possible 10 to 12 mobile systems doing that and
> now you have the possibility of 100 to 150 unused resources that need to
> be cleaned up and that each process has a maximum number of fd's and a
> maximum number of processes that can run. So, what if they bought
> another company with a similar number of trucks or another company 10
> times larger bought them 'cause of the way that they do "real time"
> testing? Instant problem!
> This whole discussion is based on that when I get a broken connection
> when the client sends some data that there is no way to tell the socket
> that it can try again. If someone knows how to do that and can point me
> to docs then I will be glad of the info. In my reading of Stevens I
> didn't see how to recover from a broken connection.
> Using UDP just side steps these issues. You put the responsibility for
> the communication on the client. The client is the one that detects
> when the packet has not been sent by putting a sequence/timestamp in
> each packet and comparing that tuple to each packet that is returned.
> If the seq/ts does not match the one you are looking for then dump it.
> When it does match you can process the packet and transmit the next one
> (there are more efficient ways of handling multiple packets, but I'm
> keeping it simple).
> The UDP server can be MUCH simpler by handling each packet as a self
> contained entity. It gets a packet from the client, processes that
> packet, then uses the source addresss as the destination for the return
> packet. No fork, no exec, no accept/new fd, no cleanup. With UDP you
> can have one process that deals with one packet at a time. What you
> have to ensure is that the server does not get busy enough that clients
> will get thier response before the end of the retransmit delay.
It sounds like you are between a rock and a hard place, since tracking
down UDP packet loss is not fun. Personally, if I was asked to support
this environment, I would push to replace Layers 1&2 with something more
robust. I am assuming that this is not an option. I am also going to
guess that GPRS and TDMA (cell data networks) are not viable options? I
do not envy the work you have in front of you. You really were not
being overly dramatic with the subject line.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
More information about the clug-talk