[clug-talk] nasty problem

Gustin Johnson gustin at echostar.ca
Thu May 18 16:23:31 PDT 2006

Hash: SHA1

Robert Lewko wrote:
> On 5/18/06, *Gustin Johnson* <gustin at echostar.ca
> <mailto:gustin at echostar.ca>> wrote:
>     Hash: SHA1
>     Why are you using UDP instead of TCP?  Since your app is going over
>     satellite, and latency is not as much of an issue as reliability, TCP
>     seems better suited.  If a UDP packet is lost on the wire, there is no
>     mechanism to help you isolate the problem.  It is possible that your app
>     is being filtered.  You could try to change the port, but I would
>     seriously consider TCP.  Keep in mind I am a network admin and not a
>     developer, you may have a valid reason for using UDP, it is just S.E.P.
>     from my point of view.
> The reason that I used UDP is that no fork/exec/new socket is used. 
> First you have to understand its not the latency that makes satellite
> communication hard - although without altering any socket options TCP
> will detect the latency and start to back off.
> What happens is that you have two things to worry about.  On the client
> end you have to worry about losing the connection and reconnecting - ie.
> one satellite passes out of LOS (line of site)  and it can be up to
> 20-25 minutes before the next one is in site but the average is 10
> minutes without service.  When the next satellite you have a 2-3 minute
> period when you may have very sporadic network availablility.  You may
> have a 6 second period with network availability, just enough time to
> dial and get a connection without getting data through.  Once the next
> satellite gets in sight you can have 90 minutes to 2 hours with good
> service.

This almost sounds like the old irridium satellite phone, where the
actual satellites flew around in a LEO, with no handoff between the

> So lets consider what it would look like if we used TCP.  This
> application gets a file in a directory (not my design in this part)
> every 5 minutes.  It parses the file, puts it in packets then sends it
> to the server.  So it does a write to a TCP socket and gets an error:
> "No route to host".  So it closes the socket and calls the windoze shit
> that dials the network through the modem.  Great! now you have a
> connection.  OK you are in one of those 6 second spots of connectivity
> at the start of getting a new satellite.  So you start to send a 4k
> packet at 9600 baud.  Do you see a problem?  You won't get your packet
> sent before the network goes down.  Remember the 3 way handshake that
> TCP needs to use before a connection is made.  Well that uses up about 3
> seconds right there.  Using UDP you can actually get 2 2k packets
> through with their ack returned in 6 seconds.  BTW I have restricted
> myself to a 2k packet size in my program.
> What's happening on the server?  There are two things you could do:
> construct a single threaded server or one that uses fork/exec.  They
> each exhibit a different form of the same problem.  The server accepts a
> new connection. The accept system call receives a new connection on the
> listening port, dups the connection on a free port and assigns a new fd
> for that new connection.
> The single threaded server will use fd's and the fork/exec server will
> use slots in the process table.  To make that clearer the single
> threaded server will get activity that indicates a new connection, call
> accept to get a new fd to communicate with the new connection and manage
> that new connection in the next select call.  The concurrent server (one
> process per connection) will wait for accept to return a new fd for a
> new connection, then it will fork/exec to make a new process to handle
> that connection.
> OK so data comes into that fd for a while until the client gets a broken
> connection.  At that point the server socket will wait for hours,
> literally indefinitely for more data on that fd.  So now you have to put
> a timer on each fd/process so you can detect when no data has been
> received for whatever timeout period that you decide to use (what do you
> use for the timeout period?).  Keep in mind that 2-3 minute period can
> generate 10-12 broken connections.  So depending on which server design
> used there will either 10-12 unused fd's that have no client or 10-12
> processes that are there listening with no client to give them data. 
> Also know that there are possible 10 to 12 mobile systems doing that and
> now you have the possibility of 100 to 150 unused resources that need to
> be cleaned up and that each process has a maximum number of fd's and a
> maximum number of processes that can run.  So, what if they bought
> another company with a similar number of trucks or another company 10
> times larger bought them 'cause of the way that they do "real time"
> testing?  Instant problem!
> This whole discussion is based on that when I get a broken connection
> when the client sends some data that there is no way to tell the socket
> that it can try again.  If someone knows how to do that and can point me
> to docs then I will be glad of the info.  In my reading of Stevens I
> didn't see how to recover from a broken connection.
> Using UDP just side steps these issues.  You put the responsibility for
> the communication on the client.  The client is the one that detects
> when the packet has not been sent by putting a sequence/timestamp in
> each packet and comparing that tuple to each packet that is returned. 
> If the seq/ts does not match the one you are looking for then dump it. 
> When it does match you can process the packet and transmit the next one
> (there are more efficient ways of handling multiple packets, but I'm
> keeping it simple).
> The UDP server can be MUCH simpler by handling each packet as a self
> contained entity.  It gets a packet from the client, processes that
> packet, then uses the source addresss as the destination for the return
> packet.  No fork, no exec, no accept/new fd, no cleanup.  With UDP you
> can have one process that deals with one packet at a time.  What you
> have to ensure is that the server does not get busy enough that clients
> will get thier response before the end of the retransmit delay.

It sounds like you are between a rock and a hard place, since tracking
down UDP packet loss is not fun.  Personally, if I was asked to support
this environment, I would push to replace Layers 1&2 with something more
robust.  I am assuming that this is not an option.  I am also going to
guess that GPRS and TDMA (cell data networks) are not viable options?  I
do not envy the work you have in front of you.  You really were not
being overly dramatic with the subject line.
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the clug-talk mailing list