[clug-talk] nasty problem

Robert Lewko lewkor at gmail.com
Thu May 18 06:48:43 PDT 2006

On 5/18/06, Gustin Johnson <gustin at echostar.ca> wrote:
> Hash: SHA1
> Why are you using UDP instead of TCP?  Since your app is going over
> satellite, and latency is not as much of an issue as reliability, TCP
> seems better suited.  If a UDP packet is lost on the wire, there is no
> mechanism to help you isolate the problem.  It is possible that your app
> is being filtered.  You could try to change the port, but I would
> seriously consider TCP.  Keep in mind I am a network admin and not a
> developer, you may have a valid reason for using UDP, it is just S.E.P.
> from my point of view.

The reason that I used UDP is that no fork/exec/new socket is used.  First
you have to understand its not the latency that makes satellite
communication hard - although without altering any socket options TCP will
detect the latency and start to back off.

What happens is that you have two things to worry about.  On the client end
you have to worry about losing the connection and reconnecting - ie. one
satellite passes out of LOS (line of site)  and it can be up to 20-25
minutes before the next one is in site but the average is 10 minutes without
service.  When the next satellite you have a 2-3 minute period when you may
have very sporadic network availablility.  You may have a 6 second period
with network availability, just enough time to dial and get a connection
without getting data through.  Once the next satellite gets in sight you can
have 90 minutes to 2 hours with good service.

So lets consider what it would look like if we used TCP.  This application
gets a file in a directory (not my design in this part) every 5 minutes.  It
parses the file, puts it in packets then sends it to the server.  So it does
a write to a TCP socket and gets an error: "No route to host".  So it closes
the socket and calls the windoze shit that dials the network through the
modem.  Great! now you have a connection.  OK you are in one of those 6
second spots of connectivity at the start of getting a new satellite.  So
you start to send a 4k packet at 9600 baud.  Do you see a problem?  You
won't get your packet sent before the network goes down.  Remember the 3 way
handshake that TCP needs to use before a connection is made.  Well that uses
up about 3 seconds right there.  Using UDP you can actually get 2 2k packets
through with their ack returned in 6 seconds.  BTW I have restricted myself
to a 2k packet size in my program.

What's happening on the server?  There are two things you could do:
construct a single threaded server or one that uses fork/exec.  They each
exhibit a different form of the same problem.  The server accepts a new
connection. The accept system call receives a new connection on the
listening port, dups the connection on a free port and assigns a new fd for
that new connection.

The single threaded server will use fd's and the fork/exec server will use
slots in the process table.  To make that clearer the single threaded server
will get activity that indicates a new connection, call accept to get a new
fd to communicate with the new connection and manage that new connection in
the next select call.  The concurrent server (one process per connection)
will wait for accept to return a new fd for a new connection, then it will
fork/exec to make a new process to handle that connection.

OK so data comes into that fd for a while until the client gets a broken
connection.  At that point the server socket will wait for hours, literally
indefinitely for more data on that fd.  So now you have to put a timer on
each fd/process so you can detect when no data has been received for
whatever timeout period that you decide to use (what do you use for the
timeout period?).  Keep in mind that 2-3 minute period can generate 10-12
broken connections.  So depending on which server design used there will
either 10-12 unused fd's that have no client or 10-12 processes that are
there listening with no client to give them data.  Also know that there are
possible 10 to 12 mobile systems doing that and now you have the possibility
of 100 to 150 unused resources that need to be cleaned up and that each
process has a maximum number of fd's and a maximum number of processes that
can run.  So, what if they bought another company with a similar number of
trucks or another company 10 times larger bought them 'cause of the way that
they do "real time" testing?  Instant problem!

This whole discussion is based on that when I get a broken connection when
the client sends some data that there is no way to tell the socket that it
can try again.  If someone knows how to do that and can point me to docs
then I will be glad of the info.  In my reading of Stevens I didn't see how
to recover from a broken connection.

Using UDP just side steps these issues.  You put the responsibility for the
communication on the client.  The client is the one that detects when the
packet has not been sent by putting a sequence/timestamp in each packet and
comparing that tuple to each packet that is returned.  If the seq/ts does
not match the one you are looking for then dump it.  When it does match you
can process the packet and transmit the next one (there are more efficient
ways of handling multiple packets, but I'm keeping it simple).

The UDP server can be MUCH simpler by handling each packet as a self
contained entity.  It gets a packet from the client, processes that packet,
then uses the source addresss as the destination for the return packet.  No
fork, no exec, no accept/new fd, no cleanup.  With UDP you can have one
process that deals with one packet at a time.  What you have to ensure is
that the server does not get busy enough that clients will get thier
response before the end of the retransmit delay.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://clug.ca/pipermail/clug-talk_clug.ca/attachments/20060518/d679bf6b/attachment-0001.htm

More information about the clug-talk mailing list