You are not logged in.
I'm having an odd problem with a largish FTP download. Using wget, python ftplib and lftp, when I download this particular file (~241 MB of text/CSV information), the entire file will download but then it stops and has to be manually killed. This same file downloads perfectly from another Arch machine in a different location. I ran pdb on the python download and it hung here: (if that means anything to someone)
ipdb>
> /usr/lib/python2.7/socket.py(447)readline()
446 try:
--> 447 data = self._sock.recv(self._rbufsize)
448 except error, e:
The offending machine is on Comcast for an ISP, and is hooked up to the internet via a consumer router and a switch. Even more confusing, a smaller file (~ 6 MB) from the same location (unfortunately I can't give out the url for anyone to test, as it's work-related) downloads just fine.
Anyone have any ideas about where to begin with troubleshooting? I'm about ready to just stick that download in a separate thread and kill the thread when it looks like the download is complete!
Thanks!
Scott
edit: I was slightly mistaken...the file doesn't get completely downloaded. It's missing the last line completely plus about the last 1032 characters of the previous line.
Last edited by firecat53 (2012-01-30 03:34:28)
Offline
Heh. This sounds extremely familiar:
http://projects.archlinux.org/pacman.gi … 4f146f232b
Try using curl, which enables keepalives by default.
Offline
Thanks falconindy! Well...it sort of works:
curl -O ftp://user:pw@url/itemx3.out
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 240M 100 240M 0 0 314k 0 0:13:04 0:13:04 --:--:-- 0^[
curl: (28) FTP response timeout
After the timeout, it appears that the file is all there...which is a good thing!! But it's still timing out and not finishing normally.
I also tried (per a stackoverflow question):
import socket
from ftplib import FTP as ftp
ftps = ftp(URL, USER, PW)
ftps.sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
ftps.retrbinary("RETR {}{}".format(FTP_PATH, FILENAME, open(SAVE_PATH, 'wb').write)
But this didn't even timeout to give me the entire file...it just hung and cutoff the last line and a half or so, like before.
I guess for now I'll try the curl method with subprocess.Popen and ignore the timeout.
Any other ideas to try and lose the timeout/hang?
Thanks!
Scott
Offline
Well, you'll want to set the two additional tuning knobs that Linux provides -- TCP_KEEPINTVL and TCP_KEEPIDLE. curl lets you do this with the --keepalive-time=N flag (the value 'N' is applied to both KEEPINTVL and KEEPIDLE) and alters the way-too-long default values. I can't find it in documentation but I think KEEPIDLE defaults to 7200 seconds meaning that keep alives aren't sent for 2 hours. Try something like 60 and see if that makes curl happier.
Last edited by falconindy (2012-01-28 04:20:45)
Offline
That killed the curl timeout, but why would I be getting this warning from inside a fully updated Arch system (tested on 2 different systems...x86_64 and i686)??
$ curl -O --keepalive-time 60 ftp://user:pw@url/itemx3.out
Warning: Keep-alive functionality somewhat crippled due to missing support in
Warning: your operating system!
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Warning: Keep-alive functionality somewhat crippled due to missing support in
Warning: your operating system!
100 240M 100 240M 0 0 423k 0 0:09:41 0:09:41 --:--:-- 374k
$
Is that normal?
Thanks!
Scott
Hmmm...now I've gotta see if I can set that same keepalive from within python somehow.
Edit: I see that the SO_KEEPALIVE socket setting can be set using 'setsockopt', but I don't see anything about adjusting the keepalive interval. Am I missing it in the the setsockopt manpage?
Last edited by firecat53 (2012-01-28 05:47:29)
Offline
Huh. That warning isn't right at all. Looks like a regression from when the code for the CLI tool was split up into a bunch of different files. The netinet/tcp.h header isn't included and those constants aren't defined. Incidentally, my patch to pacman sparked me to file a pair of patches with curl to move control of tcp keepalives to the library side (it's currently done via a socket callback in the front end tool). Those will be merged, which means this regression will be silently overlooked. I'll backport the fix for us.
The idle and intvl options aren't in setsockopt(3P) because they're not POSIX options. see tcp(7). Don't know offhand if python will have these options available.
late update: curl 7.24.0-2 in testing properly sets the keepalive knobs.
Last edited by falconindy (2012-01-28 15:36:09)
Offline
I tested your fix in 7.24.0-2 and it works fine...no error messages. I've updated my code to use curl for now.
I also tried this in my python code with ftplib:
import socket
ftps=ftplib.FTP(url, user, password)
ftps.sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
ftps.sock.setsockopt(socket.SOL_SOCKET, socket.TCP.KEEPINTVL, 60)
ftps.retrbinary(......
but it still didn't work (hangs right before the download completes). Did I set that TCP.KEEPINTVL correctly? It seems like if that's the issue and the --keepalive-time fixes the problem using curl, that a similar fix should work with python. I'm running at the edge of my networking knowledge here...
Thanks!
Scott
Last edited by firecat53 (2012-01-30 01:08:22)
Offline
Offline
Awesome, I got it working with python...figured out I needed the levels set a little differently for the TCP variables + needed the TCP_KEEPIDLE variable. Like:
import socket
ftps.sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
ftps.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP.KEEPINTVL, 75)
ftps.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP.KEEPIDLE, 60)
Thanks so much for sticking with me, falconindy!
Scott
Offline