You are not logged in.

#1 2009-01-26 04:11:24

Tg
Member
Registered: 2008-04-23
Posts: 35

[SOLVED] wget problem with ftp ( and symlink )

Hi,

I tried to download some file recursively from ftp, but It seems that wget can't figure a file under symlinks folder by itself.

At first I use

wget -r -N ftp://ftp.ncbi.nih.gov/snp/database/organism_schema

(All folders in organism_schema are symlink one, and I want file underneath them)

but this download only symlink folder but not a file underneath it. I tried --retrv-symlink but it gave me an error

preecha@preecha-laptop:~/dbsnp$ wget -r -N --retr-symlink ftp://ftp.ncbi.nih.gov/snp/database/organism_data | tee error.txt
--2009-01-26 11:07:52--  ftp://ftp.ncbi.nih.gov/snp/database/organism_data
           => `ftp.ncbi.nih.gov/snp/database/.listing'
Resolving ftp.ncbi.nih.gov... 130.14.29.30
Connecting to ftp.ncbi.nih.gov|130.14.29.30|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD /snp/database ... done.
==> PASV ... done.    ==> LIST ... done.

    [ <=>                                   ] 843         5.24K/s   in 0.2s    

2009-01-26 11:08:02 (5.24 KB/s) - `ftp.ncbi.nih.gov/snp/database/.listing' saved [843]

Removed `ftp.ncbi.nih.gov/snp/database/.listing'.
--2009-01-26 11:08:02--  ftp://ftp.ncbi.nih.gov/snp/database/organism_data/organism_data
           => `ftp.ncbi.nih.gov/snp/database/organism_data/.listing'
==> CWD /snp/database/organism_data ... done.
==> PASV ... done.    ==> LIST ... done.

    [    <=>                                ] 4,955       2.14K/s   in 2.3s    

2009-01-26 11:08:06 (2.14 KB/s) - `ftp.ncbi.nih.gov/snp/database/organism_data/.listing' saved [4955]

Removed `ftp.ncbi.nih.gov/snp/database/organism_data/.listing'.
--2009-01-26 11:08:06--  ftp://ftp.ncbi.nih.gov/snp/database/organism_data/arabidopsis_3702
           => `ftp.ncbi.nih.gov/snp/database/organism_data/arabidopsis_3702'
==> CWD not required.
==> PASV ... done.    ==> RETR arabidopsis_3702 ... 
No such file `arabidopsis_3702'.

--2009-01-26 11:08:07--  ftp://ftp.ncbi.nih.gov/snp/database/organism_data/bee_7460
           => `ftp.ncbi.nih.gov/snp/database/organism_data/bee_7460'
==> CWD not required.
==> PASV ... done.    ==> RETR bee_7460 ... 
No such file `bee_7460'.

--2009-01-26 11:08:08--  ftp://ftp.ncbi.nih.gov/snp/database/organism_data/bison_9901
           => `ftp.ncbi.nih.gov/snp/database/organism_data/bison_9901'
==> CWD not required.
==> PASV ... done.    ==> RETR bison_9901 ... 
No such file `bison_9901'.

--2009-01-26 11:08:10--  ftp://ftp.ncbi.nih.gov/snp/database/organism_data/blackbird_39638
           => `ftp.ncbi.nih.gov/snp/database/organism_data/blackbird_39638'
==> CWD not required.
==> PASV ... done.    ==> RETR blackbird_39638 ... 
No such file `blackbird_39638'.

--2009-01-26 11:08:11--  ftp://ftp.ncbi.nih.gov/snp/database/organism_data/bonobo_9597
           => `ftp.ncbi.nih.gov/snp/database/organism_data/bonobo_9597'
==> CWD not required.
==> PASV ... done.    ==> RETR bonobo_9597 ... 
No such file `bonobo_9597'.

Did I do something wrong here ? Any suggestion would help a lot. Thanks !

Last edited by Tg (2009-01-27 04:00:46)

Offline

#2 2009-01-26 05:34:41

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: [SOLVED] wget problem with ftp ( and symlink )

wget man page wrote:

When --retr-symlinks is specified, however, symbolic links are
       traversed and the pointed-to files are retrieved.  At this time,
       this option does not cause Wget to traverse symlinks to directories
       and recurse through them, but in the future it should be enhanced
       to do this.

I think I've found a good workaround:

#!/bin/bash
URL=$1
DIRPATH=$(echo "$URL" | sed s-ftp://--)
BASEURL=$(echo "$DIRPATH" | cut -d '/' -f1)
wget -r -N $URL
SYMLINKS=$(find $DIRPATH -type l)
for SYMLINK in $SYMLINKS
do
    TARGET=$(readlink $SYMLINK)
    if [ "${TARGET:0:1}" == "/" ]
    then
        URI="ftp://$BASEURL/$(readlink $SYMLINK)"
    else
        URI="ftp://$DIRPATH/$(readlink $SYMLINK)"
    fi
    echo "retrieving $URI"
    wget -r -N $URI
    # ./wget_symlinks $URI
done

You can save that as "wget_symlinks", then

chmod 744 wget_symlinks
./wget_symlinks ftp://ftp.ncbi.nih.gov/snp/database/organism_schema

I didn't download everything so I don't know if there are further symlinks. If there are, comment out the line "wget -r -N $URI" and uncomment "# ./wget_symlinks $URI". It should download everything recursively and then exit.

Last edited by Xyne (2009-01-26 05:36:10)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#3 2009-01-26 07:23:48

Tg
Member
Registered: 2008-04-23
Posts: 35

Re: [SOLVED] wget problem with ftp ( and symlink )

Xyne wrote:
wget man page wrote:

When --retr-symlinks is specified, however, symbolic links are
       traversed and the pointed-to files are retrieved.  At this time,
       this option does not cause Wget to traverse symlinks to directories
       and recurse through them, but in the future it should be enhanced
       to do this.

I think I've found a good workaround:

#!/bin/bash
URL=$1
DIRPATH=$(echo "$URL" | sed s-ftp://--)
BASEURL=$(echo "$DIRPATH" | cut -d '/' -f1)
wget -r -N $URL
SYMLINKS=$(find $DIRPATH -type l)
for SYMLINK in $SYMLINKS
do
    TARGET=$(readlink $SYMLINK)
    if [ "${TARGET:0:1}" == "/" ]
    then
        URI="ftp://$BASEURL/$(readlink $SYMLINK)"
    else
        URI="ftp://$DIRPATH/$(readlink $SYMLINK)"
    fi
    echo "retrieving $URI"
    wget -r -N $URI
    # ./wget_symlinks $URI
done

You can save that as "wget_symlinks", then

chmod 744 wget_symlinks
./wget_symlinks ftp://ftp.ncbi.nih.gov/snp/database/organism_schema

I didn't download everything so I don't know if there are further symlinks. If there are, comment out the line "wget -r -N $URI" and uncomment "# ./wget_symlinks $URI". It should download everything recursively and then exit.

I never though of using bash script  (never wrote bash more than 5 line). I guess I'll try that, thanks alot.

Last edited by Tg (2009-01-26 07:27:56)

Offline

#4 2009-01-26 17:31:50

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: [SOLVED] wget problem with ftp ( and symlink )

Np, I enjoy having little excuses to learn more about bash scripting. wink

If the script works as expected, please edit your original post and add [SOLVED] to the beginning of subject line.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#5 2016-02-10 05:20:43

truthling
Member
Registered: 2016-02-10
Posts: 1

Re: [SOLVED] wget problem with ftp ( and symlink )

It seems like this might be addressing a problem along the same lines that I am trying to solve, but I tried running the script and it didn't work for me.

At ftp://ftp.ncbi.nlm.nih.gov/genomes/genb … _versions/  there are several directories, all prefixed with "GCA".  The files I need are found by following these links, which are not being recognized as directories, perhaps due to a bug on the server that displays symbolic links to directories incorrectly.  I believe that if these links were being treated as directories, the following command would do what I need it to do, which is to go into those directories to get all .fna.gz files.

    wget -nd -rl 0 -A *.fna.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/genb … ocomialis/

So I am wondering if there is a work around.  Is there a way to tell `wget` to download files by a regex that matches complete urls?  I tried

    wget -nd -rl 0 --accept-regex ".*\/GCA.*\/.*\.fna\.gz$" ftp://ftp.ncbi.nlm.nih.gov/genomes/genb … _versions/

Which does indeed match, for example ftp://ftp.ncbi.nlm.nih.gov/genomes/genb … mic.fna.gz

As per https://regex101.com/r/pY0bI8/1

But I'm not still not getting the files I need.

Ultimately, all I want is .fna.gz files located at ftp://ftp.ncbi.nlm.nih.gov/genomes/genb … sions/GCA*


Any suggestions about how I can modify Xyne's script to work for me?

Offline

#6 2016-02-10 06:28:22

x33a
Forum Fellow
Registered: 2009-08-15
Posts: 4,587

Re: [SOLVED] wget problem with ftp ( and symlink )

@truthling,

Please open a new thread and link to this one. Also, use code tags for posting your commands.

https://wiki.archlinux.org/index.php/Fo … bumping.22

Closing.

Offline

Board footer

Powered by FluxBB