You are not logged in.

#1 2010-07-25 18:16:51

tindzk
Member
Registered: 2010-05-10
Posts: 25
Website

[Patch] Reduce syscalls

Looking at the strace output, pacman seems to invoke lots of unnecessary syscalls.

This patch has reduced the number of needed syscalls on my system by 70%:
http://pastebin.archlinux.fr/406557

Reading the "desc" files is still inefficient:

open("/var/lib/pacman/sync/extra/firefox-i18n-3.6.8-1/desc", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=321, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb770e000
read(4, "%FILENAME%\nfirefox-i18n-3.6.8-1-"..., 4096) = 321
read(4, "", 4096)                       = 0
close(4)                                = 0
munmap(0xb770e000, 4096)                = 0

This could be reduced to three syscalls:

open("/var/lib/pacman/sync/extra/firefox-i18n-3.6.8-1/desc", O_RDONLY) = 4
read(4, "%FILENAME%\nfirefox-i18n-3.6.8-1-"..., 4096) = 321
close(4)                                = 0

However, this would involve moving away from the glibc functions fopen(), etc. and use the syscalls directly.  What do you think?

The 4096 bytes should be large enough for all "desc" files. A simple heuristic could be included to check whether some bytes are missing: 1) all sections are covered (NAME, VERSION, DESC, URL, etc.) and 2) the buffer ends with \n\n.

When I flush the disk cache, pacman takes 1 min and 18s for a single "pacman -Syu" run! With the patch applied it takes 1 min 7s which is still very slow. I guess it would be a lot faster to read the .gz compressed databases into memory rather than using the uncompressed package tree.

Last edited by tindzk (2010-07-26 13:30:24)

Offline

#2 2010-07-26 01:11:41

falconindy
Developer
From: New York, USA
Registered: 2009-10-22
Posts: 4,111
Website

Re: [Patch] Reduce syscalls

Interesting stuff. You'll want to subscribe and post this to pacman-dev@archlinux.org if you want this to get any real attention.

Offline

#3 2010-07-26 01:29:26

flamelab
Member
From: Athens, Hellas (Greece)
Registered: 2007-12-26
Posts: 2,160

Re: [Patch] Reduce syscalls

Make a bug report (feature request) on the bugtracker, and as falconindy suggested, propose that to the pacman-dev mailing list.

Offline

#4 2010-07-26 01:56:59

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,472
Website

Re: [Patch] Reduce syscalls

tindzk wrote:

I guess it would be a lot faster to read the .gz compressed databases into memory rather than using the uncompressed package tree.

Which is currently being worked on...


Still, send your patch to the pacman-dev list and it will get reviewed by the people involved.

Offline

#5 2010-07-26 13:24:27

tindzk
Member
Registered: 2010-05-10
Posts: 25
Website

Re: [Patch] Reduce syscalls

Alright. Thanks for the responses.

Offline

#6 2010-07-26 22:38:36

Rip-Rip
Member
Registered: 2008-11-09
Posts: 32

Re: [Patch] Reduce syscalls

tindzk wrote:

The 4096 bytes should be large enough for all "desc" files. A simple heuristic could be included to check whether some bytes are missing: 1) all sections are covered (NAME, VERSION, DESC, URL, etc.) and 2) the buffer ends with \n\n.

Or you could simply compare the value returned by read with 4096...

int readed;
while ((readed = read(fd, buf, 4096)) == 4096)
        continue;

This is the best "heuristic" you could have dream of big_smile

Offline

#7 2010-07-27 00:02:00

diegonc
Member
Registered: 2008-12-13
Posts: 42

Re: [Patch] Reduce syscalls

Rip-Rip wrote:
int readed =0;
while ((readed += read(fd, buf, 4096 -  readed )) == 4096)
        continue;

This is the best "heuristic" you could have dream of big_smile

Did you mean that instead? wink

EDIT: hmm.. may be not tongue

Last edited by diegonc (2010-07-27 00:08:10)

Offline

#8 2010-07-27 00:16:05

tindzk
Member
Registered: 2010-05-10
Posts: 25
Website

Re: [Patch] Reduce syscalls

Rip-Rip wrote:

Or you could simply compare the value returned by read with 4096...

int readed;
while ((readed = read(fd, buf, 4096)) == 4096)
        continue;

This is the best "heuristic" you could have dream of big_smile

Yes, but that would also imply that each file contains 4096 (or more) bytes which seems not to be the case. smile

Another potential issue is that read() may actually return less than 4096 even though the file contains >=4096 bytes. I've never experienced this behaviour before when dealing with normal disk  files but it's quite common with TCP sockets. Well, I still wouldn't want to rely on the situation of always returning the highest number of available bytes because a directory could still be mounted via network. In this case our assumption isn't guaranteed anymore.

What I thought of in my initial post was something like this:

    size_t len;
    String s = StackString(4096);

    do {
        len = File_Read(&file,
            s.buf  + s.len,
            s.size - s.len);

        s.len += len;
    } while (len > 0 && s.len < s.size);

    if (!String_EndsWith(s, String("\n\n")) {
         /* The file is either a) invalid or b) larger than
         * 4096 bytes (unlikely) and thus the final \n\n
         * is missing here.
         */
    }

Is it even true that a complete "desc" file always has to end with \n\n?

Last edited by tindzk (2010-08-01 00:27:44)

Offline

Board footer

Powered by FluxBB