You are not logged in.

#1 2011-08-17 06:36:43

graysky
Member
From: The worse toilet in Scotland
Registered: 2008-12-01
Posts: 8,824
Website

python folks - can someone fix cacheclean in the wiki?

https://wiki.archlinux.org/index.php/CacheClean

The current version of cacheclean throws a non-critical error, can someone with python mojo have a look and edit to fix?

# cacheclean 3
File 'pulseaudio-alsa-1-2-any.pkg.tar.xz' doesn't match package pattern!

Last edited by graysky (2011-08-17 06:37:01)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#2 2011-08-17 09:00:19

steabert
Member
Registered: 2011-04-18
Posts: 78

Re: python folks - can someone fix cacheclean in the wiki?

It's the regex that doesn't match because of the single digit version number.

^(.+)-\d[^-]+-.+?(-i686|-x86_64|-any)?\.pkg\.tar\.(gz|bz2|xz)(\.aria2)?$

I don't really have any regex mojo, nor do I know the details about allowed package naming, but the problem doesn't seem very easy to solve, as package names may be alphanumeric, so they can contain numbers.  I would use something like:

^((?:[a-z][\w-]*)+)\d[^-]*-\d+(-i686|-x86_64|-any)?\.pkg\.tar\.(gz|bz2|xz)(\.aria2)?$

allowing only parts of package names that start with a lower-case letter.

For more general version designations that can start with non-digit characters:

^((?:[a-z][\w-]*)+)-[^-]+-\d+(-i686|-x86_64|-any)?\.pkg\.tar\.(gz|bz2|xz)(\.aria2)?$

I don't want to update the wiki page, as I haven't thoroughly tested the latter regex with the actual python program.

Last edited by steabert (2011-08-17 10:08:21)

Offline

#3 2011-08-17 09:11:45

dodo3773
Member
Registered: 2011-03-17
Posts: 801

Re: python folks - can someone fix cacheclean in the wiki?

Accidental post. Please ignore.

Last edited by dodo3773 (2011-08-17 09:12:59)

Offline

#4 2011-08-17 14:22:41

kachelaqa
Member
Registered: 2010-09-26
Posts: 215

Re: python folks - can someone fix cacheclean in the wiki?

the following python regexp is based on how the makepkg script creates packages:

bpat = re.compile("""
    ^([^-/][^/]*)-          # (1) package name
    [^-/\s]+-               # (2) epoch:version
    [^-/\s]+                # (3) release
    (-i686|-x86_64|-any)    # (4) architecture
    \.pkg\.tar              # (5) extension
    (?:\.(gz|bz2|xz|Z))?    # (6) compresssion extension
    (\.aria2)?$             # (7) other extension
""", re.X)

it has been successfully tested against all the package names/versions in the core/extra/community repos and the aur.

(nb: i also used it in the cacheclean script in preview mode and it produced no errors).

Offline

#5 2011-08-17 15:03:57

graysky
Member
From: The worse toilet in Scotland
Registered: 2008-12-01
Posts: 8,824
Website

Re: python folks - can someone fix cacheclean in the wiki?

@kach - thanks!


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#6 2011-08-17 15:10:00

steabert
Member
Registered: 2011-04-18
Posts: 78

Re: python folks - can someone fix cacheclean in the wiki?

kachelaqa wrote:

the following python regexp is based on how the makepkg script creates packages:

bpat = re.compile("""
    ^([^-/][^/]*)-          # (1) package name
    [^-/\s]+-               # (2) epoch:version
    [^-/\s]+                # (3) release
    (-i686|-x86_64|-any)    # (4) architecture
    \.pkg\.tar              # (5) extension
    (?:\.(gz|bz2|xz|Z))?    # (6) compresssion extension
    (\.aria2)?$             # (7) other extension
""", re.X)

it has been successfully tested against all the package names/versions in the core/extra/community repos and the aur.

(nb: i also used it in the cacheclean script in preview mode and it produced no errors).

I think makepkg only does a basic sanity check, so this regex will e.g. match "pkg%  ^&*(name-1-2-i686.pkg.tar.gz".  I don't know if this is a problem or not?  Based on the AUR guidelines for package names, this would become:

bpat = re.compile("""
    ^((?:\w[\w-]*)+)-     # (1) package name (alphanumeric, hyphen allowed inside)
    [\w\.]+-              # (2) epoch:version (numbers, letters, and a dot is allowed)
    \d+                   # (3) release (only a number?)
    (-i686|-x86_64|-any)  # (4) architecture
    \.pkg\.tar            # (5) extension
    (?:\.(gz|bz2|xz|Z))?  # (6) compresssion extension
    (\.aria2)?$           # (7) other extension
""", re.X)

Offline

#7 2011-08-17 16:04:25

kachelaqa
Member
Registered: 2010-09-26
Posts: 215

Re: python folks - can someone fix cacheclean in the wiki?

steabert wrote:

I think makepkg only does a basic sanity check, so this regex will e.g. match "pkg%  ^&*(name-1-2-i686.pkg.tar.gz".

afaik, the only requirement for package names is that they are unix-friendly (i.e. no forward-slashes) and that they don't start with a hyphen.

I don't know if this is a problem or not?

well, i've successfully tested the regexp against all name-version strings in the official and aur repos. but if you can somehow come up with any real-world counter-examples that cause a genuine problem, by all means post them here and i'm sure the regexp can be easily amended.

Based on the AUR guidelines for package names, this would become:

bpat = re.compile("""
    ^((?:\w[\w-]*)+)-     # (1) package name (alphanumeric, hyphen allowed inside)
    [\w\.]+-              # (2) epoch:version (numbers, letters, and a dot is allowed)
    \d+                   # (3) release (only a number?)
    (-i686|-x86_64|-any)  # (4) architecture
    \.pkg\.tar            # (5) extension
    (?:\.(gz|bz2|xz|Z))?  # (6) compresssion extension
    (\.aria2)?$           # (7) other extension
""", re.X)

pacman packages are built from pkgbuilds using makepkg. if you used the above regexp to test a name-version it could fail for several reasons.

i suggest you have a look at the pkgbuild manpage and the makepkg script to see why.

Offline

#8 2011-08-17 16:35:18

falconindy
Developer
From: New York, USA
Registered: 2009-10-22
Posts: 4,097
Website

Re: python folks - can someone fix cacheclean in the wiki?

Why validate what makepkg already has validated and sweat over version conformance? pacman packages are always going to have a .pkg.tar suffix, followed possibly by a compression suffix.

paccache uses:
1) a shell glob to find files: *.pkg.tar?(.+([^.]))
2) a simple split on dashes to trim off the $pkgver-$pkgrel.$extension and reassemble the package name and version.

Until we start changing how makepkg generates filenames (unlikely to be any time soon) this is solid.

Offline

#9 2011-08-17 17:04:56

steabert
Member
Registered: 2011-04-18
Posts: 78

Re: python folks - can someone fix cacheclean in the wiki?

@kachelaqa: I think you might have missed my point.  What I was trying to say was that it was strange to see that makepkg (which I looked at since you referred to it as a basis for your rules) does not check the rules as put here, that's all.  Of course you are right that you should use the makepkg rules, otherwise things could brake.  I just wanted to point out that those basic rules allow for strange package names.

Offline

#10 2011-08-17 17:06:56

kachelaqa
Member
Registered: 2010-09-26
Posts: 215

Re: python folks - can someone fix cacheclean in the wiki?

falconindy wrote:

Why validate what makepkg already has validated and sweat over version conformance?

that's a fair question, i suppose. my aim was simply to fix the existing regexp so that it was more compliant with what makepkg allows.

personally, if i was going to write a cache cleaner, i would do things completely differently (e.g. use alpm_pkg_load and alpm_pkg_vercmp).

Offline

#11 2011-08-17 17:18:26

kachelaqa
Member
Registered: 2010-09-26
Posts: 215

Re: python folks - can someone fix cacheclean in the wiki?

steabert wrote:

@kachelaqa: I think you might have missed my point.

quite possibly smile

What I was trying to say was that it was strange to see that makepkg (which I looked at since you referred to it as a basis for your rules) does not check the rules as put here, that's all.  Of course you are right that you should use the makepkg rules, otherwise things could brake.  I just wanted to point out that those basic rules allow for strange package names.

exactly: there may be a big difference between what makepkg allows and what the guidelines recommend. and i'm sure there must be many packages in the aur that don't follow the guidelines in one way or another. so i think it's better to be more permissive when checking candidate package names.

Offline

#12 2011-08-17 17:38:42

falconindy
Developer
From: New York, USA
Registered: 2009-10-22
Posts: 4,097
Website

Re: python folks - can someone fix cacheclean in the wiki?

kachelaqa wrote:
falconindy wrote:

Why validate what makepkg already has validated and sweat over version conformance?

that's a fair question, i suppose. my aim was simply to fix the existing regexp so that it was more compliant with what makepkg allows.

personally, if i was going to write a cache cleaner, i would do things completely differently (e.g. use alpm_pkg_load and alpm_pkg_vercmp).

It's been done, and I don't see the benefit for the pain involved compared to writing it in just-as-portable shell/awk.

I wrote paccache only later realizing that I was depending on GNU sort's -V flag (unportable) which crappily reimplements our vercmp function (but works 9 times out of 10). Of course, the rational and sane thing to do was.... write a new sort util that actually implements alpm_pkg_vercmp.

Last edited by falconindy (2011-08-17 17:40:37)

Offline

#13 2011-08-17 18:09:10

kachelaqa
Member
Registered: 2010-09-26
Posts: 215

Re: python folks - can someone fix cacheclean in the wiki?

falconindy wrote:
kachelaqa wrote:

personally, if i was going to write a cache cleaner, i would do things completely differently (e.g. use alpm_pkg_load and alpm_pkg_vercmp).

It's been done, and I don't see the benefit for the pain involved compared to writing it in just-as-portable shell/awk.

yeah: having thought about it a bit more, i think using alpm_pkg_load would probably be a lot slower than a script that only extracted the name and version from the filename.

Offline

#14 2011-08-27 19:57:45

steabert
Member
Registered: 2011-04-18
Posts: 78

Re: python folks - can someone fix cacheclean in the wiki?

kachelaqua wrote:

exactly: there may be a big difference between what makepkg allows and what the guidelines recommend.

Yes, I get that now, just that I read the guidelines 10 times in the hopes not to offend anyone: "Contributed PKGBUILDs must conform to the Arch Packaging Standards otherwise they will be deleted".  Maybe I'm too much of a chicken...

falconindy wrote:
kachelaqua wrote:

personally, if i was going to write a cache cleaner, i would do things completely differently (e.g. use alpm_pkg_load and alpm_pkg_vercmp).

It's been done, and I don't see the benefit for the pain involved compared to writing it in just-as-portable shell/awk.

I wrote one (pacclean), as a mini-exercise in python to see how I would do it, so I started from scratch, only using your regex.  I use the regular sort, with a way to cross-check with timestamps, I still have to look at alpm_vercmp to make it behave properly.  It also removes non-local packages left in the cache.

EDIT: of course I overlooked pyalpm which seems to work with python3, fixed and sort uses now vercmp from alpm.

Last edited by steabert (2011-08-28 09:05:19)

Offline

#15 2011-08-28 19:07:27

kachelaqa
Member
Registered: 2010-09-26
Posts: 215

Re: python folks - can someone fix cacheclean in the wiki?

steabert wrote:
kachelaqua wrote:

exactly: there may be a big difference between what makepkg allows and what the guidelines recommend.

Yes, I get that now, just that I read the guidelines 10 times in the hopes not to offend anyone: "Contributed PKGBUILDs must conform to the Arch Packaging Standards otherwise they will be deleted".  Maybe I'm too much of a chicken...

I think you should take those guidelines with a large pinch of salt. Looking at the "Package Naming" section, for instance, it states:

    Package names should consist of alphanumeric characters only; all letters should be lowercase.

Which is obviously false, because there are thousands of packages (both official and unofficial) with names that contain hyphens, and quite a few that contain pluses (e.g. timidity++, crypto++, etc). I even found a few official packages that have ampersands in their names (e.g. extra/koffice-l10n-ca@valencia).

In the same section, it also states:

    Version tags may not include hyphens! Letters, numbers, and periods only.

Which, again, is obviously false. There are many packages (both official and unofficial) with versions that contain underscores, colons (to indicate the epoch), and tildes (e.g. extra/foomatic-db, community/lash, etc).

I wrote one (pacclean), as a mini-exercise in python to see how I would do it, so I started from scratch, only using your regex.  I use the regular sort, with a way to cross-check with timestamps, I still have to look at alpm_vercmp to make it behave properly.  It also removes non-local packages left in the cache.

EDIT: of course I overlooked pyalpm which seems to work with python3, fixed and sort uses now vercmp from alpm.

You were very wise to switch to vercmp. Correctly comparing/sorting versions can be quite tricky wink

Offline

#16 2011-08-28 20:17:16

steabert
Member
Registered: 2011-04-18
Posts: 78

Re: python folks - can someone fix cacheclean in the wiki?

kachelaqa wrote:

I think you should take those guidelines with a large pinch of salt.

Seems so, I'll keep it in mind.

kachelaqa wrote:

You were very wise to switch to vercmp. Correctly comparing/sorting versions can be quite tricky wink

Yes, wise, ahum smile  After going through how vercmp handles versions, my hopes of having my naive solution actually work completely vanished...  But at least I got a working and simple cache cleaner in the end, horay!

Offline

Board footer

Powered by FluxBB