You are not logged in.

#1 2009-07-30 06:19:30

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

A discussion about universally/programmatically parsable PKGBUILDs.

[edit]
I've move most of this post here to make it easier to update. I'll mention changes in posts as I make them.

As I said before, I understand that there is a lot to read but please make sure that you understand what this is about before replying. I am not suggesting that we change anything at this point. I'm just playing around with the idea of how things could be improved and made more accessible.
[/edit]




Questions
1) What advantages and disadvantages do you see with such a system?
2) Can you see any way to improve this system?

Last edited by Xyne (2009-08-02 05:37:21)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#2 2009-07-30 06:47:17

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,521
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

1)
There are some bash trick used which would need a way to replicate:

e.g. the klibc package.  This sets up a provides within the build function.  This information is only available after the package is built.

I'm sure there would be ways around these, just as they aer workarounds currently.

2)
Cons I see:

The PKGUBUILD would be much more verbose.  Currently I can look at a PKGBUILD and parse it in seconds.  There is no way I could do that for XML.

makepkg is in bash...

Those two cons are the major ones that I can see.


I wonder if an alternative would be to create a git repo (or a folder within the pacman source) with scripts designed to parse PKGBUILDs in various languages.  I had started a very basic one for my needs in python.  That way, no-one would need to reinvent the wheel and we could then improve on such scripts incrementally.  I know it is creating a bash parser and so not easy, but I highly doubt XML will catch on for PKGBUILDs...

Offline

#3 2009-07-30 07:11:52

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

I suppose it would be possible to append return values to a string variable and then check that after the build function has completed. There's no extra effort involved compared to the current practice because any change of metadata within the build function is already referring to variables outside of the build function.

While I agree that XML is usually verbose, I don't think it's too verbose in this case. It's just a matter of familiarity, but I know that many people would not approach it with an open mind and I realize that's a problem. I used XML as an example because despite such things it seems logical in this case, but I'm really just interested in a parsable structured format.

I'm not worried about makepkg either. As I said, I know that such a transition would be too much to get the devs on board, but I'm trying to imagine how it would work with all the necessary tools already in place.

The problem that I see with your alternative is that it remains bash. Writing and maintaining a bash parser would not be trivial and I doubt that anyone would be up for that.

As for xml catching on, I doubt it too, but that's on the user side. I'm trying to think of PKGBUILDs as encapsulated data. With a properly structured capsule, it wouldn't be too difficult to write simple editors for those users who can't manually edit it themselves. Even if it happens to be xml, people will manage. Basic web pages are easy enough to create and don't require that much effort to learn. The kneejerk "uuuuugh, xml" is often due to a superficial familiarity with it. While I think that xml is unfit for some things, I think it is perfect for others. I'm not sure about this case though, but the more I think about it, the more I like it and see how useful it could be.

Still, these are all just ideas floating around in my head and I'm still developing them.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#4 2009-07-30 09:22:53

fukawi2
Ex-Administratorino
From: .vic.au
Registered: 2007-09-28
Posts: 6,231
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

Xyne wrote:

2) What do you see as the pros and cons of using XML, from a developer's point of view?.

I'm not an Arch dev, but I imagine from an Arch dev's point of view, it departs from KISS fairly significantly... Especially as Allan said in simple terms of readability.

Reuse is a big thing in Linux, and reusing bash for PKGBUILD's is a prime example rather than going off and writing our own system of parsing the file, we're utilising the work someone else has done on bash, and all we have to do is tell it how to interpret what it has parsed.

Offline

#5 2009-07-30 10:14:43

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

fukawi2 wrote:

it departs from KISS fairly significantly... Especially as Allan said in simple terms of readability.

I get that, but please keep in mind 4 things:
1) KISS in Arch is from the developer's point of view, not the user's (code correctness is given priority over user simplicity). In this case we're talking about the development of tools to work with PKGBUILDs, so the readability of xml from the manual editor's point of view is not at all a focus for this discussion. I'm not saying it's irrelevant, but it's secondary and it also hinges on xml (see the next point).

2) I've only used xml as an example for now. The main point that I'm interested in is the possibility to encapsulate the PKGBUILD data in something that can be easily parsed in other languages. I would like to be able to write scripts to do different things using different languages and the current options are limited. It just feels like the data in the PKGBUILD transcends bash. It would be nice to have something that wasn't directly tied to it. Think of it as a minor abstraction.

3) I'm not actually suggesting that Arch change. Even if I honestly believed that there were a superior solution (which I don't at this point), be it xml or something else, I would not expect a group of devs with disparate opinions to adopt such a drastic change (getting feature requests through the gauntlet is hard enough). I say that in a completely neutral tone. I understand that devs who have an intimate knowledge of the code and how it works will often be in a better position to evaluate a proposed change. At the same time, I also believe that people in general stick to what they know and the status quo, resisting change without considering its benefits (that goes for users too... every time anything gets a major update, there is a vocal group of people throwing a hissy fit). Sometimes that preserves that which should be preserved and sometimes it impedes progress, whence the quote about "good enough" being the biggest obstacle. *

4) Marbles are round.

As for code reuse... that's a good thing sometimes, but not always. Reinventing the wheel can be a waste of time, but if you've discovered rubber since they made that last stone wheel, it might be worth another go and be better than coating the stone wheel in rubber. I'm not saying that this applies here, only that the "reuse" mantra should be examined on a case by case basis.


* on a tangent, I really respect Guido van Rossum's decision to break backwards compatibility with Python 3.0 to fix previously bad choices.

*edited for typos*

Last edited by Xyne (2009-07-30 10:18:50)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#6 2009-07-30 10:27:54

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,521
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

Xyne wrote:

With a properly structured capsule, it wouldn't be too difficult to write simple editors for those users who can't manually edit it themselves.

This highlights my concern here...  the need for an editor to help create PKGBUILDs.

Offline

#7 2009-07-30 10:36:54

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

Allan wrote:
Xyne wrote:

With a properly structured capsule, it wouldn't be too difficult to write simple editors for those users who can't manually edit it themselves.

This highlights my concern here...  the need for an editor to help create PKGBUILDs.

Most people who use openbox manage to handle rc.xml and menu.xml, but the focus on xml is already diverting attention from the main point (my fault for dumping so much of it in there when I wrote it), so forget that for a moment...

Can you think of a way, other than bash or xml, that could encapsulate the data in a PKGBUILD in such a way that it would be easily accessible from other languages?
Can you conceive that there would be benefits to having some structured description of PKGBUILD data which does not rely on a single language/application?


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#8 2009-07-30 11:25:14

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: A discussion about universally/programmatically parsable PKGBUILDs.

To start let me say that I'm not a dev, just a user that tries to learn and get things working.

Your idea to get things structured and easy to parse in several languages seems reasonable from a developer point of view. Also like you say the "good enough" problem seems to apply here but as a user I have to agree with Allan on the readability part.

This is because I am not too familiar with xml, like you have referred. But consider that maybe many of the contributors to aur are also not familiar with xml and they might have not contributed with something if they had to work with xml. Now I am able to open a PKGBUILD with a simple text editor, read it, understand what is there and slightly modify it to my needs, with xml it would be a lot harder to do that, again maybe because I'm not too familiar with it.

Besides you would still need bash to actually build the package, I guess it would be a bit like rubber coating the stone wheel wink , the nesting of PKGBUILDs (or another way to fetch and build dependencies) seems a nice idea though, for the 1st or 2nd degree dependencies at least (in the case of PKGBUILD nesting), but that could also make it big and quite hard to read and maintain, think of how many PKGBUILDs would have to be updated every time something gets updated (but would also solve the problem of things that only work with an older version of some package).

I know you have said to consider this from a developers point of view but I believe that this user side part cannot be forgotten because, in my opinion, it is an important part to make Arch what it is.

Edit:
Bugger, took so long to write this that you and Allan already posted something ^^;

Last edited by R00KIE (2009-07-30 11:28:47)


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#9 2009-07-30 11:27:04

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,521
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

Xyne wrote:

Can you conceive that there would be benefits to having some structured description of PKGBUILD data which does not rely on a single language/application?

Yes - I'm not debating that at all.  As I said, I have been working on a python PKGBUILD parser so I know the need.

Xyne wrote:

Can you think of a way, other than bash or xml, that could encapsulate the data in a PKGBUILD in such a way that it would be easily accessible from other languages?

Sure.  I think the RPM spec file does a reasonable job. (e.g. http://cvs.fedoraproject.org/viewvc/rpm … sion=1.49).  I think a simplified version of that or something like the db files in /var/lib/pacman would be the way to go.  Lets just remove XML from the discussion! tongue

Offline

#10 2009-07-30 12:06:36

siddhant
Member
Registered: 2008-07-29
Posts: 18
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

I'm not sure if I'm correct. But you could use something like YAML (or maybe even JSON). There are bindings available for both of them, for almost every language, and they're both very well human readable, apart from being easily parseable.

Offline

#11 2009-07-30 12:29:53

scio
Member
From: Buffalo, NY
Registered: 2008-08-05
Posts: 366

Re: A discussion about universally/programmatically parsable PKGBUILDs.

Allow me to make what I think is an analogy here:
If you think of PKGBUILDs as Makefiles, then this parser would be like CMake.  It could have a simpler syntax or parts could be automatically generated for you, but the end result is still a PKGBUILD.

If this sounds right, it might make things a bit simpler to follow.  I like the idea as a way to put together a good starting PKGBUILD for the programmer to then tweak.  If you wanted to get really fancy it could then parse some of the namcap output to start filling in depends and such.  Again, it would not be just a "run this program and you have the perfect PKGBUILD", but more of a "run this program to get a starting PKGBUILD".

Last edited by scio (2009-07-30 12:30:06)

Offline

#12 2009-07-30 21:22:48

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

I've updated the OP with a new format example that I came up with after reading Allan's and siddhant's replies.

Forget that I ever even used xml as an example.


Anyway, I think the new format example is perfectly KISS.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#13 2009-07-30 22:30:40

scio
Member
From: Buffalo, NY
Registered: 2008-08-05
Posts: 366

Re: A discussion about universally/programmatically parsable PKGBUILDs.

Ah I think I understand what you meant better now.  The files are essentially PKGBUILDs themselves just without the bash specific syntax.

The only thing I could comment on at this point is that it seems like introducing this extra layer on top of the bash script seems excessive. 
The information is all the same, and the syntax is very similar, like Allan said RPM does a good job and is very similar.

As a side note: I think if you could start writing scripts that broke down the different packagers into your format, sort of like bytecode, then you could also write code to go the opposite direction and have a set of tools to use any package type on any distribution.

Offline

#14 2009-07-30 23:03:04

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

scio wrote:

Ah I think I understand what you meant better now.  The files are essentially PKGBUILDs themselves just without the bash specific syntax.

That's it, only they really would be full PKGBUILDs.

scio wrote:

The only thing I could comment on at this point is that it seems like introducing this extra layer on top of the bash script seems excessive. 
The information is all the same, and the syntax is very similar, like Allan said RPM does a good job and is very similar.

The idea though is to have something that isn't bash. As mentioned in the OP, you either have to write an intricate bash parser in another language or run the PKGBUILD with bash and parse the output if you want to extract the data.

Writing bash parsers in C, Python, Perl, Haskell, Lisp, etc is just not reasonable; you would need to catch all of the little bash tricks that people could use in a valid PKGBUILD in it's current format.

If you source the file, then you expose yourself to malicious code (intentional or not) and you also take a speed hit in other languages because you need to invoke a shell and pipe everything through that.

If the only intention here was to use the new format with makepkg (in bash), then yeah, this would not bring anything to the table, but that's not what I want to do. I want to be able to rewrite makepkg in other languages while remaining fully compliant with a common PKGBUILD structure. I also want to be able to write tools which can parse dependencies, source files and anything else from PKGBUILDs to determine things such as dependency trees.

The current implementation still has vestiges of what seems to be a hacked-together origin. Don't read anything too negative into that though. It works and it works well with makepkg. I just want to separate the logic from bash and create something that is amenable to a wide variety of other uses.

scio wrote:

As a side note: I think if you could start writing scripts that broke down the different packagers into your format, sort of like bytecode, then you could also write code to go the opposite direction and have a set of tools to use any package type on any distribution.

That's also something that I've had in mind. I don't have experience with other package formats so I can't pass any judgement on them, but I really like the simplicity of Arch's PKGBUILDs. I think it would be possible to use a bash-independent format as the example in the OP to create a package "lingua franca", even though that's not a main goal. I think it would be useful to have tools that could automatically convert other package formats into PKGBUILDs. That includes both from other distros and distro-agnostic packages such as CPAN (currently done with pacpan and cpan4pacman), CRAN, CTAN, Hackage (currently done with cabal2arch), etc.




Think of it this way:
The current implementation of PKGBUILDs are like signs written in a foreign language and they sometimes use local slang. If you don't speak the language, you need an interpreter. My proposal is to use an easily understandable diagram/picture that anyone can understand. Obviously this would be unnecessary if the signs only appeared in a country where everyone speaks the language, but that's not the case.

A better analogy might even be to think of as handicapped accessibility. If you can walk, then you don't care about having a ramp, but if you can't, then you need someone who can to carry you if there is no ramp. (PKGBUILDs in bash: steps, bash-free PKGBUILDs: ramps, working legs: bash interpreters)


*edit*
Sorry for being so verbose... I tend to overclarify, but that should be better than ambiguity.

Last edited by Xyne (2009-07-30 23:05:30)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#15 2009-07-30 23:09:14

fukawi2
Ex-Administratorino
From: .vic.au
Registered: 2007-09-28
Posts: 6,231
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

Can I ask... What is driving this? Regardless of whether it's better or worse or the bees knees, things like this usually come up because of some kind of problem, not just a random desire to attempt to improve something "just cuz" smile

Offline

#16 2009-07-30 23:19:38

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

I don't really see the benefit here. The power of ABS is that PKGBUILDs are plain bash. This means we have no limitations and don't need some hacks to solve certain problems. Also we don't need to implement our won parser or grammar; its already there. And lets not underestimate the fact that most users (even those who never used Arch) are able to read the PKGBUILDs.

Btw: Parsing split PKGBUILDs withut bash is a real pain :-)

Offline

#17 2009-07-30 23:21:22

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

The basic idea has popped up in my head a few times while writing various tools for which I needed to parse PKGBUILDs. I give it more thought yesterday when doing something with the AUR's JSON RPC interface.

Generally I actually do attempt to improve things "just cuz". I like to reduce things to their logical components and make things interoperable when possible. Maybe it's just the Unix philosophy kicking in with the drops of functional programming that I'm picking up from Haskell.

I definitely have some concrete ideas for things that I would like to do with this too, irrespective of whether it were ever adopted in Arch, but I avoid going into things that I would like to do in case I never get around to them. Suffice it to say that I've been mulling a few ideas in my head for months now.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#18 2009-07-30 23:26:35

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: A discussion about universally/programmatically parsable PKGBUILDs.

Now that looks good big_smile
If anything I would put the md5sum in the same line as the source as in

pkg
  name: powerpill
  ver: 16.0
  sources
    http://xyne.archlinux.ca/src/$$name$$-$$ver$$.tar.gz  f3b443b6238029474ad9eb07ddc13ae0
    other_source  (no md5sum here or something to indicate the absence of the md5sum, maybe the word none)
    and_another_source  17f3144a7f3664dc7456e82a15df7613

I guess this breaks the structuring, but more readable.

On the depends and optdepends adding the reason looks cool, but I would keep the required version with the name of the package. Like this

  depends
    aria2  >=1.2.0
    perl  5.10.0
    perl-xyne-arch  >=0.68
    perl-xyne-common  (maybe add something here to indicate any version will do, maybe the word any)
  optdepends
    foo  >=1.4
      reason: required for foobar formatting

On the different architectures maybe omit the depends inside each architecture, if we are dealing with dependencies we shouldn't need to say it twice and keep with the one thing per line rule, adding what I said before and if we can allow empty lines for readability then it could look something like this:

depends
  foo  any
  bottle  >=0.75

  arch_i686
    baz  1.0
    wine  >=12.5
      reason: We need wine to fill the bottle

  arch_x86_64
    lib32-baz  1.0.1
    glass  any
      reason: We need a glass to drink the wine from the bottle

On the md5sum and the required version of packages the words 'none' and 'any' could be added if it makes parsing easier otherwise its better not to put anything inside, the less one writes the better tongue


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

#19 2009-07-30 23:28:21

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

*edit*
quoted myself instead of editing my previous post



Pierre wrote:

And lets not underestimate the fact that most users (even those who never used Arch) are able to read the PKGBUILDs.

Well, the new example that I've given is not difficult to read. so I don't see how that is a valid point. I'm sure that it's possible to write a valid PKGBUILD that would be difficult to read using cryptic bash though (not that I know why you would... maybe to hide malicious code?)... the point is that readability is not an issue here. I mean, honestly, can you tell me that the current example is difficult to read? It's probably even easier for people without bash experience.

Pierre wrote:

Btw: Parsing split PKGBUILDs withut bash is a real pain :-)

How do you figure that? There's no way that you can have conceived, let alone even tried, all possible ways to do it. Having only considered a limited subset of alternatives, there is no way that you can conclude that bash is fundamental to this. You're thinking within the current "it's all bash" context. I think bash is actually quite limited as a full scripting language compared to the versatility of others and I'm pretty sure that you could reimplement the current functionality off makepkg in something else.

I'm not suggesting that we do, but I'm fairly sure it would be possible.


I admittedly haven't looked at the implementation of split packages though, but I think our disagreement here depends only on our respective approach to this.

Last edited by Xyne (2009-07-31 00:04:17)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#20 2009-07-30 23:48:43

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

@ROOKIE
This is exactly the kind of discussion that I'm looking for. Thank you.

I agree that it would be a simplification if "required" were removed. If that is the only thing which could show up in a dependency field then the colon syntax could be used, e.g. "aria2 : >=1.2.0". It might even work to just concatenate the two as already done in PKGBUILDs, e.g. "aria2>=1.2.0". I suppose that the additional name parsing would be no more complicated than parsing the line or the one that follows. The advantage of having "required: >=1.2.0" is that it remains extensible in case more dependency information is ever required. Future compatibility is a real concern and I prefer to have something which remains as general as possible. In the case of depends though, maybe the version is all that could ever been needed.

The same consideration goes for "md5sum" in the source fields. The reason for the example format I gave is to make it possible to easily add other checksums, e.g.

  sources
    http://xyne.archlinux.ca/src/powerpill-16.0.tar.gz
      md5sum: f3b443b6238029474ad9eb07ddc13ae0
      sha1sum: 4ceda6ec486aed5489ac848f2ca5cc44a413d5e9

If you only have an md5sum, you could write it as

  sources
    http://xyne.archlinux.ca/src/powerpill-16.0.tar.gz: md5sum: f3b443b6238029474ad9eb07ddc13ae0

Note that keeping the ability to split the line will increase readability for long URLs.

The use of colons still needs some thought though to make sure that there is never any ambiguity. Keyworks such as "name", "version", "depends" etc are fine, but things such as URLs would need some consideration... maybe some type of delimiter (quotation marks?). The build function has to remain verbatim though for the sake of simplicity.


ROOKIE wrote:

if we can allow empty lines for readability

I see absolutely no problem with empty lines for readability. I keep thinking it would work like Python syntax where empty lines are ignored and only the indentation on subsequent lines matters.

I don't know about moving the arch-specific stuff into different sections though. The example that you gave would require a parser to be able to distinguish between architectures and dependencies when reading the depends section. This might work with a set subset, but then if someone adds i586 then other parsers will take that as a dep.

Hmm, this goes above to what I said before about extensibility: a solution to do it your way would be to add an architecture tag to each dependency in the depends section. I don't know if I like that idea though. I think it's simpler to specify common depends in one place and then keep all architecture-specific stuff somewhere else. That would make it easier for maintainers on different architectures to work on the same PKGBUILD I think, but that's debatable.

Last edited by Xyne (2009-07-31 00:01:17)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#21 2009-07-31 00:50:45

cactus
Taco Eater
From: t͈̫̹ͨa͖͕͎̱͈ͨ͆ć̥̖̝o̫̫̼s͈̭̱̞͍̃!̰
Registered: 2004-05-25
Posts: 4,622
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

Xyne.
Two things.
1. I think instead of something more simplified like JSON, you would want to use the superset which is yaml. yaml offers a few things json does not, which retaining the pretty good library support with many languages.

2. I think part of what makes a bash pkgbuild 'easy' is the 'build' section. If you write a pkgbuild in another language, you _still_ have to fork off a bash (or sh) instance, and feed it the build section as well as set any required variables, to perform actions within the build section. It is certainly possible to do it this way, but it is a bit more work than just having bash source the file, and then do things (call the build function).

I really do like the idea of a more parsable pkgbuild format. Parsing pkgbuild files is something that has been painful (to some degree) for many community developed tools. That said, nothing is forcing you to use makepkg. I say try experimenting with other solutions, and see if other people find merit in them. The beauty of pacman and friends, is that you can use _just about anything_ to make a package. Just so long as it has those few meta files that pacman wants, and that they are correctly, you end up with a pacman package!

If you end up with something good enough, take it to the devs and see if it would warrant replacing makepkg with it.

smile

ps. if you write it in python and use yaml, i might be interesting in helping. wink


"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍

Offline

#22 2009-07-31 01:20:15

rcoyner
Member
From: Washington D.C.
Registered: 2008-05-16
Posts: 30
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

cactus wrote:

Xyne.
Two things.
1. I think instead of something more simplified like JSON, you would want to use the superset which is yaml. yaml offers a few things json does not, which retaining the pretty good library support with many languages.

2. I think part of what makes a bash pkgbuild 'easy' is the 'build' section. If you write a pkgbuild in another language, you _still_ have to fork off a bash (or sh) instance, and feed it the build section as well as set any required variables, to perform actions within the build section. It is certainly possible to do it this way, but it is a bit more work than just having bash source the file, and then do things (call the build function).

I really do like the idea of a more parsable pkgbuild format. Parsing pkgbuild files is something that has been painful (to some degree) for many community developed tools. That said, nothing is forcing you to use makepkg. I say try experimenting with other solutions, and see if other people find merit in them. The beauty of pacman and friends, is that you can use _just about anything_ to make a package. Just so long as it has those few meta files that pacman wants, and that they are correctly, you end up with a pacman package!

If you end up with something good enough, take it to the devs and see if it would warrant replacing makepkg with it.

smile

ps. if you write it in python and use yaml, i might be interesting in helping. wink

So what we are trying to achieve are:

1) Have a metadata file that contains information about a package that can be easily parsed by several different programming languages.
2) Maintain the ease of a bash-based build() function.

So why not just separate PKGBUILD into two separate files? Keep the metadata in a file called PKGDATA in YAML or JSON, and keep the build() function in PKGBUILD as a bash script. When you call makepkg -c have it parse through the PKGDATA file to automatically create necessary bash variables ($pkgname, etc) and then run build() in PKGBUILD.

EDIT: Now that I think about it, this isn't much different from what we already have, except that if the metadata is declared in a standardized format like XML/YAML/JSON, parsing the data would be easier because most languages already have libraries for parsing those types of files.

Last edited by rcoyner (2009-07-31 01:27:57)

Offline

#23 2009-07-31 01:49:59

Square
Member
Registered: 2008-06-11
Posts: 435

Re: A discussion about universally/programmatically parsable PKGBUILDs.

The way you've done this, I can see one viable option that could remain KISS.
What I can see so far is that you've simplified the format in order to make it more human readable (read: eliminated punctuation). The only issue is that now, your format would require either parsing two pieces to put together, or (what I see as) better - two steps where one step was.

I imagine that, with your method, this could work well if it went like so:
1. The user grabs the simplified PKGBUILD.
2. A command is issued, which essentially parses the simplified PKGBUILD to output a traditional PKGBUILD.
3. The command calls makepkg on the new PKGBUILD.

If done correctly, the user ends up with the same proccess as usual, just calling a different command.

For an example, let's call the simplified version you've created a PKGFILE, and maintain the name PKGBUILD for the traditional. The process would look something like this:

[user@hostname ~ ]$ cd pkg
[user@hostname ~/pkg ]$ ls
PKGFILE
[user@hostname ~/pkg ]$ pkgmake -sic
Creating PKGBUILD...
Passing to makepkg with options -sic...
(makepkg starts here)

This eliminates the need to change anything at all with the build system we have, but allows for the changes you wish to make for less work on the developers end.

Last edited by Square (2009-07-31 01:53:27)


 

Offline

#24 2009-07-31 04:11:09

u_no_hu
Member
Registered: 2008-06-15
Posts: 453

Re: A discussion about universally/programmatically parsable PKGBUILDs.

Xyne wrote:

Forget that I ever even used xml as an example.

Now we can have a discussion smile

And +1 to what square said. With the level of perl-fu that you possess, it will be trivial to write a parser which creates a normal PKGBUILD from the new format ( and viceversa ) and a makepkg wrapper/port.
Basically a POC ( I think you have built enough tools already to need a separate repo/build tools tongue) . If it is in python i can also help a bit. That will be far more productive than having a TGN discussion and will bring out the merits and flaws of the new approach.


Don't be a HELP VAMPIRE. Please search before you ask.

Subscribe to The Arch Daily News.

Offline

#25 2009-07-31 11:41:11

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: A discussion about universally/programmatically parsable PKGBUILDs.

cactus wrote:

1. I think instead of something more simplified like JSON, you would want to use the superset which is yaml. yaml offers a few things json does not, which retaining the pretty good library support with many languages.

I only looked at JSON before, not YAML. I can take a peek but as JSON is a subset of YAML, I would expect YAML to at least as "complex" as JSON. The disadvantage that I see with that is that it adds more formatting to the file. Maybe the trade-off would be worth it but so far I think the plain-text, white-space formatted example is able to convey all the necessary information, but that's still debatable.

cactus wrote:

2. I think part of what makes a bash pkgbuild 'easy' is the 'build' section. If you write a pkgbuild in another language, you _still_ have to fork off a bash (or sh) instance, and feed it the build section as well as set any required variables, to perform actions within the build section. It is certainly possible to do it this way, but it is a bit more work than just having bash source the file, and then do things (call the build function).

Part of the reason for considering this is that a PKGBUILD contains information beyond just how to build the file. Sometimes it is interesting to be able to trace a dep tree, determine source downloads, etc without actually building the file. I see a PKGBUILD as a collection of meta-info about the package, not just a recipe for building it.

As for running the build function, it's only an ordered set of commands. Most languages can run system commands so that's not that difficult and the overhead from opening a pipe should not make any noticeable difference. You won't need to set up any variables in the shell environment either because whatever's piping in the commands will have already replaced them with their literals.

As an example, if you had a Python version of makepkg, it wouldn't pipe in "cd $srcdir", it would pipe in "cd /tmp/build/foo/src" or whatever. Everything outside of the build function would be handled directly too, so it would be Python building the metadata files etc. The only thing that needs to be considered is how to get info back into the main app, but that should be as simple as echoing a variable once the build function completes.

Pure bash would still work too. The PKGBUILD parser would set up the variables and then everything else would just work as it does now.

cactus wrote:

I say try experimenting with other solutions, and see if other people find merit in them.

As I said, I have a few ideas in mind. I posted this thread to explore this and to get insightful feedback from others. smile


rcoyner wrote:

EDIT: Now that I think about it, this isn't much different from what we already have, except that if the metadata is declared in a standardized format like XML/YAML/JSON, parsing the data would be easier because most languages already have libraries for parsing those types of files.

The first example that I gave was an XML-formatted PKGBUILD, citing the same argument of readily-available parsing libraries in various languages. I think it fits the purpose programmatically, but as you can tell from the first few posts, people hate working with XML. I don't agree with the argument that it's harder to learn XML than quantum physics (people manage to configure Openbox after all), but I do agree that it's not user-friendly.

I'll take another look at YAML, as suggested by cactus too, but I think having a very simple, whitespace formatted file would be the most KISS if that format can encapsulate all of the data. So far I think it might.

Square wrote:

I imagine that, with your method, this could work well if it went like so:
1. The user grabs the simplified PKGBUILD.
2. A command is issued, which essentially parses the simplified PKGBUILD to output a traditional PKGBUILD.
3. The command calls makepkg on the new PKGBUILD.

This system would definitely be backwards-compatible with the current system via a converter. If this system ever caught on in Arch, then makepkg could get patched with a parser for the new format (which I don't think would be that difficult, as it would just need to configure a limited set of variables, but I haven't tested it yet). If this system never catches on in Arch, then it would still be trivial to write a converter to create the current bash-PKGBUILDs from this format.


u_no_hu wrote:

And +1 to what square said. With the level of perl-fu that you possess, it will be trivial to write a parser which creates a normal PKGBUILD from the new format ( and viceversa ) and a makepkg wrapper/port.
Basically a POC ( I think you have built enough tools already to need a separate repo/build tools tongue) . If it is in python i can also help a bit. That will be far more productive than having a TGN discussion and will bring out the merits and flaws of the new approach.

I think developing the idea before diving in to code it is actually quite productive. Think of this thread as trying to formulate the idealogy of such a system before trying to implement it. After all, the example format given in the OP is just that and it was off the top of my head when I wrote it. I'll probably write a mini-specification for it soon and then start playing around with actual code, such as the aforementioned converters or bash parser, unless I suddenly realize that this is a horrible idea.

In any case, the point of this discussion is not to convince the devs to change PKGBUILDs.









I wrote this post in leafpad as I read through the replies, which explains why I've repeated myself a bit in different sections. tongue


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

Board footer

Powered by FluxBB