packages and systems

scruffidog · 2020-06-02 21:40:25

Long time lurker and Arch user suffering from a case of confirmation bias. The distribution's main selling points are:
1) rapid turnaround to match upstream versions of a HUUUUGGGEE number of codebases
2) sensibly balanced packaging system between simplicity and comprehensiveness in features and capabilities
3) extensive documentations of not just the Arch Way, but also the majority of the ecosystem as well
4) flexibility

Because of these reasons, Arch is my preferred goto for any type of deployments; from servers, desktops, VM, containers, embedded, etc. To this end, pacman, makepkg, ABS are amazing self contained, complementary management tools.

Has there been any discussions/considerations to evolving a more structured packaging and/or possibly naming convention by using standards, concepts, features already in place ? For me, different system use cases (ie: server, client, development systems, embedded, containers, VM, etc) may only require a subset of files in a compiled code-base. The official repositories does seem to have some ad-hoc movement in that direction with groups, meta-pkgs, and random pkgs like: gnome, base, gcc*, haskell*, kdevelop*, smbclient/samba etc.

I kludged/built my own repository infrastructure by repackaging needed pkgs based on the following ideas:

each code-base or upstream projects that is packaged would utilize either a group or meta-pkg that be broken out into their respective named components:
ie: -clt/tools, -svr/daemon, -lib, -include/dev, -doc/man/examples/samples, -localizations, -configs

I think there are several benefits to incorporating a formal sub-component structure to pkg building:
*) higher levels of granularity and more flexible customization for different use cases
*) faster and smaller install/upgrades by segregation of fast vs. slow changing files (binaries vs. includes/docs, etc)
*) easier maintenance and potentially better security by eliminating unnecessary bloat by use case
*) no need to maintain my own AUR (which is still better than using Gentoo)

what does everyone think about this ? Would there be any interest in pursing this ?

Trilby · 2020-06-02 22:57:18

So you are basically why we don't split packages like debian-based systems do? Except it sounds like you want to go further than pkg and pkg-devel and have many other subset/variants?

I highly doubt this will get any traction in arch. It's a lot more work, for no objective benefit. It also goes against the principle of sticking close to upstream. Upstream sources don't distribute sometool-runtime, sometool-devel, and sometool-docs, they create sometool and distribute it as one entity.

But there should be no need to rebuild most packages if this is what you really want on some of your systems. Just use NoExtract rules. You could certainly make some nice pacman.conf templates that are adjusted for your different targets of servers, or embedded, or whatever. For example, if you know one flavor of system will not need man pages, put all the manpages in NoExtract.

scruffidog · 2020-06-03 03:26:48

Actually I saw this type of setup in Solaris' package management way back when Sun was still relevant. Each software pkg is broken down into a set of components: clients, servers, man pages, includes/archives/objects, etc.

The benefits would not be for the package maintainer, but the administrator/manager of the environment. To be fair, it should not be significantly more work for the package maintainer as there are already quite a few pkgs that sort of do this already in their PKGBUILD. It would not be hard to use one of those as a template to segregate the files out into their respective components. I did started originally with NoExtract and NoUpgrade directives, but it is not as useful, especially in a use case of building a both a development and production embedded environment for a specific code-base.

This does not go against the principle of upstream, in fact I think this adheres even more closely to the spirit and letter of autoconf and m4 managed projects. ie: make include; make man, etc, etc.

Due to the tools' design and features there really should be no contradiction about a single entity. The way I see that working is a meta-pkg that names the compiled project and depends on the pkg sub-components. This way, a normal user would just choose to install the meta-pkg and someone with a different use case can choose the specific components.

There is also the added benefit of limiting attack surfaces for security. Take the example of openssh: the pkg contains both the server and the client. On a server running internet facing services behind a firewall, why would you want to make it easier for someone to jump off the system in case of a compromise ?

Again, I see this kind of split being done for a number of pkgs already. It would be great to see this extended in a structured way that would facilitate automation and maintenance. And if it does not, well, I have my own abs and AUR...

Last edited by scruffidog (2020-06-03 03:31:15)

jasonwryan · 2020-06-03 03:43:46

scruffidog wrote:

Take the example of openssh: the pkg contains both the server and the client. On a server running internet facing services behind a firewall, why would you want to make it easier for someone to jump off the system in case of a compromise ?

Err, if they have compromised the server, not having an SSH client is the least of your worries...

None of what you are proposing either makes sense to me, or is likely to appeal to anyone actually developing Arch. For the simple reason that it adds a shitload of complexity for no benefit. Arch has evolved into what it is because it actively avoids that approach.

This is even explained in the wiki:

Simplicity wrote:

Packages are only split when compelling advantages exist, such as to save disk space in particularly bad cases of waste.

https://wiki.archlinux.org/index.php/Arch_Linux

Awebb · 2020-06-03 04:36:45

While some Arch devs frequent the bbs regularly, you'll have more luck getting an answer on the respective mailing lists. This bbs is mostly for support, so the answers you'll receive will mostly be about correcting either your config files or your views to adhere to the status quo.

@topic: Arch does not have a mechanism, that installs packages based on some sort of a role. There is no dpkg-reconfigure and no USE flag. At the moment, I can install a fresh desktop system to my wishes without a single package query, because the granularity of the packages is not too fine to be committed to memory. I certainly would not be overwhelmed by openssh-{client,server,sftp-server}, although I'd certainly be annoyed every time I forget sftp-server. Let's say we have a split package, pkgbase is "openssh" with those three packages, so we'd like a group "openssh", too, and an openssh-meta package, because if I want "all things openssh" installed, without manually checking if there is something new on every update, a group won't do it.

This is what finding a package looks right now:

$ pacman -Ss openssh
core/openssh 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol
community/lxqt-openssh-askpass 0.15.0-1 (lxqt)
    LXQt openssh password prompt
community/openssh-askpass 2.1.0-2
    A plasma-like passphrase dialog for ssh
community/perl-net-openssh 0.78-2
    Perl SSH client package implemented on top of OpenSSH
community/python-sshpubkeys 3.1.0-3
    OpenSSH public key parser for Python
community/tinyssh-convert 0.2-5
    Convert Ed25519 keys from OpenSSH to TinySSH format

This is what it would look like with a split package:

$ pacman -Ss openssh
core/openssh-client 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol (client)
core/openssh-meta 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol (meta package)
core/openssh-server 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol (server)
core/openssh-sftp-server 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol (sftp-server component)
community/lxqt-openssh-askpass 0.15.0-1 (lxqt)
    LXQt openssh password prompt
community/openssh-askpass 2.1.0-2
    A plasma-like passphrase dialog for ssh
community/perl-net-openssh 0.78-2
    Perl SSH client package implemented on top of OpenSSH
community/python-sshpubkeys 3.1.0-3
    OpenSSH public key parser for Python
community/tinyssh-convert 0.2-5
    Convert Ed25519 keys from OpenSSH to TinySSH format

This extended complexity will do three things:

1. I will not have to comment out the sftp subsystem in the sshd_config in case I don't want the ssh server to have sftp.
2. I will not have to not enable and start the sshd service on a non-server.
3. I will not have to worry about an attacker "getting out of my server" with my own ssh client and force the poor soul to bring his own.
4. It will reduce the installed size of a server-only to roughly a third, which could be interesting for very tiny systems... which isn't something Arch is very good at without rebuilding a lot anyway.

Then, how far is this going to go? I'd like to use dropbear-scp with the regular ssh client. I'd then want an openssh-client-scp package in the split, probably asking for an openssh-client group and an openssh-client-meta package? Then the openss-client package needs to be renamed, because frankly I don't know what pacman does, when a group and a package have the same name.

$ pacman -Ss openssh
core/openssh-client 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol (client)
core/openssh-client-meta 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol (client meta package)
core/openssh-client-scp 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol (scp client)
core/openssh-meta 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol (meta package)
core/openssh-server 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol (server)
core/openssh-sftp-server 8.3p1-1
    Premier connectivity tool for remote login with the SSH protocol (sftp-server component)
community/lxqt-openssh-askpass 0.15.0-1 (lxqt)
    LXQt openssh password prompt
community/openssh-askpass 2.1.0-2
    A plasma-like passphrase dialog for ssh
community/perl-net-openssh 0.78-2
    Perl SSH client package implemented on top of OpenSSH
community/python-sshpubkeys 3.1.0-3
    OpenSSH public key parser for Python
community/tinyssh-convert 0.2-5
    Convert Ed25519 keys from OpenSSH to TinySSH format

Now I need a) a Wiki article to understand what to install and b) will bite my own ass every time I go "openssh-client" and forget the scp package.

This is just one example. There are good examples for split packages and meta packages, where the added complexity dishes out benefits. For example, a lack of granularity in the KDE packaging once lead to the KDEmod repository and ultimately to the birth of the Chakra distribution. This would probably not happen today, because KDE/plasma is quite granular these days.

Trilby · 2020-06-03 04:53:16

scruffidog wrote:

The benefits would not be for the package maintainer, but the administrator/manager of the environment.

This would be putting user-friendliness over user centrality. More importantly ...

scruffdog wrote:

To be fair, it should not be significantly more work for the package maintainer ...

Despite your claim to the contrary, I don't think that is being fair. It would be a lot more work. A template doesn't help - the issue isn't conceptualy how one could imagine splitting a package - that's trivial - it's the work of actually doing it.

Your security argument has been addressed effectively in the previous responses, but to add one more nail to the coffin: of course, I too I see little consolation for a sys-admin knowing that the attacker that has shelled into and gained access on the server doesn't have a tool available to shell out to some other system ... actually less than little: precisely none; once they have access to the system, they can simply install openssh-client and be on their way. Don't worry about upgrading locks on the window if the front door is wide open.

In any case, I'm really just a bit confused on why you are pitching this idea here. I see nothing wrong with your goals, though I certainly don't share them, but there are other distros that really focus on the type of design your describing. It is, however, quite divergent from arch's history and philosophy. So why try to make arch more like those other distros that do what you want rather than just using one that does this from the start?

Last edited by Trilby (2020-06-03 04:56:18)

eschwartz · 2020-06-03 15:52:57

Awebb wrote:

3. I will not have to worry about an attacker "getting out of my server" with my own ssh client and force the poor soul to bring his own.

This basically summarizes the discussion for me.

Awebb wrote:

4. It will reduce the installed size of a server-only to roughly a third, which could be interesting for very tiny systems... which isn't something Arch is very good at without rebuilding a lot anyway.

Arch is fairly decent at this, we just call it NoExtract rules.

Awebb wrote:

Then, how far is this going to go? I'd like to use dropbear-scp with the regular ssh client. I'd then want an openssh-client-scp package in the split, probably asking for an openssh-client group and an openssh-client-meta package? Then the openss-client package needs to be renamed, because frankly I don't know what pacman does, when a group and a package have the same name.

If a group and a package have the same name, pacman sees the package first and doesn't check for a group. You can still explicitly ask for the group using pacman -Sqg <groupname> to get the list of individual members, implement your own homebrew mechanism to interactively select which ones you want, then pass them as package name arguments to pacman -S.

eschwartz · 2020-06-03 16:06:02

scruffidog wrote:

Has there been any discussions/considerations to evolving a more structured packaging and/or possibly naming convention by using standards, concepts, features already in place ? For me, different system use cases (ie: server, client, development systems, embedded, containers, VM, etc) may only require a subset of files in a compiled code-base. The official repositories does seem to have some ad-hoc movement in that direction with groups, meta-pkgs, and random pkgs like: gnome, base, gcc*, haskell*, kdevelop*, smbclient/samba etc.

The base package is a special case of a policy document.

gcc-libs and gcc is specifically meant to be split upstream, because most people don't need a giant compiler installed, but gcc embeds runtime libs into programs it compiles.

haskell-* is a set of individual projects written in haskell, providing one software product per package. The naming pattern is indeed something arch does, just like we do python-* module packages. This happens when a package is only useful as an ecosystem-specific library.

gnome, kde, etc are as far as I know only packaged with one package per upstream project and upstream versioned release tarball.

smbclient is fairly small, and things link to it. I have it installed because mpv. I don't have samba installed, but it's 62 MB and also forces me to install ceph-libs, which is 52 MB, so saving users over 110 MB of install footprint *in the common case* is indeed a good reason to split packages.

No, "autotools has make include and make man, so it makes sense to split" is not a good rationale. Handling dependencies between -include packages just on its own would be a godawful pain.

scruffidog · 2020-06-03 20:17:33

What attracted me to the distro in the 1st place IS the ability to dig deep without graphical cruft and requires some amount of expertise. Because of that, I wind up with many machines, real and virtual that needs anywhere from 200-600 pkgs to be managed/rebuilt/installed. Arch with its simple philosophy and flexible tools made it easy for me to keep all of that in lockstep without having to need more complicated dev, build and roll out infrastructure. That probably would be the right way to do things but I'm lazy and trust the Arch team enough to want to leverage that so I can do other things.

@eschwartz: As stated before, I have used the various No* directives, but they are not as flexible. The official tools provides me a way to quickly throw in/take out debugging facilities without having to expend much brainpower on my end. There are many ways to do things and

My kludgy, hacked up approach grew out of how my computing sprawl evolved and the decisions made over time. It works for me now and probably well into the future and if more pkg get broken out over time as I seen, it could wind up being a moot point.

Trilby · 2020-06-03 20:44:01

scruffidog wrote:

... without having to expend much brainpower on my end. There are many ways to do things ...

Yes, but don't you realize that you are suggesting that someone else do all the work (expend their brainpower) so that you don't have to?

scruffidog · 2020-06-04 00:02:41

@Trilby: I would gladly contribute some of my hacked together PKGBUILDs to the effort if there was interest. Not looking to freeload, more like I've scratched my itch this way due to some of my specific use cases and maybe other people might find it useful. And of course if it does go upstream, one less thing for me to maintain.

Arch Linux

#1 2020-06-02 21:40:25

packages and systems

#2 2020-06-02 22:57:18

Re: packages and systems

#3 2020-06-03 03:26:48

Re: packages and systems

#4 2020-06-03 03:43:46

Re: packages and systems

#5 2020-06-03 04:36:45

Re: packages and systems

#6 2020-06-03 04:53:16

Re: packages and systems

#7 2020-06-03 15:52:57

Re: packages and systems

#8 2020-06-03 16:06:02

Re: packages and systems

#9 2020-06-03 20:17:33

Re: packages and systems

#10 2020-06-03 20:44:01

Re: packages and systems

#11 2020-06-04 00:02:41

Re: packages and systems

Board footer