You are not logged in.

#1 2015-10-15 00:31:18

uberscientist
Member
Registered: 2012-01-27
Posts: 81

IPFS adapter for pacman?

I'm curious how feasible it would be to make an adapter of sorts for pacman so that it gets files from IPFS?

We could have a mirror server that would expose all the files via IPFS and then users could download and help distribute packages.

Thoughts?

Info about IPFS: https://ipfs.io/ distributed file system

Last edited by uberscientist (2015-10-15 00:32:05)

Offline

#2 2015-10-15 01:07:06

fukawi2
Forum Moderator
From: .vic.au
Registered: 2007-09-28
Posts: 5,531
Website

Re: IPFS adapter for pacman?

Well pacman does have the `XferCommand` directive, so if you could wrap the IPFS tool in a script that could parse the URL pacman passes then I guess it could work.

Offline

#3 2015-10-15 02:28:48

Allan
Supreme Leader
From: Brisbane, AU
Registered: 2007-06-09
Posts: 10,740
Website

Re: IPFS adapter for pacman?

This will be slower for the majority of packages.  Big packages will see some gain, assuming you are not maxing your connection out when downloading from a mirror.

Offline

#4 2015-10-27 19:51:49

anatolik
Developer
Registered: 2012-09-27
Posts: 424

Re: IPFS adapter for pacman?

After playing with IPFS I can tell that is really interesting bit of technology. I tried to copy files between computers in different locations (both machines on a campus, in different cities, at local network segment) and found that the latency and copying speed are very good.

IPFS Arch package has been added to [community] repository (go-ipfs) and I recommend everyone to give a shot and try it. There were a lot of talks about implementing easy-to-use distributed global filesystem and finally it is done. Read more about IPFS, watch videos - I promise you will like this technology.

And I would also love to see better pacman + IPFS integration.

I see following advantages on using IPFS for Arch binaries (packages, iso files) distribution:
- reduces our infrastructure load. Our servers become seeding servers. But as people start downloading packages, more and more traffic will be handled by IPFS. Peers will start sending data to each other instead of requesting it from our servers.
- no need for mirrors and as a result no more dead/out-of-date mirrors. Peers essentially become the mirrors itself and store some parts of packages in their cache.
- much faster download in isolated Internet segments. Things like university campuses in non-first world countries come to my mind. While these campuses have good local network the outside bandwidth is not always great. Sharing data between peers locally improves download speed. The same story for local IPS in many cities - they usually have fast local network and slow connection to Internet backbone.

Offline

#5 2015-11-14 12:02:59

Diesel4Power
Member
Registered: 2010-12-07
Posts: 6

Re: IPFS adapter for pacman?

I was thinking just about the same idea! smile But why we need Xfercommand functionality if ipfs has support for mounting as a directory?

Offline

#6 2015-12-06 19:18:44

robcat
Member
From: Fermo
Registered: 2009-02-21
Posts: 19

Re: IPFS adapter for pacman?

IPFS is a perfect fit for package distribution, and we have the important pieces ready:

  • hash of the package: already included in the package database

  • local cache: already present by default on every Arch installation

Definitely the fastest way to get it running is to edit the Xfercommand parameter, that essentially (i.e. with a couple of trivial modifications) can be set as:

ipfs get -o %o %h

(the above command gets the file with hash %h and places it in the cache)

We just need pacman to expose the hash of the package via Xfercommand, which requires a patch to /src/pacman/conf.c (following the same pattern found at lines 234-241).
But I'm not familiar with the pacman source code, and I haven't managed to produce a patch that replaces %h with the package hash. Any suggestions?

Last edited by robcat (2015-12-06 19:19:15)

Offline

#7 2015-12-06 20:59:57

Allan
Supreme Leader
From: Brisbane, AU
Registered: 2007-06-09
Posts: 10,740
Website

Re: IPFS adapter for pacman?

Create a wrapper script around pacman to test this out.  Prove this is worthwhile and the modifications can be made to pacman.  (and given pacman-5.0 is already frozen, you will be waiting at least six months for the following release anyway).

Offline

#8 2015-12-07 00:39:09

robcat
Member
From: Fermo
Registered: 2009-02-21
Posts: 19

Re: IPFS adapter for pacman?

Allan wrote:

Create a wrapper script around pacman to test this out.

Ok, I explored the external script option, hitting the same problem (i.e. pacman does not make very accessible the hash of the package).

A basic wrapper could support the "-S <package-name>" option following these steps:

Get the list of all the packages that need to be downloaded:

pacman -Sp --print-format "%n" <package-name>

For each package name in the returned list, get the extended info (that includes the hash):

pacman -Sii <target>

From the -Sii output, use grep to extract the SHA256 hash and the filename of the tarball; then reconstruct the standard ipfs multihash and figure out the package path in the cache.

Retrieve and pin the package via ipfs:

ipfs get -o <path> <hash>
ipfs pin add <hash>

And finally, call pacman transparently:

pacman -S <package-name>

I did not implement it because this "wrapper script" approach presents a lot of tricky issues (handling of more generic pacman options, root permissions, the user cannot interact with pacman before all the downloading happens, ...).

But since the low hanging fruit is the downloading part, I don't think that the full power of a wrapper is needed immediately. I'll try instead to hack a XferCommand script that roughly follows the steps above.

Last edited by robcat (2015-12-07 00:41:44)

Offline

#9 2016-01-03 05:59:41

H3g3m0n
Member
Registered: 2009-02-01
Posts: 17

Re: IPFS adapter for pacman?

Allan wrote:

This will be slower for the majority of packages.  Big packages will see some gain, assuming you are not maxing your connection out when downloading from a mirror.

Since it's distributed maybe we could have quite a few simultaneous downloads going on. Might end up being faster (although I already max out my connection on a single download). But I'm guessing some people will have super fast connections with the possibility of grabbing files from people on the same ISP.

Any one got an idea on how to do 'fallthrough' for things not available in ipfs, in a timely manner?

Unless we see large scale adoption or official mirrors supporting it, I suspect this will be slow due to ipfs cache misses. The only solution I can think about would be to start a http download in the background while querying the rest of the hashs from ipfs in the background. That way the first few packages might be retrieved from http but the rest could be ipfs unless there not available.

Last edited by H3g3m0n (2016-01-03 06:24:04)

Offline

#10 2016-01-09 15:40:50

Ape
Member
From: Finland
Registered: 2009-10-15
Posts: 46
Website

Re: IPFS adapter for pacman?

IPFS package downloader would automatically share packages in LAN environments (or virtual servers) without configuring anything besides the IPFS daemon. This would significantly boost download speeds and save backbone bandwidth. Also, in case of Internet connection failures people could still download packages from local users transparently.

I tested IPFS a bit and was surprised on how small latency it can achieve. I think a few good mirrors running IPFS with all packages in addition to the regular users participating would be enough to enable really good download speeds.

Offline

#11 2016-03-29 01:00:26

chungy
Member
Registered: 2009-09-07
Posts: 32

Re: IPFS adapter for pacman?

I believe pacman supports local file-path repositories, and if so, a wrapper or adapter shouldn't really be necessary. As long as /ipfs and /ipns are mounted, pacman should be able to look into them as if it were part of the rest of the file system.

Offline

#12 2016-07-16 13:11:56

symen
Member
Registered: 2014-10-13
Posts: 10

Re: IPFS adapter for pacman?

For those who would stumble upon this thread, the subject has been worked on further elsewhere:
https://github.com/ipfs/notes/issues/84
For now there is an up-to-date Archlinux mirror on IPFS at /ipns/mirror.rxv.cc

I only discovered that after experimenting a bit myself and noticing that multiple IPs would seed me the packages I just added on another machine. smile

Offline

#13 2016-07-17 05:11:28

Xyne
Moderator/TU
Registered: 2008-08-03
Posts: 6,360
Website

Re: IPFS adapter for pacman?

I missed this thread when it was first posted. I am unfamiliar with IPFS but it's definitely interesting and I like the idea of using it for Pacman downloads.

This is what I have understood after skimming through the linked thread and the docs:

  • To download a file via IPFS, you need the IPFS hash.

  • The IPFS hash cannot be generated from the metadata in the pacman sync database.

  • The ideal would be to have the IPFS hash included in the sync database along with the other checksums.

  • Someone may have set up a host to sync and hash all repo packages?

  • The challenge right now is generating and distributing the hashes (in a timely fashion).

So, in the absence of official support, is there any way to generate the hashes distributedly? E.g. whenever a user downloads a package directly from a mirror, he generates the IPFS hash and sends it to a host that publishes a table matching file names to IPFS hashes? I see no way to do this directly via IPFS (fortunately you can't just push data to a host), but if someone already has a host running 24/7 on IPFS then adding a little daemon to collect and serve hashes may be acceptable.

The download protocol would then be

  • Get the latest hash list.

  • Download available target packages via IPFS. Security is not an issue because the packages are signed. edit I have just realized that maliciously large files may be a problem.

  • Download remaining packages from mirrors and upload their hashes.

I could add optional support for IPFS to Pacboy relatively easily. The logic to query a source before falling back on the mirrors is already there for Pacserve. It would benefit from fully parallel downloads from IPFS while pulling everything else from the mirrors in parallel at the same time. Eventually this could be extended to source downloads for building packages too, but that would also require official support for IPFS hashes in PKGBUILDs (ipfssums=(...)).

Last edited by Xyne (2016-07-17 05:16:04)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#14 2016-07-18 14:51:55

symen
Member
Registered: 2014-10-13
Posts: 10

Re: IPFS adapter for pacman?

I am not as deep in IPFS and this experiment as other people in this thread (and the github issue), but I can give you an answer based on what I read and tried on my own.

About the distribution of hashes, it may be simpler and more idiomatic to fully rely on IPFS.
Here is the core idea behind what I am experimenting with at the moment:

  • Someone mirrors the package repo and hashes the whole directory. This results in a hash that represents the state of the package repo at this moment (e.g. /ipfs/QmU1qYFBYxvDSZPBAEL8ijKz4uA6kBVeRvexMWzHkytQvf).
    Since it is a merkle-DAG tree, this hash is all a user needs to retrieve any package (e.g. the zlib package is at /ipfs/QmU1qYFBYxvDSZPBAEL8ijKz4uA6kBVeRvexMWzHkytQvf/core/os/i686/zlib-1.2.8-4-i686.pkg.tar.xz).

  • The host uses IPNS to publish the hash it just computed. In practice, this means that the same /ipns/<hash> path always resolves to the current IPFS hash of the package repo (<hash> is the ID of the host in the swarm). It is possible to make this path prettier with DNS and a special TXT record.

  • The user has an IPFS daemon launched in the background, and adds the http or fuse gateway (provided by the daemon) as a pacman mirror, e.g. :
    Server = http://127.0.0.1:8080/ipns/mirror.archl … o/os/$arch
    or
    Server = file:///ipns/mirror.archlinux.org/archlinux/$repo/os/$arch

This solution is nice because it can be naively implemented in a few commands on the server (rsync, ipfs add -r, ipfs name publish) and no modifications to pacman on the client.
It requires a third party that hashes the package repo and runs an IPFS daemon. However it doesn't have to be trusted (so it should be fine to rely on a few unofficial hosts). Also, the hosts don't need much bandwidth since most of the network load is distributed.
There is already at least one such mirror at /ipns/mirror.rxv.cc (although I don't know if it is intended to be used seriously).

The main issues I encountered seem related to the (young and inefficient) go-ipfs implementation. We should be able to find workarounds for most of them:

  • The fuse filesystem is slow and unreliable, though the HTTP gateway works fine

  • IPNS name resolution via the swarm is very unreliable for now, at least for rarely accessed paths. This could be mitigated by setting the IPFS hash of the repo directly in DNS,
    i.e. /ipns/domain.tld -> /ipfs/<current_hash>
    instead of /ipns/domain.tld -> /ipns/<host_id> -> /ipfs/current_hash

  • When using pacman, the latency for the retrieval of each package can be high. Because of this, pacman often aborts the transfer and uses a different mirror. This could be mitigated by using a wrapper (pacboy/powerpill ?) that downloads packages in parallel and cleanly fallback to standard mirrors when needed, instead of outputting verbose error messages.

  • Host-to-host bandwidth doesn't exceed a few MB/s, so it doesn't really replace pacserve for now.

  • Packages downloaded via standard mirrors should also be hashed and seeded. This could be done via an alpm hook or by the pacman wrapper itself.

Unfortunately this wouldn't work for source downloads in PKGBUILDs. Using an additional ipfssums field sounds like a good idea on the long term.
Maybe a centralized service as you suggested could be used in the meanwhile?

Offline

#15 2016-07-18 21:42:42

Xyne
Moderator/TU
Registered: 2008-08-03
Posts: 6,360
Website

Re: IPFS adapter for pacman?

In response to the issues:

  • I see no immediate advantage in using FUSE instead of HTTP. It would just require further configuration, unless the idea is then to dump the entire user's package cache onto IPFS via FUSE.

  • The name resolution is prohibitively slow. "ipfs ls /ipns/mirror.rxv.cc" fails to resolve for me. Listing entries on the host via "ipfs ls /ipfs/<hash>/core/os/x86_64" seems to hang indefinitely while eating around 250 KiB/s, although listing the contents of .../os works.

  • Silent fallbacks are possible with pacboy. Each package can have a list of download "jobs" where the order of the list indicates preference. Currently it uses Pacserve first when Pacserve reports that it has the package and falls back to a multi-mirror parallel download if not or if the Pacserve download fails. It would be easy to add an IPFS download either before or after Pacserve. Timeouts would be handled by Aria2c as determined by the user's settings.

  • Is that due to name resolution issues or problems with the data transfer algorithms in the current Go implementation?

  • Yep.

I have added preliminary support for downloading packages via IPFS. For the moment I've just wedged it in between Pacserve and mirror downloads. I will likely modularize it later so that the user can select the order of preference , etc. Right now it doesn't add downloaded packages to IPFS. Given how central that would need to be, it should probably be done outside of Pacboy (either via Pacman hooks as suggested, if possible, otherwise with some simple wrapper script that adds the package cache afterwards).

To test it, grab the latest pacboytest.tar.xz and run the following in the extracted directory:

./testrun.sh ./pacboy --pb-verbose --pb-config ./config-ipfs_test.json -Sw --cachedir cache pacman

I haven't documented anything yet but the config-ipfs_test.json file should hopefully be self-explanatory. You'll need to have the IPFS daemon running. The default timeout is set to 20 seconds. You can easily test other aria2 parameters via the dictionary.

Note that "pacserve": null is just to override the pacserve setting in config.json, which serves as a base for config-ipfs_test.json.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#16 2018-03-08 13:54:35

VictorBjelkholm
Member
From: Barcelona, Catalunya
Registered: 2018-03-08
Posts: 1
Website

Re: IPFS adapter for pacman?

Hello everyone!

I'm new to these forums (and new to Arch in general, been running it for ~2 months and loving it so far! Thanks to everyone who makes it so awesome!)

I work fulltime on IPFS and decided that I would like to download all my updates and new packages over IPFS rather than HTTP and I also want to run arch on my laptop, and would be a waste to fetch packages from the external servers when my desktop already has them.

After following the discussion over at Github (https://github.com/ipfs/notes/issues/84) for a while (heh, more than 2 years already...) I've successfully setup a Arch mirror over IPFS!

There is a Github repository describing how it works and how you can use it here: https://github.com/VictorBjelkholm/arch-mirror

I'm happy to hear any feedback and/or suggestions on how to make it better.

If everything works OK and I hit no issues in two weeks, I plan to announce this mirror on the mirror-mailinglist.

Thanks and thanks for all the hard work the community does on Arch. It really shows.

Offline

Board footer

Powered by FluxBB