You are not logged in.

#1 2010-02-12 20:59:55

extofme
Member
From: here + now
Registered: 2009-10-10
Posts: 174
Website

Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

design the output of every tool we use to look like merge-able chains in a DSCM.

this is something i've been thinking about long before i came to Arch.  i want to see next-generation package/configuration/change management in a distributed distribution.

I am familiar with git, and most of what i have tried is relating with it; however. it's only because i know much about it.  other possibilities would be bazaar?/mercurial/fossil/etc.  i like fossil; i have not tried it but it looks closest to what i want to achieve.  i don't think it could scale to the levels we'd need however.

ASSERTIONS

) all PKGBUILD are DSCM (git/?) based
) bugs should ride along with software, and be merge-able when branches merge
) cryptographic signatures for each user
) wiki for each software
) forum "channels" for each software
) P2P sharing of SCM (blobs/trees/commits in git) units
) P2P sharing of common SCM (packs in git) pack
) P2P sharing of user configs and ABS build trees; each user may host their own binary/source repo, and sign their packages)
) P2P and distribution are good

essentially, everything is a branch/tree and we use facilities of DSCM with a P2P layer above.  the arch servers could become another node in the system and a long term record keeping peer.  others could add servers.  you could open the wiki/bugs/etc offline, in a web browser, and merge later.  when you edit your PKGBUILDS, they can be forked by others and improved, maybe pushed to the core/community repos.  official repo builds could be signed by an official Arch GPG key.  bring everything as close to source as possible, and spread it out.

this is completely brainstorming right now, but i have done some tricky cool stuff with git.  i want to keep all/most of the logic/information withing the git DAG (commit graph).  i think we could do neat stuff with the git index, git grafts, and several operations could safely be done in parallel.  we could do mapreduce type calculations on the "ArchNet" to get crazy statistics and visualizations.

i intend to actually build something soon-ish-awhile.  right now im working on an app that can produce 3D visualizations in VPython from any kind of input stream... i want to hook that kind of stuff up and visualize the arch/linux/gnu/buzz.

another offshoot project for me was to use VPython (that app is really fun) to navigate and manipulate git repositories in real time.  imagine visualizing your system in 3D while working on it.  like a 3D admin panel where you overlay others configs and entire systems on to your own to see what changes/etc. DSCM can do this.

thoughts?  what other kinds of things could we do if everything Arch behaved like a P2P super-repository?

Last edited by extofme (2010-02-14 01:34:37)


what am i but an extension of you?

Offline

#2 2010-02-14 02:56:53

extofme
Member
From: here + now
Registered: 2009-10-10
Posts: 174
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

ok, perhaps i was too ambiguous, conceptual commands don't make good titles, everyone hates regex, or some combination of the above smile

what i want to accomplish is a conceptual restructure as to what a (distribution) community can be, and how contributions can be realized and rippled.  when i came to Arch, and i haven't been here long, two things in particular were clear:

1) this community does not have corporate sponsorship or roots, and thus it's direction is not dictated by any such entity
2) this community doesn't have end-user experience rabies, nor does it aspire to acquire the disease; instead it focuses on completeness, transparency, simplicity, and interaction within it's technically inclined base

and that rocks.  i would like to see an entire distribution go, well, distributed.  this means all and everything that it is (title).

the purpose of a package manager is to merge unique/interdependent sections/views of the filesystem hierarchy into a complete and usable system.  it retains said information for clean removal, reversal, and interchangeability with other independent filesystem sections.

isn't this what a DSCM does? and with great efficiency?  why not instead of tarball packages, packages are simply a devepmental branch in an DSCM?  this would provide abilities out of the box (there are others too):

1) clean application of the software to the filesystem
2) efficient management of package "versions".  old packages are simply a descendant of the current package HEAD.  that means we get all the benefits of differential updates, and we can revert to any version, at any time
3) DSCM like operations on anything.  diff, blame, archive (actually i have tried this... if i unpack a package into a git branch, using git to generate a tarball produces an INSTALLABLE arch package... i am experimenting with this to "eat" my package cache into git repos, thus allowing me to downgrade if need be, or generate an older package whenever i want)

moving on.  when i want a package for reason x/y/z, i don't care where it comes from; i could care less whether i got it from official arch servers, my closest mirror, or joe shmo with stupid fast upload speed around the block.  i just want the bits, and so long as they match the signature in the end...

wait a minute, don't we have that technology already? i think it likes to be called by its rapper name, P2P.

we all have computers.  some of us have servers.  i'd bet most of us have decent broadband connections, and i'd also bet that our collective processing power and bandwidth capabilities is nice round power of 10 number, best expressed in scientific notation.  i want to tap this.  P2P draws it's strength from utilizing the normally idle leaf node at the internet edge, and we are all at the edge.

many DSCM's store their data in chunks addressable by the cryptographic hash of their contents.  this is great for P2P, integrity and basic chunking are already present!!  P2P has to find the unique chunks, jack them into the repo, and bang, you've got package/message/bug/whatever you were looking for.

im trying not to ramble, but i want to generate some interest in this, as this is a real project that i intend to devote time to; distributed/parallel computing is very interesting to me.  right now there is little more than the visions of grandeur trapped in my skull, but that will change, and i have done preliminary testing on some concepts.

this kind of distributed, cryptographic sharing can be applied to the entire operation.  i want to see people creating their own binary "repos", pulling from their repos, being able to merge packages/config, self configuring and "explosive" source code (this needs further thought/explanation, as i think it would have to be tackled at the Make level, but i would like to see a package manager that is distribution/architecture/configuration agnostic, and is able to recombobulate itself into alternative configurations, and no that isn't a word), source/rev based dependency resolution (install package X because its revision is known to work with revision Y of package Z, NOT because a human said version X works with version Y or greater... i want history/SHA1 based dependency resolution graphing, AUTOMATIC)

i want to see all repositories disappear, and become peers instead.  every distro is using the same software from the same upstream.  we just Make/Prepare/Install differently, and we should recognize and address this.  we could all be using a P2P net where all the code lives, and we all benefit from killer download speeds.

i didnt really talk much about it, but the same concepts can be applied to our forums/bugs/wiki.  imagine a forum thread as a collection of files in a DSCM branch.  when someone "posts", they are really adding a file to the branch, committed using their GPG signature.  this lets us do some tricky stuff with threads as they are now diffable, blamable, possess verifiable integrity, and are transferaable in a P2P/distributed system.

same with bugs, i would like to see bugs follow the actual project they link to, but i need to think on this one more.

wiki, same.  lets make the wiki a shareable asset.  you can add to it at any time, or view it at anytime, without being online.

the aur will become a thing of the past as we can share our source and binary builds with each other at awesome speeds.  since we arent in the illegal type of P2P, we can have dedicated servers that can act as ultrapeers/object mirrors.

i'm all over the place, but do you guys see where i am going with all this?  i have much more to share within my head, but its just not fleshed out to any workable degree at this point, and i want to see what others envision as being possible.

hit me with some flame at least! big_smile help me generate ideas for how/why/if something like this could change the way a distribution exists, what kinds of new possibilities would emerge, etc. etc....


what am i but an extension of you?

Offline

#3 2010-02-14 03:34:43

bruenig
Member
Registered: 2007-05-20
Posts: 175

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

dependencies

Offline

#4 2010-02-14 05:53:55

jb
Member
From: Florida
Registered: 2006-06-22
Posts: 466

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

The term "simple" comes to mind.  Where does that fit in all this? big_smile

extofme wrote:

since we arent in the illegal type of P2P, we can have dedicated servers that can act as ultrapeers/object mirrors.

Illegal or not, there's quite a few networks out there that flat-out block p2p.  Blocks on http/ftp are quite less frequent.


...

Offline

#5 2010-02-14 07:42:38

extofme
Member
From: here + now
Registered: 2009-10-10
Posts: 174
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

people could easily create the many "spinoff" arch distros i see in the forums and publish them, signed.

im trying to get simple by eliminating any critical/supervisory infrastructure for archlinux.org, and layers of indirection between the various states a software can "cold" assume, and the one i wish it to exist in cooperation with the rest of my system.  i figure source code can live very efficiently within a "hot swappable", distributed, history based filesystem, perhaps other things could fit too. enable everyone to publish things with a signature, and spread it on the <Insert Here>.  your published branch/configuration could merge with mine (index comes in handy here.  merge with index.. view, then apply to target system).  i think as it fleshes out it could get pretty... neat.

im still researching p2p methods, and i haven't even got into the Make type stuff or even know much at all about C/C++ and all the headers information and how they link together.  i'm from the outside of "how its done now" as applies to each process involved in transforming text based header/src to obj/exe/whatever.... i only know scripting/interpreted languages in detail, and will be writing in python.  ill keep reading though... i see a couple of software type units (repo/branch):

) src: "express" (checkout) itself into any of it's supported architectures and any one of it's build configurations
) bin: install if match to target system, elseif available: "become"  binary by P2P gathering the real binary pack from peers else: gather source and build for target system
) src/bin + history: a super binary/source package that can assume any of its decendants
) src + bin + history: the definitive super package... can become any source or binary package since the software's inception as a package, and revert to src only before that

i need to do alot more research into the previous building part of it, but that stuff all for distro agnostic progress.  we could still use the system to install arch packages in nearly the same way, similar to the many pacman wrapper scripts.

arent only default p2p ports blocked? i need to research more how p2p actually punches holes to unite peers.

initially im going to play with the other aspects like how to represent a forum as DSCM streams, and the wiki.  i really want to somehow mix bugs and src together, in a way that benefits both.  link chains of forum and wiki conversations to the software package also... everything p2p sharable and cachable for offline work/viewing.

bah im tired though so to be continued


what am i but an extension of you?

Offline

#6 2010-02-14 08:17:52

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,384
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

seriously, the combination of your three posts is tl:dr...   Can I have the bullet points?

Offline

#7 2010-02-14 08:32:00

sand_man
Member
From: Australia
Registered: 2008-06-10
Posts: 2,164

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

Allan wrote:

seriously, the combination of your three posts is tl:dr...   Can I have the bullet points?

100% agree and the thread title doesn't give any idea what you might be on about.


neutral

Offline

#8 2010-02-14 21:59:01

extofme
Member
From: here + now
Registered: 2009-10-10
Posts: 174
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

tldr?  i guess you learn a new internet acronym everyday.  title? c'mon! command "Distrib" matches the glob Arch*, and it's applied to the psuedo file thoughts... heh thats crystal clear isn't it?  what would you suggest as a better title?

ok, well im trying to make sense of my own thoughts and ideas, and just get them on "paper", so yeah, i realize my posts are probably not that cohesive.  ill try to bullet some out:

Archlinux.org DOMAIN
) users can write threads/posts/wiki using the GPG signature assigned to their username when they sign up
) binary packages are GPG signed by any user, or an "official" GPG key from the standard Arch, or users can apply for a "spinoff" signature.  that way every derivative has their own top level signature
) all of this is handled by the DSCM subsystem
) threads/posts/wiki/bugs/packages become branches in the DSCM, and are fully trackable and history focused

PACKAGE MANAGEMENT
this is only the first wave/thoughts for this part.  i have a radically different end goal for dependency resolution and how software exposes itself to the system and other softwares. but here what we could do now:
) each "current" package is a the branch tip HEAD in the packages repository...

--+--------------+-------------+-------------+---------+ HEAD (pkgver 1.0)
    \                  \                  \                 \
     `TAG (v0.6)  `TAG (v0.7)  `TAG (0.8)  `TAG (0.9)

this lets me revert to any version any time, easily "diff" the changes between releases, see who wrote what file and who changes what, etc. etc... anything that you could do with a DSCM

WIKI AS DSCM
) wiki pages are little more than the current HEAD of the topic they describe.  DSCM is perfect for this, here is an implementaion of git as a wiki:

http://github.com/minad/git-wiki
DEMO:
http://git.awiki.org/Home

that project started here: http://atonie.org/2008/02/git-wiki, but that fork seems to be the most active.  anyway its a example of how it can be done

FORUM AS DSCM
) this is similar to wiki, with twists.  forum changes more rapidly.  current branch HEAD would be updated each time a new post is added or a post is edited.  when you add/edit, you use your global GPG key to sign it, just like everything else

BUGS AS DSCM
this i need to research more.  i want to link bugs to source as closely as possible... imagine viewing the bug in your web browser, clicking a link/whatever, and the packages on your system is "transformed" into the state the bug reporter had (protected environment of course, maybe LXC).  i need to look into how fossil does this, and play around with some more ideas

P2P-esque IMPLEMENTATION OF git-receive-pack AND git-upload-pack (OR SIMILAR)
) DSCM uses globally unique methods for identifying content chunks.  this means that similar packages can help "seed" other, non related packages simply because they shared common files (libraries/etc.)
) DSCM and P2P will work well with each other, they both depend on integrity and universally unique chunking
) and wouldnt it just be cool?  i dont think there is anyone else doing anything like this from me research

all of this is for content only.  the P2P super repo would behave like an API of sorts, and clients could jack into it (render to webpage, render to X, cache forum branches, desktop applications, offline content viewing)

eh?


what am i but an extension of you?

Offline

#9 2010-02-14 22:52:45

smakked
Member
From: Gold Coast , Australia
Registered: 2008-08-14
Posts: 420

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

Hmmmm Isnt arch meant to stay simple?


Certified Android Junkie
Arch 64

Offline

#10 2010-02-14 23:25:47

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,384
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

so...  in one line; you want the packages, wiki, forums, bug-tracker to all be encompassed in on big git repo to allow easy interaction between each?

Kind of goes against the whole one program for one job mantra.

Offline

#11 2010-02-14 23:29:16

mikesd
Member
From: Australia
Registered: 2008-02-01
Posts: 788
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

I'm all for finding new ways of using git. You have some interesting ideas. I'm not sure if using git, or another DSCM, for *every* aspect of a distro is the way to go. Good luck though if you decide to do something with this.

Offline

#12 2010-02-15 00:08:27

extofme
Member
From: here + now
Registered: 2009-10-10
Posts: 174
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

smakked wrote:

Hmmmm Isnt arch meant to stay simple?

well simplicity to me = similar code path for many similar operations, vs. the myriad of levels/programs/websites all trying to accomplish the same thing.  i mean, think of everything source code goes thru between "commit" and "install", and all the people that have to interact with it.  so far i have only really talked about 2 concepts, DSCM + P2P.  i think if you try to break yourself from the paradigm that exists now you can maybe see the destination i seek... i want rapid interaction between the people surrounding an idea... be upstream/distributer/developer/user

Allan wrote:

so...  in one line; you want the packages, wiki, forums, bug-tracker to all be encompassed in on big git repo to allow easy interaction between each?

Kind of goes against the whole one program for one job mantra.

well its not all in a single repo by any means, any more than the entire wiki is in a single DB table.  what i'm proposing is more akin to the filesystem or database engine you decide to use for a project.  there are many uses for Ext3, and there are many uses for MySQL.  git/DSCM is little more than a stupid engine for perfectly retaining the connection between information, changesets, and people in a decentralized manner.  in the beginning, git was touted for its filesystem-like capabilities, not for its ability to be used as an SCM.  i wish to apply its unique abilities to places that have not been thoroughly explored, and i think would fit very nicely

mikesd wrote:

I'm all for finding new ways of using git. You have some interesting ideas. I'm not sure if using git, or another DSCM, for *every* aspect of a distro is the way to go. Good luck though if you decide to do something with this.

i think some parts will be more difficult than others, but i just dont see anything that couldnt benefit from being decentralized.  thanks for encouragement, i do expect to prototype some stuff out in the coming months; im mainly trying to invoke the thought processes in others, get ideas flowing, and provoke some discussion on what we as avid linux users, think could be better/possible


what am i but an extension of you?

Offline

#13 2010-02-18 23:01:44

extofme
Member
From: here + now
Registered: 2009-10-10
Posts: 174
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

the forums in this sense seems pretty cool:

thread=forum thread
tree=a tree object in git

) every thread is a branch in refs/heads
) each branch is a tree of text files (posts) and media (images/vid/whatever allowed)
) each advancement/commit to branch is someone adding/editing the thread
) each branch/thread has the same origin (a source/empty commit of some sort)

a working tree isn't necessary for many operation thanks to index file, and i think this could be fast.  you could do "logical multiposts"...  edit multiple thread at one time, i.e. an announcement of some kind in multiple categories to link threads.  you could merge threads together, and have some threads that could only be committed to by admins/etc.

stickies/releases/etc could be tags

imagine this hooked up to browser/libwebkit directly and viewed as nice html5?  if you weren't online, you would only see actual content (threads/media) in the repo and no avatars/css/anything not cached.  when you came online, that stuff could be ajax loaded "around" the content that is the repository.

ill need to play around with pack-sharing/SHA1 resolution between peers; you shouldn't have to need the entire forum when you start, and we should be able to have caching schemes for non-viewed threads.


what am i but an extension of you?

Offline

#14 2010-02-19 01:19:04

keenerd
Package Maintainer (PM)
Registered: 2007-02-22
Posts: 647
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

I'd really like to hear how you plan to scale to the current user base.

For a few weeks I've been working on a very similar but much less ambitious project, putting the AUR into Git.  That step is actually really simple.  The whole AUR (comments and everything) is much smaller than you probably would guess*.  Very little challenge in building the repo.

The impossible part is the users and permissions.  There are too many users.  Two thousand package submitters, 6 thousand people who have only commented and another 12 thousand who are silent (voting and notifications).  That is a lot of people, and public key systems fall flat down.  Don't even get me started on permissions for package ownership.  The author of Gitolite has been tremendously helpful sorting that out.  I am slowly writing a customized server to try to get around these problems, but this is way outside of my crank-out-an-app-now comfort zone.

* Email me if you'd like a dump of the AUR.

Offline

#15 2010-02-19 03:21:58

extofme
Member
From: here + now
Registered: 2009-10-10
Posts: 174
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

@keenerd, well thats nice to be confirmed; i expected the AUR and friends probably will not have a tremendous amount of actual data, and most people will be willing to have the entire thing.

im not sure about the users thing, i haven't got into it very deep yet.  i expect that there would be a dedicated repo for housing the auth lists/branches.  when a new user is added/changes group a special "official auth user" will update the branch/tree (not sure what's faster, packed refs might allow each user to have a dedicated tag/ref object), and sign the commit with their signature.  object need only be verified on xfer; if someone commits garbage, it will be rejected by peers.

peers probably dont need the full auth list, but it wouldnt be very big and i think it would simplify things.  each peer could authenticate any commit (might be worth signing each commit).  in git.git the maintainer stores his GPG sig in a tagged blob, something like that would be nice if packed-refs works nicely for 20,000 refs, and it would be fast to search/lookup... will need more research

my current project is a precursor to this one; i am working with multiprocess/threading, ssh channels + protocol buffers... once i get that working reasonably well, i will apply it to this project.

Last edited by extofme (2010-02-19 16:37:14)


what am i but an extension of you?

Offline

#16 2010-02-19 21:26:08

extofme
Member
From: here + now
Registered: 2009-10-10
Posts: 174
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

forums could be closely linked to the state of my machine (via a "dpackage manager") and the ability to broadcast to peers.  i could make posts like this:

http://bbs.archlinux.org/viewtopic.php? … 26#p712526

much more integrated.  the CODE block could be expanded via an unlimited number of plugins like CODE=(patch|package|branch|bug).  you could expose other objects like RSS feeds or media.  desktop clients could be further integrated; i could "project" my system onto the forums, and embed various operations into the (multi)post.  someone viewing the post could simply click a link to test a patch, or update their package tree and try a package variant via an available binary or building/creating/sharing one.  we could test packages and make changes p2p, then directly integrate it back into the official trees when its ready.

upstream, or anything "Out Of Band", can be integrated with feeds or automated posting.  automated posting to upstream bug systems with clear information, after being "developed" in our forum/bug tracker via DSCM subsystem.

imagine github, but its running on your localhost and linking you to others' machines.  think of everything forum/* as being a work in progress development tree.

MORE DPACKAGE MANAGER IDEAS

i think the way of versions has served well.  alas, a nice first step transition is maybe like this:

PACK[1]|----[v.01]----------[v.02]-------[v.03]--|HEAD

PACK[2]|---------[v.01]--[v.02]----------[v.03]--|HEAD

PACK[3]|-----[v.01]----[v.02]--[v.03]------------|HEAD

PACK[4]|---[v.01]---[v.02]----------------[v.03]-|HEAD

as package/source development progresses and an API/breaking change is needed, the parent package can record the SHA for each of it's affected dependencies.  if the dependency breaks, record the final commit in the dependency package that is acceptable for use (blocks/stalls upgrade if none available for end-user).  if the dependency does not break, record the last known commit that can support this change (this is a dependency marker, earlier revisions of this dep package would not be compatible with the now current revision in the parent package.  this could be a SHA in the deps's past...).  we can use this information to autopull deps from the p2p web, or rebuild source packages.

upstart could help out here; i'd like to see some kind of (dbus) interface so the package manager could be wired to any distribution.  upstart could intelligently handle system state and respond to events triggered by the package manager.


what am i but an extension of you?

Offline

#17 2010-02-19 23:20:10

bruenig
Member
Registered: 2007-05-20
Posts: 175

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

To be honest, this looks like a brand new distro. I don't know why you don't fork arch and go that route. I would be interested in playing around with it, but to suggest arch is going to change in such a fundamental way is fairly unlikely.

Offline

#18 2010-02-20 09:28:51

Stythys
Member
From: SF Bay Area
Registered: 2008-05-18
Posts: 878
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

agreed. the arch devs are never going to bother with any of this tongue. make your own and see if it catches on.


[home page] -- [code / configs]

"Once you go Arch, you must remain there for life or else Allan will track you down and break you."
-- Bregol

Offline

#19 2010-02-20 09:34:27

jwwolf
Member
Registered: 2009-06-29
Posts: 74

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

Wat?

Offline

#20 2010-02-20 23:47:29

extofme
Member
From: here + now
Registered: 2009-10-10
Posts: 174
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

meh...

a distro framework.

Last edited by extofme (2010-02-21 01:08:47)


what am i but an extension of you?

Offline

#21 2010-09-10 12:15:58

Dieter@be
Forum Fellow
From: Belgium
Registered: 2006-11-05
Posts: 2,001
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

Interesting ideas, I think i like them.
but trying to store too many big files (ie package files) inside a VCS seems like a bad idea.  space requirements will be much bigger then what we have now, unless you make the VCS "forget" about older versions or something...


< Daenyth> and he works prolifically
4 8 15 16 23 42

Offline

#22 2010-09-10 12:44:28

Anntoin
Member
Registered: 2009-08-10
Posts: 42
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

Some interesting ideas. But you need to start small first and make a proof of concept before you try and tackle everything. A detailed plan of how packages are handled and a basic implementation would be a start for example (still not a small job though), then testing the behaviours that you are interested with that framework. You have an idea where you want to go with this but you will need to focus on a few core features and show their benefit before anyone will consider this.

Offline

#23 2010-09-10 13:32:03

drcouzelis
Member
From: Connecticut, USA
Registered: 2009-11-09
Posts: 4,092
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

For anyone as confused as me, "DSCM" appears to mean "distributed source code management", or, as defined on the Wikipedia, distributed revision control.

Offline

#24 2010-09-10 16:03:53

stefanwilkens
Member
From: Enschede, the Netherlands
Registered: 2008-12-10
Posts: 624

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

Dieter@be wrote:

Interesting ideas, I think i like them.
but trying to store too many big files (ie package files) inside a VCS seems like a bad idea.  space requirements will be much bigger then what we have now, unless you make the VCS "forget" about older versions or something...

mostly this.

the base of what you're proposing is a tremendous amount of data that would never be touched after a new version is released, how do your suggestions fit the rolling release model. Especially relatively large packages updated with high frequency (nvidia binary drivers, for instance) could cause the space requirement to increase rapidly unless moderated.


Arch i686 on Phenom X4 | GTX760

Offline

#25 2010-09-11 01:00:37

extofme
Member
From: here + now
Registered: 2009-10-10
Posts: 174
Website

Re: Distrib -e "Arch(org|code|pkgs|aur|forum|wiki|bugs|.*)?" -- thoughts

Anntoin wrote:

Some interesting ideas. But you need to start small first and make a proof of concept before you try and tackle everything. A detailed plan of how packages are handled and a basic implementation would be a start for example (still not a small job though), then testing the behaviours that you are interested with that framework. You have an idea where you want to go with this but you will need to focus on a few core features and show their benefit before anyone will consider this.

ah yes of course.  the first step is getting a distributed index, and a "package" format (packages become fuzzy items, below/above); this will be realized in the form of:

"AUR3 [aur-pyjs] implementation in python (pyjs) + JSON-RPC"
https://bbs.archlinux.org/viewtopic.php?pid=823972

i have a package in the AUR for that [aur-pyjs], but it's old as i haven't been able to update in awhile, and won't be until i secure a development job in my new city (next week hopefully).  aur-pyjs will be built on top of the concepts i have outlined in this thread, and will in time become prototype.  check it out; pretty neat even though it can't do much yet :-).  soon though, i will update the package, and it will then be able to run as a native python desktop app (pyjamas allows the same code to run as a website or a desktop app).  at that point, it will be trivial to implement connectivity to the old AUR/repos, and we will in effect have a pacman+aur replacement.  from there i will tackle bugs+forum, of which there are already several implementations on top of DSCM sub-systems to research and learn from.

stefanwilkens wrote:
Dieter@be wrote:

Interesting ideas, I think i like them.
but trying to store too many big files (ie package files) inside a VCS seems like a bad idea.  space requirements will be much bigger then what we have now, unless you make the VCS "forget" about older versions or something...

mostly this.

the base of what you're proposing is a tremendous amount of data that would never be touched after a new version is released, how do your suggestions fit the rolling release model. Especially relatively large packages updated with high frequency (nvidia binary drivers, for instance) could cause the space requirement to increase rapidly unless moderated.

packages are not stored in the DSCM, their contents are.  the package itself is simply a top-level tree object in git, linking to all other trees and blobs comprising the package state, and a reference to said tree.  this means everything and anything that is common between _any_ package and _any_ version will be reused; if ten unrelated packages reference the same file, only one copy will ever exist; blobs are the same.  however, some packages may indeed create gigantic, singular blob type objects that always change, and this will be addressed (next...).

git compresses the individual objects itself, in gz format; this could be changed to use the xz format, or anything else.  it also generates pack files full of differentiated objects, also compressed. it would not always be necessary to have the full history of a package (if you look somewhere above, i breifly touch this point with various "kinds" of packages, some capable of source rebuild, some capable of becoming any past source/binary version, some a single version/binary only, etc.).  you would not have to retain all versions of "packages" if you did not want, but you could retrieve them at anytime so long as their components existed somewhere on the network.  servers could be set up to provide all packs, all version, effectively and automatically performing the intended duty of the "arch rollback machine".  the exact mechanism is not defined yet, but it will likely involve some sort of SHA routing protocol, to resolve missing chunks.

git's data model is stupid simple; structures can be created to represent a package, it's history, it's bugs/status, and it's information (wiki/etc.), in an independent way so they do not depend on each other, but still relate to each other, and possess knowledge of how to "complete" and find each other.  it will not be structured in the typical way git is used now.  unfortunately this is very low level git stuff, and difficult to explain properly, so i won't go there; just know that ultimately the system will only pull the objects you need to fulfill the directive you gave it, and there will be rules to control your object cache.  your object cache can then be used to fulfill the requests of others; ie. P2P.

since git itself is in a rather poor state when it comes to bindings, i will be using the pure python git library, dulwich, instead.  while in time this could be changed to use proper bindings, or some bits written as C modules, it's possible pypy will make all that unnecessary.  i don't need anything git core offers except its data structures and concepts; although, i intend to make the entire system (adding bugs/updating packages/editing wiki/editing forum/etc.) _completely_ 100% accessible from a basic git client.  for example, you could write a post in the forum by "committing" to a special branch; you could search the entire wiki, and its history from the terminal while installing; you could add a bug, and link a patch to it, directly usable and buildable by others for testing; this could all be done offline, and pushed once a connection was available... this will lead to all sorts of interesting paths...

in one super run-on sentence:

i intend to construct a "social", 100% distributed distribution platform, where everyone is a [potentially] contributing node and has [nearly] full access to all informations, each node's contributions are cryptographically verifiable, each node may easily participate in testing/discussion or lend computing resources, each node may republish variations of any object or collection under their own signature, each node may "track" or "follow" any number of signatures with configurable aggressiveness (no such thing as "official repos"; your personal "repo" is the unique overlay of other nodes you trust, and by proxy some nodes they trust; "official repos" degrade into an Arch signature, a Debian signature, Fedora, etc.), and finally, do all of this is a way that is agnostic to the customized distribution (or other package managers) above it, or it's goals, and eventually spread to other distros, thus creating a monstrous pool of shared bandwidth, space, ideas, and workload, whilst at the same time converging user/developer/tester/vendor/packager/contributor/etc. toward: person.

piece of cake smile

C Anthony

Last edited by extofme (2010-09-11 05:23:46)


what am i but an extension of you?

Offline

Board footer

Powered by FluxBB