You are not logged in.

#1 2010-09-24 07:18:06

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

pkgstats round two: take your vote and help improving Arch

Two years ago, we introduced pkgstats.
This time, after a major revamp, we are re-introducing it to you!

Contributing is as easy as installing the package - a weekly cron job
will take care of the rest. You will be sending us a list of packages
installed on your system, along with the architecture and mirror you
use. This information is anonymous and cannot be used to identify you,
but it will help us prioritize our efforts and make Arch even better.
So, go ahead and spread the word!

For more details see pkgstats -h or just read the simple source code.
You can view the collected data at the Statistics page.

Note: If you had  setup a cron job for pkgstats before please remove it and also don't create a new one.

Offline

#2 2010-09-24 07:19:12

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: pkgstats round two: take your vote and help improving Arch

Some more details:

Data that are sent by pkgstats:
* list of installed packages without version numbers
* the architecture
* the mirror used (without any username/password scheme)
* the version of pkgstats in use

Data that are saved on the server:
* sha1 hash of the IP
* number of packages submitted
* package occurrence is counted but not connected to the IP hash
* time of submission
* Country of the sender IP (determined by geoIP)

Submissions are limited to 10 by IP and day. (note: we only save the
hash of an ip though)

For completeness here is the source code:
https://github.com/archlinux-de/pkgstats-cli
https://github.com/archlinux-de/pkgstats.archlinux.de

Offline

#3 2010-09-24 08:26:52

Finkregh
Member
From: Germany/Hannover
Registered: 2007-12-04
Posts: 44

Re: pkgstats round two: take your vote and help improving Arch

nice, but you should consider adding an random sleep to the cron as you webserver might get hammered when a whole timezone of installations comes to you door wink

Offline

#4 2010-09-24 12:54:19

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: pkgstats round two: take your vote and help improving Arch

I'll see how many submission we get at the same time. If needed I could add something like "sleep $(($RANDOM % 120))" to the cron job, where 120 is the number we need to figure out first.

Offline

#5 2010-09-24 17:12:05

Leonid.I
Member
From: Aethyr
Registered: 2009-03-22
Posts: 999

Re: pkgstats round two: take your vote and help improving Arch

I agree, gathering statistics is a good idea. However, I don't quite understand the necessity of a weekly cron job. Say, on my workstation I update only when a new kernel comes out. Aren't I going to send you the same packages list multiple times? Or you have measures against it? And even if I update everyday, it is not likely that my package list will change (afair in pkgstats there is pacman -Qq, right?)...


Arch Linux is more than just GNU/Linux -- it's an adventure
pkill -9 systemd

Offline

#6 2010-09-24 19:10:52

bcat
Member
From: New York, NY, USA
Registered: 2009-01-02
Posts: 30
Website

Re: pkgstats round two: take your vote and help improving Arch

Leonid.I wrote:

I agree, gathering statistics is a good idea. However, I don't quite understand the necessity of a weekly cron job. Say, on my workstation I update only when a new kernel comes out. Aren't I going to send you the same packages list multiple times? Or you have measures against it? And even if I update everyday, it is not likely that my package list will change (afair in pkgstats there is pacman -Qq, right?)...

I assume the IP hash will help to prevent this. My guess is that if the hashed IP address of a submission matches a record already in the database, it will update that record instead of creating a new one. I'm curious to know how it will cope with dynamic IP addresses, though.

Last edited by bcat (2010-09-24 19:11:27)


Running Arch on a Dell Studio 1735. xmonad FTW! Dotfiles here.
Want free cloud-based file sharing? Sign up for Dropbox and we both get some bonus storage!

Offline

#7 2010-09-24 20:18:16

Leonid.I
Member
From: Aethyr
Registered: 2009-03-22
Posts: 999

Re: pkgstats round two: take your vote and help improving Arch

Right, plus IPs behind proxies. However, I remember there was a related discussion at arch-dev-public mailing list (I forgot what was the outcome).

My concern was more related to the server load. Isn't it more efficient to store the list of installled packages locally, and submit only deltas when they appear (i.e. package add/remove)?


Arch Linux is more than just GNU/Linux -- it's an adventure
pkill -9 systemd

Offline

#8 2010-09-24 20:38:07

GCN
Member
From: France
Registered: 2005-06-29
Posts: 14
Website

Re: pkgstats round two: take your vote and help improving Arch

Maybe I'll write something stupid but... Instead of the IP hash, why not use a UUID (generated by genuuid or similar) and store it locally on the user machine ?

The only drawback I can see with this method is when the user make a full reinstall of his box but I think it is better than identifying a user by his IP address....

Last edited by GCN (2010-09-24 20:38:39)

Offline

#9 2010-09-24 20:39:21

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: pkgstats round two: take your vote and help improving Arch

There is simply no sane way to prevent users from spamming us (while protecting your privacy). Of course the first set of data wont be usable. But over time hopefully only fair users will be left.

With pkgstats 2 I am trying to get some time sensitive data; to measure some trends. That's why the weekly cron job was added. We need data on a regular base.

Offline

#10 2010-09-24 22:15:05

senjin
Member
Registered: 2006-09-15
Posts: 181
Website

Re: pkgstats round two: take your vote and help improving Arch

Good work, thank you!

I think it would be polite to inform the user during the installation about the new cron job.

Offline

#11 2010-09-24 22:36:13

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: pkgstats round two: take your vote and help improving Arch

senjin wrote:

Good work, thank you!

I think it would be polite to inform the user during the installation about the new cron job.

Good point. Will add that one.

Offline

#12 2010-09-25 03:53:42

ppawel
Member
Registered: 2010-09-25
Posts: 2

Re: pkgstats round two: take your vote and help improving Arch

1470 of 1471 people use grep - this one person must really dislike it :-)

Offline

#13 2010-09-25 05:29:12

kgas
Member
From: Qatar
Registered: 2008-11-08
Posts: 718

Re: pkgstats round two: take your vote and help improving Arch

In all  the pc (notebook/netbook), I have almost similar set-up and applications. Will it be helpful for the collection of statistics?

Offline

#14 2010-09-25 06:09:55

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: pkgstats round two: take your vote and help improving Arch

Just install pkgstats on all your computers and then forget about it. :-)

Offline

#15 2010-09-25 06:43:33

sicpsnake
Member
From: Austin, TX.
Registered: 2010-02-25
Posts: 128
Website

Re: pkgstats round two: take your vote and help improving Arch

Pierre wrote:

Just install pkgstats on all your computers and then forget about it. :-)

Done. smile

Offline

#16 2010-09-25 08:45:48

solstice
Member
Registered: 2006-10-27
Posts: 235
Website

Re: pkgstats round two: take your vote and help improving Arch

Why is this not hosted on archlinux.org but archlinux.de ?
Is this officialy supported ? I guess so given the message on arch-annouce ML.
So why not migrate this to archlinux.org ?

Offline

#17 2010-09-25 08:56:33

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,365
Website

Re: pkgstats round two: take your vote and help improving Arch

Mainly because the Arch developer who implemented it also is in charge of archlinux.de.   It is official.

Offline

#18 2010-09-25 09:45:31

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: pkgstats round two: take your vote and help improving Arch

It's on archlinux.de for historical reasons. But I'll definitely move it to .org. I'll send a mail to arch-dev-public about that soon.

Offline

#19 2010-09-25 09:52:07

Phlogiston
Member
Registered: 2009-02-14
Posts: 39

Re: pkgstats round two: take your vote and help improving Arch

Does this also collect installed packages from AUR? (pacman -Qm) Or is AUR just handled by the votes?

Offline

#20 2010-09-25 09:53:13

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: pkgstats round two: take your vote and help improving Arch

It does not know about the AUR or repos. pkgstats just sends a list of all installed packages.

Offline

#21 2010-09-25 10:20:14

oldherl
Member
Registered: 2008-08-31
Posts: 23

Re: pkgstats round two: take your vote and help improving Arch

Many people have non-unique IP's, for example, ADSL or Wifi with DHCP, the use of routers.
So it may be needed to identify the system by some means.

For example, generate a unique ID on the filesystem?

Offline

#22 2010-09-25 10:33:53

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: pkgstats round two: take your vote and help improving Arch

And how do I make sure that you don't send me a made up uuid? This has been discussed to death. Just quoting myself: https://bbs.archlinux.org/viewtopic.php … 40#p830740

Offline

#23 2010-09-25 12:04:32

jarda-wien
Member
Registered: 2008-03-13
Posts: 104

Re: pkgstats round two: take your vote and help improving Arch

Installed:-)

now to the point of IPs. I do not have a static IP with my ISP here, but somehow I always get the same. I used to live in Vienna and have ADSL there. Every time I connected I would get a new IP. I know there are lots of people with ADSL around... I also have a laptop which I use at school/home/grandma's/parent's/friend's/whatever places. I would submit the package list from very many IPs I can imagine. Maybe this can be fixed with IPv6 sometime in the future.

I am not sure if a partition's UUID can be unique even among thousands of harddrives or if it can be dangerous to send out a UUID to the world, but what about using the root partitions UUID for unique install identification. I am sure most users have more than one partition, maybe a hash of all partition UUIDs would be helpful.

The may be other data in hardware that is visible to the system and unique to this system at the same time (like HDD serial numbers...). Maybe this data could be directly or rather in some sort of combination and in a hashed form used to identify a particular system without sending any dangerous information away. Some might argue that, e.g., upgrading hw will change the identification, but I wouldn't care too much about this, because hw upgrades are not as usual as IP address changes and buying new iron might also make people use new packages (like switching to x86_64 for example).

Might this be somehow useful?

Offline

#24 2010-09-25 12:22:50

Pierre
Developer
From: Bonn
Registered: 2004-07-05
Posts: 1,964
Website

Re: pkgstats round two: take your vote and help improving Arch

Not at all.

See:
1) I don't want track individual users for privacy reasons.
2) Everything that is sent by pkgstats can easily manipulated by the user without us noticing. So any idea based on sending data is flawed.
3) The IP hash is only used to prevent too easy flooding; not to track users or make the stats any more accurate.
4) There is no way to get exact values, but over time if more and more people use pkgstats some single variations (e.g. when someone sends garbage) wont matter.

Offline

#25 2010-09-25 13:44:21

jarda-wien
Member
Registered: 2008-03-13
Posts: 104

Re: pkgstats round two: take your vote and help improving Arch

ok I got it, so no problem then!

btw are you going to show us something that came out of this in the future?

Offline

Board footer

Powered by FluxBB