You are not logged in.
Two years ago, we introduced pkgstats.
This time, after a major revamp, we are re-introducing it to you!
Contributing is as easy as installing the package - a weekly cron job
will take care of the rest. You will be sending us a list of packages
installed on your system, along with the architecture and mirror you
use. This information is anonymous and cannot be used to identify you,
but it will help us prioritize our efforts and make Arch even better.
So, go ahead and spread the word!
For more details see pkgstats -h or just read the simple source code.
You can view the collected data at the Statistics page.
Note: If you had setup a cron job for pkgstats before please remove it and also don't create a new one.
Offline
Some more details:
Data that are sent by pkgstats:
* list of installed packages without version numbers
* the architecture
* the mirror used (without any username/password scheme)
* the version of pkgstats in use
Data that are saved on the server:
* sha1 hash of the IP
* number of packages submitted
* package occurrence is counted but not connected to the IP hash
* time of submission
* Country of the sender IP (determined by geoIP)
Submissions are limited to 10 by IP and day. (note: we only save the
hash of an ip though)
For completeness here is the source code:
https://github.com/archlinux-de/pkgstats-cli
https://github.com/archlinux-de/pkgstats.archlinux.de
Offline
nice, but you should consider adding an random sleep to the cron as you webserver might get hammered when a whole timezone of installations comes to you door
Offline
I'll see how many submission we get at the same time. If needed I could add something like "sleep $(($RANDOM % 120))" to the cron job, where 120 is the number we need to figure out first.
Offline
I agree, gathering statistics is a good idea. However, I don't quite understand the necessity of a weekly cron job. Say, on my workstation I update only when a new kernel comes out. Aren't I going to send you the same packages list multiple times? Or you have measures against it? And even if I update everyday, it is not likely that my package list will change (afair in pkgstats there is pacman -Qq, right?)...
Arch Linux is more than just GNU/Linux -- it's an adventure
pkill -9 systemd
Offline
I agree, gathering statistics is a good idea. However, I don't quite understand the necessity of a weekly cron job. Say, on my workstation I update only when a new kernel comes out. Aren't I going to send you the same packages list multiple times? Or you have measures against it? And even if I update everyday, it is not likely that my package list will change (afair in pkgstats there is pacman -Qq, right?)...
I assume the IP hash will help to prevent this. My guess is that if the hashed IP address of a submission matches a record already in the database, it will update that record instead of creating a new one. I'm curious to know how it will cope with dynamic IP addresses, though.
Last edited by bcat (2010-09-24 19:11:27)
Running Arch on a Dell Studio 1735. xmonad FTW! Dotfiles here.
Want free cloud-based file sharing? Sign up for Dropbox and we both get some bonus storage!
Offline
Right, plus IPs behind proxies. However, I remember there was a related discussion at arch-dev-public mailing list (I forgot what was the outcome).
My concern was more related to the server load. Isn't it more efficient to store the list of installled packages locally, and submit only deltas when they appear (i.e. package add/remove)?
Arch Linux is more than just GNU/Linux -- it's an adventure
pkill -9 systemd
Offline
Maybe I'll write something stupid but... Instead of the IP hash, why not use a UUID (generated by genuuid or similar) and store it locally on the user machine ?
The only drawback I can see with this method is when the user make a full reinstall of his box but I think it is better than identifying a user by his IP address....
Last edited by GCN (2010-09-24 20:38:39)
Offline
There is simply no sane way to prevent users from spamming us (while protecting your privacy). Of course the first set of data wont be usable. But over time hopefully only fair users will be left.
With pkgstats 2 I am trying to get some time sensitive data; to measure some trends. That's why the weekly cron job was added. We need data on a regular base.
Offline
Good work, thank you!
I think it would be polite to inform the user during the installation about the new cron job.
Offline
Good work, thank you!
I think it would be polite to inform the user during the installation about the new cron job.
Good point. Will add that one.
Offline
1470 of 1471 people use grep - this one person must really dislike it :-)
Offline
In all the pc (notebook/netbook), I have almost similar set-up and applications. Will it be helpful for the collection of statistics?
Offline
Just install pkgstats on all your computers and then forget about it. :-)
Offline
Why is this not hosted on archlinux.org but archlinux.de ?
Is this officialy supported ? I guess so given the message on arch-annouce ML.
So why not migrate this to archlinux.org ?
Offline
Mainly because the Arch developer who implemented it also is in charge of archlinux.de. It is official.
Offline
It's on archlinux.de for historical reasons. But I'll definitely move it to .org. I'll send a mail to arch-dev-public about that soon.
Offline
Does this also collect installed packages from AUR? (pacman -Qm) Or is AUR just handled by the votes?
Offline
It does not know about the AUR or repos. pkgstats just sends a list of all installed packages.
Offline
Many people have non-unique IP's, for example, ADSL or Wifi with DHCP, the use of routers.
So it may be needed to identify the system by some means.
For example, generate a unique ID on the filesystem?
Offline
And how do I make sure that you don't send me a made up uuid? This has been discussed to death. Just quoting myself: https://bbs.archlinux.org/viewtopic.php … 40#p830740
Offline
Installed:-)
now to the point of IPs. I do not have a static IP with my ISP here, but somehow I always get the same. I used to live in Vienna and have ADSL there. Every time I connected I would get a new IP. I know there are lots of people with ADSL around... I also have a laptop which I use at school/home/grandma's/parent's/friend's/whatever places. I would submit the package list from very many IPs I can imagine. Maybe this can be fixed with IPv6 sometime in the future.
I am not sure if a partition's UUID can be unique even among thousands of harddrives or if it can be dangerous to send out a UUID to the world, but what about using the root partitions UUID for unique install identification. I am sure most users have more than one partition, maybe a hash of all partition UUIDs would be helpful.
The may be other data in hardware that is visible to the system and unique to this system at the same time (like HDD serial numbers...). Maybe this data could be directly or rather in some sort of combination and in a hashed form used to identify a particular system without sending any dangerous information away. Some might argue that, e.g., upgrading hw will change the identification, but I wouldn't care too much about this, because hw upgrades are not as usual as IP address changes and buying new iron might also make people use new packages (like switching to x86_64 for example).
Might this be somehow useful?
Offline
Not at all.
See:
1) I don't want track individual users for privacy reasons.
2) Everything that is sent by pkgstats can easily manipulated by the user without us noticing. So any idea based on sending data is flawed.
3) The IP hash is only used to prevent too easy flooding; not to track users or make the stats any more accurate.
4) There is no way to get exact values, but over time if more and more people use pkgstats some single variations (e.g. when someone sends garbage) wont matter.
Offline
ok I got it, so no problem then!
btw are you going to show us something that came out of this in the future?
Offline