You are not logged in.

#1 2016-06-02 08:43:28

mDuo13
Member
Registered: 2010-04-25
Posts: 93

nltk-data upgrade is enormous? [answered]

I have nltk-data 3.0a3-1 installed, although admittedly I haven't used it much. When I went to upgrade today, I saw that the size difference between the old package and the new one (3.2.1-1) is enormous. The old package is already big, at an installed size of 1.7GB (to be expected, it's a bunch of corpora of natural language text) but the new package is a whopping 7.8GB installed size.

I ignored the upgrade for now, but I had to wonder... is such a huge increase in size expected, or might it be a bug? I don't see anyone talking about it, but I would think any time a package grows in size by over 6GB it would at least warrant a mention. Does the new version install the data uncompressed instead of leaving it compressed? Maybe there's just 4x as much data now?

Anyway, it's not a big problem, but it was surprising to me that nobody else seems to have taken notice. Seems Alexander Rødseth is the package maintainer, so maybe I should get in contact with him, but I'm not really sure how. Maybe this post will do the trick?

Last edited by mDuo13 (2016-06-21 21:41:54)

Offline

#2 2016-06-02 09:46:12

a821
Member
Registered: 2012-10-31
Posts: 381

Re: nltk-data upgrade is enormous? [answered]

I don't know about nltk-data, but the diff between versions 3.0a3 y 3.2.1 in the PKGBUILD [1] is just a version bump, so it seems that the package it's just big. You're welcome to trim it down of course...

[1] https://git.archlinux.org/svntogit/comm … caaabe2fcb

Edit: Maybe the "NoExtract" option of pacman.conf might help...

Last edited by a821 (2016-06-02 12:48:40)

Offline

#3 2016-06-08 09:07:57

bradst
Member
Registered: 2016-06-08
Posts: 1

Re: nltk-data upgrade is enormous? [answered]

The latest version of nltk includes support for the PanLex Lite database, which is huge:

$ ls -lh /usr/share/nltk_data/corpora/panlex_lite/db.sqlite
-rw-r--r-- 1 root root 5.1G May 13 2016 23:00 /usr/share/nltk_data/corpora/panlex_lite/db.sqlite

Offline

Board footer

Powered by FluxBB