You are not logged in.

#1 2025-03-22 17:28:11

andreagg
Member
Registered: 2025-03-22
Posts: 4

python-magic doesn't seem to work reliably with zip files

Hi all.
Since python-magic is a wrapper for libmagic, I think this should be the right place to discuss on this problem.
After creating a zip file, running in python

magic.from_buffer()

on that file returns 'application/octet-stream' instead of 'application/zip'.
I can replicate the problem in a Distrobox Arch Linux container.
Both the host and the container are up to date.

$ ls
$ echo testtext > test.txt
$ zip test test.txt 
  adding: test.txt (stored 0%)
$ ls
test.txt  test.zip
$ mv test.zip test
$ file test
test: Zip archive data, made by v3.0 UNIX, extract using at least v1.0, last modified, last modified Sun, Mar 22 2025 17:43:44, uncompressed size 9, method=store
$ python
Python 3.13.2 (main, Feb  5 2025, 08:05:21) [GCC 14.2.1 20250128] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> magic.from_file('test', mime=True)
'application/zip'
>>> with open('test', 'rb') as f: file = f.read()
... 
>>> magic.from_buffer(file, mime=True)
'application/octet-stream'

I tried replicating on a fresh Ubuntu server installation, but magic always reported 'application/zip', so I thought the problem might be Arch Linux.

Do you have the same problem?
Thank you




Fast copy-and-paste script to replicate (before running, install python-magic):

echo testtext > test.txt
zip test test.txt
mv test.zip test
file test
python << 'EOF'
import magic
print("Reading from file:", magic.from_file('test', mime=True))
with open('test', 'rb') as f: file = f.read()
print("Reading from buffer", magic.from_buffer(file, mime=True))
EOF

Offline

#2 2025-03-22 17:32:49

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 25,156

Re: python-magic doesn't seem to work reliably with zip files

From what I gather from a quick googler this is normal and expected since you're not reading enough of the magic buffer? https://github.com/ahupp/python-magic/issues/185

hmm but read() should get you the entire buffer. have you checked e.g. version differences between ubuntu and Arch?

Last edited by V1del (2025-03-22 17:40:44)

Offline

#3 2025-03-22 17:39:53

andreagg
Member
Registered: 2025-03-22
Posts: 4

Re: python-magic doesn't seem to work reliably with zip files

V1del wrote:

From what I gather from a quick googler this is normal and expected since you're not reading enough of the magic buffer? https://github.com/ahupp/python-magic/issues/185

Well, the zip file signature, as I understood from reading Wikipedia, is at the beginning of the file, so that shouldn't be the problem.

Also, on Ubuntu, magic always returns the zip mime, so if you're right, I'd expect a different result there too...

Last edited by andreagg (2025-03-22 17:41:41)

Offline

#4 2025-03-22 17:43:55

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 25,156

Re: python-magic doesn't seem to work reliably with zip files

Could be a change in the backing libmagic, in particular Arch ships a zip specific patch: https://gitlab.archlinux.org/archlinux/ … 1c8dbf8ff1 but that should precisely help with zip misdetection

Offline

#5 2025-03-22 17:48:41

andreagg
Member
Registered: 2025-03-22
Posts: 4

Re: python-magic doesn't seem to work reliably with zip files

V1del wrote:

hmm but read() should get you the entire buffer. have you checked e.g. version differences between ubuntu and Arch?

Both on Arch Linux and Ubuntu Server:

$ pip show python-magic
Name: python-magic
Version: 0.4.27
Summary: File type identification using libmagic
Home-page: http://github.com/ahupp/python-magic
Author: Adam Hupp
Author-email: adam@hupp.org
License: MIT
Location: /usr/lib/python3.13/site-packages
Requires: 
Required-by:

Offline

#6 2025-03-22 18:00:03

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 25,156

Re: python-magic doesn't seem to work reliably with zip files

Yeah, but the "file" package? It's what is ultimately used under the hood.

Offline

#7 2025-03-22 18:04:23

andreagg
Member
Registered: 2025-03-22
Posts: 4

Re: python-magic doesn't seem to work reliably with zip files

V1del wrote:

Yeah, but the "file" package? It's what is ultimately used under the hood.

You're right, sorry
Here it is:

$ pacman -Qi file        
Name            : file
Version         : 5.46-3
Description     : File type identification utility
Architecture    : x86_64
URL             : https://www.darwinsys.com/file/
Licenses        : custom
Groups          : None
Provides        : libmagic.so=1-64
Depends On      : glibc  zlib  xz  bzip2  libseccomp  libseccomp.so=2-64  zstd
                  libzstd.so=1-64
Optional Deps   : None
Required By     : base  base-devel  nano  python-magic  util-linux  xdg-utils
Optional For    : None
Conflicts With  : None
Replaces        : None
Installed Size  : 10,09 MiB
Packager        : Christian Hesse <eworm@archlinux.org>
Build Date      : sat 4 gen 2025, 23:16:45
Install Date    : sat 11 gen 2025, 10:31:20
Install Reason  : Installed as a dependency for another package
Install Script  : No
Validated By    : Signature

Don't know if it is needed, but posting "file" package version for Ubuntu too:

$ apt show file
Package: file
Version: 1:5.45-3build1
Priority: standard
Section: utils
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Christoph Biedl <debian.axhn@manchmal.in-ulm.de>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 64.5 kB
Depends: libc6 (>= 2.38), libmagic1t64 (= 1:5.45-3build1)
Breaks: debhelper (<< 12.2~)
Homepage: https://www.darwinsys.com/file/
Task: standard, ubuntu-wsl
Download-Size: 22.0 kB
APT-Manual-Installed: no
APT-Sources: http://it.archive.ubuntu.com/ubuntu noble/main amd64 Packages
Description: Recognize the type of data in a file using "magic" numbers
 The file command is "a file type guesser", a command-line tool that
 tells you in words what kind of data a file contains.

Last edited by andreagg (2025-03-22 18:18:27)

Offline

#8 2025-05-14 09:49:27

railla
Member
Registered: 2025-05-14
Posts: 2

Re: python-magic doesn't seem to work reliably with zip files

The patch mentioned above doesn't appear to fix the issue, which is still reproducible with `file` 5.46-4.
Unless there was a regression, patch 5.46-3 was fixing something else.

Also, curiously enough, while `from_buffer` doesn't work correctly, `from_file` returns `application/zip` for the same file.

Offline

#9 2025-05-14 10:47:38

railla
Member
Registered: 2025-05-14
Posts: 2

Re: python-magic doesn't seem to work reliably with zip files

I've fished out older versions of `file` from pacman cache, and turns out the issue happened somewhere between 5.46-4 and 5.45-1:

$ python -c "import magic; print(magic.from_buffer(open('/tmp/tmprotnj7zu/theme.zip', 'rb').read(), mime=True))" 
application/octet-stream
$ pacman -Qi file | grep Version
Version         : 5.46-4
$ python -c "import magic; print(magic.from_buffer(open('/tmp/tmprotnj7zu/theme.zip', 'rb').read(), mime=True))" 
application/zip
$ pacman -Qi file | grep Version
Version         : 5.45-1

Offline

Board footer

Powered by FluxBB