You are not logged in.
Hi all.
Since python-magic is a wrapper for libmagic, I think this should be the right place to discuss on this problem.
After creating a zip file, running in python
magic.from_buffer() on that file returns 'application/octet-stream' instead of 'application/zip'.
I can replicate the problem in a Distrobox Arch Linux container.
Both the host and the container are up to date.
$ ls
$ echo testtext > test.txt
$ zip test test.txt
adding: test.txt (stored 0%)
$ ls
test.txt test.zip
$ mv test.zip test
$ file test
test: Zip archive data, made by v3.0 UNIX, extract using at least v1.0, last modified, last modified Sun, Mar 22 2025 17:43:44, uncompressed size 9, method=store
$ python
Python 3.13.2 (main, Feb 5 2025, 08:05:21) [GCC 14.2.1 20250128] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> magic.from_file('test', mime=True)
'application/zip'
>>> with open('test', 'rb') as f: file = f.read()
...
>>> magic.from_buffer(file, mime=True)
'application/octet-stream'I tried replicating on a fresh Ubuntu server installation, but magic always reported 'application/zip', so I thought the problem might be Arch Linux.
Do you have the same problem?
Thank you
Fast copy-and-paste script to replicate (before running, install python-magic):
echo testtext > test.txt
zip test test.txt
mv test.zip test
file test
python << 'EOF'
import magic
print("Reading from file:", magic.from_file('test', mime=True))
with open('test', 'rb') as f: file = f.read()
print("Reading from buffer", magic.from_buffer(file, mime=True))
EOFOffline
From what I gather from a quick googler this is normal and expected since you're not reading enough of the magic buffer? https://github.com/ahupp/python-magic/issues/185
hmm but read() should get you the entire buffer. have you checked e.g. version differences between ubuntu and Arch?
Last edited by V1del (2025-03-22 17:40:44)
Offline
From what I gather from a quick googler this is normal and expected since you're not reading enough of the magic buffer? https://github.com/ahupp/python-magic/issues/185
Well, the zip file signature, as I understood from reading Wikipedia, is at the beginning of the file, so that shouldn't be the problem.
Also, on Ubuntu, magic always returns the zip mime, so if you're right, I'd expect a different result there too...
Last edited by andreagg (2025-03-22 17:41:41)
Offline
Could be a change in the backing libmagic, in particular Arch ships a zip specific patch: https://gitlab.archlinux.org/archlinux/ … 1c8dbf8ff1 but that should precisely help with zip misdetection
Offline
hmm but read() should get you the entire buffer. have you checked e.g. version differences between ubuntu and Arch?
Both on Arch Linux and Ubuntu Server:
$ pip show python-magic
Name: python-magic
Version: 0.4.27
Summary: File type identification using libmagic
Home-page: http://github.com/ahupp/python-magic
Author: Adam Hupp
Author-email: adam@hupp.org
License: MIT
Location: /usr/lib/python3.13/site-packages
Requires:
Required-by:Offline
Yeah, but the "file" package? It's what is ultimately used under the hood.
Offline
Yeah, but the "file" package? It's what is ultimately used under the hood.
You're right, sorry
Here it is:
$ pacman -Qi file
Name : file
Version : 5.46-3
Description : File type identification utility
Architecture : x86_64
URL : https://www.darwinsys.com/file/
Licenses : custom
Groups : None
Provides : libmagic.so=1-64
Depends On : glibc zlib xz bzip2 libseccomp libseccomp.so=2-64 zstd
libzstd.so=1-64
Optional Deps : None
Required By : base base-devel nano python-magic util-linux xdg-utils
Optional For : None
Conflicts With : None
Replaces : None
Installed Size : 10,09 MiB
Packager : Christian Hesse <eworm@archlinux.org>
Build Date : sat 4 gen 2025, 23:16:45
Install Date : sat 11 gen 2025, 10:31:20
Install Reason : Installed as a dependency for another package
Install Script : No
Validated By : SignatureDon't know if it is needed, but posting "file" package version for Ubuntu too:
$ apt show file
Package: file
Version: 1:5.45-3build1
Priority: standard
Section: utils
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Christoph Biedl <debian.axhn@manchmal.in-ulm.de>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 64.5 kB
Depends: libc6 (>= 2.38), libmagic1t64 (= 1:5.45-3build1)
Breaks: debhelper (<< 12.2~)
Homepage: https://www.darwinsys.com/file/
Task: standard, ubuntu-wsl
Download-Size: 22.0 kB
APT-Manual-Installed: no
APT-Sources: http://it.archive.ubuntu.com/ubuntu noble/main amd64 Packages
Description: Recognize the type of data in a file using "magic" numbers
The file command is "a file type guesser", a command-line tool that
tells you in words what kind of data a file contains.Last edited by andreagg (2025-03-22 18:18:27)
Offline
The patch mentioned above doesn't appear to fix the issue, which is still reproducible with `file` 5.46-4.
Unless there was a regression, patch 5.46-3 was fixing something else.
Also, curiously enough, while `from_buffer` doesn't work correctly, `from_file` returns `application/zip` for the same file.
Offline
I've fished out older versions of `file` from pacman cache, and turns out the issue happened somewhere between 5.46-4 and 5.45-1:
$ python -c "import magic; print(magic.from_buffer(open('/tmp/tmprotnj7zu/theme.zip', 'rb').read(), mime=True))"
application/octet-stream
$ pacman -Qi file | grep Version
Version : 5.46-4$ python -c "import magic; print(magic.from_buffer(open('/tmp/tmprotnj7zu/theme.zip', 'rb').read(), mime=True))"
application/zip
$ pacman -Qi file | grep Version
Version : 5.45-1Offline