You are not logged in.

#1 2018-09-12 20:01:43

danielquinn
Member
From: Cambrige, UK
Registered: 2018-09-02
Posts: 13
Website

Python's Pillow returns different values for system-wide Pillow

It's a weird thing, and I have no idea what's causing it.  If I run this code in a virtualenv where I've installed the Pillow module, I get the same hash regardless of Python version or architecture, or even Linux distro, but if I run it on Arch, with the same Pillow version installed, I get a different hash:

import PIL.Image
import hashlib

with PIL.Image.open("test-file.jpg") as im:
    print(hashlib.md5(im.tobytes()).hexdigest())

Could there be something happening behind the scenes in the Arch version of Pillow that's making .tobytes() act differently?  How would I get the expected behaviour?

Both the virtualenv and my Arch system have the same everything:

Python 3.7.0
Pillow 5.2.0

Any illumination I can get on this is appreciated.

Offline

#2 2018-09-12 20:54:11

schard
Forum Moderator
From: Hannover
Registered: 2016-05-06
Posts: 2,374
Website

Re: Python's Pillow returns different values for system-wide Pillow

Are you sure test-file.jpg is the same on both systems?
Could you provide the MD5 sum returned by the md5sum command line tool on both systems?


Inofficial first vice president of the Rust Evangelism Strike Force

Offline

#3 2018-09-14 22:11:37

danielquinn
Member
From: Cambrige, UK
Registered: 2018-09-02
Posts: 13
Website

Re: Python's Pillow returns different values for system-wide Pillow

It's most definitely the same file on both systems.  It's actually the same file on the same system!  Here are the complete steps to reproduce.

# Install Pillow system-wide
sudo pacman -S python-pillow

# Grab a test JPG image (there doesn't seem to be a problem with .png files)
wget https://www.fileformat.info/format/jpeg/sample/ffe8869cda8748559f0780765c3ba9d8/download --output-document=test.jpg

# Run the above test to generate a hash from .tobytes()
echo -e "import PIL.Image\nimport hashlib\n\nwith PIL.Image.open('test.jpg') as im:\n    print(hashlib.md5(im.tobytes()).hexdigest())\n" | python

# Create a virtualenv (I'm using pipenv here) and install Pillow into it
pipenv install pillow
pipenv shell

# Run the same test
echo -e "import PIL.Image\nimport hashlib\n\nwith PIL.Image.open('test.jpg') as im:\n    print(hashlib.md5(im.tobytes()).hexdigest())\n" | python

The result for the first test (for me anyway) was ad6d9b1b5710326e369b880481ec887a while the result for the second test was 8e4652422b3adbd047e4aafdf0d4bcce -- I can't explain it.  In both cases, if I run pip freeze | grep Pillow the result is Pillow==5.2.0.

Offline

#4 2018-09-14 22:27:51

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Python's Pillow returns different values for system-wide Pillow

Well, what are the actual print(im.tobytes()) differences? Never mind their hashes.


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#5 2018-09-16 23:38:45

danielquinn
Member
From: Cambrige, UK
Registered: 2018-09-02
Posts: 13
Website

Re: Python's Pillow returns different values for system-wide Pillow

It's really weird actually.  The output of im.tobytes() is similar but not identical between versions.  I ran the above command and instead of using hashlib to generate a hash, I just printed the output of im.tobytes() and then piped all of that into separate files: "raw" and "virtualenv"

The "raw" file, the result of running this *outside* the virtualenv rendered a shorter list of bytes, looking mostly like this:

b'\\x01\\x01\\xff\\x01\\x01\\xff\\x01\\x01\\xff\\x01\\x01\\xff\\x01\\x01\\xff\\x01\\x01\\xff\\x01\\x01\\xff\\x01\\x01\\xff\\x

But the length of that chain was 345496, while the length of the virtualenv output was was 345532.  What's more, the first 2297 bytes were the same... I have no idea why.  I am totally in the dark on this.

Offline

#6 2018-09-17 07:47:50

brebs
Member
Registered: 2007-04-03
Posts: 3,742

Re: Python's Pillow returns different values for system-wide Pillow

danielquinn wrote:

the same everything

Arch contains patches - check bpo34056-always-return-bytes-from-_HackedGetData.get_data.patch

Offline

#7 2018-09-17 16:02:06

twelveeighty
Member
Registered: 2011-09-04
Posts: 1,371

Re: Python's Pillow returns different values for system-wide Pillow

Compare the output of both versions of im.tobytes()  to an actual binary dump of the file, using 'xxd', for example, or any other binary viewer. The version that's different from the binary dump is the odd one to be investigated.

Offline

#8 2018-09-17 18:51:20

eschwartz
Fellow
Registered: 2014-08-08
Posts: 4,097

Re: Python's Pillow returns different values for system-wide Pillow

brebs wrote:
danielquinn wrote:

the same everything

Arch contains patches - check bpo34056-always-return-bytes-from-_HackedGetData.get_data.patch

I doubt the way we open .pyc files matters here, though.


Managing AUR repos The Right Way -- aurpublish (now a standalone tool)

Offline

#9 2018-09-19 12:56:15

danielquinn
Member
From: Cambrige, UK
Registered: 2018-09-02
Posts: 13
Website

Re: Python's Pillow returns different values for system-wide Pillow

So can anyone reproduce what I'm getting, or is this somehow unique to my system?

@twelveeighty: I'm afraid I don't know how to do that.  Can you provide some example code?

@brebs: From the looks of those patches, the only thing that looked relevant to this case was a bit of code to force the opening of files in binary format if those files are indeed binary.  However, as I'm using Pillow's `Image.open()` here, rather than Python's `open()`, I don't think that applies.

Offline

#10 2018-09-19 21:25:20

remag_xela
Member
Registered: 2018-09-09
Posts: 1

Re: Python's Pillow returns different values for system-wide Pillow

Given that Pillow uses an external library to decode JPEG files, could it be due to the fact that the manylinux wheel uses libjpeg while Arch uses libjpeg-turbo?

$ imagetest=$'import PIL.Image, hashlib\nwith PIL.Image.open("/usr/share/backgrounds/xfce/alone.jpg"):\n\tprint(hashlib.sha256(im.tobytes()).hexdigest())

$ python -c "$imagetest"
62cc8dc5186f2b117571b4d9724ab3cba32df7be9535a4c554392ff856e709df

$ python -m venv pillow-wheel
$ pillow-wheel/bin/pip install Pillow
$ pillow-wheel/bin/python -c "$imagetest"
7f1c91fd36ff73900a83177f3d6e066d4f386b8286edc2aac0a1f8001bbbae3b

$ python -m venv pillow-src
$ pillow-src/bin/pip install --no-binary ':all:' Pillow
$ pillow-src/bin/python -c "$imagetest"
62cc8dc5186f2b117571b4d9724ab3cba32df7be9535a4c554392ff856e709df

$ find pillow-{wheel,src} -iname '*jpeg*so*'
pillow-wheel/lib/python3.7/site-packages/PIL/.libs/libjpeg-3fe7dfc0.so.9.3.0

Last edited by remag_xela (2018-09-19 21:27:56)

Offline

#11 2018-09-19 22:10:45

twelveeighty
Member
Registered: 2011-09-04
Posts: 1,371

Re: Python's Pillow returns different values for system-wide Pillow

danielquinn wrote:

I'm afraid I don't know how to do that.

For xxd, there may be more than one package that provides it, but gvim owns the executable in my case. "man xxd" is pretty self-explanatory. Use xxd to hexdump your test-file.jpg file to test-file-xxd.txt.

Next, write a python script that grabs im.tobytes(), creates a text file "test-file-version1.txt" and then iterates over the tobytes() array and dumps it to hex into the text file. Repeat for version 2 to create test-file-version2.txt. Then diff each version with the xxd text file. The one that's different from xxd is the version that should be investigated.

You could also use the PyPI "hexdump" library to dump binary to hex, but that could introduce another variable into this investigation.

Offline

#12 2018-09-21 08:45:16

danielquinn
Member
From: Cambrige, UK
Registered: 2018-09-02
Posts: 13
Website

Re: Python's Pillow returns different values for system-wide Pillow

@remag_xela that appears to be it, thank you for the clarification!  I guess I can't count on `.tobytes()` to be consistent across platforms then and will have to find another option.

Offline

Board footer

Powered by FluxBB