OpenCL package for free amdgpu stack

iuno · 2016-11-29 04:06:18

Hi there,

I'm using the free amdgpu stack; amdgpu, mesa radeonsi, radv and vdpau. Sadly, clover opencl support is not sufficient, so I use the proprietary driver for that. However, I don't want to use the full hybrid stack and give up Mesa for that, so I created this package:
https://github.com/grmat/amdgpocl

Is this something that anybody want to see in the aur? I know there are already packages for amdgpu-pro, but this is really just opencl (+libdrm).

I have not done much testing but blender cycles and luxmark work and run as fast as on windows.

Last edited by iuno (2016-12-15 01:26:34)

R00KIE · 2016-11-29 14:07:44

If there isn't a package there already submit yours. I suppose more people will use it if it is available and even if only you use it you can consider it a backup

Dokter Bibber · 2016-11-30 22:23:38

I would want to know if AMD's OCL implementation is an ICD. Because I have the ocl-icd package installed : https://www.archlinux.org/packages/extr … 4/ocl-icd/

I would certainly be interested if this speeds up my GPU renders.
At the moment the GPU version of the Cycles Benchmark (the blend file with the 2 BMWs) at the code.blender.org website is taking almost 40 minutes with my RX 480 8GB with open source drivers.
While for others with older and lesser cards it takes 5 - 7 minutes.
(If the benchmark even renders with the GPU that is. I open the GPU version but that's no guarantee I guess.)

The big BUT though, is that I first want/have to resolve the graphics glitches that I'm getting with the open source drivers in Blender.
This happened on a fresh installation of Gnome and KDE, and still happens on a fresh installation of Deepin.
So it is definitely driver/kernel related. I'm on kernel 4.8.11 and amdgpu 1.2.0 now and the graphical glitches persist.

Even on a finsihed render I can see 2 artifacts, each time, at the exact same spot. So I do not know if the feedback I could give would be usable or taken seriously for your testing.

I'm not asking for help with the graphical glitches, because I will open another thread for that.

Last edited by Dokter Bibber (2016-11-30 22:25:23)

iuno · 2016-12-01 03:56:30

Yes, thanks for the hint I forgot to add ocl-icd as a dependency, actually.

The BMW scene should render in under 2 minutes on your 480 with this package. Remember to adjust the tile size in Blender. In this specific benchmark one large tile works best.

To enable GPU rendering in Cycles, you have to go to
File > User Preferences > System
Set "Compute Device" from "None" to OpenCL" [then there should be your card in the drop down menu]
Close User Preferences, move to the "Render" tab in the sidebar, select "Device: GPU Compute"

I adjusted the name of the package to fit the existing opencl-mesa and opencl-nvidia packages.

https://aur.archlinux.org/packages/opencl-amd/

Dokter Bibber · 2016-12-01 09:13:23

[off-topic]
But I have to resolve the graphics glitches in the Blender UI first. Before that I can't make any settings. That's also why I can't check if rendering is done with the GPU or CPU. It sucks.

Clicking the top level menu items corrupts the Blender UI. So much that I can't read the menu options.
I can open Blend files because I use Ctrl+O to get the file open dialog. Count the position of the file in the list, and then blindly tap the down arrow (causes immediate UI corruption) to get there and hit enter.
Also, just moving the mouse over certain parts of the Blender UI randomly corrupts the content of the Blender window.

I can't even change the dpi of the Blender UI. So almost all text and icons are tiny.

Screenshots here : https://bbs.archlinux.org/viewtopic.php?id=220172
[/off-topic]

Last edited by Dokter Bibber (2016-12-01 10:03:50)

iuno · 2016-12-01 11:02:58

Yes, I'm aware of this bug. I can't help you with that but maybe provide some files to render from CLI later, if you like.

Dokter Bibber · 2016-12-01 11:11:10

@iuno You're da man!
I installed your package opencl-amd.
By comparing screenshots in Google images with the corrupted menus in Blender, and estimated/blind clicking, I then manged to make the setting to let Blender use compute device OpenCL Ellesmere for rendering.

The first run took 7 minutes and 3 seconds. There was a message that compute engines had to be loaded and that it might take a few minutes the first time. (Sorry but the line was partially corrupted, and then disappeared.)
The second run took 5 minutes and 56 seconds. The message mentioned at the start of the first run wasn't shown.

Oh my what an improvement over nearly 40 minutes. It might get faster if I adjust the tile size like you suggested. I will do that later today and let you know. (If I manage to make the setting of course.)
If I can resolve the glitches, I can continue with writing shaders.

I'm finally seeing the potential of this card.
Thanks very much for this.

[rant]Bloody AMD![/rant]

Disclaimer : The glitches in the screenshots have nothing to do with this OCL driver. They were already present before it was installed.

First run :

Second run :

Last edited by Dokter Bibber (2016-12-01 11:12:44)

iuno · 2016-12-01 11:23:02

Thanks for reporting back.
The first run takes longer because of kernel compilation time. The kernels get cached so subsequent runs are faster. remember to adjust the tile size. And if you switch back to CPU rendering, chose 16*16. BTW may I ask which CPU you use? >40 min seem very slow.
If you want readable output, you can start blender from the terminal.

Last edited by iuno (2016-12-01 11:23:56)

Dokter Bibber · 2016-12-01 12:07:19

It's getting better and better. (I couldn't wait.)

I managed to adjust the tile size. (Same method.)
First to 512 x 512 (from what I think was 256 x 256) which resulted in 4 tiles (2 big tiles and 2 tiny tiles). Then 1024 x 576 (1 huge tile).
512 x 512 : 4 min 46 sec
1024x756 : 4 min 23 sec

This is on my desktop which is ancient, Intel Q9550, (but will be replaced if AMD Zen is not too disappointing.)
I don't have Blender on my laptop because I use that solely for work. And that involves Windows.

No screenshot of 512 x 512 tile size. The Blender window turned completely black when I started Shutter.

With one 1024 x 756 tile :

Last edited by Dokter Bibber (2016-12-01 12:09:52)

Dokter Bibber · 2016-12-01 17:45:26

I don't know what's difference between the BMW gpu blend file from code.blender.org and the blend file from blenderartists.org.
But I'm getting even faster render times with the file from blenderartists.org.
I used the tile setting (960 x 540) from this post : https://blenderartists.org/forum/showth … ost3072292 and got a wicked :
960 x 540 : 01 min 28 sec (see screenshot).

The fastest I can get from the code.blender.org blend file is in my previous post (4 min 23 sec).

Thanks for making this package.

GPU render with blend file from blenderartists.org 01:28:72 :

CPU render time went down to 29:37:60 with the 16 x 16 tile size that you suggested. This is with the CPU blend file from code.blender.org :

iuno · 2016-12-01 23:43:55

Oh, the samples count does differ. 20(^2) vs 35(^2) samples, that does explain the difference in computing time. I have never noticed this before, though.
~1:30 seems like a 'valid' result with that card, finally

Dokter Bibber · 2016-12-02 08:20:15

So it is the sample count. I can't see that because the render settings are mostly corrupted.

And yes, amazing results. \o/
Thanks for your efforts with making this package.

iuno · 2016-12-09 04:12:39

Package updated to yesterday released 16.50.

This update gave me a little improvement in performance, but not much testing done tbh.

Dokter Bibber · 2016-12-09 17:18:50

At the moment I cannot do any testing.
I uninstalled Blender because I cannot do anything with it really, due to its severe UI glitches.

[off-topic]
I'm trying to setup my shader development environment like under Windows, but the OS driver is killing all hope so far.
Most of AMD's OCL SDKs (APP SDK) that I want to use, require stuff from the Pro driver. So that is Ubuntu and Red Hat only. The installation requirements also list that. EDIT: It's on page 7 of the Getting Started Guide pdf linked from here : http://developer.amd.com/tools-and-sdks … g-app-sdk/
The circle is round again. And I stopped going round and round.
I do not start off with 3D modelers really (I have little to no experience with them. Last is 3DS Max 2009.).
I only use ready made and rigged models to start with. And shade them through standalone renderers or Unity.
[/off-topic]

Last edited by Dokter Bibber (2016-12-09 17:26:27)

ase1590 · 2017-01-12 03:07:24

Dokter Bibber wrote:

@iuno You're da man!
I installed your package opencl-amd.
By comparing screenshots in Google images with the corrupted menus in Blender, and estimated/blind clicking, I then manged to make the setting to let Blender use compute device OpenCL Ellesmere for rendering.
The first run took 7 minutes and 3 seconds. There was a message that compute engines had to be loaded and that it might take a few minutes the first time. (Sorry but the line was partially corrupted, and then disappeared.)
The second run took 5 minutes and 56 seconds. The message mentioned at the start of the first run wasn't shown.
Oh my what an improvement over nearly 40 minutes. It might get faster if I adjust the tile size like you suggested. I will do that later today and let you know. (If I manage to make the setting of course.)
If I can resolve the glitches, I can continue with writing shaders.
I'm finally seeing the potential of this card.
Thanks very much for this.
[rant]Bloody AMD![/rant]
Disclaimer : The glitches in the screenshots have nothing to do with this OCL driver. They were already present before it was installed.
First run :
https://s17.postimg.org/k9h8ff8h7/blender_78a_gpu_render_with_package_opencl_amd.png
Second run :
https://s17.postimg.org/kar68uaaz/blender_78a_gpu_render_with_package_opencl_amd.png

Please take a look at what I added to the blender wiki page about how to fix your graphical corruption.

Also, thank you so much Iuno for this package. I didnt want to deal with amdgpu-pro but wanted the blender opencl support, and this meets my needs perfectly. Also, its seems its working fine for a quick glass bottle scene I whipped up. thank you!

Edit: AMD is coming out with an open-cl pro-render plugin for blender that promises to be very accurate and fast, so this package is great for preparing for it landing.
http://pro.radeon.com/en-us/radeon-pror … r-blender/

Last edited by ase1590 (2017-01-12 15:01:09)

Perry3D · 2017-01-27 16:05:48

Did someone has success with this package and darktable. I get this with a GCN 1.0 card:

[opencl_init] opencl related configuration options:
[opencl_init] 
[opencl_init] opencl: 1
[opencl_init] opencl_library: ''
[opencl_init] opencl_memory_requirement: 768
[opencl_init] opencl_memory_headroom: 300
[opencl_init] opencl_device_priority: '*/!0,*/*/*'
[opencl_init] opencl_size_roundup: 16
[opencl_init] opencl_async_pixelpipe: 0
[opencl_init] opencl_synch_cache: 0
[opencl_init] opencl_number_event_handles: 25
[opencl_init] opencl_micro_nap: 1000
[opencl_init] opencl_use_pinned_memory: 0
[opencl_init] opencl_use_cpu_devices: 0
[opencl_init] opencl_avoid_atomics: 0
[opencl_init] opencl_enable_markesteijn: 1
[opencl_init] 
[opencl_init] found opencl runtime library 'libOpenCL'
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 2 devices
[opencl_init] device 0 `Hainan' supports image sizes of 16384 x 16384
[opencl_init] device 0 `Hainan' allows GPU memory allocations of up to 1733MB
[opencl_init] device 0: Hainan 
     GLOBAL_MEM_SIZE:          2504MB
     MAX_WORK_GROUP_SIZE:      256
     MAX_WORK_ITEM_DIMENSIONS: 3
     MAX_WORK_ITEM_SIZES:      [ 256 256 256 ]
     DRIVER_VERSION:           2264.10
     DEVICE_VERSION:           OpenCL 1.2 AMD-APP (2264.10)
[opencl_init] could not create command queue for device 0: -6
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.

iuno · 2017-01-27 21:31:09

ase1590 wrote:

Also, thank you so much Iuno for this package. I didnt want to deal with amdgpu-pro but wanted the blender opencl support, and this meets my needs perfectly. Also, its seems its working fine for a quick glass bottle scene I whipped up. thank you!

You're welcome Thanks for adding that section about dri3/triple buffering to the wiki.

Perry3D wrote:

Did someone has success with this package and darktable. I get this with a GCN 1.0 card:

IIRC not all GCN 1.0 cards are supported by AMD, but I'm not sure. What chip is it exactly? Could you provide a test case for darktable?

I updated the package to yesterday's release (16.60). If you are updating, you might need to pass '--cleanbuild' to makepkg. That should not happen again in the future.
Also, hybrid code popped up on the web, making it possible to patch libdrm instead of having two versions around. I'll look into that when I have some time. But right now it is not of much priority for me, tbh.

Last edited by iuno (2017-01-27 21:31:29)

Perry3D · 2017-01-28 09:07:24

I own a 280X:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X]

But maybe you gave me the hint: i need another version of libdrm? Can someone point me to a howto on using the free amdgpu stack?

Output of clinfo:

Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.0 AMD-APP (2264.10)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 2
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon HD 7900 Series
  Device Topology:				 PCI[ B#1, D#0, F#0 ]
  Max compute units:				 5
  Max work items dimensions:			 3
    Max work items[0]:				 256
    Max work items[1]:				 256
    Max work items[2]:				 256
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 1020Mhz
  Address bits:					 32
  Max memory allocation:			 1849826304
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 2048
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 2669084672
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Max pipe arguments:				 0
  Max pipe active reservations:			 0
  Max pipe packet size:				 0
  Max global variable size:			 0
  Max global variable preferred total size:	 0
  Max read/write image args:			 0
  Max on device events:				 0
  Queue on device max size:			 0
  Max on device queues:				 0
  Queue on device preferred size:		 0
  SVM capabilities:				 
    Coarse grain buffer:			 No
    Fine grain buffer:				 No
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 No
    Profiling :					 No
  Platform ID:					 0x7f80062faad8
  Name:						 Hainan
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 1.2 
  Driver version:				 2264.10
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 AMD-APP (2264.10)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 


  Device Type:					 CL_DEVICE_TYPE_CPU
  Vendor ID:					 1002h
  Board name:					 
  Max compute units:				 8
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 1024
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 8
  Preferred vector width double:		 4
  Native vector width char:			 16
  Native vector width short:			 8
  Native vector width int:			 4
  Native vector width long:			 2
  Native vector width float:			 8
  Native vector width double:			 4
  Max clock frequency:				 3581Mhz
  Address bits:					 64
  Max memory allocation:			 4193436672
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 64
  Max image 2D width:				 8192
  Max image 2D height:				 8192
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 4096
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 32768
  Global memory size:				 16773746688
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Global
  Local memory size:				 32768
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 4193436672
  Max global variable size:			 1879048192
  Max global variable preferred total size:	 1879048192
  Max read/write image args:			 64
  Max on device events:				 0
  Queue on device max size:			 0
  Max on device queues:				 0
  Queue on device preferred size:		 0
  SVM capabilities:				 
    Coarse grain buffer:			 No
    Fine grain buffer:				 No
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 1
  Error correction support:			 0
  Unified memory for Host and Device:		 1
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 Yes
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 No
    Profiling :					 No
  Platform ID:					 0x7f80062faad8
  Name:						 Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
  Vendor:					 GenuineIntel
  Device OpenCL C version:			 OpenCL C 1.2 
  Driver version:				 2264.10 (sse2,avx)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 AMD-APP (2264.10)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

iuno · 2017-01-28 19:11:33

No, the included libdrm from amdgpu-pro has it all. Patching libdrm would make this solution more elegant, not more functional.

Your outputs say something about Hainan, but 280X is definitely Tahiti (also: "Number of devices: 2"). Do you have a second GPU or is your CPU in fact an APU?

Last edited by iuno (2017-01-28 19:12:24)

Perry3D · 2017-01-28 19:44:42

iuno wrote:

No, the included libdrm from amdgpu-pro has it all. Patching libdrm would make this solution more elegant, not more functional.
Your outputs say something about Hainan, but 280X is definitely Tahiti (also: "Number of devices: 2"). Do you have a second GPU or is your CPU in fact an APU?

Ok, got it. The libdrm library is in the package.

I build a new kernel from git and the device name is now correct. But the error is the same.

Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.0 AMD-APP (2264.10)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 2
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon HD 7900 Series
  Device Topology:				 PCI[ B#1, D#0, F#0 ]
  Max compute units:				 16
  Max work items dimensions:			 3
    Max work items[0]:				 256
    Max work items[1]:				 256
    Max work items[2]:				 256
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 1020Mhz
  Address bits:					 32
  Max memory allocation:			 2069483520
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 2048
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 2913701888
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Max pipe arguments:				 0
  Max pipe active reservations:			 0
  Max pipe packet size:				 0
  Max global variable size:			 0
  Max global variable preferred total size:	 0
  Max read/write image args:			 0
  Max on device events:				 0
  Queue on device max size:			 0
  Max on device queues:				 0
  Queue on device preferred size:		 0
  SVM capabilities:				 
    Coarse grain buffer:			 No
    Fine grain buffer:				 No
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 No
    Profiling :					 No
  Platform ID:					 0x7f20692e1ad8
  Name:						 Tahiti
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 1.2 
  Driver version:				 2264.10
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 AMD-APP (2264.10)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 


  Device Type:					 CL_DEVICE_TYPE_CPU
  Vendor ID:					 1002h
  Board name:					 
  Max compute units:				 8
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 1024
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 8
  Preferred vector width double:		 4
  Native vector width char:			 16
  Native vector width short:			 8
  Native vector width int:			 4
  Native vector width long:			 2
  Native vector width float:			 8
  Native vector width double:			 4
  Max clock frequency:				 3700Mhz
  Address bits:					 64
  Max memory allocation:			 4193408000
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 64
  Max image 2D width:				 8192
  Max image 2D height:				 8192
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 16
  Max size of kernel argument:			 4096
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 32768
  Global memory size:				 16773632000
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Global
  Local memory size:				 32768
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 4193408000
  Max global variable size:			 1879048192
  Max global variable preferred total size:	 1879048192
  Max read/write image args:			 64
  Max on device events:				 0
  Queue on device max size:			 0
  Max on device queues:				 0
  Queue on device preferred size:		 0
  SVM capabilities:				 
    Coarse grain buffer:			 No
    Fine grain buffer:				 No
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 1
  Error correction support:			 0
  Unified memory for Host and Device:		 1
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 Yes
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 No
    Profiling :					 No
  Platform ID:					 0x7f20692e1ad8
  Name:						 Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
  Vendor:					 GenuineIntel
  Device OpenCL C version:			 OpenCL C 1.2 
  Driver version:				 2264.10 (sse2,avx)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 AMD-APP (2264.10)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

The second device is my CPU.

Maybe someone can post his output of clinfo. So i can compare.

iuno · 2017-01-28 22:24:52

clinfo for Hawaii:

Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.0 AMD-APP (2264.10)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 2
  Device Name                                     Hawaii
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2264.10)
  Driver Version                                  2264.10
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         AMD Radeon R9 200 Series
  Device Topology (AMD)                           PCI-E, 01:00.0
  Max compute units                               44
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1000MHz
  Graphics IP (AMD)                               7.2
  Device Partition                                (core)
    Max number of sub-devices                     44
    Supported partition types                     none specified
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              3969855488 (3.697GiB)
  Global free memory (AMD)                        3858248 (3.68GiB)
  Global memory channels (AMD)                    16
  Global memory banks per channel (AMD)           16
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           2820304896 (2.627GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max constant buffer size                        2820304896 (2.627GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1485631842879031925ns (Sat Jan 28 20:30:42 2017)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

[...]

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  AMD Accelerated Parallel Processing
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [AMD]
  clCreateContext(NULL, ...) [default]            Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Hawaii
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (2)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Hawaii
    Device Name                                   Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.10
  ICD loader Profile                              OpenCL 2.1

Have you tried using the full amdpgu-pro stack, to see if the problem remains?

Last edited by iuno (2017-01-28 22:25:37)

Perry3D · 2017-01-29 11:49:23

Thank you for your log. I actually found something:
Your log:

Max constant buffer size                        2820304896 (2.627GiB)

And mine:

Max constant buffer size                        65536 (64KiB)

And the error in luxmark is

OpenCL ERROR: clCreateCommandQueue(-6)

which is

cl.h:#define CL_OUT_OF_HOST_MEMORY                       -6

Do someone know where i can write an upstream bug report?

Also tried the drm-next kernel. Doesn't help.

vaulteleven · 2017-07-13 11:00:13

I recently started using blender and noticed that it was not using my R9 280x GPU for rendering. I started searching for a solution and found the Wiki entry on installing opencl-amd from the AUR alongside the open source drivers. I did that, but now blender crashes everytime I try to go into the settings. The error it gives me:

Read new prefs: /home/vault/.config/blender/2.78/config/userpref.blend
amdgpu_device_initialize: DRM version is 2.49.0 but this driver is only compatible with 3.x.x.
Writing: /tmp/blender.crash.txt
Segmentation fault (core dumped)

clinfo gives me the same error so I'm guessing that I'm doing something major stupid here.

I'm using the standard kernel and amdgpu driver.

Perry3D · 2017-07-13 11:13:19

I think you didn't deactivate the radeon kernel module. The drm version for radeon is 2.xx and for amdgpu is 3.xx.

You have to deactivate it explicitly or amdgpu will not be loaded.

/edit: https://wiki.archlinux.org/index.php/AM … 29_support

Last edited by Perry3D (2017-07-13 11:34:21)

vaulteleven · 2017-07-13 12:07:05

Ok very nice, now the gdm login manager is smoother on login then before and clinfo gives me information

Number of platforms                               2
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.0 AMD-APP (2348.3)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Extensions function suffix             AMD

  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 17.1.4
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 2
  Device Name                                     Tahiti
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2348.3)
  Driver Version                                  2348.3
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         AMD Radeon HD 7900 Series
  Device Topology (AMD)                           PCI-E, 01:00.0
  Max compute units                               16
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1020MHz
  Graphics IP (AMD)                               6.0
  Device Partition                                (core)
    Max number of sub-devices                     16
    Supported partition types                     none specified
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Compiler Available                              Yes
  Linker Available                                Yes
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    32, Little-Endian
  Global memory size                              2991472640 (2.786GiB)
  Global free memory (AMD)                        <printDeviceInfo:72: get number of CL_DEVICE_GLOBAL_FREE_MEMORY_AMD : error -33>
  Global memory channels (AMD)                    12
  Global memory banks per channel (AMD)           4
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           2060473344 (1.919GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1499947070229582478ns (Thu Jul 13 13:57:50 2017)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  No
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

  Device Name                                     Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
  Device Vendor                                   GenuineIntel
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2348.3)
  Driver Version                                  2348.3 (sse2,avx)
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     CPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         
  Device Topology (AMD)                           (n/a)
  Max compute units                               4
  Max clock frequency                             3698MHz
  Device Partition                                (core, cl_ext_device_fission)
    Max number of sub-devices                     4
    Supported partition types                     equally, by counts, by affinity domain
    Supported affinity domains                    L3 cache, L2 cache, L1 cache, next partitionable
    Supported partition types (ext)               equally, by counts, by affinity domain
    Supported affinity domains (ext)              L3 cache, L2 cache, L1 cache, next fissionable
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Compiler Available                              Yes
  Linker Available                                Yes
  Preferred work group size multiple              1
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 4 / 4        (n/a)
    float                                                8 / 8       
    double                                               4 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              16679866368 (15.53GiB)
  Error Correction support                        No
  Max memory allocation                           4169966592 (3.884GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        32768 (32KiB)
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                64
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     8
  Max size of kernel argument                     4096 (4KiB)
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1499947070229582478ns (Thu Jul 13 13:57:50 2017)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            65536 (64KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event 

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD TAHITI (DRM 3.10.0 / 4.11.9-1-ARCH, LLVM 4.0.1)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 17.1.4
  Driver Version                                  17.1.4
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Max compute units                               32
  Max clock frequency                             1020MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Compiler Available                              Yes
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              3220066304 (2.999GiB)
  Error Correction support                        No
  Max memory allocation                           2254046412 (2.099GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        2147483647 (2GiB)
  Max number of constant args                     16
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  AMD Accelerated Parallel Processing
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [AMD]
  clCreateContext(NULL, ...) [default]            Success [AMD]
  clCreateContext(NULL, ...) [other]              Success [MESA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Tahiti
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (2)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   Tahiti
    Device Name                                   Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

I just went back to arch after not liking Fedora so much and I apparently forgot to install amdgpu. Everything worked though with wayland gnome, so I didn't give it any thought.

Now I can select the GPU in blender, but when trying to render, blender says CL_OUT_OF_HOST_MEMORY. Clinfo sees the 3GB VRam my card has, so I dont know what the problem is.

EDIT: I guess this line is the problem

  Global free memory (AMD)                        <printDeviceInfo:72: get number of CL_DEVICE_GLOBAL_FREE_MEMORY_AMD : error -33>

Last edited by vaulteleven (2017-07-13 12:12:08)

Arch Linux

#1 2016-11-29 04:06:18

OpenCL package for free amdgpu stack

#2 2016-11-29 14:07:44

Re: OpenCL package for free amdgpu stack

#3 2016-11-30 22:23:38

Re: OpenCL package for free amdgpu stack

#4 2016-12-01 03:56:30

Re: OpenCL package for free amdgpu stack

#5 2016-12-01 09:13:23

Re: OpenCL package for free amdgpu stack

#6 2016-12-01 11:02:58

Re: OpenCL package for free amdgpu stack

#7 2016-12-01 11:11:10

Re: OpenCL package for free amdgpu stack

#8 2016-12-01 11:23:02

Re: OpenCL package for free amdgpu stack

#9 2016-12-01 12:07:19

Re: OpenCL package for free amdgpu stack

#10 2016-12-01 17:45:26

Re: OpenCL package for free amdgpu stack

#11 2016-12-01 23:43:55

Re: OpenCL package for free amdgpu stack

#12 2016-12-02 08:20:15

Re: OpenCL package for free amdgpu stack

#13 2016-12-09 04:12:39

Re: OpenCL package for free amdgpu stack

#14 2016-12-09 17:18:50

Re: OpenCL package for free amdgpu stack

#15 2017-01-12 03:07:24

Re: OpenCL package for free amdgpu stack

#16 2017-01-27 16:05:48

Re: OpenCL package for free amdgpu stack

#17 2017-01-27 21:31:09

Re: OpenCL package for free amdgpu stack

#18 2017-01-28 09:07:24

Re: OpenCL package for free amdgpu stack

#19 2017-01-28 19:11:33

Re: OpenCL package for free amdgpu stack

#20 2017-01-28 19:44:42

Re: OpenCL package for free amdgpu stack

#21 2017-01-28 22:24:52

Re: OpenCL package for free amdgpu stack

#22 2017-01-29 11:49:23

Re: OpenCL package for free amdgpu stack

#23 2017-07-13 11:00:13

Re: OpenCL package for free amdgpu stack

#24 2017-07-13 11:13:19

Re: OpenCL package for free amdgpu stack

#25 2017-07-13 12:07:05

Re: OpenCL package for free amdgpu stack

Board footer