You are not logged in.
Hi there,
I'm using the free amdgpu stack; amdgpu, mesa radeonsi, radv and vdpau. Sadly, clover opencl support is not sufficient, so I use the proprietary driver for that. However, I don't want to use the full hybrid stack and give up Mesa for that, so I created this package:
https://github.com/grmat/amdgpocl
Is this something that anybody want to see in the aur? I know there are already packages for amdgpu-pro, but this is really just opencl (+libdrm).
I have not done much testing but blender cycles and luxmark work and run as fast as on windows.
Last edited by iuno (2016-12-15 01:26:34)
Offline
If there isn't a package there already submit yours. I suppose more people will use it if it is available and even if only you use it you can consider it a backup
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
I would want to know if AMD's OCL implementation is an ICD. Because I have the ocl-icd package installed : https://www.archlinux.org/packages/extr … 4/ocl-icd/
I would certainly be interested if this speeds up my GPU renders.
At the moment the GPU version of the Cycles Benchmark (the blend file with the 2 BMWs) at the code.blender.org website is taking almost 40 minutes with my RX 480 8GB with open source drivers.
While for others with older and lesser cards it takes 5 - 7 minutes.
(If the benchmark even renders with the GPU that is. I open the GPU version but that's no guarantee I guess.)
The big BUT though, is that I first want/have to resolve the graphics glitches that I'm getting with the open source drivers in Blender.
This happened on a fresh installation of Gnome and KDE, and still happens on a fresh installation of Deepin.
So it is definitely driver/kernel related. I'm on kernel 4.8.11 and amdgpu 1.2.0 now and the graphical glitches persist.
Even on a finsihed render I can see 2 artifacts, each time, at the exact same spot. So I do not know if the feedback I could give would be usable or taken seriously for your testing.
I'm not asking for help with the graphical glitches, because I will open another thread for that.
Last edited by Dokter Bibber (2016-11-30 22:25:23)
Offline
Yes, thanks for the hint I forgot to add ocl-icd as a dependency, actually.
The BMW scene should render in under 2 minutes on your 480 with this package. Remember to adjust the tile size in Blender. In this specific benchmark one large tile works best.
To enable GPU rendering in Cycles, you have to go to
File > User Preferences > System
Set "Compute Device" from "None" to OpenCL" [then there should be your card in the drop down menu]
Close User Preferences, move to the "Render" tab in the sidebar, select "Device: GPU Compute"
I adjusted the name of the package to fit the existing opencl-mesa and opencl-nvidia packages.
Offline
[off-topic]
But I have to resolve the graphics glitches in the Blender UI first. Before that I can't make any settings. That's also why I can't check if rendering is done with the GPU or CPU. It sucks.
Clicking the top level menu items corrupts the Blender UI. So much that I can't read the menu options.
I can open Blend files because I use Ctrl+O to get the file open dialog. Count the position of the file in the list, and then blindly tap the down arrow (causes immediate UI corruption) to get there and hit enter.
Also, just moving the mouse over certain parts of the Blender UI randomly corrupts the content of the Blender window.
I can't even change the dpi of the Blender UI. So almost all text and icons are tiny.
Screenshots here : https://bbs.archlinux.org/viewtopic.php?id=220172
[/off-topic]
Last edited by Dokter Bibber (2016-12-01 10:03:50)
Offline
Yes, I'm aware of this bug. I can't help you with that but maybe provide some files to render from CLI later, if you like.
Offline
@iuno You're da man!
I installed your package opencl-amd.
By comparing screenshots in Google images with the corrupted menus in Blender, and estimated/blind clicking, I then manged to make the setting to let Blender use compute device OpenCL Ellesmere for rendering.
The first run took 7 minutes and 3 seconds. There was a message that compute engines had to be loaded and that it might take a few minutes the first time. (Sorry but the line was partially corrupted, and then disappeared.)
The second run took 5 minutes and 56 seconds. The message mentioned at the start of the first run wasn't shown.
Oh my what an improvement over nearly 40 minutes. It might get faster if I adjust the tile size like you suggested. I will do that later today and let you know. (If I manage to make the setting of course.)
If I can resolve the glitches, I can continue with writing shaders.
I'm finally seeing the potential of this card.
Thanks very much for this.
[rant]Bloody AMD![/rant]
Disclaimer : The glitches in the screenshots have nothing to do with this OCL driver. They were already present before it was installed.
First run :
Second run :
Last edited by Dokter Bibber (2016-12-01 11:12:44)
Offline
Thanks for reporting back.
The first run takes longer because of kernel compilation time. The kernels get cached so subsequent runs are faster. remember to adjust the tile size. And if you switch back to CPU rendering, chose 16*16. BTW may I ask which CPU you use? >40 min seem very slow.
If you want readable output, you can start blender from the terminal.
Last edited by iuno (2016-12-01 11:23:56)
Offline
It's getting better and better. (I couldn't wait.)
I managed to adjust the tile size. (Same method.)
First to 512 x 512 (from what I think was 256 x 256) which resulted in 4 tiles (2 big tiles and 2 tiny tiles). Then 1024 x 576 (1 huge tile).
512 x 512 : 4 min 46 sec
1024x756 : 4 min 23 sec
This is on my desktop which is ancient, Intel Q9550, (but will be replaced if AMD Zen is not too disappointing.)
I don't have Blender on my laptop because I use that solely for work. And that involves Windows.
No screenshot of 512 x 512 tile size. The Blender window turned completely black when I started Shutter.
With one 1024 x 756 tile :
Last edited by Dokter Bibber (2016-12-01 12:09:52)
Offline
I don't know what's difference between the BMW gpu blend file from code.blender.org and the blend file from blenderartists.org.
But I'm getting even faster render times with the file from blenderartists.org.
I used the tile setting (960 x 540) from this post : https://blenderartists.org/forum/showth … ost3072292 and got a wicked :
960 x 540 : 01 min 28 sec (see screenshot).
The fastest I can get from the code.blender.org blend file is in my previous post (4 min 23 sec).
Thanks for making this package.
GPU render with blend file from blenderartists.org 01:28:72 :
CPU render time went down to 29:37:60 with the 16 x 16 tile size that you suggested. This is with the CPU blend file from code.blender.org :
Offline
Oh, the samples count does differ. 20(^2) vs 35(^2) samples, that does explain the difference in computing time. I have never noticed this before, though.
~1:30 seems like a 'valid' result with that card, finally
Offline
So it is the sample count. I can't see that because the render settings are mostly corrupted.
And yes, amazing results. \o/
Thanks for your efforts with making this package.
Offline
Package updated to yesterday released 16.50.
This update gave me a little improvement in performance, but not much testing done tbh.
Offline
At the moment I cannot do any testing.
I uninstalled Blender because I cannot do anything with it really, due to its severe UI glitches.
[off-topic]
I'm trying to setup my shader development environment like under Windows, but the OS driver is killing all hope so far.
Most of AMD's OCL SDKs (APP SDK) that I want to use, require stuff from the Pro driver. So that is Ubuntu and Red Hat only. The installation requirements also list that. EDIT: It's on page 7 of the Getting Started Guide pdf linked from here : http://developer.amd.com/tools-and-sdks … g-app-sdk/
The circle is round again. And I stopped going round and round.
I do not start off with 3D modelers really (I have little to no experience with them. Last is 3DS Max 2009.).
I only use ready made and rigged models to start with. And shade them through standalone renderers or Unity.
[/off-topic]
Last edited by Dokter Bibber (2016-12-09 17:26:27)
Offline
@iuno You're da man!
I installed your package opencl-amd.
By comparing screenshots in Google images with the corrupted menus in Blender, and estimated/blind clicking, I then manged to make the setting to let Blender use compute device OpenCL Ellesmere for rendering.The first run took 7 minutes and 3 seconds. There was a message that compute engines had to be loaded and that it might take a few minutes the first time. (Sorry but the line was partially corrupted, and then disappeared.)
The second run took 5 minutes and 56 seconds. The message mentioned at the start of the first run wasn't shown.Oh my what an improvement over nearly 40 minutes. It might get faster if I adjust the tile size like you suggested. I will do that later today and let you know. (If I manage to make the setting of course.)
If I can resolve the glitches, I can continue with writing shaders.I'm finally seeing the potential of this card.
Thanks very much for this.[rant]Bloody AMD![/rant]
Disclaimer : The glitches in the screenshots have nothing to do with this OCL driver. They were already present before it was installed.
First run :
https://s17.postimg.org/k9h8ff8h7/blender_78a_gpu_render_with_package_opencl_amd.png
Second run :
https://s17.postimg.org/kar68uaaz/blender_78a_gpu_render_with_package_opencl_amd.png
Please take a look at what I added to the blender wiki page about how to fix your graphical corruption.
Also, thank you so much Iuno for this package. I didnt want to deal with amdgpu-pro but wanted the blender opencl support, and this meets my needs perfectly. Also, its seems its working fine for a quick glass bottle scene I whipped up. thank you!
Edit: AMD is coming out with an open-cl pro-render plugin for blender that promises to be very accurate and fast, so this package is great for preparing for it landing.
http://pro.radeon.com/en-us/radeon-pror … r-blender/
Last edited by ase1590 (2017-01-12 15:01:09)
Offline
Did someone has success with this package and darktable. I get this with a GCN 1.0 card:
[opencl_init] opencl related configuration options:
[opencl_init]
[opencl_init] opencl: 1
[opencl_init] opencl_library: ''
[opencl_init] opencl_memory_requirement: 768
[opencl_init] opencl_memory_headroom: 300
[opencl_init] opencl_device_priority: '*/!0,*/*/*'
[opencl_init] opencl_size_roundup: 16
[opencl_init] opencl_async_pixelpipe: 0
[opencl_init] opencl_synch_cache: 0
[opencl_init] opencl_number_event_handles: 25
[opencl_init] opencl_micro_nap: 1000
[opencl_init] opencl_use_pinned_memory: 0
[opencl_init] opencl_use_cpu_devices: 0
[opencl_init] opencl_avoid_atomics: 0
[opencl_init] opencl_enable_markesteijn: 1
[opencl_init]
[opencl_init] found opencl runtime library 'libOpenCL'
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 2 devices
[opencl_init] device 0 `Hainan' supports image sizes of 16384 x 16384
[opencl_init] device 0 `Hainan' allows GPU memory allocations of up to 1733MB
[opencl_init] device 0: Hainan
GLOBAL_MEM_SIZE: 2504MB
MAX_WORK_GROUP_SIZE: 256
MAX_WORK_ITEM_DIMENSIONS: 3
MAX_WORK_ITEM_SIZES: [ 256 256 256 ]
DRIVER_VERSION: 2264.10
DEVICE_VERSION: OpenCL 1.2 AMD-APP (2264.10)
[opencl_init] could not create command queue for device 0: -6
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.
Offline
Also, thank you so much Iuno for this package. I didnt want to deal with amdgpu-pro but wanted the blender opencl support, and this meets my needs perfectly. Also, its seems its working fine for a quick glass bottle scene I whipped up. thank you!
You're welcome Thanks for adding that section about dri3/triple buffering to the wiki.
Did someone has success with this package and darktable. I get this with a GCN 1.0 card:
IIRC not all GCN 1.0 cards are supported by AMD, but I'm not sure. What chip is it exactly? Could you provide a test case for darktable?
I updated the package to yesterday's release (16.60). If you are updating, you might need to pass '--cleanbuild' to makepkg. That should not happen again in the future.
Also, hybrid code popped up on the web, making it possible to patch libdrm instead of having two versions around. I'll look into that when I have some time. But right now it is not of much priority for me, tbh.
Last edited by iuno (2017-01-27 21:31:29)
Offline
I own a 280X:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X]
But maybe you gave me the hint: i need another version of libdrm? Can someone point me to a howto on using the free amdgpu stack?
Output of clinfo:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (2264.10)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon HD 7900 Series
Device Topology: PCI[ B#1, D#0, F#0 ]
Max compute units: 5
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1020Mhz
Address bits: 32
Max memory allocation: 1849826304
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 2669084672
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f80062faad8
Name: Hainan
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2264.10
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2264.10)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 1002h
Board name:
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 8
Preferred vector width double: 4
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 3581Mhz
Address bits: 64
Max memory allocation: 4193436672
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 16773746688
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 4193436672
Max global variable size: 1879048192
Max global variable preferred total size: 1879048192
Max read/write image args: 64
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f80062faad8
Name: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
Vendor: GenuineIntel
Device OpenCL C version: OpenCL C 1.2
Driver version: 2264.10 (sse2,avx)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2264.10)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event
Offline
No, the included libdrm from amdgpu-pro has it all. Patching libdrm would make this solution more elegant, not more functional.
Your outputs say something about Hainan, but 280X is definitely Tahiti (also: "Number of devices: 2"). Do you have a second GPU or is your CPU in fact an APU?
Last edited by iuno (2017-01-28 19:12:24)
Offline
No, the included libdrm from amdgpu-pro has it all. Patching libdrm would make this solution more elegant, not more functional.
Your outputs say something about Hainan, but 280X is definitely Tahiti (also: "Number of devices: 2"). Do you have a second GPU or is your CPU in fact an APU?
Ok, got it. The libdrm library is in the package.
I build a new kernel from git and the device name is now correct. But the error is the same.
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (2264.10)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon HD 7900 Series
Device Topology: PCI[ B#1, D#0, F#0 ]
Max compute units: 16
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1020Mhz
Address bits: 32
Max memory allocation: 2069483520
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 2913701888
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f20692e1ad8
Name: Tahiti
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2264.10
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2264.10)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 1002h
Board name:
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 8
Preferred vector width double: 4
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 3700Mhz
Address bits: 64
Max memory allocation: 4193408000
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 16773632000
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 4193408000
Max global variable size: 1879048192
Max global variable preferred total size: 1879048192
Max read/write image args: 64
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f20692e1ad8
Name: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
Vendor: GenuineIntel
Device OpenCL C version: OpenCL C 1.2
Driver version: 2264.10 (sse2,avx)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2264.10)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event
The second device is my CPU.
Maybe someone can post his output of clinfo. So i can compare.
Offline
clinfo for Hawaii:
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.0 AMD-APP (2264.10)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Extensions function suffix AMD
Platform Name AMD Accelerated Parallel Processing
Number of devices 2
Device Name Hawaii
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2264.10)
Driver Version 2264.10
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Board Name (AMD) AMD Radeon R9 200 Series
Device Topology (AMD) PCI-E, 01:00.0
Max compute units 44
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 1000MHz
Graphics IP (AMD) 7.2
Device Partition (core)
Max number of sub-devices 44
Supported partition types none specified
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 3969855488 (3.697GiB)
Global free memory (AMD) 3858248 (3.68GiB)
Global memory channels (AMD) 16
Global memory banks per channel (AMD) 16
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 2820304896 (2.627GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16384
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 32768 (32KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max constant buffer size 2820304896 (2.627GiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1485631842879031925ns (Sat Jan 28 20:30:42 2017)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
[...]
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) AMD Accelerated Parallel Processing
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [AMD]
clCreateContext(NULL, ...) [default] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Hawaii
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (2)
Platform Name AMD Accelerated Parallel Processing
Device Name Hawaii
Device Name Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.10
ICD loader Profile OpenCL 2.1
Have you tried using the full amdpgu-pro stack, to see if the problem remains?
Last edited by iuno (2017-01-28 22:25:37)
Offline
Thank you for your log. I actually found something:
Your log:
Max constant buffer size 2820304896 (2.627GiB)
And mine:
Max constant buffer size 65536 (64KiB)
And the error in luxmark is
OpenCL ERROR: clCreateCommandQueue(-6)
which is
cl.h:#define CL_OUT_OF_HOST_MEMORY -6
Do someone know where i can write an upstream bug report?
Also tried the drm-next kernel. Doesn't help.
Offline
I recently started using blender and noticed that it was not using my R9 280x GPU for rendering. I started searching for a solution and found the Wiki entry on installing opencl-amd from the AUR alongside the open source drivers. I did that, but now blender crashes everytime I try to go into the settings. The error it gives me:
Read new prefs: /home/vault/.config/blender/2.78/config/userpref.blend
amdgpu_device_initialize: DRM version is 2.49.0 but this driver is only compatible with 3.x.x.
Writing: /tmp/blender.crash.txt
Segmentation fault (core dumped)
clinfo gives me the same error so I'm guessing that I'm doing something major stupid here.
I'm using the standard kernel and amdgpu driver.
Offline
I think you didn't deactivate the radeon kernel module. The drm version for radeon is 2.xx and for amdgpu is 3.xx.
You have to deactivate it explicitly or amdgpu will not be loaded.
/edit: https://wiki.archlinux.org/index.php/AM … 29_support
Last edited by Perry3D (2017-07-13 11:34:21)
Offline
Ok very nice, now the gdm login manager is smoother on login then before and clinfo gives me information
Number of platforms 2
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.0 AMD-APP (2348.3)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Extensions function suffix AMD
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 17.1.4
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name AMD Accelerated Parallel Processing
Number of devices 2
Device Name Tahiti
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2348.3)
Driver Version 2348.3
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Available Yes
Device Profile FULL_PROFILE
Device Board Name (AMD) AMD Radeon HD 7900 Series
Device Topology (AMD) PCI-E, 01:00.0
Max compute units 16
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 1020MHz
Graphics IP (AMD) 6.0
Device Partition (core)
Max number of sub-devices 16
Supported partition types none specified
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Compiler Available Yes
Linker Available Yes
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 32, Little-Endian
Global memory size 2991472640 (2.786GiB)
Global free memory (AMD) <printDeviceInfo:72: get number of CL_DEVICE_GLOBAL_FREE_MEMORY_AMD : error -33>
Global memory channels (AMD) 12
Global memory banks per channel (AMD) 4
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 2060473344 (1.919GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 32768 (32KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max constant buffer size 65536 (64KiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1499947070229582478ns (Thu Jul 13 13:57:50 2017)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) No
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Device Name Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
Device Vendor GenuineIntel
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2348.3)
Driver Version 2348.3 (sse2,avx)
Device OpenCL C Version OpenCL C 1.2
Device Type CPU
Device Available Yes
Device Profile FULL_PROFILE
Device Board Name (AMD)
Device Topology (AMD) (n/a)
Max compute units 4
Max clock frequency 3698MHz
Device Partition (core, cl_ext_device_fission)
Max number of sub-devices 4
Supported partition types equally, by counts, by affinity domain
Supported affinity domains L3 cache, L2 cache, L1 cache, next partitionable
Supported partition types (ext) equally, by counts, by affinity domain
Supported affinity domains (ext) L3 cache, L2 cache, L1 cache, next fissionable
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
Compiler Available Yes
Linker Available Yes
Preferred work group size multiple 1
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 4 / 4 (n/a)
float 8 / 8
double 4 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 16679866368 (15.53GiB)
Error Correction support No
Max memory allocation 4169966592 (3.884GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 32768 (32KiB)
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 8192x8192 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 64
Local memory type Global
Local memory size 32768 (32KiB)
Max constant buffer size 65536 (64KiB)
Max number of constant args 8
Max size of kernel argument 4096 (4KiB)
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1499947070229582478ns (Thu Jul 13 13:57:50 2017)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 65536 (64KiB)
Built-in kernels
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event
Platform Name Clover
Number of devices 1
Device Name AMD TAHITI (DRM 3.10.0 / 4.11.9-1-ARCH, LLVM 4.0.1)
Device Vendor AMD
Device Vendor ID 0x1002
Device Version OpenCL 1.1 Mesa 17.1.4
Driver Version 17.1.4
Device OpenCL C Version OpenCL C 1.1
Device Type GPU
Device Available Yes
Device Profile FULL_PROFILE
Max compute units 32
Max clock frequency 1020MHz
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Compiler Available Yes
Preferred work group size multiple 64
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 0 (n/a)
float 4 / 4
double 2 / 2 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 3220066304 (2.999GiB)
Error Correction support No
Max memory allocation 2254046412 (2.099GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type None
Image support No
Local memory type Local
Local memory size 32768 (32KiB)
Max constant buffer size 2147483647 (2GiB)
Max number of constant args 16
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Profiling timer resolution 0ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) AMD Accelerated Parallel Processing
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [AMD]
clCreateContext(NULL, ...) [default] Success [AMD]
clCreateContext(NULL, ...) [other] Success [MESA]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Tahiti
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (2)
Platform Name AMD Accelerated Parallel Processing
Device Name Tahiti
Device Name Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.11
ICD loader Profile OpenCL 2.1
I just went back to arch after not liking Fedora so much and I apparently forgot to install amdgpu. Everything worked though with wayland gnome, so I didn't give it any thought.
Now I can select the GPU in blender, but when trying to render, blender says CL_OUT_OF_HOST_MEMORY. Clinfo sees the 3GB VRam my card has, so I dont know what the problem is.
EDIT: I guess this line is the problem
Global free memory (AMD) <printDeviceInfo:72: get number of CL_DEVICE_GLOBAL_FREE_MEMORY_AMD : error -33>
Last edited by vaulteleven (2017-07-13 12:12:08)
Offline