You are not logged in.

#1 2014-01-08 14:51:49

snack
Member
From: Italy
Registered: 2009-01-13
Posts: 876

OpenCL: best way to resize memory objects

I'm a beginner in OpenCL and I have a question about memory management (like 99.99% of noob questions about OpenCL, I guess...). I enqueue the kernel for execution inside a loop in the host code; at each iteration the size of the buffers used for memory objects passed as kernel arguments is different. Currently, I do something like this using the C++ wrapper API (which BTW I find much more easy to use than the C interface, that nevertheless seems to be the most used interface yet):

// Set platform, kernel, queue etc.
       . . . 

for (int i = 0; i < nIterations; i++){
  // Set buffer sizes and fill them
      . . .  

  // OpenCL section
  cl::Buffer clBuffer1 = cl::Buffer(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, bufferSize1*typeSize1, (void *) &(buffer1[0]));
  cl::Buffer clBuffer2 = cl::Buffer(context, CL_MEM_USE_HOST_PTR, bufferSize2*typeSize2, (void *) &(buffer2[0]));
  cl::Buffer clBuffer3 = cl::Buffer(context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, bufferSize3*typeSize3, (void *) &(buffer3[0]));
  kernel.setArg(0, clBuffer1);
  kernel.setArg(0, clBuffer2);
  kernel.setArg(0, clBuffer3);
  queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(someValueForRange), cl::NullRange);
  queue.enqueueMapBuffer(clBuffer2, CL_FALSE, CL_MAP_READ, 0, bufferSize2*typeSize2);
  queue.enqueueMapBuffer(clBuffer3, CL_TRUE, CL_MAP_READ, 0, bufferSize3*typeSize3);

  // Do something with retrieved data
      . . . 
}

I'm sure there are thousands of less-than-sub optimal memory access patterns in the code above, but for the moment I'm facing this issue: is there a way to modify the size of the cl::Buffer objects without having to reinstantiate them at each iteration of the loop? I fear that this constinuous create/destroy of cl::Buffer objects causes a full reallocation of memory on the CL device at each loop iteration, potentially degrading the overall performance. I think that also a size modification (a' la STL, to be more clear) could result in some memory reallocation, but hopefully less frequently. Or is it better to statically define oversized buffers outside the loop and play with the size arguments of enqueueMapBuffer inside the loop to transfer only the needed range of data to the GPU at each iteration of the loop?
Sorry for the possibly confusing question but as I said I'm a noob with OpenCL and this particular kind of problem seems to be not very common. Thanks.

Last edited by snack (2014-01-08 14:52:31)

Offline

Board footer

Powered by FluxBB