I standardly create kernel and pass values like:
clSetKernelArg(cl_kernel, 0, sizeof(cl_mem), (void*)&rt_kdtree_mem_c); clSetKernelArg(cl_kernel, 1, sizeof(cl_mem), (void*)&rt_kdtree_mem_a); clSetKernelArg(cl_kernel, 2, sizeof(cl_mem), (void*)&rt_kdtree_mem_B); clSetKernelArg(cl_kernel, 3, sizeof(cl_mem), (void*)&rt_tree_bounds); clSetKernelArg(cl_kernel, 4, sizeof(int), (void*)&rt_kdtree_items);
And my kernel looks like (just for testing):
#pragma OPENCL EXTENSION cl_amd_printf : enable
__kernel void main(__global float* c, __global float* a, __global float* b, __global float* bounds, const int num)
{
const int idx = get_global_id(0);
printf("Idx %d, Bounds: %f %f %f\n", idx, bounds[0], bounds[1], bounds[2]);
if(idx < num)
{
c[idx] = a[idx] + b[idx];
}
}
What I really can't get is, why my bounds are passed like {-125, 0, 0, .... 0}, and on the application side it's actually {-125, -5, -125, 1, ... 1}. E.g. I only get first value passed.
Data are passed in the standard way...
clEnqueueWriteBuffer(cl_queue, rt_tree_bounds, CL_TRUE, 0, sizeof(float) * 8, data, 0, NULL, &cl_event); clReleaseEvent(cl_event);
And of course I call clFinish before I execute the kernel.
I still can't get it after two hours, on the other hand I'm sure there is some childish bug in here... If anyone can spot it, I'd be very glad.












