|
@@ -22,7 +22,7 @@ implementations (for CPU, OpenCL, and/or CUDA), and invoke the task like
|
|
a regular C function. The example below defines <c>my_task</c> which
|
|
a regular C function. The example below defines <c>my_task</c> which
|
|
has a single implementation for CPU:
|
|
has a single implementation for CPU:
|
|
|
|
|
|
-\snippet hello_pragma.c To be included
|
|
|
|
|
|
+\snippet hello_pragma.c To be included. You should update doxygen if you see that text.
|
|
|
|
|
|
The code can then be compiled and linked with GCC and the flag <c>-fplugin</c>:
|
|
The code can then be compiled and linked with GCC and the flag <c>-fplugin</c>:
|
|
|
|
|
|
@@ -352,7 +352,7 @@ vector_scal (unsigned size, float vector[size], float factor)
|
|
Next, the body of the program, which uses the task defined above, can be
|
|
Next, the body of the program, which uses the task defined above, can be
|
|
implemented:
|
|
implemented:
|
|
|
|
|
|
-\snippet hello_pragma2.c To be included
|
|
|
|
|
|
+\snippet hello_pragma2.c To be included. You should update doxygen if you see that text.
|
|
|
|
|
|
The function <c>main</c> above does several things:
|
|
The function <c>main</c> above does several things:
|
|
|
|
|
|
@@ -488,7 +488,7 @@ The actual implementation of the CUDA task goes into a separate
|
|
compilation unit, in a <c>.cu</c> file. It is very close to the
|
|
compilation unit, in a <c>.cu</c> file. It is very close to the
|
|
implementation when using StarPU's standard C API (\ref DefinitionOfTheCUDAKernel).
|
|
implementation when using StarPU's standard C API (\ref DefinitionOfTheCUDAKernel).
|
|
|
|
|
|
-\snippet scal_pragma.cu To be included
|
|
|
|
|
|
+\snippet scal_pragma.cu To be included. You should update doxygen if you see that text.
|
|
|
|
|
|
The complete source code, in the directory <c>gcc-plugin/examples/vector_scal</c>
|
|
The complete source code, in the directory <c>gcc-plugin/examples/vector_scal</c>
|
|
of the StarPU distribution, also shows how an SSE-specialized
|
|
of the StarPU distribution, also shows how an SSE-specialized
|
|
@@ -628,7 +628,7 @@ that the vector pointer returned by ::STARPU_VECTOR_GET_PTR is here a
|
|
pointer in GPU memory, so that it can be passed as such to the
|
|
pointer in GPU memory, so that it can be passed as such to the
|
|
kernel call <c>vector_mult_cuda</c>.
|
|
kernel call <c>vector_mult_cuda</c>.
|
|
|
|
|
|
-\snippet vector_scal_cuda.cu To be included
|
|
|
|
|
|
+\snippet vector_scal_cuda.cu To be included. You should update doxygen if you see that text.
|
|
|
|
|
|
\subsection DefinitionOfTheOpenCLKernel Definition of the OpenCL Kernel
|
|
\subsection DefinitionOfTheOpenCLKernel Definition of the OpenCL Kernel
|
|
|
|
|
|
@@ -650,7 +650,7 @@ which returns a <c>cl_mem</c> (which is not a device pointer, but an OpenCL
|
|
handle), which can be passed as such to the OpenCL kernel. The difference is
|
|
handle), which can be passed as such to the OpenCL kernel. The difference is
|
|
important when using partitioning, see \ref PartitioningData.
|
|
important when using partitioning, see \ref PartitioningData.
|
|
|
|
|
|
-\snippet vector_scal_opencl.c To be included
|
|
|
|
|
|
+\snippet vector_scal_opencl.c To be included. You should update doxygen if you see that text.
|
|
|
|
|
|
\subsection DefinitionOfTheMainCode Definition of the Main Code
|
|
\subsection DefinitionOfTheMainCode Definition of the Main Code
|
|
|
|
|
|
@@ -661,7 +661,7 @@ starpu_codelet::cuda_funcs and starpu_codelet::opencl_funcs are set to
|
|
define the pointers to the CUDA and OpenCL implementations of the
|
|
define the pointers to the CUDA and OpenCL implementations of the
|
|
task.
|
|
task.
|
|
|
|
|
|
-\snippet vector_scal_c.c To be included
|
|
|
|
|
|
+\snippet vector_scal_c.c To be included. You should update doxygen if you see that text.
|
|
|
|
|
|
\subsection ExecutionOfHybridVectorScaling Execution of Hybrid Vector Scaling
|
|
\subsection ExecutionOfHybridVectorScaling Execution of Hybrid Vector Scaling
|
|
|
|
|