| 
					
				 | 
			
			
				@@ -821,16 +821,16 @@ indicates that it is only available on Cell SPUs. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 Is a function pointer to the CPU implementation of the codelet. Its prototype 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 must be: @code{void cpu_func(void *buffers[], void *cl_arg)}. The first 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 argument being the array of data managed by the data management library, and 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-the second argument is a pointer to the argument passed from the @code{.cl_arg} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+the second argument is a pointer to the argument passed from the @code{cl_arg} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 field of the @code{starpu_task} structure. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 The @code{cpu_func} field is ignored if @code{STARPU_CPU} does not appear in 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-the @code{.where} field, it must be non-null otherwise. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+the @code{where} field, it must be non-null otherwise. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @item @code{cuda_func} (optional): 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 Is a function pointer to the CUDA implementation of the codelet. @emph{This 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 must be a host-function written in the CUDA runtime API}. Its prototype must 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 be: @code{void cuda_func(void *buffers[], void *cl_arg);}. The @code{cuda_func} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-field is ignored if @code{STARPU_CUDA} does not appear in the @code{.where} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+field is ignored if @code{STARPU_CUDA} does not appear in the @code{where} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 field, it must be non-null otherwise. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @item @code{opencl_func} (optional): 
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -838,7 +838,7 @@ Is a function pointer to the OpenCL implementation of the codelet. Its 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 prototype must be: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @code{void opencl_func(starpu_data_interface_t *descr, void *arg);}. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 This pointer is ignored if @code{OPENCL} does not appear in the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-@code{.where} field, it must be non-null otherwise. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+@code{where} field, it must be non-null otherwise. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @item @code{gordon_func} (optional): 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 This is the index of the Cell SPU implementation within the Gordon library. 
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -847,7 +847,7 @@ TODO 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @item @code{nbuffers}: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 Specifies the number of arguments taken by the codelet. These arguments are 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 managed by the DSM and are accessed from the @code{void *buffers[]} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-array. The constant argument passed with the @code{.cl_arg} field of the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+array. The constant argument passed with the @code{cl_arg} field of the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @code{starpu_task} structure is not counted in this number.  This value should 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 not be above @code{STARPU_NMAXBUFS}. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -884,15 +884,15 @@ TODO 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @item @code{cl_arg} (optional) (default = NULL): 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 This pointer is passed to the codelet through the second argument 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 of the codelet implementation (e.g. @code{cpu_func} or @code{cuda_func}). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-In the specific case of the Cell processor, see the @code{.cl_arg_size} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+In the specific case of the Cell processor, see the @code{cl_arg_size} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 argument. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @item @code{cl_arg_size} (optional, Cell specific): 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-In the case of the Cell processor, the @code{.cl_arg} pointer is not directly 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+In the case of the Cell processor, the @code{cl_arg} pointer is not directly 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 given to the SPU function. A buffer of size @code{cl_arg_size} is allocated on 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 the SPU. This buffer is then filled with the @code{cl_arg_size} bytes starting 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 at address @code{cl_arg}. In that case, the argument given to the SPU codelet 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-is therefore not the @code{.cl_arg} pointer, but the address of the buffer in 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+is therefore not the @code{cl_arg} pointer, but the address of the buffer in 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 local store (LS) instead. This field is ignored for CPU, CUDA and OpenCL 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 codelets.  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -1003,7 +1003,7 @@ Note that this function is automatically called by @code{starpu_task_destroy}. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @item @emph{Description}: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 Free the resource allocated during @code{starpu_task_create}. This function can be 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 called automatically after the execution of a task by setting the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-@code{.destroy} flag of the @code{starpu_task} structure (default behaviour). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+@code{destroy} flag of the @code{starpu_task} structure (default behaviour). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 Calling this function on a statically allocated task results in an undefined 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 behaviour. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -1268,7 +1268,7 @@ When calling this method, the offloaded function specified by the first argument 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 executed by every StarPU worker that may execute the function. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 The second argument is passed to the offloaded function. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 The last argument specifies on which types of processing units the function 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-should be executed. Similarly to the @code{.where} field of the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+should be executed. Similarly to the @code{where} field of the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @code{starpu_codelet} structure, it is possible to specify that the function 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 should be executed on every CUDA device and every CPU by passing 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @code{STARPU_CPU|STARPU_CUDA}. 
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -1348,19 +1348,19 @@ A codelet is a structure that represents a computational kernel. Such a codelet 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 may contain an implementation of the same kernel on different architectures 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 (e.g. CUDA, Cell's SPU, x86, ...). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-The ''@code{.nbuffers}'' field specifies the number of data buffers that are 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+The @code{nbuffers} field specifies the number of data buffers that are 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 manipulated by the codelet: here the codelet does not access or modify any data 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 that is controlled by our data management library. Note that the argument 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-passed to the codelet (the ''@code{.cl_arg}'' field of the @code{starpu_task} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+passed to the codelet (the @code{cl_arg} field of the @code{starpu_task} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 structure) does not count as a buffer since it is not managed by our data 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 management library.  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @c TODO need a crossref to the proper description of "where" see bla for more ... 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-We create a codelet which may only be executed on the CPUs. The ''@code{.where}'' 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+We create a codelet which may only be executed on the CPUs. The @code{where} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 field is a bitmask that defines where the codelet may be executed. Here, the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @code{STARPU_CPU} value means that only CPUs can execute this codelet 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 (@pxref{Codelets and Tasks} for more details on that field). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-When a CPU core executes a codelet, it calls the @code{.cpu_func} function, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+When a CPU core executes a codelet, it calls the @code{cpu_func} function, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 which @emph{must} have the following prototype: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @code{void (*cpu_func)(void *buffers[], void *cl_arg)} 
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -1368,7 +1368,7 @@ which @emph{must} have the following prototype: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 In this example, we can ignore the first argument of this function which gives a 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 description of the input and output buffers (e.g. the size and the location of 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 the matrices). The second argument is a pointer to a buffer passed as an 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-argument to the codelet by the means of the ''@code{.cl_arg}'' field of the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+argument to the codelet by the means of the @code{cl_arg} field of the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @code{starpu_task} structure. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @c TODO rewrite so that it is a little clearer ? 
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -1428,28 +1428,28 @@ corresponding structure with the default settings (@pxref{starpu_task_create}), 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 but it does not submit the task to StarPU. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @c not really clear ;) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-The ''@code{.cl}'' field is a pointer to the codelet which the task will 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+The @code{cl} field is a pointer to the codelet which the task will 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 execute: in other words, the codelet structure describes which computational 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 kernel should be offloaded on the different architectures, and the task 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 structure is a wrapper containing a codelet and the piece of data on which the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 codelet should operate. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-The optional ''@code{.cl_arg}'' field is a pointer to a buffer (of size 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-@code{.cl_arg_size}) with some parameters for the kernel 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+The optional @code{cl_arg} field is a pointer to a buffer (of size 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+@code{cl_arg_size}) with some parameters for the kernel 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 described by the codelet. For instance, if a codelet implements a computational 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 kernel that multiplies its input vector by a constant, the constant could be 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 specified by the means of this buffer. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 Once a task has been executed, an optional callback function can be called. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 While the computational kernel could be offloaded on various architectures, the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-callback function is always executed on a CPU. The ''@code{.callback_arg}'' 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+callback function is always executed on a CPU. The @code{callback_arg} 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 pointer is passed as an argument of the callback. The prototype of a callback 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 function must be: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @example 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 void (*callback_function)(void *); 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 @end example 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-If the @code{.synchronous} field is non-null, task submission will be 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+If the @code{synchronous} field is non-null, task submission will be 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 synchronous: the @code{starpu_task_submit} function will not return until the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 task was executed. Note that the @code{starpu_shutdown} method does not 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 guarantee that asynchronous tasks have been executed before it returns. 
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -1508,7 +1508,7 @@ Since the factor is constant, it does not need a preliminary declaration, and 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 can just be passed through the @code{cl_arg} pointer like in the previous 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 example.  The vector parameter is described by its handle. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 There are two fields in each element of the @code{buffers} array. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-@code{.handle} is the handle of the data, and @code{.mode} specifies how the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+@code{handle} is the handle of the data, and @code{mode} specifies how the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 kernel will access the data (@code{STARPU_R} for read-only, @code{STARPU_W} for 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 write-only and @code{STARPU_RW} for read and write access). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -1543,7 +1543,7 @@ The second argument of the @code{scal_func} function contains a pointer to the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 parameters of the codelet (given in @code{task->cl_arg}), so that we read the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 constant factor from this pointer. The first argument is an array that gives 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 a description of every buffers passed in the @code{task->buffers}@ array. The 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-size of this array is given by the @code{.nbuffers} field of the codelet 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+size of this array is given by the @code{nbuffers} field of the codelet 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 structure. For the sake of generality, this array contains pointers to the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 different interfaces describing each buffer.  In the case of the @b{vector 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 interface}, the location of the vector (resp. its length) is accessible in the 
			 |