@c -*-texinfo-*- @c This file is part of the StarPU Handbook. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1 @c Copyright (C) 2010, 2011, 2012, 2013 Centre National de la Recherche Scientifique @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique @c See the file starpu.texi for copying conditions. @menu * Insert Task:: * Tracing support:: * MPI Interface:: * Defining a new data interface:: * Multiformat Data Interface:: * Task Bundles:: * Task Lists:: * Using Parallel Tasks:: * Scheduling Contexts:: * Defining a new scheduling policy:: * Running drivers:: * Expert mode:: @end menu @node Insert Task @section Insert Task @deftypefun int starpu_insert_task (struct starpu_codelet *@var{cl}, ...) Create and submit a task corresponding to @var{cl} with the following arguments. The argument list must be zero-terminated. The arguments following the codelets can be of the following types: @itemize @item @code{STARPU_R}, @code{STARPU_W}, @code{STARPU_RW}, @code{STARPU_SCRATCH}, @code{STARPU_REDUX} an access mode followed by a data handle; @item @code{STARPU_DATA_ARRAY} followed by an array of data handles and its number of elements; @item the specific values @code{STARPU_VALUE}, @code{STARPU_CALLBACK}, @code{STARPU_CALLBACK_ARG}, @code{STARPU_CALLBACK_WITH_ARG}, @code{STARPU_PRIORITY}, @code{STARPU_TAG}, followed by the appropriated objects as defined below. @end itemize When using @code{STARPU_DATA_ARRAY}, the access mode of the data handles is not defined. Parameters to be passed to the codelet implementation are defined through the type @code{STARPU_VALUE}. The function @code{starpu_codelet_unpack_args} must be called within the codelet implementation to retrieve them. @end deftypefun @defmac STARPU_VALUE this macro is used when calling @code{starpu_insert_task}, and must be followed by a pointer to a constant value and the size of the constant @end defmac @defmac STARPU_CALLBACK this macro is used when calling @code{starpu_insert_task}, and must be followed by a pointer to a callback function @end defmac @defmac STARPU_CALLBACK_ARG this macro is used when calling @code{starpu_insert_task}, and must be followed by a pointer to be given as an argument to the callback function @end defmac @defmac STARPU_CALLBACK_WITH_ARG this macro is used when calling @code{starpu_insert_task}, and must be followed by two pointers: one to a callback function, and the other to be given as an argument to the callback function; this is equivalent to using both @code{STARPU_CALLBACK} and @code{STARPU_CALLBACK_WITH_ARG} @end defmac @defmac STARPU_PRIORITY this macro is used when calling @code{starpu_insert_task}, and must be followed by a integer defining a priority level @end defmac @defmac STARPU_TAG this macro is used when calling @code{starpu_insert_task}, and must be followed by a tag. @end defmac @deftypefun void starpu_codelet_pack_args ({char **}@var{arg_buffer}, {size_t *}@var{arg_buffer_size}, ...) Pack arguments of type @code{STARPU_VALUE} into a buffer which can be given to a codelet and later unpacked with the function @code{starpu_codelet_unpack_args} defined below. @end deftypefun @deftypefun void starpu_codelet_unpack_args ({void *}@var{cl_arg}, ...) Retrieve the arguments of type @code{STARPU_VALUE} associated to a task automatically created using the function @code{starpu_insert_task} defined above. @end deftypefun @node Tracing support @section Tracing support @deftypefun void starpu_fxt_start_profiling (void) Start recording the trace. The trace is by default started from @code{starpu_init()} call, but can be paused by using @code{starpu_fxt_stop_profiling}, in which case @code{starpu_fxt_start_profiling} should be called to specify when to resume recording events. @end deftypefun @deftypefun void starpu_fxt_stop_profiling (void) Stop recording the trace. The trace is by default stopped at @code{starpu_shutdown()} call. @code{starpu_fxt_stop_profiling} can however be used to stop it earlier. @code{starpu_fxt_start_profiling} can then be called to start recording it again, etc. @end deftypefun @node MPI Interface @section MPI Interface @menu * Initialisation:: * Communication:: * Communication cache:: @end menu @node Initialisation @subsection Initialisation @deftypefun int starpu_mpi_init (int *@var{argc}, char ***@var{argv}, int initialize_mpi) Initializes the starpumpi library. @code{initialize_mpi} indicates if MPI should be initialized or not by StarPU. If the value is not @code{0}, MPI will be initialized by calling @code{MPI_Init_Thread(argc, argv, MPI_THREAD_SERIALIZED, ...)}. @end deftypefun @deftypefun int starpu_mpi_initialize (void) This function has been made deprecated. One should use instead the function @code{starpu_mpi_init()} defined above. This function does not call @code{MPI_Init}, it should be called beforehand. @end deftypefun @deftypefun int starpu_mpi_initialize_extended (int *@var{rank}, int *@var{world_size}) This function has been made deprecated. One should use instead the function @code{starpu_mpi_init()} defined above. MPI will be initialized by starpumpi by calling @code{MPI_Init_Thread(argc, argv, MPI_THREAD_SERIALIZED, ...)}. @end deftypefun @deftypefun int starpu_mpi_shutdown (void) Cleans the starpumpi library. This must be called between calling @code{starpu_mpi} functions and @code{starpu_shutdown()}. @code{MPI_Finalize()} will be called if StarPU-MPI has been initialized by @code{starpu_mpi_init()}. @end deftypefun @deftypefun void starpu_mpi_comm_amounts_retrieve (size_t *@var{comm_amounts}) Retrieve the current amount of communications from the current node in the array @code{comm_amounts} which must have a size greater or equal to the world size. Communications statistics must be enabled (@pxref{STARPU_COMM_STATS}). @end deftypefun @node Communication @subsection Communication The standard point to point communications of MPI have been implemented. The semantic is similar to the MPI one, but adapted to the DSM provided by StarPU. A MPI request will only be submitted when the data is available in the main memory of the node submitting the request. @deftypefun int starpu_mpi_send (starpu_data_handle_t @var{data_handle}, int @var{dest}, int @var{mpi_tag}, MPI_Comm @var{comm}) Performs a standard-mode, blocking send of @var{data_handle} to the node @var{dest} using the message tag @code{mpi_tag} within the communicator @var{comm}. @end deftypefun @deftypefun int starpu_mpi_recv (starpu_data_handle_t @var{data_handle}, int @var{source}, int @var{mpi_tag}, MPI_Comm @var{comm}, MPI_Status *@var{status}) Performs a standard-mode, blocking receive in @var{data_handle} from the node @var{source} using the message tag @code{mpi_tag} within the communicator @var{comm}. @end deftypefun @deftypefun int starpu_mpi_isend (starpu_data_handle_t @var{data_handle}, starpu_mpi_req *@var{req}, int @var{dest}, int @var{mpi_tag}, MPI_Comm @var{comm}) Posts a standard-mode, non blocking send of @var{data_handle} to the node @var{dest} using the message tag @code{mpi_tag} within the communicator @var{comm}. After the call, the pointer to the request @var{req} can be used to test the completion of the communication. @end deftypefun @deftypefun int starpu_mpi_irecv (starpu_data_handle_t @var{data_handle}, starpu_mpi_req *@var{req}, int @var{source}, int @var{mpi_tag}, MPI_Comm @var{comm}) Posts a nonblocking receive in @var{data_handle} from the node @var{source} using the message tag @code{mpi_tag} within the communicator @var{comm}. After the call, the pointer to the request @var{req} can be used to test the completion of the communication. @end deftypefun @deftypefun int starpu_mpi_isend_detached (starpu_data_handle_t @var{data_handle}, int @var{dest}, int @var{mpi_tag}, MPI_Comm @var{comm}, void (*@var{callback})(void *), void *@var{arg}) Posts a standard-mode, non blocking send of @var{data_handle} to the node @var{dest} using the message tag @code{mpi_tag} within the communicator @var{comm}. On completion, the @var{callback} function is called with the argument @var{arg}. @end deftypefun @deftypefun int starpu_mpi_irecv_detached (starpu_data_handle_t @var{data_handle}, int @var{source}, int @var{mpi_tag}, MPI_Comm @var{comm}, void (*@var{callback})(void *), void *@var{arg}) Posts a nonblocking receive in @var{data_handle} from the node @var{source} using the message tag @code{mpi_tag} within the communicator @var{comm}. On completion, the @var{callback} function is called with the argument @var{arg}. @end deftypefun @deftypefun int starpu_mpi_wait (starpu_mpi_req *@var{req}, MPI_Status *@var{status}) Returns when the operation identified by request @var{req} is complete. @end deftypefun @deftypefun int starpu_mpi_test (starpu_mpi_req *@var{req}, int *@var{flag}, MPI_Status *@var{status}) If the operation identified by @var{req} is complete, set @var{flag} to 1. The @var{status} object is set to contain information on the completed operation. @end deftypefun @deftypefun int starpu_mpi_barrier (MPI_Comm @var{comm}) Blocks the caller until all group members of the communicator @var{comm} have called it. @end deftypefun @deftypefun int starpu_mpi_isend_detached_unlock_tag (starpu_data_handle_t @var{data_handle}, int @var{dest}, int @var{mpi_tag}, MPI_Comm @var{comm}, starpu_tag_t @var{tag}) Posts a standard-mode, non blocking send of @var{data_handle} to the node @var{dest} using the message tag @code{mpi_tag} within the communicator @var{comm}. On completion, @var{tag} is unlocked. @end deftypefun @deftypefun int starpu_mpi_irecv_detached_unlock_tag (starpu_data_handle_t @var{data_handle}, int @var{source}, int @var{mpi_tag}, MPI_Comm @var{comm}, starpu_tag_t @var{tag}) Posts a nonblocking receive in @var{data_handle} from the node @var{source} using the message tag @code{mpi_tag} within the communicator @var{comm}. On completion, @var{tag} is unlocked. @end deftypefun @deftypefun int starpu_mpi_isend_array_detached_unlock_tag (unsigned @var{array_size}, starpu_data_handle_t *@var{data_handle}, int *@var{dest}, int *@var{mpi_tag}, MPI_Comm *@var{comm}, starpu_tag_t @var{tag}) Posts @var{array_size} standard-mode, non blocking send. Each post sends the n-th data of the array @var{data_handle} to the n-th node of the array @var{dest} using the n-th message tag of the array @code{mpi_tag} within the n-th communicator of the array @var{comm}. On completion of the all the requests, @var{tag} is unlocked. @end deftypefun @deftypefun int starpu_mpi_irecv_array_detached_unlock_tag (unsigned @var{array_size}, starpu_data_handle_t *@var{data_handle}, int *@var{source}, int *@var{mpi_tag}, MPI_Comm *@var{comm}, starpu_tag_t @var{tag}) Posts @var{array_size} nonblocking receive. Each post receives in the n-th data of the array @var{data_handle} from the n-th node of the array @var{source} using the n-th message tag of the array @code{mpi_tag} within the n-th communicator of the array @var{comm}. On completion of the all the requests, @var{tag} is unlocked. @end deftypefun @node Communication cache @subsection Communication cache @deftypefun void starpu_mpi_cache_flush (MPI_Comm @var{comm}, starpu_data_handle_t @var{data_handle}) Clear the send and receive communication cache for the data @var{data_handle}. The function has to be called synchronously by all the MPI nodes. The function does nothing if the cache mechanism is disabled (@pxref{STARPU_MPI_CACHE}). @end deftypefun @deftypefun void starpu_mpi_cache_flush_all_data (MPI_Comm @var{comm}) Clear the send and receive communication cache for all data. The function has to be called synchronously by all the MPI nodes. The function does nothing if the cache mechanism is disabled (@pxref{STARPU_MPI_CACHE}). @end deftypefun @node Defining a new data interface @section Defining a new data interface @menu * Data Interface API:: Data Interface API * An example of data interface:: An example of data interface @end menu @node Data Interface API @subsection Data Interface API @deftp {Data Type} {struct starpu_data_interface_ops} @anchor{struct starpu_data_interface_ops} Per-interface data transfer methods. @table @asis @item @code{void (*register_data_handle)(starpu_data_handle_t handle, uint32_t home_node, void *data_interface)} Register an existing interface into a data handle. @item @code{starpu_ssize_t (*allocate_data_on_node)(void *data_interface, uint32_t node)} Allocate data for the interface on a given node. @item @code{ void (*free_data_on_node)(void *data_interface, uint32_t node)} Free data of the interface on a given node. @item @code{ const struct starpu_data_copy_methods *copy_methods} ram/cuda/spu/opencl synchronous and asynchronous transfer methods. @item @code{ void * (*handle_to_pointer)(starpu_data_handle_t handle, uint32_t node)} Return the current pointer (if any) for the handle on the given node. @item @code{ size_t (*get_size)(starpu_data_handle_t handle)} Return an estimation of the size of data, for performance models. @item @code{ uint32_t (*footprint)(starpu_data_handle_t handle)} Return a 32bit footprint which characterizes the data size. @item @code{ int (*compare)(void *data_interface_a, void *data_interface_b)} Compare the data size of two interfaces. @item @code{ void (*display)(starpu_data_handle_t handle, FILE *f)} Dump the sizes of a handle to a file. @item @code{enum starpu_data_interface_id interfaceid} An identifier that is unique to each interface. @item @code{size_t interface_size} The size of the interface data descriptor. @item @code{int is_multiformat} todo @item @code{struct starpu_multiformat_data_interface_ops* (*get_mf_ops)(void *data_interface)} todo @item @code{int (*pack_data)(starpu_data_handle_t handle, uint32_t node, void **ptr, size_t *count)} Pack the data handle into a contiguous buffer at the address @code{ptr} and set the size of the newly created buffer in @code{count} @item @code{int (*unpack_data)(starpu_data_handle_t handle, uint32_t node, void *ptr, size_t count)} Unpack the data handle from the contiguous buffer at the address @code{ptr} of size @var{count} @end table @end deftp @deftp {Data Type} {struct starpu_data_copy_methods} Defines the per-interface methods. @table @asis @item @code{int @{ram,cuda,opencl,spu@}_to_@{ram,cuda,opencl,spu@}(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node)} These 16 functions define how to copy data from the @var{src_interface} interface on the @var{src_node} node to the @var{dst_interface} interface on the @var{dst_node} node. They return 0 on success. @item @code{int (*ram_to_cuda_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, cudaStream_t stream)} Define how to copy data from the @var{src_interface} interface on the @var{src_node} node (in RAM) to the @var{dst_interface} interface on the @var{dst_node} node (on a CUDA device), using the given @var{stream}. Return 0 on success. @item @code{int (*cuda_to_ram_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, cudaStream_t stream)} Define how to copy data from the @var{src_interface} interface on the @var{src_node} node (on a CUDA device) to the @var{dst_interface} interface on the @var{dst_node} node (in RAM), using the given @var{stream}. Return 0 on success. @item @code{int (*cuda_to_cuda_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, cudaStream_t stream)} Define how to copy data from the @var{src_interface} interface on the @var{src_node} node (on a CUDA device) to the @var{dst_interface} interface on the @var{dst_node} node (on another CUDA device), using the given @var{stream}. Return 0 on success. @item @code{int (*ram_to_opencl_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, /* cl_event * */ void *event)} Define how to copy data from the @var{src_interface} interface on the @var{src_node} node (in RAM) to the @var{dst_interface} interface on the @var{dst_node} node (on an OpenCL device), using @var{event}, a pointer to a cl_event. Return 0 on success. @item @code{int (*opencl_to_ram_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, /* cl_event * */ void *event)} Define how to copy data from the @var{src_interface} interface on the @var{src_node} node (on an OpenCL device) to the @var{dst_interface} interface on the @var{dst_node} node (in RAM), using the given @var{event}, a pointer to a cl_event. Return 0 on success. @item @code{int (*opencl_to_opencl_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, /* cl_event * */ void *event)} Define how to copy data from the @var{src_interface} interface on the @var{src_node} node (on an OpenCL device) to the @var{dst_interface} interface on the @var{dst_node} node (on another OpenCL device), using the given @var{event}, a pointer to a cl_event. Return 0 on success. @end table @end deftp @deftypefun uint32_t starpu_crc32_be_n ({void *}@var{input}, size_t @var{n}, uint32_t @var{inputcrc}) Compute the CRC of a byte buffer seeded by the inputcrc "current state". The return value should be considered as the new "current state" for future CRC computation. This is used for computing data size footprint. @end deftypefun @deftypefun uint32_t starpu_crc32_be (uint32_t @var{input}, uint32_t @var{inputcrc}) Compute the CRC of a 32bit number seeded by the inputcrc "current state". The return value should be considered as the new "current state" for future CRC computation. This is used for computing data size footprint. @end deftypefun @deftypefun uint32_t starpu_crc32_string ({char *}@var{str}, uint32_t @var{inputcrc}) Compute the CRC of a string seeded by the inputcrc "current state". The return value should be considered as the new "current state" for future CRC computation. This is used for computing data size footprint. @end deftypefun @node An example of data interface @subsection An example of data interface @deftypefun int starpu_data_interface_get_next_id (void) Returns the next available id for a newly created data interface. @end deftypefun Let's define a new data interface to manage complex numbers. @cartouche @smallexample /* interface for complex numbers */ struct starpu_complex_interface @{ double *real; double *imaginary; int nx; @}; @end smallexample @end cartouche Registering such a data to StarPU is easily done using the function @code{starpu_data_register} (@pxref{Basic Data Management API}). The last parameter of the function, @code{interface_complex_ops}, will be described below. @cartouche @smallexample void starpu_complex_data_register(starpu_data_handle_t *handle, uint32_t home_node, double *real, double *imaginary, int nx) @{ struct starpu_complex_interface complex = @{ .real = real, .imaginary = imaginary, .nx = nx @}; if (interface_complex_ops.interfaceid == -1) @{ interface_complex_ops.interfaceid = starpu_data_interface_get_next_id(); @} starpu_data_register(handleptr, home_node, &complex, &interface_complex_ops); @} @end smallexample @end cartouche Different operations need to be defined for a data interface through the type @code{struct starpu_data_interface_ops} (@pxref{Data Interface API}). We only define here the basic operations needed to run simple applications. The source code for the different functions can be found in the file @code{examples/interface/complex_interface.c}. @cartouche @smallexample static struct starpu_data_interface_ops interface_complex_ops = @{ .register_data_handle = complex_register_data_handle, .allocate_data_on_node = complex_allocate_data_on_node, .copy_methods = &complex_copy_methods, .get_size = complex_get_size, .footprint = complex_footprint, .interfaceid = -1, .interface_size = sizeof(struct starpu_complex_interface), @}; @end smallexample @end cartouche Functions need to be defined to access the different fields of the complex interface from a StarPU data handle. @cartouche @smallexample double *starpu_complex_get_real(starpu_data_handle_t handle) @{ struct starpu_complex_interface *complex_interface = (struct starpu_complex_interface *) starpu_data_get_interface_on_node(handle, 0); return complex_interface->real; @} double *starpu_complex_get_imaginary(starpu_data_handle_t handle); int starpu_complex_get_nx(starpu_data_handle_t handle); @end smallexample @end cartouche Similar functions need to be defined to access the different fields of the complex interface from a @code{void *} pointer to be used within codelet implemetations. @cartouche @smallexample #define STARPU_COMPLEX_GET_REAL(interface) \ (((struct starpu_complex_interface *)(interface))->real) #define STARPU_COMPLEX_GET_IMAGINARY(interface) \ (((struct starpu_complex_interface *)(interface))->imaginary) #define STARPU_COMPLEX_GET_NX(interface) \ (((struct starpu_complex_interface *)(interface))->nx) @end smallexample @end cartouche Complex data interfaces can then be registered to StarPU. @cartouche @smallexample double real = 45.0; double imaginary = 12.0; starpu_complex_data_register(&handle1, 0, &real, &imaginary, 1); starpu_insert_task(&cl_display, STARPU_R, handle1, 0); @end smallexample @end cartouche and used by codelets. @cartouche @smallexample void display_complex_codelet(void *descr[], __attribute__ ((unused)) void *_args) @{ int nx = STARPU_COMPLEX_GET_NX(descr[0]); double *real = STARPU_COMPLEX_GET_REAL(descr[0]); double *imaginary = STARPU_COMPLEX_GET_IMAGINARY(descr[0]); int i; for(i=0 ; itype} is not a valid StarPU device type (STARPU_CPU_WORKER, STARPU_CUDA_WORKER or STARPU_OPENCL_WORKER). This is the same as using the following functions: calling @code{starpu_driver_init()}, then calling @code{starpu_driver_run_once()} in a loop, and eventually @code{starpu_driver_deinit()}. @end deftypefun @deftypefun int starpu_driver_init (struct starpu_driver *@var{d}) Initialize the given driver. Returns 0 on success, -EINVAL if @code{d->type} is not a valid StarPU device type (STARPU_CPU_WORKER, STARPU_CUDA_WORKER or STARPU_OPENCL_WORKER). @end deftypefun @deftypefun int starpu_driver_run_once (struct starpu_driver *@var{d}) Run the driver once, then returns 0 on success, -EINVAL if @code{d->type} is not a valid StarPU device type (STARPU_CPU_WORKER, STARPU_CUDA_WORKER or STARPU_OPENCL_WORKER). @end deftypefun @deftypefun int starpu_driver_deinit (struct starpu_driver *@var{d}) Deinitialize the given driver. Returns 0 on success, -EINVAL if @code{d->type} is not a valid StarPU device type (STARPU_CPU_WORKER, STARPU_CUDA_WORKER or STARPU_OPENCL_WORKER). @end deftypefun @deftypefun void starpu_drivers_request_termination (void) Notify all running drivers they should terminate. @end deftypefun @node Example @subsection Example @cartouche @smallexample int ret; struct starpu_driver = @{ .type = STARPU_CUDA_WORKER, .id.cuda_id = 0 @}; ret = starpu_driver_init(&d); if (ret != 0) error(); while (some_condition) @{ ret = starpu_driver_run_once(&d); if (ret != 0) error(); @} ret = starpu_driver_deinit(&d); if (ret != 0) error(); @end smallexample @end cartouche @node Expert mode @section Expert mode @deftypefun void starpu_wake_all_blocked_workers (void) Wake all the workers, so they can inspect data requests and task submissions again. @end deftypefun @deftypefun int starpu_progression_hook_register (unsigned (*@var{func})(void *arg), void *@var{arg}) Register a progression hook, to be called when workers are idle. @end deftypefun @deftypefun void starpu_progression_hook_deregister (int @var{hook_id}) Unregister a given progression hook. @end deftypefun