| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152 | @c -*-texinfo-*-@c This file is part of the StarPU Handbook.@c Copyright (C) 2009--2011  Universit@'e de Bordeaux 1@c Copyright (C) 2010, 2011, 2012, 2013  Centre National de la Recherche Scientifique@c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique@c See the file starpu.texi for copying conditions.@menu* Insert Task::* Tracing support::* MPI Interface::* Defining a new data interface::* Multiformat Data Interface::* Task Bundles::* Task Lists::* Using Parallel Tasks::* Scheduling Contexts::* Defining a new scheduling policy::* Running drivers::* Expert mode::@end menu@node Insert Task@section Insert Task@deftypefun int starpu_insert_task (struct starpu_codelet *@var{cl}, ...)Create and submit a task corresponding to @var{cl} with the followingarguments.  The argument list must be zero-terminated.The arguments following the codelets can be of the following types:@itemize@item@code{STARPU_R}, @code{STARPU_W}, @code{STARPU_RW}, @code{STARPU_SCRATCH}, @code{STARPU_REDUX} an access mode followed by a data handle;@item@code{STARPU_DATA_ARRAY} followed by an array of data handles and its number of elements;@itemthe specific values @code{STARPU_VALUE}, @code{STARPU_CALLBACK},@code{STARPU_CALLBACK_ARG}, @code{STARPU_CALLBACK_WITH_ARG},@code{STARPU_PRIORITY}, @code{STARPU_TAG}, followed by the appropriated objectsas defined below.@end itemizeWhen using @code{STARPU_DATA_ARRAY}, the access mode of the datahandles is not defined.Parameters to be passed to the codelet implementation are definedthrough the type @code{STARPU_VALUE}. The function@code{starpu_codelet_unpack_args} must be called within the codeletimplementation to retrieve them.@end deftypefun@defmac STARPU_VALUEthis macro is used when calling @code{starpu_insert_task}, and must befollowed by a pointer to a constant value and the size of the constant@end defmac@defmac STARPU_CALLBACKthis macro is used when calling @code{starpu_insert_task}, and must befollowed by a pointer to a callback function@end defmac@defmac STARPU_CALLBACK_ARGthis macro is used when calling @code{starpu_insert_task}, and must befollowed by a pointer to be given as an argument to the callbackfunction@end defmac@defmac  STARPU_CALLBACK_WITH_ARGthis macro is used when calling @code{starpu_insert_task}, and must befollowed by two pointers: one to a callback function, and the other tobe given as an argument to the callback function; this is equivalentto using both @code{STARPU_CALLBACK} and@code{STARPU_CALLBACK_WITH_ARG}@end defmac@defmac STARPU_PRIORITYthis macro is used when calling @code{starpu_insert_task}, and must befollowed by a integer defining a priority level@end defmac@defmac STARPU_TAGthis macro is used when calling @code{starpu_insert_task}, and must befollowed by a tag.@end defmac@deftypefun void starpu_codelet_pack_args ({char **}@var{arg_buffer}, {size_t *}@var{arg_buffer_size}, ...)Pack arguments of type @code{STARPU_VALUE} into a buffer which can begiven to a codelet and later unpacked with the function@code{starpu_codelet_unpack_args} defined below.@end deftypefun@deftypefun void starpu_codelet_unpack_args ({void *}@var{cl_arg}, ...)Retrieve the arguments of type @code{STARPU_VALUE} associated to atask automatically created using the function@code{starpu_insert_task} defined above.@end deftypefun@node Tracing support@section Tracing support@deftypefun void starpu_fxt_start_profiling (void)Start recording the trace. The trace is by default started from@code{starpu_init()} call, unless setting @code{no_auto_start_trace} to 1 in@code{starpu_conf}, in which case @code{starpu_fxt_start_profiling} should becalled to specify when to start recording events.@end deftypefun@deftypefun void starpu_fxt_stop_profiling (void)Stop recording the trace. The trace is by default stopped at@code{starpu_shutdown()} call. @code{starpu_fxt_stop_profiling} can however beused to stop it earlier. @code{starpu_fxt_start_profiling} can then be called tostart recording it again, etc.@end deftypefun@node MPI Interface@section MPI Interface@menu* Initialisation::* Communication::* Communication cache::@end menu@node Initialisation@subsection Initialisation@deftypefun int starpu_mpi_init (int *@var{argc}, char ***@var{argv}, int initialize_mpi)Initializes the starpumpi library. @code{initialize_mpi} indicates ifMPI should be initialized or not by StarPU. If the value is not @code{0},MPI will be initialized by calling @code{MPI_Init_Thread(argc, argv,MPI_THREAD_SERIALIZED, ...)}.@end deftypefun@deftypefun int starpu_mpi_initialize (void)This function has been made deprecated. One should use instead thefunction @code{starpu_mpi_init()} defined above.This function does not call @code{MPI_Init}, it should be called beforehand.@end deftypefun@deftypefun int starpu_mpi_initialize_extended (int *@var{rank}, int *@var{world_size})This function has been made deprecated. One should use instead thefunction @code{starpu_mpi_init()} defined above.MPI will be initialized by starpumpi by calling @code{MPI_Init_Thread(argc, argv,MPI_THREAD_SERIALIZED, ...)}.@end deftypefun@deftypefun int starpu_mpi_shutdown (void)Cleans the starpumpi library. This must be called between calling@code{starpu_mpi} functions and @code{starpu_shutdown()}.@code{MPI_Finalize()} will be called if StarPU-MPI has been initializedby @code{starpu_mpi_init()}.@end deftypefun@deftypefun void starpu_mpi_comm_amounts_retrieve (size_t *@var{comm_amounts})Retrieve the current amount of communications from the current node inthe array @code{comm_amounts} which must have a size greater or equalto the world size. Communications statistics must be enabled(@pxref{STARPU_COMM_STATS}).@end deftypefun@node Communication@subsection CommunicationThe standard point to point communications of MPI have beenimplemented. The semantic is similar to the MPI one, but adapted tothe DSM provided by StarPU. A MPI request will only be submitted whenthe data is available in the main memory of the node submitting therequest.@deftypefun int starpu_mpi_send (starpu_data_handle_t @var{data_handle}, int @var{dest}, int @var{mpi_tag}, MPI_Comm @var{comm})Performs a standard-mode, blocking send of @var{data_handle} to thenode @var{dest} using the message tag @code{mpi_tag} within thecommunicator @var{comm}.@end deftypefun@deftypefun int starpu_mpi_recv (starpu_data_handle_t @var{data_handle}, int @var{source}, int @var{mpi_tag}, MPI_Comm @var{comm}, MPI_Status *@var{status})Performs a standard-mode, blocking receive in @var{data_handle} from thenode @var{source} using the message tag @code{mpi_tag} within thecommunicator @var{comm}.@end deftypefun@deftypefun int starpu_mpi_isend (starpu_data_handle_t @var{data_handle}, starpu_mpi_req *@var{req}, int @var{dest}, int @var{mpi_tag}, MPI_Comm @var{comm})Posts a standard-mode, non blocking send of @var{data_handle} to thenode @var{dest} using the message tag @code{mpi_tag} within thecommunicator @var{comm}. After the call, the pointer to the request@var{req} can be used to test the completion of the communication.@end deftypefun@deftypefun int starpu_mpi_irecv (starpu_data_handle_t @var{data_handle}, starpu_mpi_req *@var{req}, int @var{source}, int @var{mpi_tag}, MPI_Comm @var{comm})Posts a nonblocking receive in @var{data_handle} from thenode @var{source} using the message tag @code{mpi_tag} within thecommunicator @var{comm}. After the call, the pointer to the request@var{req} can be used to test the completion of the communication.@end deftypefun@deftypefun int starpu_mpi_isend_detached (starpu_data_handle_t @var{data_handle}, int @var{dest}, int @var{mpi_tag}, MPI_Comm @var{comm}, void (*@var{callback})(void *), void *@var{arg})Posts a standard-mode, non blocking send of @var{data_handle} to thenode @var{dest} using the message tag @code{mpi_tag} within thecommunicator @var{comm}. On completion, the @var{callback} function iscalled with the argument @var{arg}.@end deftypefun@deftypefun int starpu_mpi_irecv_detached (starpu_data_handle_t @var{data_handle}, int @var{source}, int @var{mpi_tag}, MPI_Comm @var{comm}, void (*@var{callback})(void *), void *@var{arg})Posts a nonblocking receive in @var{data_handle} from thenode @var{source} using the message tag @code{mpi_tag} within thecommunicator @var{comm}. On completion, the @var{callback} function iscalled with the argument @var{arg}.@end deftypefun@deftypefun int starpu_mpi_wait (starpu_mpi_req *@var{req}, MPI_Status *@var{status})Returns when the operation identified by request @var{req} is complete.@end deftypefun@deftypefun int starpu_mpi_test (starpu_mpi_req *@var{req}, int *@var{flag}, MPI_Status *@var{status})If the operation identified by @var{req} is complete, set @var{flag}to 1. The @var{status} object is set to contain information on thecompleted operation.@end deftypefun@deftypefun int starpu_mpi_barrier (MPI_Comm @var{comm})Blocks the caller until all group members of the communicator@var{comm} have called it.@end deftypefun@deftypefun int starpu_mpi_isend_detached_unlock_tag (starpu_data_handle_t @var{data_handle}, int @var{dest}, int @var{mpi_tag}, MPI_Comm @var{comm}, starpu_tag_t @var{tag})Posts a standard-mode, non blocking send of @var{data_handle} to thenode @var{dest} using the message tag @code{mpi_tag} within thecommunicator @var{comm}. On completion, @var{tag} is unlocked.@end deftypefun@deftypefun int starpu_mpi_irecv_detached_unlock_tag (starpu_data_handle_t @var{data_handle}, int @var{source}, int @var{mpi_tag}, MPI_Comm @var{comm}, starpu_tag_t @var{tag})Posts a nonblocking receive in @var{data_handle} from thenode @var{source} using the message tag @code{mpi_tag} within thecommunicator @var{comm}. On completion, @var{tag} is unlocked.@end deftypefun@deftypefun int starpu_mpi_isend_array_detached_unlock_tag (unsigned @var{array_size}, starpu_data_handle_t *@var{data_handle}, int *@var{dest}, int *@var{mpi_tag}, MPI_Comm *@var{comm}, starpu_tag_t @var{tag})Posts @var{array_size} standard-mode, non blocking send. Each postsends the n-th data of the array @var{data_handle} to the n-th node ofthe array @var{dest}using the n-th message tag of the array @code{mpi_tag} within the n-thcommunicator of the array@var{comm}. On completion of the all the requests, @var{tag} is unlocked.@end deftypefun@deftypefun int starpu_mpi_irecv_array_detached_unlock_tag (unsigned @var{array_size}, starpu_data_handle_t *@var{data_handle}, int *@var{source}, int *@var{mpi_tag}, MPI_Comm *@var{comm}, starpu_tag_t @var{tag})Posts @var{array_size} nonblocking receive. Each post receives in then-th data of the array @var{data_handle} from the n-thnode of the array @var{source} using the n-th message tag of the array@code{mpi_tag} within the n-th communicator of the array @var{comm}.On completion of the all the requests, @var{tag} is unlocked.@end deftypefun@node Communication cache@subsection Communication cache@deftypefun void starpu_mpi_cache_flush (MPI_Comm @var{comm}, starpu_data_handle_t @var{data_handle})Clear the send and receive communication cache for the data@var{data_handle}. The function has to be called synchronously by allthe MPI nodes.The function does nothing if the cache mechanism is disabled (@pxref{STARPU_MPI_CACHE}).@end deftypefun@deftypefun void starpu_mpi_cache_flush_all_data (MPI_Comm @var{comm})Clear the send and receive communication cache for all data. Thefunction has to be called synchronously by all the MPI nodes.The function does nothing if the cache mechanism is disabled (@pxref{STARPU_MPI_CACHE}).@end deftypefun@node Defining a new data interface@section Defining a new data interface@menu* Data Interface API::  Data Interface API* An example of data interface::        An example of data interface@end menu@node Data Interface API@subsection Data Interface API@deftp {Data Type} {struct starpu_data_interface_ops}@anchor{struct starpu_data_interface_ops}Per-interface data transfer methods.@table @asis@item @code{void (*register_data_handle)(starpu_data_handle_t handle, uint32_t home_node, void *data_interface)}Register an existing interface into a data handle.@item @code{starpu_ssize_t (*allocate_data_on_node)(void *data_interface, uint32_t node)}Allocate data for the interface on a given node.@item @code{ void (*free_data_on_node)(void *data_interface, uint32_t node)}Free data of the interface on a given node.@item @code{ const struct starpu_data_copy_methods *copy_methods}ram/cuda/spu/opencl synchronous and asynchronous transfer methods.@item @code{ void * (*handle_to_pointer)(starpu_data_handle_t handle, uint32_t node)}Return the current pointer (if any) for the handle on the given node.@item @code{ size_t (*get_size)(starpu_data_handle_t handle)}Return an estimation of the size of data, for performance models.@item @code{ uint32_t (*footprint)(starpu_data_handle_t handle)}Return a 32bit footprint which characterizes the data size.@item @code{ int (*compare)(void *data_interface_a, void *data_interface_b)}Compare the data size of two interfaces.@item @code{ void (*display)(starpu_data_handle_t handle, FILE *f)}Dump the sizes of a handle to a file.@item @code{enum starpu_data_interface_id interfaceid}An identifier that is unique to each interface.@item @code{size_t interface_size}The size of the interface data descriptor.@item @code{int is_multiformat}todo@item @code{struct starpu_multiformat_data_interface_ops* (*get_mf_ops)(void *data_interface)}todo@item @code{int (*pack_data)(starpu_data_handle_t handle, uint32_t node, void **ptr, size_t *count)}Pack the data handle into a contiguous buffer at the address @code{ptr} and set the size of the newly created buffer in @code{count}@item @code{int (*unpack_data)(starpu_data_handle_t handle, uint32_t node, void *ptr, size_t count)}Unpack the data handle from the contiguous buffer at the address @code{ptr} of size @var{count}@end table@end deftp@deftp {Data Type} {struct starpu_data_copy_methods}Defines the per-interface methods.@table @asis@item @code{int @{ram,cuda,opencl,spu@}_to_@{ram,cuda,opencl,spu@}(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node)}These 16 functions define how to copy data from the @var{src_interface}interface on the @var{src_node} node to the @var{dst_interface} interfaceon the @var{dst_node} node. They return 0 on success.@item @code{int (*ram_to_cuda_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, cudaStream_t stream)}Define how to copy data from the @var{src_interface} interface on the@var{src_node} node (in RAM) to the @var{dst_interface} interface on the@var{dst_node} node (on a CUDA device), using the given @var{stream}. Return 0on success.@item @code{int (*cuda_to_ram_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, cudaStream_t stream)}Define how to copy data from the @var{src_interface} interface on the@var{src_node} node (on a CUDA device) to the @var{dst_interface} interface on the@var{dst_node} node (in RAM), using the given @var{stream}. Return 0on success.@item @code{int (*cuda_to_cuda_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, cudaStream_t stream)}Define how to copy data from the @var{src_interface} interface on the@var{src_node} node (on a CUDA device) to the @var{dst_interface} interface onthe @var{dst_node} node (on another CUDA device), using the given @var{stream}.Return 0 on success.@item @code{int (*ram_to_opencl_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, /* cl_event * */ void *event)}Define how to copy data from the @var{src_interface} interface on the@var{src_node} node (in RAM) to the @var{dst_interface} interface on the@var{dst_node} node (on an OpenCL device), using @var{event}, a pointer to acl_event. Return 0 on success.@item @code{int (*opencl_to_ram_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, /* cl_event * */ void *event)}Define how to copy data from the @var{src_interface} interface on the@var{src_node} node (on an OpenCL device) to the @var{dst_interface} interfaceon the @var{dst_node} node (in RAM), using the given @var{event}, a pointer toa cl_event. Return 0 on success.@item @code{int (*opencl_to_opencl_async)(void *src_interface, unsigned src_node, void *dst_interface, unsigned dst_node, /* cl_event * */ void *event)}Define how to copy data from the @var{src_interface} interface on the@var{src_node} node (on an OpenCL device) to the @var{dst_interface} interfaceon the @var{dst_node} node (on another OpenCL device), using the given@var{event}, a pointer to a cl_event. Return 0 on success.@end table@end deftp@deftypefun uint32_t starpu_crc32_be_n ({void *}@var{input}, size_t @var{n}, uint32_t @var{inputcrc})Compute the CRC of a byte buffer seeded by the inputcrc "currentstate". The return value should be considered as the new "currentstate" for future CRC computation. This is used for computing data sizefootprint.@end deftypefun@deftypefun uint32_t starpu_crc32_be (uint32_t @var{input}, uint32_t @var{inputcrc})Compute the CRC of a 32bit number seeded by the inputcrc "currentstate". The return value should be considered as the new "currentstate" for future CRC computation. This is used for computing data sizefootprint.@end deftypefun@deftypefun uint32_t starpu_crc32_string ({char *}@var{str}, uint32_t @var{inputcrc})Compute the CRC of a string seeded by the inputcrc "current state".The return value should be considered as the new "current state" forfuture CRC computation. This is used for computing data size footprint.@end deftypefun@node An example of data interface@subsection An example of data interface@deftypefun int starpu_data_interface_get_next_id (void)Returns the next available id for a newly created data interface.@end deftypefunLet's define a new data interface to manage complex numbers.@cartouche@smallexample/* interface for complex numbers */struct starpu_complex_interface@{        double *real;        double *imaginary;        int nx;@};@end smallexample@end cartoucheRegistering such a data to StarPU is easily done using the function@code{starpu_data_register} (@pxref{Basic Data Management API}). The lastparameter of the function, @code{interface_complex_ops}, will bedescribed below.@cartouche@smallexamplevoid starpu_complex_data_register(starpu_data_handle_t *handle,     uint32_t home_node, double *real, double *imaginary, int nx)@{        struct starpu_complex_interface complex =        @{                .real = real,                .imaginary = imaginary,                .nx = nx        @};        if (interface_complex_ops.interfaceid == -1)        @{                interface_complex_ops.interfaceid = starpu_data_interface_get_next_id();        @}        starpu_data_register(handleptr, home_node, &complex, &interface_complex_ops);@}@end smallexample@end cartoucheDifferent operations need to be defined for a data interface throughthe type @code{struct starpu_data_interface_ops} (@pxref{DataInterface API}). We only define here the basic operations needed torun simple applications. The source code for the different functionscan be found in the file@code{examples/interface/complex_interface.c}.@cartouche@smallexamplestatic struct starpu_data_interface_ops interface_complex_ops =@{        .register_data_handle = complex_register_data_handle,        .allocate_data_on_node = complex_allocate_data_on_node,        .copy_methods = &complex_copy_methods,        .get_size = complex_get_size,        .footprint = complex_footprint,        .interfaceid = -1,        .interface_size = sizeof(struct starpu_complex_interface),@};@end smallexample@end cartoucheFunctions need to be defined to access the different fields of thecomplex interface from a StarPU data handle.@cartouche@smallexampledouble *starpu_complex_get_real(starpu_data_handle_t handle)@{        struct starpu_complex_interface *complex_interface =          (struct starpu_complex_interface *) starpu_data_get_interface_on_node(handle, 0);        return complex_interface->real;@}double *starpu_complex_get_imaginary(starpu_data_handle_t handle);int starpu_complex_get_nx(starpu_data_handle_t handle);@end smallexample@end cartoucheSimilar functions need to be defined to access the different fields of thecomplex interface from a @code{void *} pointer to be used within codeletimplemetations.@cartouche@smallexample#define STARPU_COMPLEX_GET_REAL(interface)	\        (((struct starpu_complex_interface *)(interface))->real)#define STARPU_COMPLEX_GET_IMAGINARY(interface)	\        (((struct starpu_complex_interface *)(interface))->imaginary)#define STARPU_COMPLEX_GET_NX(interface)	\        (((struct starpu_complex_interface *)(interface))->nx)@end smallexample@end cartoucheComplex data interfaces can then be registered to StarPU.@cartouche@smallexampledouble real = 45.0;double imaginary = 12.0;starpu_complex_data_register(&handle1, 0, &real, &imaginary, 1);starpu_insert_task(&cl_display, STARPU_R, handle1, 0);@end smallexample@end cartoucheand used by codelets.@cartouche@smallexamplevoid display_complex_codelet(void *descr[], __attribute__ ((unused)) void *_args)@{        int nx = STARPU_COMPLEX_GET_NX(descr[0]);        double *real = STARPU_COMPLEX_GET_REAL(descr[0]);        double *imaginary = STARPU_COMPLEX_GET_IMAGINARY(descr[0]);        int i;        for(i=0 ; i<nx ; i++)        @{                fprintf(stderr, "Complex[%d] = %3.2f + %3.2f i\n", i, real[i], imaginary[i]);        @}@}@end smallexample@end cartoucheThe whole code for this complex data interface is available in thedirectory @code{examples/interface/}.@node Multiformat Data Interface@section Multiformat Data Interface@deftp {Data Type} {struct starpu_multiformat_data_interface_ops}The different fields are:@table @asis@item @code{size_t cpu_elemsize}the size of each element on CPUs,@item @code{size_t opencl_elemsize}the size of each element on OpenCL devices,@item @code{struct starpu_codelet *cpu_to_opencl_cl}pointer to a codelet which converts from CPU to OpenCL@item @code{struct starpu_codelet *opencl_to_cpu_cl}pointer to a codelet which converts from OpenCL to CPU@item @code{size_t cuda_elemsize}the size of each element on CUDA devices,@item @code{struct starpu_codelet *cpu_to_cuda_cl}pointer to a codelet which converts from CPU to CUDA@item @code{struct starpu_codelet *cuda_to_cpu_cl}pointer to a codelet which converts from CUDA to CPU@end table@end deftp@deftypefun void starpu_multiformat_data_register (starpu_data_handle_t *@var{handle}, uint32_t @var{home_node}, void *@var{ptr}, uint32_t @var{nobjects}, struct starpu_multiformat_data_interface_ops *@var{format_ops})Register a piece of data that can be represented in different ways, depending uponthe processing unit that manipulates it. It allows the programmer, for instance, touse an array of structures when working on a CPU, and a structure of arrays whenworking on a GPU.@var{nobjects} is the number of elements in the data. @var{format_ops} describesthe format.@end deftypefun@defmac STARPU_MULTIFORMAT_GET_CPU_PTR ({void *}@var{interface})returns the local pointer to the data with CPU format.@end defmac@defmac STARPU_MULTIFORMAT_GET_CUDA_PTR ({void *}@var{interface})returns the local pointer to the data with CUDA format.@end defmac@defmac STARPU_MULTIFORMAT_GET_OPENCL_PTR ({void *}@var{interface})returns the local pointer to the data with OpenCL format.@end defmac@defmac STARPU_MULTIFORMAT_GET_NX  ({void *}@var{interface})returns the number of elements in the data.@end defmac@node Task Bundles@section Task Bundles@deftp {Data Type} {starpu_task_bundle_t}Opaque structure describing a list of tasks that should be scheduledon the same worker whenever it's possible. It must be considered as ahint given to the scheduler as there is no guarantee that they will beexecuted on the same worker.@end deftp@deftypefun void starpu_task_bundle_create ({starpu_task_bundle_t *}@var{bundle})Factory function creating and initializing @var{bundle}, when the call returns, memory needed is allocated and @var{bundle} is ready to use.@end deftypefun@deftypefun int starpu_task_bundle_insert (starpu_task_bundle_t @var{bundle}, {struct starpu_task *}@var{task})Insert @var{task} in @var{bundle}. Until @var{task} is removed from @var{bundle} its expected length and data transfer time will be considered along those of the other tasks of @var{bundle}.This function mustn't be called if @var{bundle} is already closed and/or @var{task} is already submitted.@end deftypefun@deftypefun int starpu_task_bundle_remove (starpu_task_bundle_t @var{bundle}, {struct starpu_task *}@var{task})Remove @var{task} from @var{bundle}.Of course @var{task} must have been previously inserted @var{bundle}.This function mustn't be called if @var{bundle} is already closed and/or @var{task} is already submitted. Doing so would result in undefined behaviour.@end deftypefun@deftypefun void starpu_task_bundle_close (starpu_task_bundle_t @var{bundle})Inform the runtime that the user won't modify @var{bundle} anymore, it means no more inserting or removing task. Thus the runtime can destroy it when possible.@end deftypefun@deftypefun double starpu_task_bundle_expected_length (starpu_task_bundle_t @var{bundle}, {enum starpu_perf_archtype} @var{arch}, unsigned @var{nimpl})Return the expected duration of the entire task bundle in µs.@end deftypefun@deftypefun double starpu_task_bundle_expected_power (starpu_task_bundle_t @var{bundle}, enum starpu_perf_archtype @var{arch}, unsigned @var{nimpl})Return the expected power consumption of the entire task bundle in J.@end deftypefun@deftypefun double starpu_task_bundle_expected_data_transfer_time (starpu_task_bundle_t @var{bundle}, unsigned @var{memory_node})Return the time (in µs) expected to transfer all data used within the bundle.@end deftypefun@node Task Lists@section Task Lists@deftp {Data Type} {struct starpu_task_list}Stores a double-chained list of tasks@end deftp@deftypefun void starpu_task_list_init ({struct starpu_task_list *}@var{list})Initialize a list structure@end deftypefun@deftypefun void starpu_task_list_push_front ({struct starpu_task_list *}@var{list}, {struct starpu_task *}@var{task})Push a task at the front of a list@end deftypefun@deftypefun void starpu_task_list_push_back ({struct starpu_task_list *}@var{list}, {struct starpu_task *}@var{task})Push a task at the back of a list@end deftypefun@deftypefun {struct starpu_task *} starpu_task_list_front ({struct starpu_task_list *}@var{list})Get the front of the list (without removing it)@end deftypefun@deftypefun {struct starpu_task *} starpu_task_list_back ({struct starpu_task_list *}@var{list})Get the back of the list (without removing it)@end deftypefun@deftypefun int starpu_task_list_empty ({struct starpu_task_list *}@var{list})Test if a list is empty@end deftypefun@deftypefun void starpu_task_list_erase ({struct starpu_task_list *}@var{list}, {struct starpu_task *}@var{task})Remove an element from the list@end deftypefun@deftypefun {struct starpu_task *} starpu_task_list_pop_front ({struct starpu_task_list *}@var{list})Remove the element at the front of the list@end deftypefun@deftypefun {struct starpu_task *} starpu_task_list_pop_back ({struct starpu_task_list *}@var{list})Remove the element at the back of the list@end deftypefun@deftypefun {struct starpu_task *} starpu_task_list_begin ({struct starpu_task_list *}@var{list})Get the first task of the list.@end deftypefun@deftypefun {struct starpu_task *} starpu_task_list_end ({struct starpu_task_list *}@var{list})Get the end of the list.@end deftypefun@deftypefun {struct starpu_task *} starpu_task_list_next ({struct starpu_task *}@var{task})Get the next task of the list. This is not erase-safe.@end deftypefun@node Using Parallel Tasks@section Using Parallel TasksThese are used by parallel tasks:@deftypefun int starpu_combined_worker_get_size (void)Return the size of the current combined worker, i.e. the total number of cpusrunning the same task in the case of SPMD parallel tasks, or the total numberof threads that the task is allowed to start in the case of FORKJOIN paralleltasks.@end deftypefun@deftypefun int starpu_combined_worker_get_rank (void)Return the rank of the current thread within the combined worker. Can only beused in FORKJOIN parallel tasks, to know which part of the task to work on.@end deftypefunMost of these are used for schedulers which support parallel tasks.@deftypefun unsigned starpu_combined_worker_get_count (void)Return the number of different combined workers.@end deftypefun@deftypefun int starpu_combined_worker_get_id (void)Return the identifier of the current combined worker.@end deftypefun@deftypefun int starpu_combined_worker_assign_workerid (int @var{nworkers}, int @var{workerid_array}[])Register a new combined worker and get its identifier@end deftypefun@deftypefun int starpu_combined_worker_get_description (int @var{workerid}, {int *}@var{worker_size}, {int **}@var{combined_workerid})Get the description of a combined worker@end deftypefun@deftypefun int starpu_combined_worker_can_execute_task (unsigned @var{workerid}, {struct starpu_task *}@var{task}, unsigned @var{nimpl})Variant of starpu_worker_can_execute_task compatible with combined workers@end deftypefun@deftp {Data Type} {struct starpu_machine_topology}@table @asis@item @code{unsigned nworkers}Total number of workers.@item @code{unsigned ncombinedworkers}Total number of combined workers.@item @code{hwloc_topology_t hwtopology}Topology as detected by hwloc.To maintain ABI compatibility when hwloc is not available, the fieldis replaced with @code{void *dummy}@item @code{unsigned nhwcpus}Total number of CPUs, as detected by the topology code. May be different fromthe actual number of CPU workers.@item @code{unsigned nhwcudagpus}Total number of CUDA devices, as detected. May be different from the actualnumber of CUDA workers.@item @code{unsigned nhwopenclgpus}Total number of OpenCL devices, as detected. May be different from the actualnumber of CUDA workers.@item @code{unsigned ncpus}Actual number of CPU workers used by StarPU.@item @code{unsigned ncudagpus}Actual number of CUDA workers used by StarPU.@item @code{unsigned nopenclgpus}Actual number of OpenCL workers used by StarPU.@item @code{unsigned workers_bindid[STARPU_NMAXWORKERS]}Indicates the successive cpu identifier that should be used to bind theworkers. It is either filled according to the user's explicitparameters (from starpu_conf) or according to the STARPU_WORKERS_CPUID env.variable. Otherwise, a round-robin policy is used to distributed the workersover the cpus.@item @code{unsigned workers_cuda_gpuid[STARPU_NMAXWORKERS]}Indicates the successive cpu identifier that should be used by the CUDAdriver.  It is either filled according to the user's explicit parameters (fromstarpu_conf) or according to the STARPU_WORKERS_CUDAID env. variable. Otherwise,they are taken in ID order.@item @code{unsigned workers_opencl_gpuid[STARPU_NMAXWORKERS]}Indicates the successive cpu identifier that should be used by the OpenCLdriver.  It is either filled according to the user's explicit parameters (fromstarpu_conf) or according to the STARPU_WORKERS_OPENCLID env. variable. Otherwise,they are taken in ID order.@end table@end deftp@node Scheduling Contexts@section Scheduling ContextsStarPU permits on one hand grouping workers in combined workers in order to execute a parallel task and on the other hand grouping tasks in bundles that will be executed by a single specified worker.In contrast when we group workers in scheduling contexts we submit starpu tasks to them and we schedule them with the policy assigned to the context.Scheduling contexts can be created, deleted and modified dynamically.@deftypefun unsigned starpu_sched_ctx_create (const char *@var{policy_name}, int *@var{workerids_ctx}, int @var{nworkers_ctx}, const char *@var{sched_ctx_name})This function creates a scheduling context which uses the scheduling policy indicated in the first argument and assigns the workers indicated in the second argument to execute the tasks submitted to it.The return value represents the identifier of the context that has just been created. It will be further used to indicate the context the tasks will be submitted to. The return value should be at most @code{STARPU_NMAX_SCHED_CTXS}.@end deftypefun@deftypefun void starpu_sched_ctx_delete (unsigned @var{sched_ctx_id})Delete scheduling context @var{sched_ctx_id} and transfer remaining workers to the inheritor scheduling context.@end deftypefun@deftypefun void starpu_sched_ctx_add_workers ({int *}@var{workerids_ctx}, int @var{nworkers_ctx}, unsigned @var{sched_ctx_id})This function adds dynamically the workers indicated in the first argument to the context indicated in the last argument. The last argument cannot be greater than  @code{STARPU_NMAX_SCHED_CTXS}.@end deftypefun@deftypefun void starpu_sched_ctx_remove_workers ({int *}@var{workerids_ctx}, int @var{nworkers_ctx}, unsigned @var{sched_ctx_id})This function removes the workers indicated in the first argument from the context indicated in the last argument. The last argument cannot be greater than  @code{STARPU_NMAX_SCHED_CTXS}.@end deftypefunA scheduling context manages a collection of workers that can be memorized using different data structures. Thus, a generic structure is available in order to simplify the choice of its type.Only the list data structure is available but further data structures(like tree) implementations are foreseen.@deftp {Data Type} {struct starpu_sched_ctx_worker_collection}@table @asis@item @code{void *workerids}The workerids managed by the collection@item @code{unsigned nworkers}The number of workerids@item @code{pthread_key_t cursor_key} (optional)The cursor needed to iterate the collection (depending on the data structure)@item @code{int type}The type of structure (currently STARPU_WORKER_LIST is the only one available)@item @code{unsigned (*has_next)(struct starpu_sched_ctx_worker_collection *workers)}Checks if there is a next worker@item @code{int (*get_next)(struct starpu_sched_ctx_worker_collection *workers)}Gets the next worker@item @code{int (*add)(struct starpu_sched_ctx_worker_collection *workers, int worker)}Adds a worker to the collection@item @code{int (*remove)(struct starpu_sched_ctx_worker_collection *workers, int worker)}Removes a worker from the collection@item @code{void* (*init)(struct starpu_sched_ctx_worker_collection *workers)}Initialize the collection@item @code{void (*deinit)(struct starpu_sched_ctx_worker_collection *workers)}Deinitialize the colection@item @code{void (*init_cursor)(struct starpu_sched_ctx_worker_collection *workers)} (optional)Initialize the cursor if there is one@item @code{void (*deinit_cursor)(struct starpu_sched_ctx_worker_collection *workers)} (optional)Deinitialize the cursor if there is one@end table@end deftp@deftypefun struct starpu_sched_ctx_worker_collection* starpu_sched_ctx_create_worker_collection (unsigned @var{sched_ctx_id}, int @var{type})Create a worker collection of the type indicated by the last parameter for the context specified through the first parameter.@end deftypefun@deftypefun void starpu_sched_ctx_delete_worker_collection (unsigned @var{sched_ctx_id})Delete the worker collection of the specified scheduling context@end deftypefun@deftypefun struct starpu_sched_ctx_worker_collection* starpu_sched_ctx_get_worker_collection (unsigned @var{sched_ctx_id})Return the worker collection managed by the indicated context@end deftypefun@deftypefun pthread_mutex_t* starpu_get_changing_ctx_mutex (unsigned @var{sched_ctx_id})TODO@end deftypefun@deftypefun void starpu_task_set_context (unsigned *@var{sched_ctx_id})Set the scheduling context the subsequent tasks will be submitted to@end deftypefun@deftypefun unsigned starpu_task_get_context (void)Return the scheduling context the tasks are currently submitted to@end deftypefun@deftypefun unsigned starpu_sched_ctx_get_nworkers (unsigned @var{sched_ctx_id})Return the number of workers managed by the specified contexts(Usually needed to verify if it manages any workers or if it should be blocked)@end deftypefun@deftypefun unsigned starpu_sched_ctx_get_nshared_workers (unsigned @var{sched_ctx_id}, unsigned @var{sched_ctx_id2})Return the number of workers shared by two contexts@end deftypefun@node Defining a new scheduling policy@section Defining a new scheduling policyTODOA full example showing how to define a new scheduling policy is available inthe StarPU sources in the directory @code{examples/scheduler/}.@menu* Scheduling Policy API:: Scheduling Policy API* Source code::@end menu@node Scheduling Policy API@subsection Scheduling Policy APIWhile StarPU comes with a variety of scheduling policies (@pxref{Taskscheduling policy}), it may sometimes be desirable to implement custompolicies to address specific problems.  The API described below allowsusers to write their own scheduling policy.@deftp {Data Type} {struct starpu_sched_policy}This structure contains all the methods that implement a scheduling policy.  Anapplication may specify which scheduling strategy in the @code{sched_policy}field of the @code{starpu_conf} structure passed to the @code{starpu_init}function. The different fields are:@table @asis@item @code{void (*init_sched)(unsigned sched_ctx_id)}Initialize the scheduling policy.@item @code{void (*deinit_sched)(unsigned sched_ctx_id)}Cleanup the scheduling policy.@item @code{int (*push_task)(struct starpu_task *)}Insert a task into the scheduler.@item @code{void (*push_task_notify)(struct starpu_task *, int workerid)}Notify the scheduler that a task was pushed on a given worker. This method iscalled when a task that was explicitely assigned to a worker becomes ready andis about to be executed by the worker. This method therefore permits to keepthe state of of the scheduler coherent even when StarPU bypasses the schedulingstrategy.@item @code{struct starpu_task *(*pop_task)(unsigned sched_ctx_id)} (optional)Get a task from the scheduler. The mutex associated to the worker is alreadytaken when this method is called. If this method is defined as @code{NULL}, theworker will only execute tasks from its local queue. In this case, the@code{push_task} method should use the @code{starpu_push_local_task} method toassign tasks to the different workers.@item @code{struct starpu_task *(*pop_every_task)(unsigned sched_ctx_id)}Remove all available tasks from the scheduler (tasks are chained by the meansof the prev and next fields of the starpu_task structure). The mutex associatedto the worker is already taken when this method is called. This is currentlynot used.@item @code{void (*pre_exec_hook)(struct starpu_task *)} (optional)This method is called every time a task is starting.@item @code{void (*post_exec_hook)(struct starpu_task *)} (optional)This method is called every time a task has been executed.@item @code{void (*add_workers)(unsigned sched_ctx_id, int *workerids, unsigned nworkers)}Initialize scheduling structures corresponding to each worker used by the policy.@item @code{void (*remove_workers)(unsigned sched_ctx_id, int *workerids, unsigned nworkers)}Deinitialize scheduling structures corresponding to each worker used by the policy.@item @code{const char *policy_name} (optional)Name of the policy.@item @code{const char *policy_description} (optional)Description of the policy.@end table@end deftp@deftypefun {struct starpu_sched_policy **} starpu_sched_get_predefined_policies ()Return an NULL-terminated array of all the predefined scheduling policies.@end deftypefun@deftypefun void starpu_sched_ctx_set_worker_mutex_and_cond (unsigned @var{sched_ctx_id}, int @var{workerid}, pthread_mutex_t *@var{sched_mutex}, {pthread_cond_t *}@var{sched_cond})This function specifies the condition variable associated to a worker per contextWhen there is no available task for a worker, StarPU blocks this worker on acondition variable. This function specifies which condition variable (and theassociated mutex) should be used to block (and to wake up) a worker. Note thatmultiple workers may use the same condition variable. For instance, in the caseof a scheduling strategy with a single task queue, the same condition variablewould be used to block and wake up all workers.The initialization method of a scheduling strategy (@code{init_sched}) mustcall this function once per worker.@end deftypefun@deftypefun void starpu_sched_ctx_get_worker_mutex_and_cond (unsigned @var{sched_ctx_id}, int @var{workerid}, {pthread_mutex_t **}@var{sched_mutex}, {pthread_cond_t **}@var{sched_cond})This function returns the condition variables associated to a worker in a contextIt is used in the policy to access to the local queue of the worker@end deftypefun@deftypefun void starpu_sched_ctx_set_policy_data (unsigned @var{sched_ctx_id}, {void *} @var{policy_data})Each scheduling policy uses some specific data (queues, variables, additional condition variables).It is memorize through a local structure. This function assigns it to a scheduling context.@end deftypefun@deftypefun void* starpu_sched_ctx_get_policy_data (unsigned @var{sched_ctx_id})Returns the policy data previously assigned to a context@end deftypefun@deftypefun void starpu_sched_set_min_priority (int @var{min_prio})Defines the minimum priority level supported by the scheduling policy. Thedefault minimum priority level is the same as the default priority level whichis 0 by convention.  The application may access that value by calling the@code{starpu_sched_get_min_priority} function. This function should only becalled from the initialization method of the scheduling policy, and should notbe used directly from the application.@end deftypefun@deftypefun void starpu_sched_set_max_priority (int @var{max_prio})Defines the maximum priority level supported by the scheduling policy. Thedefault maximum priority level is 1.  The application may access that value bycalling the @code{starpu_sched_get_max_priority} function. This function shouldonly be called from the initialization method of the scheduling policy, andshould not be used directly from the application.@end deftypefun@deftypefun int starpu_sched_get_min_priority (void)Returns the current minimum priority level supported by thescheduling policy@end deftypefun@deftypefun int starpu_sched_get_max_priority (void)Returns the current maximum priority level supported by thescheduling policy@end deftypefun@deftypefun int starpu_push_local_task (int @var{workerid}, {struct starpu_task} *@var{task}, int @var{back})The scheduling policy may put tasks directly into a worker's local queue sothat it is not always necessary to create its own queue when the local queueis sufficient. If @var{back} not null, @var{task} is put at the back of the queuewhere the worker will pop tasks first. Setting @var{back} to 0 therefore ensuresa FIFO ordering.@end deftypefun@deftypefun int starpu_worker_can_execute_task (unsigned @var{workerid}, {struct starpu_task *}@var{task}, unsigned {nimpl})Check if the worker specified by workerid can execute the codelet. Schedulers need to call it before assigning a task to a worker, otherwise the task may fail to execute.@end deftypefun@deftypefun double starpu_timing_now (void)Return the current date in µs@end deftypefun@deftypefun double starpu_task_expected_length ({struct starpu_task *}@var{task}, {enum starpu_perf_archtype} @var{arch}, unsigned @var{nimpl})Returns expected task duration in µs@end deftypefun@deftypefun double starpu_worker_get_relative_speedup ({enum starpu_perf_archtype} @var{perf_archtype})Returns an estimated speedup factor relative to CPU speed@end deftypefun@deftypefun double starpu_task_expected_data_transfer_time (uint32_t @var{memory_node}, {struct starpu_task *}@var{task})Returns expected data transfer time in µs@end deftypefun@deftypefun double starpu_data_expected_transfer_time (starpu_data_handle_t @var{handle}, unsigned @var{memory_node}, {enum starpu_access_mode} @var{mode})Predict the transfer time (in µs) to move a handle to a memory node@end deftypefun@deftypefun double starpu_task_expected_power ({struct starpu_task *}@var{task}, {enum starpu_perf_archtype} @var{arch}, unsigned @var{nimpl})Returns expected power consumption in J@end deftypefun@deftypefun double starpu_task_expected_conversion_time ({struct starpu_task *}@var{task}, {enum starpu_perf_archtype} @var{arch}, unsigned {nimpl})Returns expected conversion time in ms (multiformat interface only)@end deftypefun@node Source code@subsection Source code@cartouche@smallexamplestatic struct starpu_sched_policy dummy_sched_policy = @{    .init_sched = init_dummy_sched,    .deinit_sched = deinit_dummy_sched,    .add_workers = dummy_sched_add_workers,    .remove_workers = dummy_sched_remove_workers,    .push_task = push_task_dummy,    .push_prio_task = NULL,    .pop_task = pop_task_dummy,    .post_exec_hook = NULL,    .pop_every_task = NULL,    .policy_name = "dummy",    .policy_description = "dummy scheduling strategy"@};@end smallexample@end cartouche@node Running drivers@section Running drivers@menu* Driver API::* Example::@end menu@node Driver API@subsection Driver API@deftypefun int starpu_driver_run ({struct starpu_driver *}@var{d})Initialize the given driver, run it until it receives a request to terminate,deinitialize it and return 0 on success. It returns -EINVAL if @code{d->type}is not a valid StarPU device type (STARPU_CPU_WORKER, STARPU_CUDA_WORKER orSTARPU_OPENCL_WORKER). This is the same as using the followingfunctions: calling @code{starpu_driver_init()}, then calling@code{starpu_driver_run_once()} in a loop, and eventually@code{starpu_driver_deinit()}.@end deftypefun@deftypefun int starpu_driver_init (struct starpu_driver *@var{d})Initialize the given driver. Returns 0 on success, -EINVAL if@code{d->type} is not a valid StarPU device type (STARPU_CPU_WORKER,STARPU_CUDA_WORKER or STARPU_OPENCL_WORKER).@end deftypefun@deftypefun int starpu_driver_run_once (struct starpu_driver *@var{d})Run the driver once, then returns 0 on success, -EINVAL if@code{d->type} is not a valid StarPU device type (STARPU_CPU_WORKER,STARPU_CUDA_WORKER or STARPU_OPENCL_WORKER).@end deftypefun@deftypefun int starpu_driver_deinit (struct starpu_driver *@var{d})Deinitialize the given driver. Returns 0 on success, -EINVAL if@code{d->type} is not a valid StarPU device type (STARPU_CPU_WORKER,STARPU_CUDA_WORKER or STARPU_OPENCL_WORKER).@end deftypefun@deftypefun void starpu_drivers_request_termination (void)Notify all running drivers they should terminate.@end deftypefun@node Example@subsection Example@cartouche@smallexampleint ret;struct starpu_driver = @{    .type = STARPU_CUDA_WORKER,    .id.cuda_id = 0@};ret = starpu_driver_init(&d);if (ret != 0)    error();while (some_condition) @{    ret = starpu_driver_run_once(&d);    if (ret != 0)        error();@}ret = starpu_driver_deinit(&d);if (ret != 0)    error();@end smallexample@end cartouche@node Expert mode@section Expert mode@deftypefun void starpu_wake_all_blocked_workers (void)Wake all the workers, so they can inspect data requests and task submissionsagain.@end deftypefun@deftypefun int starpu_progression_hook_register (unsigned (*@var{func})(void *arg), void *@var{arg})Register a progression hook, to be called when workers are idle.@end deftypefun@deftypefun void starpu_progression_hook_deregister (int @var{hook_id})Unregister a given progression hook.@end deftypefun
 |