1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186 |
- \input texinfo @c -*-texinfo-*-
- @c %**start of header
- @setfilename starpu.info
- @settitle StarPU
- @c %**end of header
- @setchapternewpage odd
- @titlepage
- @title StarPU
- @page
- @vskip 0pt plus 1filll
- @comment For the @value{version-GCC} Version*
- @end titlepage
- @summarycontents
- @contents
- @page
- @node Top
- @top Preface
- @cindex Preface
- This manual documents the usage of StarPU
- @comment
- @comment When you add a new menu item, please keep the right hand
- @comment aligned to the same column. Do not use tabs. This provides
- @comment better formatting.
- @comment
- @menu
- * Introduction:: A basic introduction to using StarPU.
- * Installing StarPU:: How to configure, build and install StarPU.
- * Configuration options:: Configurations options
- * Environment variables:: Environment variables used by StarPU.
- * StarPU API:: The API to use StarPU.
- * Basic Examples:: Basic examples of the use of StarPU.
- * Advanced Topics:: Advanced use of StarPU.
- @end menu
- @c ---------------------------------------------------------------------
- @c Introduction to StarPU
- @c ---------------------------------------------------------------------
- @node Introduction
- @chapter Introduction to StarPU
- @menu
- * Motivation:: Why StarPU ?
- * StarPU in a Nutshell:: The Fundamentals of StarPU
- @end menu
- @node Motivation
- @section Motivation
- @c complex machines with heterogeneous cores/devices
- The use of specialized hardware such as accelerators or coprocessors offers an
- interesting approach to overcome the physical limits encountered by processor
- architects. As a result, many machines are now equipped with one or several
- accelerators (eg. a GPU), in addition to the usual processor(s). While a lot of
- efforts have been devoted to offload computation onto such accelerators, very
- little attention as been paid to portability concerns on the one hand, and to the
- possibility of having heterogeneous accelerators and processors to interact on the other hand.
- StarPU is a runtime system that offers support for heterogeneous multicore
- architectures, it not only offers a unified view of the computational resources
- (ie. CPUs and accelerators at the same time), but it also takes care to
- efficiently map and execute tasks onto an heterogeneous machine while
- transparently handling low-level issues in a portable fashion.
- @c this leads to a complicated distributed memory design
- @c which is not (easily) manageable by hand
- @c added value/benefits of StarPU
- @c - portability
- @c - scheduling, perf. portability
- @node StarPU in a Nutshell
- @section StarPU in a Nutshell
- From a programming point of view, StarPU is not a new language but a library
- that executes tasks explicitly submitted by the application. The data that a
- task manipulate are automatically transferred onto the accelerator so that the
- programmer does not have to take care of complex data movements. StarPU also
- takes particular care of scheduling those tasks efficiently and allows
- scheduling experts to implement custom scheduling policies in a portable
- fashion.
- @c explain the notion of codelet and task (ie. g(A, B)
- @subsection Codelet and Tasks
- One of StarPU primary data structure is the @b{codelet}. A codelet describes a
- computational kernel that can possibly be implemented on multiple architectures
- such as a CPU, a CUDA device or a Cell's SPU.
- @c TODO insert illustration f : f_spu, f_cpu, ...
- Another important data structure is the @b{task}. Executing a StarPU task
- consists in applying a codelet on a data set, on one of the architecture on
- which the codelet is implemented. In addition to the codelet that a task
- implements, it also describes which data are accessed, and how they are
- accessed during the computation (read and/or write).
- StarPU tasks are asynchronous: submitting a task to StarPU is a non-blocking
- operation. The task structure can also specify a @b{callback} function that is
- called once StarPU has properly executed the task. It also contains optional
- fields that the application may use to give hints to the scheduler (such as
- priority levels).
- A task may be identified by a unique 64-bit number which we refer as a @b{tag}.
- Task dependencies can be enforced either by the means of callback functions, or
- by expressing dependencies between tags.
- @c TODO insert illustration f(Ar, Brw, Cr) + ..
- @c DSM
- @subsection StarPU Data Management Library
- Because StarPU schedules tasks at runtime, data transfers have to be
- done automatically and ``just-in-time'' between processing units,
- relieving the application programmer from explicit data transfers.
- Moreover, to avoid unnecessary transfers, StarPU keeps data
- where it was last needed, even if was modified there, and it
- allows multiple copies of the same data to reside at the same time on
- several processing units as long as it is not modified.
- @c ---------------------------------------------------------------------
- @c Installing StarPU
- @c ---------------------------------------------------------------------
- @node Installing StarPU
- @chapter Installing StarPU
- StarPU can be built and installed by the standard means of the GNU
- autotools. The following chapter is intended to briefly remind how these tools
- can be used to install StarPU.
- @section Configuring StarPU
- @subsection Generating Makefiles and configuration scripts
- This step is not necessary when using the tarball releases of StarPU. If you
- are using the source code from the svn repository, you first need to generate
- the configure scripts and the Makefiles.
- @example
- $ autoreconf -vfi
- @end example
- @subsection Configuring StarPU
- @example
- $ ./configure
- @end example
- @c TODO enumerate the list of interesting options: refer to a specific section
- @section Building and Installing StarPU
- @subsection Building
- @example
- $ make
- @end example
- @subsection Sanity Checks
- In order to make sure that StarPU is working properly on the system, it is also
- possible to run a test suite.
- @example
- $ make check
- @end example
- @subsection Installing
- In order to install StarPU at the location that was specified during
- configuration:
- @example
- $ make install
- @end example
- @subsection pkg-config configuration
- It is possible that compiling and linking an application against StarPU
- requires to use specific flags or libraries (for instance @code{CUDA} or
- @code{libspe2}). Therefore, it is possible to use the @code{pkg-config} tool.
- If StarPU was not installed at some standard location, the path of StarPU's
- library must be specified in the @code{PKG_CONFIG_PATH} environment variable so
- that @code{pkg-config} can find it. So if StarPU was installed in
- @code{$(prefix_dir)}:
- @example
- @c TODO: heu, c'est vraiment du shell ça ? :)
- $ PKG_CONFIG_PATH = @{PKG_CONFIG_PATH@}:$(prefix_dir)/lib/
- @end example
- The flags required to compiled or linked against StarPU are then
- accessible with the following commands:
- @example
- $ pkg-config --cflags libstarpu # options for the compiler
- $ pkg-config --libs libstarpu # options for the linker
- @end example
- @c ---------------------------------------------------------------------
- @c Configuration options
- @c ---------------------------------------------------------------------
- @node Configuration options
- @chapter Configuration options
- TODO
- @c ---------------------------------------------------------------------
- @c Environment variables
- @c ---------------------------------------------------------------------
- @node Environment variables
- @chapter Environment variables
- @menu
- * Workers:: Configuring workers
- * Scheduling:: Configuring the Scheduling engine
- * Misc:: Miscellaneous and debug
- @end menu
- TODO, explicit configuration (passed to starpu_init) overrides env variables.
- @node Workers
- @section Configuring workers
- @menu
- * NCPUS :: Number of CPU workers
- * NCUDA :: Number of CUDA workers
- * NGORDON :: Number of SPU workers (Cell)
- * WORKERS_CPUID :: Bind workers to specific CPUs
- * WORKERS_GPUID :: Select specific CUDA devices
- @end menu
- @node NCPUS
- @subsection @code{NCPUS} -- Number of CPU workers
- @table @asis
- @item @emph{Description}:
- TODO
- @end table
- @node NCUDA
- @subsection @code{NCUDA} -- Number of CUDA workers
- @table @asis
- @item @emph{Description}:
- TODO
- @end table
- @node NGORDON
- @subsection @code{NGORDON} -- Number of SPU workers (Cell)
- @table @asis
- @item @emph{Description}:
- TODO
- @end table
- @node WORKERS_CPUID
- @subsection @code{WORKERS_CPUID} -- Bind workers to specific CPUs
- @table @asis
- @item @emph{Description}:
- TODO
- @end table
- @node WORKERS_GPUID
- @subsection @code{WORKERS_GPUID} -- Select specific CUDA devices
- @table @asis
- @item @emph{Description}:
- TODO
- @end table
- @node Scheduling
- @section Configuring the Scheduling engine
- @menu
- * SCHED :: Scheduling policy
- * CALIBRATE :: Calibrate performance models
- * PREFETCH :: Use data prefetch
- @end menu
- @node SCHED
- @subsection @code{SCHED} -- Scheduling policy
- @table @asis
- @item @emph{Description}:
- TODO
- @end table
- @node CALIBRATE
- @subsection @code{CALIBRATE} -- Calibrate performance models
- @table @asis
- @item @emph{Description}:
- TODO
- @end table
- @node PREFETCH
- @subsection @code{PREFETCH} -- Use data prefetch
- @table @asis
- @item @emph{Description}:
- TODO
- @end table
- @node Misc
- @section Miscellaneous and debug
- @menu
- * LOGFILENAME :: Select debug file name
- @end menu
- @node LOGFILENAME
- @subsection @code{LOGFILENAME} -- Select debug file name
- @table @asis
- @item @emph{Description}:
- TODO
- @end table
- @c ---------------------------------------------------------------------
- @c StarPU API
- @c ---------------------------------------------------------------------
- @node StarPU API
- @chapter StarPU API
- @menu
- * Initialization and Termination:: Initialization and Termination methods
- * Workers' Properties:: Methods to enumerate workers' properties
- * Data Library:: Methods to manipulate data
- * Codelets and Tasks:: Methods to construct tasks
- * Tags:: Task dependencies
- @end menu
- @node Initialization and Termination
- @section Initialization and Termination
- @menu
- * starpu_init:: Initialize StarPU
- * struct starpu_conf:: StarPU runtime configuration
- * starpu_shutdown:: Terminate StarPU
- @end menu
- @node starpu_init
- @subsection @code{starpu_init} -- Initialize StarPU
- @table @asis
- @item @emph{Description}:
- This is StarPU initialization method, which must be called prior to any other
- StarPU call. It is possible to specify StarPU's configuration (eg. scheduling
- policy, number of cores, ...) by passing a non-null argument. Default
- configuration is used if the passed argument is @code{NULL}.
- @item @emph{Return value}:
- Upon successful completion, this function returns 0. Otherwise, @code{-ENODEV}
- indicates that no worker was available (so that StarPU was not be initialized).
- @item @emph{Prototype}:
- @code{int starpu_init(struct starpu_conf *conf);}
- @end table
- @node struct starpu_conf
- @subsection @code{struct starpu_conf} -- StarPU runtime configuration
- @table @asis
- @item @emph{Description}:
- This structure is passed to the @code{starpu_init} function in order configure
- StarPU. When the default value is used, StarPU automatically select the number
- of processing units and takes the default scheduling policy. This parameters
- overwrite the equivalent environnement variables.
- @item @emph{Fields}:
- @table @asis
- @item @code{sched_policy} (default = NULL):
- This is the name of the scheduling policy. This can also be specified with the
- @code{SCHED} environment variable.
- @item @code{ncpus} (default = -1):
- This is the maximum number of CPU cores that StarPU can use. This can also be
- specified with the @code{NCPUS} environment variable.
- @item @code{ncuda} (default = -1):
- This is the maximum number of CUDA devices that StarPU can use. This can also be
- specified with the @code{NCUDA} environment variable.
- @item @code{nspus} (default = -1):
- This is the maximum number of Cell SPUs that StarPU can use. This can also be
- specified with the @code{NGORDON} environment variable.
- @item @code{calibrate} (default = 0):
- If this flag is set, StarPU will calibrate the performance models when
- executing tasks. This can also be specified with the @code{CALIBRATE}
- environment variable.
- @end table
- @end table
- @node starpu_shutdown
- @subsection @code{starpu_shutdown} -- Terminate StarPU
- @table @asis
- @item @emph{Description}:
- This is StarPU termination method. It must be called at the end of the
- application: statistics and other post-mortem debugging information are not
- garanteed to be available until this method has been called.
- @item @emph{Prototype}:
- @code{void starpu_shutdown(void);}
- @end table
- @node Workers' Properties
- @section Workers' Properties
- @menu
- * starpu_get_worker_count:: Get the number of processing units
- * starpu_get_worker_id:: Get the identifier of the current worker
- * starpu_get_worker_type:: Get the type of processing unit associated to a worker
- * starpu_get_worker_name:: Get the name of a worker
- @end menu
- @node starpu_get_worker_count
- @subsection @code{starpu_get_worker_count} -- Get the number of processing units
- @table @asis
- @item @emph{Description}:
- This function returns the number of workers (ie. processing units executing
- StarPU tasks). The returned value should be at most @code{STARPU_NMAXWORKERS}.
- @item @emph{Prototype}:
- @code{unsigned starpu_get_worker_count(void);}
- @end table
- @node starpu_get_worker_id
- @subsection @code{starpu_get_worker_id} -- Get the identifier of the current worker
- @table @asis
- @item @emph{Description}:
- This function returns the identifier of the worker associated to the calling
- thread. The returned value is either -1 if the current context is not a StarPU
- worker (ie. when called from the application outside a task or a callback), or
- an integer between 0 and @code{starpu_get_worker_count() - 1}.
- @item @emph{Prototype}:
- @code{int starpu_get_worker_count(void);}
- @end table
- @node starpu_get_worker_type
- @subsection @code{starpu_get_worker_type} -- Get the type of processing unit associated to a worker
- @table @asis
- @item @emph{Description}:
- This function returns the type of worker associated to an identifier (as
- returned by the @code{starpu_get_worker_id} function). The returned value
- indicates the architecture of the worker: @code{STARPU_CORE_WORKER} for a CPU
- core, @code{STARPU_CUDA_WORKER} for a CUDA device, and
- @code{STARPU_GORDON_WORKER} for a Cell SPU. The value returned for an invalid
- identifier is unspecified.
- @item @emph{Prototype}:
- @code{enum starpu_archtype starpu_get_worker_type(int id);}
- @end table
- @node starpu_get_worker_name
- @subsection @code{starpu_get_worker_name} -- Get the name of a worker
- @table @asis
- @item @emph{Description}:
- StarPU associates a unique human readable string to each processing unit. This
- function copies at most the @code{maxlen} first bytes of the unique string
- associated to a worker identified by its identifier @code{id} into the
- @code{dst} buffer. The caller is responsible for ensuring that the @code{dst}
- is a valid pointer to a buffer of @code{maxlen} bytes at least. Calling this
- function on an invalid identifier results in an unspecified behaviour.
- @item @emph{Prototype}:
- @code{void starpu_get_worker_name(int id, char *dst, size_t maxlen);}
- @end table
- @node Data Library
- @section Data Library
- @c data_handle_t
- @c void starpu_delete_data(struct starpu_data_state_t *state);
- @c user interaction with the DSM
- @c void starpu_sync_data_with_mem(struct starpu_data_state_t *state);
- @c void starpu_notify_data_modification(struct starpu_data_state_t *state, uint32_t modifying_node);
- @node Codelets and Tasks
- @section Codelets and Tasks
- @menu
- * struct starpu_codelet:: StarPU codelet structure
- * struct starpu_task:: StarPU task structure
- * starpu_task_init:: Initialize a Task
- * starpu_task_create:: Allocate and Initialize a Task
- * starpu_task_destroy:: Destroy a dynamically allocated Task
- * starpu_submit_task:: Submit a Task
- * starpu_wait_task:: Wait for the termination of a Task
- * starpu_wait_all_tasks:: Wait for the termination of all Tasks
- @end menu
- @c struct starpu_task
- @c struct starpu_codelet
- @node struct starpu_codelet
- @subsection @code{struct starpu_codelet} -- StarPU codelet structure
- @table @asis
- @item @emph{Description}:
- The codelet structure describes a kernel that is possibly implemented on
- various targets.
- @item @emph{Fields}:
- @table @asis
- @item @code{where}:
- Indicates which types of processing units are able to execute that codelet.
- @code{CORE|CUDA} for instance indicates that the codelet is implemented for
- both CPU cores and CUDA devices while @code{GORDON} indicates that it is only
- available on Cell SPUs.
- @item @code{core_func} (optionnal):
- Is a function pointer to the CPU implementation of the codelet. Its prototype
- must be: @code{void core_func(starpu_data_interface_t *descr, void *arg)}. The
- first argument being the array of data managed by the data management library,
- and the second argument is a pointer to the argument (possibly a copy of it)
- passed from the @code{.cl_arg} field of the @code{starpu_task} structure. This
- pointer is ignored if @code{CORE} does not appear in the @code{.where} field,
- it must be non-null otherwise.
- @item @code{cuda_func} (optionnal):
- Is a function pointer to the CUDA implementation of the codelet. @emph{This
- must be a host-function written in the CUDA runtime API}. Its prototype must
- be: @code{void cuda_func(starpu_data_interface_t *descr, void *arg);}. This
- pointer is ignored if @code{CUDA} does not appear in the @code{.where} field,
- it must be non-null otherwise.
- @item @code{gordon_func} (optionnal):
- This is the index of the Cell SPU implementation within the Gordon library.
- TODO
- @item @code{nbuffers}:
- Specifies the number of arguments taken by the codelet. These arguments are
- managed by the DSM and are accessed from the @code{starpu_data_interface_t *}
- array. The constant argument passed with the @code{.cl_arg} field of the
- @code{starpu_task} structure is not counted in this number. This value should
- not be above @code{STARPU_NMAXBUFS}.
- @item @code{model} (optionnal):
- This is a pointer to the performance model associated to this codelet. This
- optionnal field is ignored when null. TODO
- @end table
- @end table
- @node struct starpu_task
- @subsection @code{struct starpu_task} -- StarPU task structure
- @table @asis
- @item @emph{Description}:
- The starpu_task structure describes a task that can be offloaded on the various
- processing units managed by StarPU. It instanciates a codelet. It can either be
- allocated dynamically with the @code{starpu_task_create} method, or declared
- statically. In the latter case, the programmer has to zero the
- @code{starpu_task} structure and to fill the different fields properly. The
- indicated default values correspond to the configuration of a task allocated
- with @code{starpu_task_create}.
- @item @emph{Fields}:
- @table @asis
- @item @code{cl}:
- Is a pointer to the corresponding @code{starpu_codelet} data structure. This
- describes where the kernel should be executed, and supplies the appropriate
- implementations. When set to @code{NULL}, no code is executed during the tasks,
- such empty tasks can be useful for synchronization purposes.
- @item @code{buffers}:
- TODO
- @item @code{cl_arg} (optional) (default = NULL):
- TODO
- @item @code{cl_arg_size} (optional):
- TODO
- @c ignored if only executable on CPUs or CUDA ...
- @item @code{callback_func} (optional) (default = @code{NULL}):
- This is a function pointer of prototype @code{void (*f)(void *)} which
- specifies a possible callback. If that pointer is non-null, the callback
- function is executed @emph{on the host} after the execution of the task. The
- callback is passed the value contained in the @code{callback_arg} field. No
- callback is executed if that field is null.
- @item @code{callback_arg} (optional) (default = @code{NULL}):
- This is the pointer passed to the callback function. This field is ignored if
- the @code{callback_func} is null.
- @item @code{use_tag} (optional) (default = 0):
- If set, this flag indicates that the task should be associated with the tag
- conained in the @code{tag_id} field. Tag allow the application to synchronize
- with the task and to express task dependencies easily.
- @item @code{tag_id}:
- This fields contains the tag associated to the tag if the @code{use_tag} field
- was set, it is ignored otherwise.
- @item @code{synchronous}:
- If this flag is set, the @code{starpu_submit_task} function is blocking and
- returns only when the task has been executed (or if no worker is able to
- process the task). Otherwise, @code{starpu_submit_task} returns immediately.
- @item @code{priority} (optionnal) (default = @code{DEFAULT_PRIO}):
- This field indicates a level of priority for the task. This is an integer value
- that must be selected between @code{MIN_PRIO} (for the least important tasks)
- and @code{MAX_PRIO} (for the most important tasks) included. Default priority
- is @code{DEFAULT_PRIO}. Scheduling strategies that take priorities into
- account can use this parameter to take better scheduling decisions, but the
- scheduling policy may also ignore it.
- @item @code{execute_on_a_specific_worker} (default = 0):
- If this flag is set, StarPU will bypass the scheduler and directly affect this
- task to the worker specified by the @code{workerid} field.
- @item @code{workerid} (optional):
- If the @code{execute_on_a_specific_worker} field is set, this field indicates
- which is the identifier of the worker that should process this task (as
- returned by @code{starpu_get_worker_id}). This field is ignored if
- @code{execute_on_a_specific_worker} field is set to 0.
- @item @code{detach} (optional) (default = 1):
- If this flag is set, it is not possible to synchronize with the task
- by the means of @code{starpu_wait_task} later on. Internal data structures
- are only garanteed to be liberated once @code{starpu_wait_task} is called
- if that flag is not set.
- @item @code{destroy} (optional) (default = 1):
- If that flag is set, the task structure will automatically be liberated, either
- after the execution of the callback if the task is detached, or during
- @code{starpu_task_wait} otherwise. If this flag is not set, dynamically allocated data
- structures will not be liberated until @code{starpu_task_destroy} is called
- explicitely. Setting this flag for a statically allocated task structure will
- result in undefined behaviour.
- @end table
- @end table
- @node starpu_task_init
- @subsection @code{starpu_task_init} -- Initialize a Task
- @table @asis
- @item @emph{Description}:
- TODO
- @item @emph{Prototype}:
- @code{void starpu_task_init(struct starpu_task *task);}
- @end table
- @node starpu_task_create
- @subsection @code{starpu_task_create} -- Allocate and Initialize a Task
- @table @asis
- @item @emph{Description}:
- TODO
- (Describe the different default fields ...)
- @item @emph{Prototype}:
- @code{struct starpu_task *starpu_task_create(void);}
- @end table
- @node starpu_task_destroy
- @subsection @code{starpu_task_destroy} -- Destroy a dynamically allocated Task
- @table @asis
- @item @emph{Description}:
- Liberate the ressource allocated during starpu_task_create. This function can
- be called automatically after the execution of a task by setting the
- @code{.destroy} flag of the @code{starpu_task} structure (default behaviour).
- Calling this function on a statically allocated task results in an undefined
- behaviour.
- @item @emph{Prototype}:
- @code{void starpu_task_destroy(struct starpu_task *task);}
- @end table
- @node starpu_wait_task
- @subsection @code{starpu_wait_task} -- Wait for the termination of a Task
- @table @asis
- @item @emph{Description}:
- This function blocks until the task was executed. It is not possible to
- synchronize with a task more than once. It is not possible to wait
- synchronous or detached tasks.
- @item @emph{Return value}:
- Upon successful completion, this function returns 0. Otherwise, @code{-EINVAL}
- indicates that the waited task was either synchronous or detached.
- @item @emph{Prototype}:
- @code{int starpu_wait_task(struct starpu_task *task);}
- @end table
- @node starpu_submit_task
- @subsection @code{starpu_submit_task} -- Submit a Task
- @table @asis
- @item @emph{Description}:
- This function submits task @code{task} to StarPU. Calling this function does
- not mean that the task will be executed immediatly as there can be data or task
- (tag) dependencies that are not fulfilled yet: StarPU will take care to
- schedule this task with respect to such dependencies.
- This function returns immediately if the @code{synchronous} field of the
- @code{starpu_task} structure was set to 0, and block until the termination of
- the task otherwise. It is also possible to synchronize the application with
- asynchronous tasks by the means of tags, using the @code{starpu_tag_wait}
- function for instance.
- In case of success, this function returns 0, a return value of @code{-ENODEV}
- means that there is no worker able to process that task (eg. there is no GPU
- available and this task is only implemented on top of CUDA).
- @item @emph{Prototype}:
- @code{int starpu_submit_task(struct starpu_task *task);}
- @end table
- @node starpu_wait_all_tasks
- @subsection @code{starpu_wait_all_tasks} -- Wait for the termination of all Tasks
- @table @asis
- @item @emph{Description}:
- This function blocks until all the tasks that were submitted are terminated.
- @item @emph{Prototype}:
- @code{void starpu_wait_all_tasks(void);}
- @end table
- @c Callbacks : what can we put in callbacks ?
- @node Tags
- @section Tags
- @menu
- * starpu_tag_t:: Task identifier
- * starpu_tag_declare_deps:: Declare the Dependencies of a Tag
- * starpu_tag_declare_deps_array:: Declare the Dependencies of a Tag
- * starpu_tag_wait:: Block until a Tag is terminated
- * starpu_tag_wait_array:: Block until a set of Tags is terminated
- * starpu_tag_remove:: Destroy a Tag
- * starpu_tag_notify_from_apps:: Feed a tag explicitely
- @end menu
- @node starpu_tag_t
- @subsection @code{starpu_tag_t} -- Task identifier
- @table @asis
- @item @emph{Description}:
- It is possible to associate a task with a unique "tag" and to express
- dependencies between tasks by the means of those tags. To do so, fill the
- @code{tag_id} field of the @code{starpu_task} structure with a tag number (can
- be arbitrary) and set the @code{use_tag} field to 1.
- If @code{starpu_tag_declare_deps} is called with that tag number, the task will
- not be started until the task which wears the declared dependency tags are
- complete.
- @end table
- @node starpu_tag_declare_deps
- @subsection @code{starpu_tag_declare_deps} -- Declare the Dependencies of a Tag
- @table @asis
- @item @emph{Description}:
- Specify the dependencies of the task identified by tag @code{id}. The first
- argument specifies the tag which is configured, the second argument gives the
- number of tag(s) on which @code{id} depends. The following arguments are the
- tags which have to terminated to unlock the task.
- This function must be called before the associated task is submitted to StarPU
- with @code{starpu_submit_task}.
- @item @emph{Remark}
- Because of the variable arity of @code{starpu_tag_declare_deps}, note that the
- last arguments @emph{must} be of type @code{starpu_tag_t}: constant values
- typically need to be explicitely casted. Using the
- @code{starpu_tag_declare_deps_array} function avoids this hazard.
- @item @emph{Prototype}:
- @code{void starpu_tag_declare_deps(starpu_tag_t id, unsigned ndeps, ...);}
- @item @emph{Example}:
- @example
- @c @cartouche
- /* Tag 0x1 depends on tags 0x32 and 0x52 */
- starpu_tag_declare_deps((starpu_tag_t)0x1,
- 2, (starpu_tag_t)0x32, (starpu_tag_t)0x52);
- @c @end cartouche
- @end example
- @end table
- @node starpu_tag_declare_deps_array
- @subsection @code{starpu_tag_declare_deps_array} -- Declare the Dependencies of a Tag
- @table @asis
- @item @emph{Description}:
- This function is similar to @code{starpu_tag_declare_deps}, except that its
- does not take a variable number of arguments but an array of tags of size
- @code{ndeps}.
- @item @emph{Prototype}:
- @code{void starpu_tag_declare_deps_array(starpu_tag_t id, unsigned ndeps, starpu_tag_t *array);}
- @item @emph{Example}:
- @example
- @c @cartouche
- /* Tag 0x1 depends on tags 0x32 and 0x52 */
- starpu_tag_t tag_array[2] = @{0x32, 0x52@};
- starpu_tag_declare_deps((starpu_tag_t)0x1, 2, tag_array);
- @c @end cartouche
- @end example
- @end table
- @node starpu_tag_wait
- @subsection @code{starpu_tag_wait} -- Block until a Tag is terminated
- @table @asis
- @item @emph{Description}:
- This function blocks until the task associated to tag @code{id} has been
- executed. This is a blocking call which must therefore not be called within
- tasks or callbacks, but only from the application directly. It is possible to
- synchronize with the same tag multiple times, as long as the
- @code{starpu_tag_remove} function is not called. Note that it is still
- possible to synchronize wih a tag associated to a task which @code{starpu_task}
- data structure was liberated (eg. if the @code{destroy} flag of the
- @code{starpu_task} was enabled).
- @item @emph{Prototype}:
- @code{void starpu_tag_wait(starpu_tag_t id);}
- @end table
- @node starpu_tag_wait_array
- @subsection @code{starpu_tag_wait_array} -- Block until a set of Tags is terminated
- @table @asis
- @item @emph{Description}:
- This function is similar to @code{starpu_tag_wait} except that it blocks until
- @emph{all} the @code{ntags} tags contained in the @code{id} array are
- terminated.
- @item @emph{Prototype}:
- @code{void starpu_tag_wait_array(unsigned ntags, starpu_tag_t *id);}
- @end table
- @node starpu_tag_remove
- @subsection @code{starpu_tag_remove} -- Destroy a Tag
- @table @asis
- @item @emph{Description}:
- This function release the resources associated to tag @code{id}. It can be
- called once the corresponding task has been executed and when there is no tag
- that depend on that one anymore.
- @item @emph{Prototype}:
- @code{void starpu_tag_remove(starpu_tag_t id);}
- @end table
- @node starpu_tag_notify_from_apps
- @subsection @code{starpu_tag_notify_from_apps} -- Feed a Tag explicitely
- @table @asis
- @item @emph{Description}:
- This function explicitely unlocks tag @code{id}. It may be useful in the
- case of applications which execute part of their computation outside StarPU
- tasks (eg. third-party libraries). It is also provided as a
- convenient tool for the programmer, for instance to entirely construct the task
- DAG before actually giving StarPU the opportunity to execute the tasks.
- @item @emph{Prototype}:
- @code{void starpu_tag_notify_from_apps(starpu_tag_t id);}
- @end table
- @section Extensions
- @subsection CUDA extensions
- @c void starpu_malloc_pinned_if_possible(float **A, size_t dim);
- @subsection Cell extensions
- @c ---------------------------------------------------------------------
- @c Basic Examples
- @c ---------------------------------------------------------------------
- @node Basic Examples
- @chapter Basic Examples
- @menu
- * Compiling and linking:: Compiling and Linking Options
- * Hello World:: Submitting Tasks
- * Scaling a Vector:: Manipulating Data
- * Scaling a Vector (hybrid):: Handling Heterogeneous Architectures
- @end menu
- @node Compiling and linking
- @section Compiling and linking options
- The Makefile could for instance contain the following lines to define which
- options must be given to the compiler and to the linker:
- @example
- @c @cartouche
- CFLAGS+=$$(pkg-config --cflags libstarpu)
- LIBS+=$$(pkg-config --libs libstarpu)
- @c @end cartouche
- @end example
- @node Hello World
- @section Hello World
- In this section, we show how to implement a simple program that submits a task to StarPU.
- @subsection Required Headers
- The @code{starpu.h} header should be included in any code using StarPU.
- @example
- @c @cartouche
- #include <starpu.h>
- @c @end cartouche
- @end example
- @subsection Defining a Codelet
- @example
- @c @cartouche
- void cpu_func(starpu_data_interface_t *buffers, void *func_arg)
- @{
- float *array = func_arg;
- printf("Hello world (array = @{%f, %f@} )\n", array[0], array[1]);
- @}
- starpu_codelet cl =
- @{
- .where = CORE,
- .core_func = cpu_func,
- .nbuffers = 0
- @};
- @c @end cartouche
- @end example
- A codelet is a structure that represents a computational kernel. Such a codelet
- may contain an implementation of the same kernel on different architectures
- (eg. CUDA, Cell's SPU, x86, ...).
- The ''@code{.nbuffers}'' field specifies the number of data buffers that are
- manipulated by the codelet: here the codelet does not access or modify any data
- that is controlled by our data management library. Note that the argument
- passed to the codelet (the ''@code{.cl_arg}'' field of the @code{starpu_task}
- structure) does not count as a buffer since it is not managed by our data
- management library.
- @c TODO need a crossref to the proper description of "where" see bla for more ...
- We create a codelet which may only be executed on the CPUs. The ''@code{.where}''
- field is a bitmask that defines where the codelet may be executed. Here, the
- @code{CORE} value means that only CPUs can execute this codelet
- (@pxref{Codelets and Tasks} for more details on that field).
- When a CPU core executes a codelet, it calls the @code{.core_func} function,
- which @emph{must} have the following prototype:
- @code{void (*core_func)(starpu_data_interface_t *, void *)}
- In this example, we can ignore the first argument of this function which gives a
- description of the input and output buffers (eg. the size and the location of
- the matrices). The second argument is a pointer to a buffer passed as an
- argument to the codelet by the means of the ''@code{.cl_arg}'' field of the
- @code{starpu_task} structure. Be aware that this may be a pointer to a
- @emph{copy} of the actual buffer, and not the pointer given by the programmer:
- if the codelet modifies this buffer, there is no garantee that the initial
- buffer will be modified as well: this for instance implies that the buffer
- cannot be used as a synchronization medium.
- @subsection Submitting a Task
- @example
- @c @cartouche
- void callback_func(void *callback_arg)
- @{
- printf("Callback function (arg %x)\n", callback_arg);
- @}
- int main(int argc, char **argv)
- @{
- /* initialize StarPU */
- starpu_init(NULL);
- struct starpu_task *task = starpu_task_create();
- task->cl = &cl;
-
- float array[2] = @{1.0f, -1.0f@};
- task->cl_arg = &array;
- task->cl_arg_size = 2*sizeof(float);
- task->callback_func = callback_func;
- task->callback_arg = 0x42;
- /* starpu_submit_task will be a blocking call */
- task->synchronous = 1;
- /* submit the task to StarPU */
- starpu_submit_task(task);
- /* terminate StarPU */
- starpu_shutdown();
- return 0;
- @}
- @c @end cartouche
- @end example
- Before submitting any tasks to StarPU, @code{starpu_init} must be called. The
- @code{NULL} argument specifies that we use default configuration. Tasks cannot
- be submitted after the termination of StarPU by a call to
- @code{starpu_shutdown}.
- In the example above, a task structure is allocated by a call to
- @code{starpu_task_create}. This function only allocates and fills the
- corresponding structure with the default settings (@pxref{starpu_task_create}),
- but it does not submit the task to StarPU.
- @c not really clear ;)
- The ''@code{.cl}'' field is a pointer to the codelet which the task will
- execute: in other words, the codelet structure describes which computational
- kernel should be offloaded on the different architectures, and the task
- structure is a wrapper containing a codelet and the piece of data on which the
- codelet should operate.
- The optional ''@code{.cl_arg}'' field is a pointer to a buffer (of size
- @code{.cl_arg_size}) with some parameters for the kernel
- described by the codelet. For instance, if a codelet implements a computational
- kernel that multiplies its input vector by a constant, the constant could be
- specified by the means of this buffer.
- Once a task has been executed, an optional callback function can be called.
- While the computational kernel could be offloaded on various architectures, the
- callback function is always executed on a CPU. The ''@code{.callback_arg}''
- pointer is passed as an argument of the callback. The prototype of a callback
- function must be:
- @example
- void (*callback_function)(void *);
- @end example
- If the @code{.synchronous} field is non-null, task submission will be
- synchronous: the @code{starpu_submit_task} function will not return until the
- task was executed. Note that the @code{starpu_shutdown} method does not
- guarantee that asynchronous tasks have been executed before it returns.
- @node Scaling a Vector
- @section Manipulating Data: Scaling a Vector
- The previous example has shown how to submit tasks. In this section we show how
- StarPU tasks can manipulate data.
- Programmers can describe the data layout of their application so that StarPU is
- responsible for enforcing data coherency and availability accross the machine.
- Instead of handling complex (and non-portable) mechanisms to perform data
- movements, programmers only declare which piece of data is accessed and/or
- modified by a task, and StarPU makes sure that when a computational kernel
- starts somewhere (eg. on a GPU), its data are available locally.
- Before submitting those tasks, the programmer first needs to declare the
- different pieces of data to StarPU using the @code{starpu_register_*_data}
- functions. To ease the development of applications for StarPU, it is possible
- to describe multiple types of data layout. A type of data layout is called an
- @b{interface}. By default, there are different interfaces available in StarPU:
- here we will consider the @b{vector interface}.
- The following lines show how to declare an array of @code{n} elements of type
- @code{float} using the vector interface:
- @example
- float tab[n];
- starpu_data_handle tab_handle;
- starpu_register_vector_data(&tab_handle, 0, tab, n, sizeof(float));
- @end example
- The first argument, called the @b{data handle}, is an opaque pointer which
- designates the array in StarPU. This is also the structure which is used to
- describe which data is used by a task.
- @c TODO: what is 0 ?
- It is possible to construct a StarPU
- task that multiplies this vector by a constant factor:
- @example
- float factor;
- struct starpu_task *task = starpu_task_create();
- task->cl = &cl;
- task->buffers[0].handle = tab_handle;
- task->buffers[0].mode = STARPU_RW;
- task->cl_arg = &factor;
- task->cl_arg_size = sizeof(float);
- @end example
- Since the factor is constant, it does not need a preliminary declaration, and
- can just be passed through the @code{cl_arg} pointer like in the previous
- example. The vector parameter is described by its handle.
- There are two fields in each element of the @code{buffers} array.
- @code{.handle} is the handle of the data, and @code{.mode} specifies how the
- kernel will access the data (@code{STARPU_R} for read-only, @code{STARPU_W} for
- write-only and @code{STARPU_RW} for read and write access).
- The definition of the codelet can be written as follows:
- @example
- void scal_func(starpu_data_interface_t *buffers, void *arg)
- @{
- unsigned i;
- float *factor = arg;
- /* length of the vector */
- unsigned n = buffers[0].vector.nx;
- /* local copy of the vector pointer */
- float *val = (float *)buffers[0].vector.ptr;
- for (i = 0; i < n; i++)
- val[i] *= *factor;
- @}
- starpu_codelet cl = @{
- .where = CORE,
- .core_func = scal_func,
- .nbuffers = 1
- @};
- @end example
- The second argument of the @code{scal_func} function contains a pointer to the
- parameters of the codelet (given in @code{task->cl_arg}), so that we read the
- constant factor from this pointer. The first argument is an array that gives
- a description of every buffers passed in the @code{task->buffers}@ array, the
- number of which is given by the @code{.nbuffers} field of the codelet structure.
- In the @b{vector interface}, the location of the vector (resp. its length)
- is accessible in the @code{.vector.ptr} (resp. @code{.vector.nx}) of this
- array. Since the vector is accessed in a read-write fashion, any modification
- will automatically affect future accesses to that vector made by other tasks.
- @node Scaling a Vector (hybrid)
- @section Vector Scaling on an Hybrid CPU/GPU Machine
- Contrary to the previous examples, the task submitted in the example may not
- only be executed by the CPUs, but also by a CUDA device.
- TODO
- @c ---------------------------------------------------------------------
- @c Advanced Topics
- @c ---------------------------------------------------------------------
- @node Advanced Topics
- @chapter Advanced Topics
- @bye
|