123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188 |
- @c -*-texinfo-*-
- @c This file is part of the StarPU Handbook.
- @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
- @c Copyright (C) 2010, 2011, 2012 Centre National de la Recherche Scientifique
- @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
- @c See the file starpu.texi for copying conditions.
- @menu
- * Motivation:: Why StarPU ?
- * StarPU in a Nutshell:: The Fundamentals of StarPU
- @end menu
- @node Motivation
- @section Motivation
- @c complex machines with heterogeneous cores/devices
- The use of specialized hardware such as accelerators or coprocessors offers an
- interesting approach to overcome the physical limits encountered by processor
- architects. As a result, many machines are now equipped with one or several
- accelerators (e.g. a GPU), in addition to the usual processor(s). While a lot of
- efforts have been devoted to offload computation onto such accelerators, very
- little attention as been paid to portability concerns on the one hand, and to the
- possibility of having heterogeneous accelerators and processors to interact on the other hand.
- StarPU is a runtime system that offers support for heterogeneous multicore
- architectures, it not only offers a unified view of the computational resources
- (i.e. CPUs and accelerators at the same time), but it also takes care of
- efficiently mapping and executing tasks onto an heterogeneous machine while
- transparently handling low-level issues such as data transfers in a portable
- fashion.
- @c this leads to a complicated distributed memory design
- @c which is not (easily) manageable by hand
- @c added value/benefits of StarPU
- @c - portability
- @c - scheduling, perf. portability
- @node StarPU in a Nutshell
- @section StarPU in a Nutshell
- StarPU is a software tool aiming to allow programmers to exploit the
- computing power of the available CPUs and GPUs, while relieving them
- from the need to specially adapt their programs to the target machine
- and processing units.
- At the core of StarPU is its run-time support library, which is
- responsible for scheduling application-provided tasks on heterogeneous
- CPU/GPU machines. In addition, StarPU comes with programming language
- support, in the form of extensions to languages of the C family
- (@pxref{C Extensions}), as well as an OpenCL front-end (@pxref{SOCL
- OpenCL Extensions}).
- @cindex task-based programming model
- StarPU's run-time and programming language extensions support a
- @dfn{task-based programming model}. Applications submit computational
- tasks, with CPU and/or GPU implementations, and StarPU schedules these
- tasks and associated data transfers on available CPUs and GPUs. The
- data that a task manipulates are automatically transferred among
- accelerators and the main memory, so that programmers are freed from the
- scheduling issues and technical details associated with these transfers.
- StarPU takes particular care of scheduling tasks efficiently, using
- well-known algorithms from the literature (@pxref{Task scheduling
- policy}). In addition, it allows scheduling experts, such as compiler
- or computational library developers, to implement custom scheduling
- policies in a portable fashion (@pxref{Scheduling Policy API}).
- The remainder of this section describes the main concepts used in StarPU.
- @menu
- * Codelet and Tasks::
- * StarPU Data Management Library::
- * Glossary::
- * Research Papers::
- @end menu
- @c explain the notion of codelet and task (i.e. g(A, B)
- @node Codelet and Tasks
- @subsection Codelet and Tasks
- @cindex codelet
- One of the StarPU primary data structures is the @b{codelet}. A codelet describes a
- computational kernel that can possibly be implemented on multiple architectures
- such as a CPU, a CUDA device or an OpenCL device.
- @c TODO insert illustration f: f_spu, f_cpu, ...
- @cindex task
- Another important data structure is the @b{task}. Executing a StarPU task
- consists in applying a codelet on a data set, on one of the architectures on
- which the codelet is implemented. A task thus describes the codelet that it
- uses, but also which data are accessed, and how they are
- accessed during the computation (read and/or write).
- StarPU tasks are asynchronous: submitting a task to StarPU is a non-blocking
- operation. The task structure can also specify a @b{callback} function that is
- called once StarPU has properly executed the task. It also contains optional
- fields that the application may use to give hints to the scheduler (such as
- priority levels).
- @cindex tag
- By default, task dependencies are inferred from data dependency (sequential
- coherence) by StarPU. The application can however disable sequential coherency
- for some data, and dependencies be expressed by hand.
- A task may be identified by a unique 64-bit number chosen by the application
- which we refer as a @b{tag}.
- Task dependencies can be enforced by hand either by the means of callback functions, by
- submitting other tasks, or by expressing dependencies
- between tags (which can thus correspond to tasks that have not been submitted
- yet).
- @c TODO insert illustration f(Ar, Brw, Cr) + ..
- @c DSM
- @node StarPU Data Management Library
- @subsection StarPU Data Management Library
- Because StarPU schedules tasks at runtime, data transfers have to be
- done automatically and ``just-in-time'' between processing units,
- relieving the application programmer from explicit data transfers.
- Moreover, to avoid unnecessary transfers, StarPU keeps data
- where it was last needed, even if was modified there, and it
- allows multiple copies of the same data to reside at the same time on
- several processing units as long as it is not modified.
- @node Glossary
- @subsection Glossary
- A @b{codelet} records pointers to various implementations of the same
- theoretical function.
- A @b{memory node} can be either the main RAM or GPU-embedded memory.
- A @b{bus} is a link between memory nodes.
- A @b{data handle} keeps track of replicates of the same data (@b{registered} by the
- application) over various memory nodes. The data management library manages
- keeping them coherent.
- The @b{home} memory node of a data handle is the memory node from which the data
- was registered (usually the main memory node).
- A @b{task} represents a scheduled execution of a codelet on some data handles.
- A @b{tag} is a rendez-vous point. Tasks typically have their own tag, and can
- depend on other tags. The value is chosen by the application.
- A @b{worker} execute tasks. There is typically one per CPU computation core and
- one per accelerator (for which a whole CPU core is dedicated).
- A @b{driver} drives a given kind of workers. There are currently CPU, CUDA,
- and OpenCL drivers. They usually start several workers to actually drive
- them.
- A @b{performance model} is a (dynamic or static) model of the performance of a
- given codelet. Codelets can have execution time performance model as well as
- power consumption performance models.
- A data @b{interface} describes the layout of the data: for a vector, a pointer
- for the start, the number of elements and the size of elements ; for a matrix, a
- pointer for the start, the number of elements per row, the offset between rows,
- and the size of each element ; etc. To access their data, codelet functions are
- given interfaces for the local memory node replicates of the data handles of the
- scheduled task.
- @b{Partitioning} data means dividing the data of a given data handle (called
- @b{father}) into a series of @b{children} data handles which designate various
- portions of the former.
- A @b{filter} is the function which computes children data handles from a father
- data handle, and thus describes how the partitioning should be done (horizontal,
- vertical, etc.)
- @b{Acquiring} a data handle can be done from the main application, to safely
- access the data of a data handle from its home node, without having to
- unregister it.
- @node Research Papers
- @subsection Research Papers
- Research papers about StarPU can be found at
- @indicateurl{http://runtime.bordeaux.inria.fr/Publis/Keyword/STARPU.html}.
- A good overview is available in the research report at
- @indicateurl{http://hal.archives-ouvertes.fr/inria-00467677}.
|