| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171 | 
							- @c -*-texinfo-*-
 
- @c This file is part of the StarPU Handbook.
 
- @c Copyright (C) 2009--2011  Universit@'e de Bordeaux 1
 
- @c Copyright (C) 2010, 2011  Centre National de la Recherche Scientifique
 
- @c Copyright (C) 2011 Institut National de Recherche en Informatique et Automatique
 
- @c See the file starpu.texi for copying conditions.
 
- @node Introduction
 
- @chapter Introduction to StarPU
 
- @menu
 
- * Motivation::                  Why StarPU ?
 
- * StarPU in a Nutshell::        The Fundamentals of StarPU
 
- @end menu
 
- @node Motivation
 
- @section Motivation
 
- @c complex machines with heterogeneous cores/devices
 
- The use of specialized hardware such as accelerators or coprocessors offers an
 
- interesting approach to overcome the physical limits encountered by processor
 
- architects. As a result, many machines are now equipped with one or several
 
- accelerators (e.g. a GPU), in addition to the usual processor(s). While a lot of
 
- efforts have been devoted to offload computation onto such accelerators, very
 
- little attention as been paid to portability concerns on the one hand, and to the
 
- possibility of having heterogeneous accelerators and processors to interact on the other hand.
 
- StarPU is a runtime system that offers support for heterogeneous multicore
 
- architectures, it not only offers a unified view of the computational resources
 
- (i.e. CPUs and accelerators at the same time), but it also takes care of
 
- efficiently mapping and executing tasks onto an heterogeneous machine while
 
- transparently handling low-level issues such as data transfers in a portable
 
- fashion.
 
- @c this leads to a complicated distributed memory design
 
- @c which is not (easily) manageable by hand
 
- @c added value/benefits of StarPU
 
- @c   - portability
 
- @c   - scheduling, perf. portability
 
- @node StarPU in a Nutshell
 
- @section StarPU in a Nutshell
 
- @menu
 
- * Codelet and Tasks::           
 
- * StarPU Data Management Library::  
 
- * Glossary::
 
- * Research Papers::
 
- @end menu
 
- From a programming point of view, StarPU is not a new language but a library
 
- that executes tasks explicitly submitted by the application.  The data that a
 
- task manipulates are automatically transferred onto the accelerator so that
 
- the programmer does not have to take care of complex data movements.  StarPU
 
- also takes particular care of scheduling those tasks efficiently and allows
 
- scheduling experts to implement custom scheduling policies in a portable
 
- fashion. The target audience is typically developers of compilers or computation
 
- libraries which want to seamlessly extend them to support heterogeneous
 
- architectures.
 
- @c explain the notion of codelet and task (i.e. g(A, B)
 
- @node Codelet and Tasks
 
- @subsection Codelet and Tasks
 
- One of the StarPU primary data structures is the @b{codelet}. A codelet describes a
 
- computational kernel that can possibly be implemented on multiple architectures
 
- such as a CPU, a CUDA device or a Cell's SPU.
 
- @c TODO insert illustration f : f_spu, f_cpu, ...
 
- Another important data structure is the @b{task}. Executing a StarPU task
 
- consists in applying a codelet on a data set, on one of the architectures on
 
- which the codelet is implemented. A task thus describes the codelet that it
 
- uses, but also which data are accessed, and how they are
 
- accessed during the computation (read and/or write).
 
- StarPU tasks are asynchronous: submitting a task to StarPU is a non-blocking
 
- operation. The task structure can also specify a @b{callback} function that is
 
- called once StarPU has properly executed the task. It also contains optional
 
- fields that the application may use to give hints to the scheduler (such as
 
- priority levels).
 
- By default, task dependencies are inferred from data dependency (sequential
 
- coherence) by StarPU. The application can however disable sequential coherency
 
- for some data, and dependencies be expressed by hand.
 
- A task may be identified by a unique 64-bit number chosen by the application
 
- which we refer as a @b{tag}.
 
- Task dependencies can be enforced by hand either by the means of callback functions, by
 
- submitting other tasks, or by expressing dependencies
 
- between tags (which can thus correspond to tasks that have not been submitted
 
- yet).
 
- @c TODO insert illustration f(Ar, Brw, Cr) + ..
 
- @c DSM
 
- @node StarPU Data Management Library
 
- @subsection StarPU Data Management Library
 
- Because StarPU schedules tasks at runtime, data transfers have to be
 
- done automatically and ``just-in-time'' between processing units,
 
- relieving the application programmer from explicit data transfers.
 
- Moreover, to avoid unnecessary transfers, StarPU keeps data
 
- where it was last needed, even if was modified there, and it
 
- allows multiple copies of the same data to reside at the same time on
 
- several processing units as long as it is not modified.
 
- @node Glossary
 
- @subsection Glossary
 
- A @b{codelet} records pointers to various implementations of the same
 
- theoretical function.
 
- A @b{memory node} can be either the main RAM or GPU-embedded memory.
 
- A @b{bus} is a link between memory nodes.
 
- A @b{data handle} keeps track of replicates of the same data (@b{registered} by the
 
- application) over various memory nodes. The data management library manages
 
- keeping them coherent.
 
- The @b{home} memory node of a data handle is the memory node from which the data
 
- was registered (usually the main memory node).
 
- A @b{task} represents a scheduled execution of a codelet on some data handles.
 
- A @b{tag} is a rendez-vous point. Tasks typically have their own tag, and can
 
- depend on other tags. The value is chosen by the application.
 
- A @b{worker} execute tasks. There is typically one per CPU computation core and
 
- one per accelerator (for which a whole CPU core is dedicated).
 
- A @b{driver} drives a given kind of workers. There are currently CPU, CUDA,
 
- OpenCL and Gordon drivers. They usually start several workers to actually drive
 
- them.
 
- A @b{performance model} is a (dynamic or static) model of the performance of a
 
- given codelet. Codelets can have execution time performance model as well as
 
- power consumption performance models.
 
- A data @b{interface} describes the layout of the data: for a vector, a pointer
 
- for the start, the number of elements and the size of elements ; for a matrix, a
 
- pointer for the start, the number of elements per row, the offset between rows,
 
- and the size of each element ; etc. To access their data, codelet functions are
 
- given interfaces for the local memory node replicates of the data handles of the
 
- scheduled task.
 
- @b{Partitioning} data means dividing the data of a given data handle (called
 
- @b{father}) into a series of @b{children} data handles which designate various
 
- portions of the former.
 
- A @b{filter} is the function which computes children data handles from a father
 
- data handle, and thus describes how the partitioning should be done (horizontal,
 
- vertical, etc.)
 
- @b{Acquiring} a data handle can be done from the main application, to safely
 
- access the data of a data handle from its home node, without having to
 
- unregister it.
 
- @node Research Papers
 
- @subsection Research Papers
 
- Research papers about StarPU can be found at
 
- @indicateurl{http://runtime.bordeaux.inria.fr/Publis/Keyword/STARPU.html}
 
- Notably a good overview in the research report
 
- @indicateurl{http://hal.archives-ouvertes.fr/inria-00467677}
 
 
  |