introduction.texi 7.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011, 2012 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @menu
  8. * Motivation:: Why StarPU ?
  9. * StarPU in a Nutshell:: The Fundamentals of StarPU
  10. @end menu
  11. @node Motivation
  12. @section Motivation
  13. @c complex machines with heterogeneous cores/devices
  14. The use of specialized hardware such as accelerators or coprocessors offers an
  15. interesting approach to overcome the physical limits encountered by processor
  16. architects. As a result, many machines are now equipped with one or several
  17. accelerators (e.g. a GPU), in addition to the usual processor(s). While a lot of
  18. efforts have been devoted to offload computation onto such accelerators, very
  19. little attention as been paid to portability concerns on the one hand, and to the
  20. possibility of having heterogeneous accelerators and processors to interact on the other hand.
  21. StarPU is a runtime system that offers support for heterogeneous multicore
  22. architectures, it not only offers a unified view of the computational resources
  23. (i.e. CPUs and accelerators at the same time), but it also takes care of
  24. efficiently mapping and executing tasks onto an heterogeneous machine while
  25. transparently handling low-level issues such as data transfers in a portable
  26. fashion.
  27. @c this leads to a complicated distributed memory design
  28. @c which is not (easily) manageable by hand
  29. @c added value/benefits of StarPU
  30. @c - portability
  31. @c - scheduling, perf. portability
  32. @node StarPU in a Nutshell
  33. @section StarPU in a Nutshell
  34. StarPU is a software tool aiming to allow programmers to exploit the
  35. computing power of the available CPUs and GPUs, while relieving them
  36. from the need to specially adapt their programs to the target machine
  37. and processing units.
  38. At the core of StarPU is its run-time support library, which is
  39. responsible for scheduling application-provided tasks on heterogeneous
  40. CPU/GPU machines. In addition, StarPU comes with programming language
  41. support, in the form of extensions to languages of the C family
  42. (@pxref{C Extensions}), as well as an OpenCL front-end (@pxref{SOCL
  43. OpenCL Extensions}).
  44. @cindex task-based programming model
  45. StarPU's run-time and programming language extensions support a
  46. @dfn{task-based programming model}. Applications submit computational
  47. tasks, with CPU and/or GPU implementations, and StarPU schedules these
  48. tasks and associated data transfers on available CPUs and GPUs. The
  49. data that a task manipulates are automatically transferred among
  50. accelerators and the main memory, so that programmers are freed from the
  51. scheduling issues and technical details associated with these transfers.
  52. StarPU takes particular care of scheduling tasks efficiently, using
  53. well-known algorithms from the literature (@pxref{Task scheduling
  54. policy}). In addition, it allows scheduling experts, such as compiler
  55. or computational library developers, to implement custom scheduling
  56. policies in a portable fashion (@pxref{Scheduling Policy API}).
  57. The remainder of this section describes the main concepts used in StarPU.
  58. @menu
  59. * Codelet and Tasks::
  60. * StarPU Data Management Library::
  61. * Glossary::
  62. * Research Papers::
  63. @end menu
  64. @c explain the notion of codelet and task (i.e. g(A, B)
  65. @node Codelet and Tasks
  66. @subsection Codelet and Tasks
  67. @cindex codelet
  68. One of the StarPU primary data structures is the @b{codelet}. A codelet describes a
  69. computational kernel that can possibly be implemented on multiple architectures
  70. such as a CPU, a CUDA device or an OpenCL device.
  71. @c TODO insert illustration f: f_spu, f_cpu, ...
  72. @cindex task
  73. Another important data structure is the @b{task}. Executing a StarPU task
  74. consists in applying a codelet on a data set, on one of the architectures on
  75. which the codelet is implemented. A task thus describes the codelet that it
  76. uses, but also which data are accessed, and how they are
  77. accessed during the computation (read and/or write).
  78. StarPU tasks are asynchronous: submitting a task to StarPU is a non-blocking
  79. operation. The task structure can also specify a @b{callback} function that is
  80. called once StarPU has properly executed the task. It also contains optional
  81. fields that the application may use to give hints to the scheduler (such as
  82. priority levels).
  83. @cindex tag
  84. By default, task dependencies are inferred from data dependency (sequential
  85. coherence) by StarPU. The application can however disable sequential coherency
  86. for some data, and dependencies be expressed by hand.
  87. A task may be identified by a unique 64-bit number chosen by the application
  88. which we refer as a @b{tag}.
  89. Task dependencies can be enforced by hand either by the means of callback functions, by
  90. submitting other tasks, or by expressing dependencies
  91. between tags (which can thus correspond to tasks that have not been submitted
  92. yet).
  93. @c TODO insert illustration f(Ar, Brw, Cr) + ..
  94. @c DSM
  95. @node StarPU Data Management Library
  96. @subsection StarPU Data Management Library
  97. Because StarPU schedules tasks at runtime, data transfers have to be
  98. done automatically and ``just-in-time'' between processing units,
  99. relieving the application programmer from explicit data transfers.
  100. Moreover, to avoid unnecessary transfers, StarPU keeps data
  101. where it was last needed, even if was modified there, and it
  102. allows multiple copies of the same data to reside at the same time on
  103. several processing units as long as it is not modified.
  104. @node Glossary
  105. @subsection Glossary
  106. A @b{codelet} records pointers to various implementations of the same
  107. theoretical function.
  108. A @b{memory node} can be either the main RAM or GPU-embedded memory.
  109. A @b{bus} is a link between memory nodes.
  110. A @b{data handle} keeps track of replicates of the same data (@b{registered} by the
  111. application) over various memory nodes. The data management library manages
  112. keeping them coherent.
  113. The @b{home} memory node of a data handle is the memory node from which the data
  114. was registered (usually the main memory node).
  115. A @b{task} represents a scheduled execution of a codelet on some data handles.
  116. A @b{tag} is a rendez-vous point. Tasks typically have their own tag, and can
  117. depend on other tags. The value is chosen by the application.
  118. A @b{worker} execute tasks. There is typically one per CPU computation core and
  119. one per accelerator (for which a whole CPU core is dedicated).
  120. A @b{driver} drives a given kind of workers. There are currently CPU, CUDA,
  121. and OpenCL drivers. They usually start several workers to actually drive
  122. them.
  123. A @b{performance model} is a (dynamic or static) model of the performance of a
  124. given codelet. Codelets can have execution time performance model as well as
  125. power consumption performance models.
  126. A data @b{interface} describes the layout of the data: for a vector, a pointer
  127. for the start, the number of elements and the size of elements ; for a matrix, a
  128. pointer for the start, the number of elements per row, the offset between rows,
  129. and the size of each element ; etc. To access their data, codelet functions are
  130. given interfaces for the local memory node replicates of the data handles of the
  131. scheduled task.
  132. @b{Partitioning} data means dividing the data of a given data handle (called
  133. @b{father}) into a series of @b{children} data handles which designate various
  134. portions of the former.
  135. A @b{filter} is the function which computes children data handles from a father
  136. data handle, and thus describes how the partitioning should be done (horizontal,
  137. vertical, etc.)
  138. @b{Acquiring} a data handle can be done from the main application, to safely
  139. access the data of a data handle from its home node, without having to
  140. unregister it.
  141. @node Research Papers
  142. @subsection Research Papers
  143. Research papers about StarPU can be found at
  144. @indicateurl{http://runtime.bordeaux.inria.fr/Publis/Keyword/STARPU.html}.
  145. A good overview is available in the research report at
  146. @indicateurl{http://hal.archives-ouvertes.fr/inria-00467677}.