introduction.doxy 9.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242
  1. /*
  2. * This file is part of the StarPU Handbook.
  3. * Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. * Copyright (C) 2010, 2011, 2012, 2013 Centre National de la Recherche Scientifique
  5. * Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
  6. * See the file version.doxy for copying conditions.
  7. */
  8. /*! \mainpage Introduction
  9. \htmlonly
  10. <h1><a class="anchor" id="Foreword"></a>Foreword</h1>
  11. \endhtmlonly
  12. \htmlinclude version.html
  13. \htmlinclude foreword.html
  14. \section Motivation Motivation
  15. \internal
  16. complex machines with heterogeneous cores/devices
  17. \endinternal
  18. The use of specialized hardware such as accelerators or coprocessors offers an
  19. interesting approach to overcome the physical limits encountered by processor
  20. architects. As a result, many machines are now equipped with one or several
  21. accelerators (e.g. a GPU), in addition to the usual processor(s). While a lot of
  22. efforts have been devoted to offload computation onto such accelerators, very
  23. little attention as been paid to portability concerns on the one hand, and to the
  24. possibility of having heterogeneous accelerators and processors to interact on the other hand.
  25. StarPU is a runtime system that offers support for heterogeneous multicore
  26. architectures, it not only offers a unified view of the computational resources
  27. (i.e. CPUs and accelerators at the same time), but it also takes care of
  28. efficiently mapping and executing tasks onto an heterogeneous machine while
  29. transparently handling low-level issues such as data transfers in a portable
  30. fashion.
  31. \internal
  32. this leads to a complicated distributed memory design
  33. which is not (easily) manageable by hand
  34. added value/benefits of StarPU
  35. - portability
  36. - scheduling, perf. portability
  37. \endinternal
  38. \section StarPUInANutshell StarPU in a Nutshell
  39. StarPU is a software tool aiming to allow programmers to exploit the
  40. computing power of the available CPUs and GPUs, while relieving them
  41. from the need to specially adapt their programs to the target machine
  42. and processing units.
  43. At the core of StarPU is its run-time support library, which is
  44. responsible for scheduling application-provided tasks on heterogeneous
  45. CPU/GPU machines. In addition, StarPU comes with programming language
  46. support, in the form of extensions to languages of the C family
  47. (\ref cExtensions), as well as an OpenCL front-end (\ref SOCLOpenclExtensions).
  48. StarPU's run-time and programming language extensions support a
  49. task-based programming model. Applications submit computational
  50. tasks, with CPU and/or GPU implementations, and StarPU schedules these
  51. tasks and associated data transfers on available CPUs and GPUs. The
  52. data that a task manipulates are automatically transferred among
  53. accelerators and the main memory, so that programmers are freed from the
  54. scheduling issues and technical details associated with these transfers.
  55. StarPU takes particular care of scheduling tasks efficiently, using
  56. well-known algorithms from the literature (\ref TaskSchedulingPolicy).
  57. In addition, it allows scheduling experts, such as compiler or
  58. computational library developers, to implement custom scheduling
  59. policies in a portable fashion (\ref DefiningANewSchedulingPolicy).
  60. The remainder of this section describes the main concepts used in StarPU.
  61. \internal
  62. explain the notion of codelet and task (i.e. g(A, B)
  63. \endinternal
  64. \subsection CodeletAndTasks Codelet and Tasks
  65. One of the StarPU primary data structures is the \b codelet. A codelet describes a
  66. computational kernel that can possibly be implemented on multiple architectures
  67. such as a CPU, a CUDA device or an OpenCL device.
  68. \internal
  69. TODO insert illustration f: f_spu, f_cpu, ...
  70. \endinternal
  71. Another important data structure is the \b task. Executing a StarPU task
  72. consists in applying a codelet on a data set, on one of the architectures on
  73. which the codelet is implemented. A task thus describes the codelet that it
  74. uses, but also which data are accessed, and how they are
  75. accessed during the computation (read and/or write).
  76. StarPU tasks are asynchronous: submitting a task to StarPU is a non-blocking
  77. operation. The task structure can also specify a \b callback function that is
  78. called once StarPU has properly executed the task. It also contains optional
  79. fields that the application may use to give hints to the scheduler (such as
  80. priority levels).
  81. By default, task dependencies are inferred from data dependency (sequential
  82. coherence) by StarPU. The application can however disable sequential coherency
  83. for some data, and dependencies be expressed by hand.
  84. A task may be identified by a unique 64-bit number chosen by the application
  85. which we refer as a \b tag.
  86. Task dependencies can be enforced by hand either by the means of callback functions, by
  87. submitting other tasks, or by expressing dependencies
  88. between tags (which can thus correspond to tasks that have not been submitted
  89. yet).
  90. \internal
  91. TODO insert illustration f(Ar, Brw, Cr) + ..
  92. \endinternal
  93. \internal
  94. DSM
  95. \endinternal
  96. \subsection StarPUDataManagementLibrary StarPU Data Management Library
  97. Because StarPU schedules tasks at runtime, data transfers have to be
  98. done automatically and ``just-in-time'' between processing units,
  99. relieving the application programmer from explicit data transfers.
  100. Moreover, to avoid unnecessary transfers, StarPU keeps data
  101. where it was last needed, even if was modified there, and it
  102. allows multiple copies of the same data to reside at the same time on
  103. several processing units as long as it is not modified.
  104. \section ApplicationTaskification Application Taskification
  105. TODO
  106. \internal
  107. TODO: section describing what taskifying an application means: before
  108. porting to StarPU, turn the program into:
  109. "pure" functions, which only access data from their passed parameters
  110. a main function which just calls these pure functions
  111. and then it's trivial to use StarPU or any other kind of task-based library:
  112. simply replace calling the function with submitting a task.
  113. \endinternal
  114. \section Glossary Glossary
  115. A \b codelet records pointers to various implementations of the same
  116. theoretical function.
  117. A <b>memory node</b> can be either the main RAM or GPU-embedded memory.
  118. A \b bus is a link between memory nodes.
  119. A <b>data handle</b> keeps track of replicates of the same data (\b registered by the
  120. application) over various memory nodes. The data management library manages
  121. keeping them coherent.
  122. The \b home memory node of a data handle is the memory node from which the data
  123. was registered (usually the main memory node).
  124. A \b task represents a scheduled execution of a codelet on some data handles.
  125. A \b tag is a rendez-vous point. Tasks typically have their own tag, and can
  126. depend on other tags. The value is chosen by the application.
  127. A \b worker execute tasks. There is typically one per CPU computation core and
  128. one per accelerator (for which a whole CPU core is dedicated).
  129. A \b driver drives a given kind of workers. There are currently CPU, CUDA,
  130. and OpenCL drivers. They usually start several workers to actually drive
  131. them.
  132. A <b>performance model</b> is a (dynamic or static) model of the performance of a
  133. given codelet. Codelets can have execution time performance model as well as
  134. power consumption performance models.
  135. A data \b interface describes the layout of the data: for a vector, a pointer
  136. for the start, the number of elements and the size of elements ; for a matrix, a
  137. pointer for the start, the number of elements per row, the offset between rows,
  138. and the size of each element ; etc. To access their data, codelet functions are
  139. given interfaces for the local memory node replicates of the data handles of the
  140. scheduled task.
  141. \b Partitioning data means dividing the data of a given data handle (called
  142. \b father) into a series of \b children data handles which designate various
  143. portions of the former.
  144. A \b filter is the function which computes children data handles from a father
  145. data handle, and thus describes how the partitioning should be done (horizontal,
  146. vertical, etc.)
  147. \b Acquiring a data handle can be done from the main application, to safely
  148. access the data of a data handle from its home node, without having to
  149. unregister it.
  150. \section ResearchPapers Research Papers
  151. Research papers about StarPU can be found at
  152. http://runtime.bordeaux.inria.fr/Publis/Keyword/STARPU.html.
  153. A good overview is available in the research report at
  154. http://hal.archives-ouvertes.fr/inria-00467677.
  155. \section FurtherReading Further Reading
  156. The documentation chapters include
  157. <ol>
  158. <li> Part: Using StarPU
  159. <ul>
  160. <li> \ref BuildingAndInstallingStarPU
  161. <li> \ref BasicExamples
  162. <li> \ref AdvancedExamples
  163. <li> \ref HowToOptimizePerformanceWithStarPU
  164. <li> \ref PerformanceFeedback
  165. <li> \ref TipsAndTricksToKnowAbout
  166. <li> \ref MPISupport
  167. <li> \ref FFTSupport
  168. <li> \ref cExtensions
  169. <li> \ref SOCLOpenclExtensions
  170. <li> \ref SchedulingContexts
  171. <li> \ref SchedulingContextHypervisor
  172. </ul>
  173. </li>
  174. <li> Part: Inside StarPU
  175. <ul>
  176. <li> \ref ExecutionConfigurationThroughEnvironmentVariables
  177. <li> \ref CompilationConfiguration
  178. <li> \ref ModuleDocumentation
  179. <li> \ref deprecated
  180. </ul>
  181. <li> Part: Appendix
  182. <ul>
  183. <li> \ref FullSourceCodeVectorScal
  184. <li> \ref GNUFreeDocumentationLicense
  185. </ul>
  186. </ol>
  187. Make sure to have had a look at those too!
  188. */