000_introduction.doxy 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325
  1. /* StarPU --- Runtime system for heterogeneous multicore architectures.
  2. *
  3. * Copyright (C) 2009-2021 Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
  4. *
  5. * StarPU is free software; you can redistribute it and/or modify
  6. * it under the terms of the GNU Lesser General Public License as published by
  7. * the Free Software Foundation; either version 2.1 of the License, or (at
  8. * your option) any later version.
  9. *
  10. * StarPU is distributed in the hope that it will be useful, but
  11. * WITHOUT ANY WARRANTY; without even the implied warranty of
  12. * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  13. *
  14. * See the GNU Lesser General Public License in COPYING.LGPL for more details.
  15. */
  16. /*! \mainpage Introduction
  17. \htmlonly
  18. <h1><a class="anchor" id="Foreword"></a>Foreword</h1>
  19. \endhtmlonly
  20. \htmlinclude version.html
  21. \htmlinclude foreword.html
  22. \section Motivation Motivation
  23. // This is a comment and it will be removed before the file is processed by doxygen
  24. // complex machines with heterogeneous cores/devices
  25. The use of specialized hardware such as accelerators or coprocessors offers an
  26. interesting approach to overcome the physical limits encountered by processor
  27. architects. As a result, many machines are now equipped with one or several
  28. accelerators (e.g. a GPU), in addition to the usual processor(s). While a lot of
  29. efforts have been devoted to offload computation onto such accelerators, very
  30. little attention as been paid to portability concerns on the one hand, and to the
  31. possibility of having heterogeneous accelerators and processors to interact on the other hand.
  32. StarPU is a runtime system that offers support for heterogeneous multicore
  33. architectures, it not only offers a unified view of the computational resources
  34. (i.e. CPUs and accelerators at the same time), but it also takes care of
  35. efficiently mapping and executing tasks onto an heterogeneous machine while
  36. transparently handling low-level issues such as data transfers in a portable
  37. fashion.
  38. // this leads to a complicated distributed memory design
  39. // which is not (easily) manageable by hand
  40. // added value/benefits of StarPU
  41. // - portability
  42. // - scheduling, perf. portability
  43. \section StarPUInANutshell StarPU in a Nutshell
  44. StarPU is a software tool aiming to allow programmers to exploit the
  45. computing power of the available CPUs and GPUs, while relieving them
  46. from the need to specially adapt their programs to the target machine
  47. and processing units.
  48. At the core of StarPU is its runtime support library, which is
  49. responsible for scheduling application-provided tasks on heterogeneous
  50. CPU/GPU machines. In addition, StarPU comes with programming language
  51. support, in the form of an OpenCL front-end (\ref SOCLOpenclExtensions).
  52. StarPU's runtime and programming language extensions support a
  53. task-based programming model. Applications submit computational
  54. tasks, with CPU and/or GPU implementations, and StarPU schedules these
  55. tasks and associated data transfers on available CPUs and GPUs. The
  56. data that a task manipulates are automatically transferred among
  57. accelerators and the main memory, so that programmers are freed from the
  58. scheduling issues and technical details associated with these transfers.
  59. StarPU takes particular care of scheduling tasks efficiently, using
  60. well-known algorithms from the literature (\ref TaskSchedulingPolicy).
  61. In addition, it allows scheduling experts, such as compiler or
  62. computational library developers, to implement custom scheduling
  63. policies in a portable fashion (\ref HowToDefineANewSchedulingPolicy).
  64. The remainder of this section describes the main concepts used in StarPU.
  65. A video is available on the StarPU website
  66. https://starpu.gitlabpages.inria.fr/ that presents these concepts in 26 minutes.
  67. Some tutorials are also available on https://starpu.gitlabpages.inria.fr/tutorials/
  68. // explain the notion of codelet and task (i.e. g(A, B)
  69. \subsection CodeletAndTasks Codelet and Tasks
  70. One of the StarPU primary data structures is the \b codelet. A codelet describes a
  71. computational kernel that can possibly be implemented on multiple architectures
  72. such as a CPU, a CUDA device or an OpenCL device.
  73. // TODO insert illustration f: f_spu, f_cpu, ...
  74. Another important data structure is the \b task. Executing a StarPU task
  75. consists in applying a codelet on a data set, on one of the architectures on
  76. which the codelet is implemented. A task thus describes the codelet that it
  77. uses, but also which data are accessed, and how they are
  78. accessed during the computation (read and/or write).
  79. StarPU tasks are asynchronous: submitting a task to StarPU is a non-blocking
  80. operation. The task structure can also specify a \b callback function that is
  81. called once StarPU has properly executed the task. It also contains optional
  82. fields that the application may use to give hints to the scheduler (such as
  83. priority levels).
  84. By default, task dependencies are inferred from data dependency (sequential
  85. coherency) by StarPU. The application can however disable sequential coherency
  86. for some data, and dependencies can be specifically expressed.
  87. A task may be identified by a unique 64-bit number chosen by the application
  88. which we refer as a \b tag.
  89. Task dependencies can be enforced either by the means of callback functions, by
  90. submitting other tasks, or by expressing dependencies
  91. between tags (which can thus correspond to tasks that have not yet been submitted).
  92. // TODO insert illustration f(Ar, Brw, Cr) + ..
  93. // DSM
  94. \subsection StarPUDataManagementLibrary StarPU Data Management Library
  95. Because StarPU schedules tasks at runtime, data transfers have to be
  96. done automatically and ``just-in-time'' between processing units,
  97. relieving application programmers from explicit data transfers.
  98. Moreover, to avoid unnecessary transfers, StarPU keeps data
  99. where it was last needed, even if was modified there, and it
  100. allows multiple copies of the same data to reside at the same time on
  101. several processing units as long as it is not modified.
  102. \section ApplicationTaskification Application Taskification
  103. TODO
  104. // TODO: section describing what taskifying an application means: before
  105. // porting to StarPU, turn the program into:
  106. // "pure" functions, which only access data from their passed parameters
  107. // a main function which just calls these pure functions
  108. // and then it's trivial to use StarPU or any other kind of task-based library:
  109. // simply replace calling the function with submitting a task.
  110. \section Glossary Glossary
  111. A \b codelet records pointers to various implementations of the same
  112. theoretical function.
  113. A <b>memory node</b> can be either the main RAM, GPU-embedded memory or a disk memory.
  114. A \b bus is a link between memory nodes.
  115. A <b>data handle</b> keeps track of replicates of the same data (\b registered by the
  116. application) over various memory nodes. The data management library manages to
  117. keep them coherent.
  118. The \b home memory node of a data handle is the memory node from which the data
  119. was registered (usually the main memory node).
  120. A \b task represents a scheduled execution of a codelet on some data handles.
  121. A \b tag is a rendez-vous point. Tasks typically have their own tag, and can
  122. depend on other tags. The value is chosen by the application.
  123. A \b worker execute tasks. There is typically one per CPU computation core and
  124. one per accelerator (for which a whole CPU core is dedicated).
  125. A \b driver drives a given kind of workers. There are currently CPU, CUDA,
  126. and OpenCL drivers. They usually start several workers to actually drive
  127. them.
  128. A <b>performance model</b> is a (dynamic or static) model of the performance of a
  129. given codelet. Codelets can have execution time performance model as well as
  130. energy consumption performance models.
  131. A data \b interface describes the layout of the data: for a vector, a pointer
  132. for the start, the number of elements and the size of elements ; for a matrix, a
  133. pointer for the start, the number of elements per row, the offset between rows,
  134. and the size of each element ; etc. To access their data, codelet functions are
  135. given interfaces for the local memory node replicates of the data handles of the
  136. scheduled task.
  137. \b Partitioning data means dividing the data of a given data handle (called
  138. \b father) into a series of \b children data handles which designate various
  139. portions of the former.
  140. A \b filter is the function which computes children data handles from a father
  141. data handle, and thus describes how the partitioning should be done (horizontal,
  142. vertical, etc.)
  143. \b Acquiring a data handle can be done from the main application, to safely
  144. access the data of a data handle from its home node, without having to
  145. unregister it.
  146. \section ResearchPapers Research Papers
  147. Research papers about StarPU can be found at
  148. https://starpu.gitlabpages.inria.fr/publications/.
  149. A good overview is available in the research report at
  150. http://hal.archives-ouvertes.fr/inria-00467677.
  151. \section StarPUApplications StarPU Applications
  152. You can first have a look at the chapters \ref BasicExamples and \ref AdvancedExamples.
  153. A tutorial is also installed in the directory <c>share/doc/starpu/tutorial/</c>.
  154. Many examples are also available in the StarPU sources in the directory
  155. <c>examples/</c>. Simple examples include:
  156. <dl>
  157. <dt> <c>incrementer/</c> </dt>
  158. <dd> Trivial incrementation test. </dd>
  159. <dt> <c>basic_examples/</c> </dt>
  160. <dd>
  161. Simple documented Hello world and vector/scalar product (as
  162. shown in \ref BasicExamples), matrix
  163. product examples (as shown in \ref PerformanceModelExample), an example using the blocked matrix data
  164. interface, an example using the variable data interface, and an example
  165. using different formats on CPUs and GPUs.
  166. </dd>
  167. <dt> <c>matvecmult/</c></dt>
  168. <dd>
  169. OpenCL example from NVidia, adapted to StarPU.
  170. </dd>
  171. <dt> <c>axpy/</c></dt>
  172. <dd>
  173. AXPY CUBLAS operation adapted to StarPU.
  174. </dd>
  175. <dt> <c>native_fortran/</c> </dt>
  176. <dd>
  177. Example of using StarPU's native Fortran support.
  178. </dd>
  179. <dt> <c>fortran90/</c> </dt>
  180. <dd>
  181. Example of Fortran 90 bindings, using C marshalling wrappers.
  182. </dd>
  183. <dt> <c>fortran/</c> </dt>
  184. <dd>
  185. Example of Fortran 77 bindings, using C marshalling wrappers.
  186. </dd>
  187. </dl>
  188. More advanced examples include:
  189. <dl>
  190. <dt><c>filters/</c></dt>
  191. <dd>
  192. Examples using filters, as shown in \ref PartitioningData.
  193. </dd>
  194. <dt><c>lu/</c></dt>
  195. <dd>
  196. LU matrix factorization, see for instance <c>xlu_implicit.c</c>
  197. </dd>
  198. <dt><c>cholesky/</c></dt>
  199. <dd>
  200. Cholesky matrix factorization, see for instance <c>cholesky_implicit.c</c>.
  201. </dd>
  202. </dl>
  203. \section FurtherReading Further Reading
  204. The documentation chapters include
  205. <ul>
  206. <li> Part 1: StarPU Basics
  207. <ul>
  208. <li> \ref BuildingAndInstallingStarPU
  209. <li> \ref BasicExamples
  210. </ul>
  211. <li> Part 2: StarPU Quick Programming Guide
  212. <ul>
  213. <li> \ref AdvancedExamples
  214. <li> \ref CheckListWhenPerformanceAreNotThere
  215. </ul>
  216. <li> Part 3: StarPU Inside
  217. <ul>
  218. <li> \ref TasksInStarPU
  219. <li> \ref DataManagement
  220. <li> \ref Scheduling
  221. <li> \ref SchedulingContexts
  222. <li> \ref SchedulingContextHypervisor
  223. <li> \ref HowToDefineANewSchedulingPolicy
  224. <li> \ref DebuggingTools
  225. <li> \ref OnlinePerformanceTools
  226. <li> \ref OfflinePerformanceTools
  227. <li> \ref FrequentlyAskedQuestions
  228. </ul>
  229. <li> Part 4: StarPU Extensions
  230. <ul>
  231. <li> \ref PythonInterface
  232. <li> \ref OutOfCore
  233. <li> \ref MPISupport
  234. <li> \ref FaultTolerance
  235. <li> \ref FFTSupport
  236. <li> \ref NativeFortranSupport
  237. <li> \ref SOCLOpenclExtensions
  238. <li> \ref SimGridSupport
  239. <li> \ref OpenMPRuntimeSupport
  240. <li> \ref ClusteringAMachine
  241. <li> \ref InteroperabilitySupport
  242. <li> \ref EclipsePlugin
  243. </ul>
  244. <li> Part 5: StarPU Reference API
  245. <ul>
  246. <li> \ref ExecutionConfigurationThroughEnvironmentVariables
  247. <li> \ref CompilationConfiguration
  248. <li> \ref ModuleDocumentation
  249. <li> \ref FileDocumentation
  250. <li> \ref deprecated
  251. </ul>
  252. <li> Part: Appendix
  253. <ul>
  254. <li> \ref FullSourceCodeVectorScal
  255. <li> \ref GNUFreeDocumentationLicense
  256. </ul>
  257. </ul>
  258. Make sure to have had a look at those too!
  259. */