introduction.texi 7.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @node Introduction
  8. @chapter Introduction to StarPU
  9. @menu
  10. * Motivation:: Why StarPU ?
  11. * StarPU in a Nutshell:: The Fundamentals of StarPU
  12. @end menu
  13. @node Motivation
  14. @section Motivation
  15. @c complex machines with heterogeneous cores/devices
  16. The use of specialized hardware such as accelerators or coprocessors offers an
  17. interesting approach to overcome the physical limits encountered by processor
  18. architects. As a result, many machines are now equipped with one or several
  19. accelerators (e.g. a GPU), in addition to the usual processor(s). While a lot of
  20. efforts have been devoted to offload computation onto such accelerators, very
  21. little attention as been paid to portability concerns on the one hand, and to the
  22. possibility of having heterogeneous accelerators and processors to interact on the other hand.
  23. StarPU is a runtime system that offers support for heterogeneous multicore
  24. architectures, it not only offers a unified view of the computational resources
  25. (i.e. CPUs and accelerators at the same time), but it also takes care of
  26. efficiently mapping and executing tasks onto an heterogeneous machine while
  27. transparently handling low-level issues such as data transfers in a portable
  28. fashion.
  29. @c this leads to a complicated distributed memory design
  30. @c which is not (easily) manageable by hand
  31. @c added value/benefits of StarPU
  32. @c - portability
  33. @c - scheduling, perf. portability
  34. @node StarPU in a Nutshell
  35. @section StarPU in a Nutshell
  36. @menu
  37. * Codelet and Tasks::
  38. * StarPU Data Management Library::
  39. * Glossary::
  40. * Research Papers::
  41. @end menu
  42. From a programming point of view, StarPU is not a new language but a library
  43. that executes tasks explicitly submitted by the application. The data that a
  44. task manipulates are automatically transferred onto the accelerator so that
  45. the programmer does not have to take care of complex data movements. StarPU
  46. also takes particular care of scheduling those tasks efficiently and allows
  47. scheduling experts to implement custom scheduling policies in a portable
  48. fashion. The target audience is typically developers of compilers or computation
  49. libraries which want to seamlessly extend them to support heterogeneous
  50. architectures.
  51. @c explain the notion of codelet and task (i.e. g(A, B)
  52. @node Codelet and Tasks
  53. @subsection Codelet and Tasks
  54. One of the StarPU primary data structures is the @b{codelet}. A codelet describes a
  55. computational kernel that can possibly be implemented on multiple architectures
  56. such as a CPU, a CUDA device or a Cell's SPU.
  57. @c TODO insert illustration f : f_spu, f_cpu, ...
  58. Another important data structure is the @b{task}. Executing a StarPU task
  59. consists in applying a codelet on a data set, on one of the architectures on
  60. which the codelet is implemented. A task thus describes the codelet that it
  61. uses, but also which data are accessed, and how they are
  62. accessed during the computation (read and/or write).
  63. StarPU tasks are asynchronous: submitting a task to StarPU is a non-blocking
  64. operation. The task structure can also specify a @b{callback} function that is
  65. called once StarPU has properly executed the task. It also contains optional
  66. fields that the application may use to give hints to the scheduler (such as
  67. priority levels).
  68. By default, task dependencies are inferred from data dependency (sequential
  69. coherence) by StarPU. The application can however disable sequential coherency
  70. for some data, and dependencies be expressed by hand.
  71. A task may be identified by a unique 64-bit number chosen by the application
  72. which we refer as a @b{tag}.
  73. Task dependencies can be enforced by hand either by the means of callback functions, by
  74. submitting other tasks, or by expressing dependencies
  75. between tags (which can thus correspond to tasks that have not been submitted
  76. yet).
  77. @c TODO insert illustration f(Ar, Brw, Cr) + ..
  78. @c DSM
  79. @node StarPU Data Management Library
  80. @subsection StarPU Data Management Library
  81. Because StarPU schedules tasks at runtime, data transfers have to be
  82. done automatically and ``just-in-time'' between processing units,
  83. relieving the application programmer from explicit data transfers.
  84. Moreover, to avoid unnecessary transfers, StarPU keeps data
  85. where it was last needed, even if was modified there, and it
  86. allows multiple copies of the same data to reside at the same time on
  87. several processing units as long as it is not modified.
  88. @node Glossary
  89. @subsection Glossary
  90. A @b{codelet} records pointers to various implementations of the same
  91. theoretical function.
  92. A @b{memory node} can be either the main RAM or GPU-embedded memory.
  93. A @b{bus} is a link between memory nodes.
  94. A @b{data handle} keeps track of replicates of the same data (@b{registered} by the
  95. application) over various memory nodes. The data management library manages
  96. keeping them coherent.
  97. The @b{home} memory node of a data handle is the memory node from which the data
  98. was registered (usually the main memory node).
  99. A @b{task} represents a scheduled execution of a codelet on some data handles.
  100. A @b{tag} is a rendez-vous point. Tasks typically have their own tag, and can
  101. depend on other tags. The value is chosen by the application.
  102. A @b{worker} execute tasks. There is typically one per CPU computation core and
  103. one per accelerator (for which a whole CPU core is dedicated).
  104. A @b{driver} drives a given kind of workers. There are currently CPU, CUDA,
  105. OpenCL and Gordon drivers. They usually start several workers to actually drive
  106. them.
  107. A @b{performance model} is a (dynamic or static) model of the performance of a
  108. given codelet. Codelets can have execution time performance model as well as
  109. power consumption performance models.
  110. A data @b{interface} describes the layout of the data: for a vector, a pointer
  111. for the start, the number of elements and the size of elements ; for a matrix, a
  112. pointer for the start, the number of elements per row, the offset between rows,
  113. and the size of each element ; etc. To access their data, codelet functions are
  114. given interfaces for the local memory node replicates of the data handles of the
  115. scheduled task.
  116. @b{Partitioning} data means dividing the data of a given data handle (called
  117. @b{father}) into a series of @b{children} data handles which designate various
  118. portions of the former.
  119. A @b{filter} is the function which computes children data handles from a father
  120. data handle, and thus describes how the partitioning should be done (horizontal,
  121. vertical, etc.)
  122. @b{Acquiring} a data handle can be done from the main application, to safely
  123. access the data of a data handle from its home node, without having to
  124. unregister it.
  125. @node Research Papers
  126. @subsection Research Papers
  127. Research papers about StarPU can be found at
  128. @indicateurl{http://runtime.bordeaux.inria.fr/Publis/Keyword/STARPU.html}
  129. Notably a good overview in the research report
  130. @indicateurl{http://hal.archives-ouvertes.fr/inria-00467677}