perf-feedback.texi 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @node Performance feedback
  8. @chapter Performance feedback
  9. @menu
  10. * On-line:: On-line performance feedback
  11. * Off-line:: Off-line performance feedback
  12. * Codelet performance:: Performance of codelets
  13. @end menu
  14. @node On-line
  15. @section On-line performance feedback
  16. @menu
  17. * Enabling monitoring:: Enabling on-line performance monitoring
  18. * Task feedback:: Per-task feedback
  19. * Codelet feedback:: Per-codelet feedback
  20. * Worker feedback:: Per-worker feedback
  21. * Bus feedback:: Bus-related feedback
  22. * StarPU-Top:: StarPU-Top interface
  23. @end menu
  24. @node Enabling monitoring
  25. @subsection Enabling on-line performance monitoring
  26. In order to enable online performance monitoring, the application can call
  27. @code{starpu_profiling_status_set(STARPU_PROFILING_ENABLE)}. It is possible to
  28. detect whether monitoring is already enabled or not by calling
  29. @code{starpu_profiling_status_get()}. Enabling monitoring also reinitialize all
  30. previously collected feedback. The @code{STARPU_PROFILING} environment variable
  31. can also be set to 1 to achieve the same effect.
  32. Likewise, performance monitoring is stopped by calling
  33. @code{starpu_profiling_status_set(STARPU_PROFILING_DISABLE)}. Note that this
  34. does not reset the performance counters so that the application may consult
  35. them later on.
  36. More details about the performance monitoring API are available in section
  37. @ref{Profiling API}.
  38. @node Task feedback
  39. @subsection Per-task feedback
  40. If profiling is enabled, a pointer to a @code{starpu_task_profiling_info}
  41. structure is put in the @code{.profiling_info} field of the @code{starpu_task}
  42. structure when a task terminates.
  43. This structure is automatically destroyed when the task structure is destroyed,
  44. either automatically or by calling @code{starpu_task_destroy}.
  45. The @code{starpu_task_profiling_info} structure indicates the date when the
  46. task was submitted (@code{submit_time}), started (@code{start_time}), and
  47. terminated (@code{end_time}), relative to the initialization of
  48. StarPU with @code{starpu_init}. It also specifies the identifier of the worker
  49. that has executed the task (@code{workerid}).
  50. These date are stored as @code{timespec} structures which the user may convert
  51. into micro-seconds using the @code{starpu_timing_timespec_to_us} helper
  52. function.
  53. It it worth noting that the application may directly access this structure from
  54. the callback executed at the end of the task. The @code{starpu_task} structure
  55. associated to the callback currently being executed is indeed accessible with
  56. the @code{starpu_get_current_task()} function.
  57. @node Codelet feedback
  58. @subsection Per-codelet feedback
  59. The @code{per_worker_stats} field of the @code{starpu_codelet_t} structure is
  60. an array of counters. The i-th entry of the array is incremented every time a
  61. task implementing the codelet is executed on the i-th worker.
  62. This array is not reinitialized when profiling is enabled or disabled.
  63. @node Worker feedback
  64. @subsection Per-worker feedback
  65. The second argument returned by the @code{starpu_worker_get_profiling_info}
  66. function is a @code{starpu_worker_profiling_info} structure that gives
  67. statistics about the specified worker. This structure specifies when StarPU
  68. started collecting profiling information for that worker (@code{start_time}),
  69. the duration of the profiling measurement interval (@code{total_time}), the
  70. time spent executing kernels (@code{executing_time}), the time spent sleeping
  71. because there is no task to execute at all (@code{sleeping_time}), and the
  72. number of tasks that were executed while profiling was enabled.
  73. These values give an estimation of the proportion of time spent do real work,
  74. and the time spent either sleeping because there are not enough executable
  75. tasks or simply wasted in pure StarPU overhead.
  76. Calling @code{starpu_worker_get_profiling_info} resets the profiling
  77. information associated to a worker.
  78. When an FxT trace is generated (see @ref{Generating traces}), it is also
  79. possible to use the @code{starpu_top} script (described in @ref{starpu-top}) to
  80. generate a graphic showing the evolution of these values during the time, for
  81. the different workers.
  82. @node Bus feedback
  83. @subsection Bus-related feedback
  84. TODO
  85. @c how to enable/disable performance monitoring
  86. @c what kind of information do we get ?
  87. The bus speed measured by StarPU can be displayed by using the
  88. @code{starpu_machine_display} tool, for instance:
  89. @example
  90. StarPU has found :
  91. 3 CUDA devices
  92. CUDA 0 (Tesla C2050 02:00.0)
  93. CUDA 1 (Tesla C2050 03:00.0)
  94. CUDA 2 (Tesla C2050 84:00.0)
  95. from to RAM to CUDA 0 to CUDA 1 to CUDA 2
  96. RAM 0.000000 5176.530428 5176.492994 5191.710722
  97. CUDA 0 4523.732446 0.000000 2414.074751 2417.379201
  98. CUDA 1 4523.718152 2414.078822 0.000000 2417.375119
  99. CUDA 2 4534.229519 2417.069025 2417.060863 0.000000
  100. @end example
  101. @node StarPU-Top
  102. @subsection StarPU-Top interface
  103. StarPU-Top is an interface which remotely displays the on-line state of a StarPU
  104. application and permits the user to change parameters on the fly.
  105. Variables to be monitored can be registered by calling the
  106. @code{starputop_add_data_boolean}, @code{starputop_add_data_integer},
  107. @code{starputop_add_data_float} functions, e.g.:
  108. @example
  109. starputop_data *data = starputop_add_data_integer("mynum", 0, 100, 1);
  110. @end example
  111. The application should then call @code{starputop_init_and_wait} to give its name
  112. and wait for StarPU-Top to get a start request from the user. The name is used
  113. by StarPU-Top to quickly reload a previously-saved layout of parameter display.
  114. @example
  115. starputop_init_and_wait("the application");
  116. @end example
  117. The new values can then be provided thanks to
  118. @code{starputop_update_data_boolean}, @code{starputop_update_data_integer},
  119. @code{starputop_update_data_float}, e.g.:
  120. @example
  121. starputop_update_data_integer(data, mynum);
  122. @end example
  123. Updateable parameters can be registered thanks to @code{starputop_register_parameter_boolean}, @code{starputop_register_parameter_integer}, @code{starputop_register_parameter_float}, e.g.:
  124. @example
  125. float apha;
  126. starputop_register_parameter_float("alpha", &alpha, 0, 10, modif_hook);
  127. @end example
  128. @code{modif_hook} is a function which will be called when the parameter is being modified, it can for instance print the new value:
  129. @example
  130. void modif_hook(struct starputop_param_t *d) @{
  131. fprintf(stderr,"%s has been modified: %f\n", d->name, alpha);
  132. @}
  133. @end example
  134. Task schedulers should notify StarPU-Top when it has decided when a task will be
  135. scheduled, so that it can show it in its Gantt chart, for instance:
  136. @example
  137. starputop_task_prevision(task, workerid, begin, end);
  138. @end example
  139. Starting StarPU-Top and the application can be done two ways:
  140. @itemize
  141. @item The application is started by hand on some machine (and thus already
  142. waiting for the start event). In the Preference dialog of StarPU-Top, the SSH
  143. checkbox should be unchecked, and the hostname and port (default is 2011) on
  144. which the application is already running should be specified. Clicking on the
  145. connection button will thus connect to the already-running application.
  146. @item StarPU-Top is started first, and clicking on the connection button will
  147. start the application itself (possibly on a remote machine). The SSH checkbox
  148. should be checked, and a command line provided, e.g.:
  149. @example
  150. ssh myserver STARPU_SCHED=heft ./application
  151. @end example
  152. If port 2011 of the remote machine can not be accessed directly, an ssh port bridge should be added:
  153. @example
  154. ssh -L 2011:localhost:2011 myserver STARPU_SCHED=heft ./application
  155. @end example
  156. and "localhost" should be used as IP Address to connect to.
  157. @end itemize
  158. @node Off-line
  159. @section Off-line performance feedback
  160. @menu
  161. * Generating traces:: Generating traces with FxT
  162. * Gantt diagram:: Creating a Gantt Diagram
  163. * DAG:: Creating a DAG with graphviz
  164. * starpu-top:: Monitoring activity
  165. @end menu
  166. @node Generating traces
  167. @subsection Generating traces with FxT
  168. StarPU can use the FxT library (see
  169. @indicateurl{https://savannah.nongnu.org/projects/fkt/}) to generate traces
  170. with a limited runtime overhead.
  171. You can either get a tarball:
  172. @example
  173. % wget http://download.savannah.gnu.org/releases/fkt/fxt-0.2.2.tar.gz
  174. @end example
  175. or use the FxT library from CVS (autotools are required):
  176. @example
  177. % cvs -d :pserver:anonymous@@cvs.sv.gnu.org:/sources/fkt co FxT
  178. % ./bootstrap
  179. @end example
  180. Compiling and installing the FxT library in the @code{$FXTDIR} path is
  181. done following the standard procedure:
  182. @example
  183. % ./configure --prefix=$FXTDIR
  184. % make
  185. % make install
  186. @end example
  187. In order to have StarPU to generate traces, StarPU should be configured with
  188. the @code{--with-fxt} option:
  189. @example
  190. $ ./configure --with-fxt=$FXTDIR
  191. @end example
  192. Or you can simply point the @code{PKG_CONFIG_PATH} to
  193. @code{$FXTDIR/lib/pkgconfig} and pass @code{--with-fxt} to @code{./configure}
  194. When FxT is enabled, a trace is generated when StarPU is terminated by calling
  195. @code{starpu_shutdown()}). The trace is a binary file whose name has the form
  196. @code{prof_file_XXX_YYY} where @code{XXX} is the user name, and
  197. @code{YYY} is the pid of the process that used StarPU. This file is saved in the
  198. @code{/tmp/} directory by default, or by the directory specified by
  199. the @code{STARPU_FXT_PREFIX} environment variable.
  200. @node Gantt diagram
  201. @subsection Creating a Gantt Diagram
  202. When the FxT trace file @code{filename} has been generated, it is possible to
  203. generate a trace in the Paje format by calling:
  204. @example
  205. % starpu_fxt_tool -i filename
  206. @end example
  207. Or alternatively, setting the @code{STARPU_GENERATE_TRACE} environment variable
  208. to 1 before application execution will make StarPU do it automatically at
  209. application shutdown.
  210. This will create a @code{paje.trace} file in the current directory that can be
  211. inspected with the ViTE trace visualizing open-source tool. More information
  212. about ViTE is available at @indicateurl{http://vite.gforge.inria.fr/}. It is
  213. possible to open the @code{paje.trace} file with ViTE by using the following
  214. command:
  215. @example
  216. % vite paje.trace
  217. @end example
  218. @node DAG
  219. @subsection Creating a DAG with graphviz
  220. When the FxT trace file @code{filename} has been generated, it is possible to
  221. generate a task graph in the DOT format by calling:
  222. @example
  223. $ starpu_fxt_tool -i filename
  224. @end example
  225. This will create a @code{dag.dot} file in the current directory. This file is a
  226. task graph described using the DOT language. It is possible to get a
  227. graphical output of the graph by using the graphviz library:
  228. @example
  229. $ dot -Tpdf dag.dot -o output.pdf
  230. @end example
  231. @node starpu-top
  232. @subsection Monitoring activity
  233. When the FxT trace file @code{filename} has been generated, it is possible to
  234. generate a activity trace by calling:
  235. @example
  236. $ starpu_fxt_tool -i filename
  237. @end example
  238. This will create an @code{activity.data} file in the current
  239. directory. A profile of the application showing the activity of StarPU
  240. during the execution of the program can be generated:
  241. @example
  242. $ starpu_top activity.data
  243. @end example
  244. This will create a file named @code{activity.eps} in the current directory.
  245. This picture is composed of two parts.
  246. The first part shows the activity of the different workers. The green sections
  247. indicate which proportion of the time was spent executed kernels on the
  248. processing unit. The red sections indicate the proportion of time spent in
  249. StartPU: an important overhead may indicate that the granularity may be too
  250. low, and that bigger tasks may be appropriate to use the processing unit more
  251. efficiently. The black sections indicate that the processing unit was blocked
  252. because there was no task to process: this may indicate a lack of parallelism
  253. which may be alleviated by creating more tasks when it is possible.
  254. The second part of the @code{activity.eps} picture is a graph showing the
  255. evolution of the number of tasks available in the system during the execution.
  256. Ready tasks are shown in black, and tasks that are submitted but not
  257. schedulable yet are shown in grey.
  258. @node Codelet performance
  259. @section Performance of codelets
  260. The performance model of codelets (described in @ref{Performance model example}) can be examined by using the
  261. @code{starpu_perfmodel_display} tool:
  262. @example
  263. $ starpu_perfmodel_display -l
  264. file: <malloc_pinned.hannibal>
  265. file: <starpu_slu_lu_model_21.hannibal>
  266. file: <starpu_slu_lu_model_11.hannibal>
  267. file: <starpu_slu_lu_model_22.hannibal>
  268. file: <starpu_slu_lu_model_12.hannibal>
  269. @end example
  270. Here, the codelets of the lu example are available. We can examine the
  271. performance of the 22 kernel:
  272. @example
  273. $ starpu_perfmodel_display -s starpu_slu_lu_model_22
  274. performance model for cpu
  275. # hash size mean dev n
  276. 57618ab0 19660800 2.851069e+05 1.829369e+04 109
  277. performance model for cuda_0
  278. # hash size mean dev n
  279. 57618ab0 19660800 1.164144e+04 1.556094e+01 315
  280. performance model for cuda_1
  281. # hash size mean dev n
  282. 57618ab0 19660800 1.164271e+04 1.330628e+01 360
  283. performance model for cuda_2
  284. # hash size mean dev n
  285. 57618ab0 19660800 1.166730e+04 3.390395e+02 456
  286. @end example
  287. We can see that for the given size, over a sample of a few hundreds of
  288. execution, the GPUs are about 20 times faster than the CPUs (numbers are in
  289. us). The standard deviation is extremely low for the GPUs, and less than 10% for
  290. CPUs.
  291. The @code{starpu_regression_display} tool does the same for regression-based
  292. performance models. It also writes a @code{.gp} file in the current directory,
  293. to be run in the @code{gnuplot} tool, which shows the corresponding curve.