perf-feedback.texi 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632
  1. @c -*-texinfo-*-
  2. @c This file is part of the StarPU Handbook.
  3. @c Copyright (C) 2009--2011 Universit@'e de Bordeaux 1
  4. @c Copyright (C) 2010, 2011, 2012, 2013 Centre National de la Recherche Scientifique
  5. @c Copyright (C) 2011, 2012 Institut National de Recherche en Informatique et Automatique
  6. @c See the file starpu.texi for copying conditions.
  7. @menu
  8. * Task debugger:: Using the Temanejo task debugger
  9. * On-line:: On-line performance feedback
  10. * Off-line:: Off-line performance feedback
  11. * Codelet performance:: Performance of codelets
  12. * Theoretical lower bound on execution time API::
  13. * Memory feedback::
  14. * Data statistics::
  15. @end menu
  16. @node Task debugger
  17. @section Using the Temanejo task debugger
  18. StarPU can connect to Temanejo (see
  19. @url{http://www.hlrs.de/temanejo}), to permit
  20. nice visual task debugging. To do so, build Temanejo's @code{libayudame.so},
  21. install @code{Ayudame.h} to e.g. @code{/usr/local/include}, apply the
  22. @code{tools/patch-ayudame} to it to fix C build, re-@code{./configure}, make
  23. sure that it found it, rebuild StarPU. Run the Temanejo GUI, give it the path
  24. to your application, any options you want to pass it, the path to libayudame.so.
  25. Make sure to specify at least the same number of CPUs in the dialog box as your
  26. machine has, otherwise an error will happen during execution. Future versions
  27. of Temanejo should be able to tell StarPU the number of CPUs to use.
  28. Tag numbers have to be below @code{4000000000000000000ULL} to be usable for
  29. Temanejo (so as to distinguish them from tasks).
  30. @node On-line
  31. @section On-line performance feedback
  32. @menu
  33. * Enabling on-line performance monitoring::
  34. * Task feedback:: Per-task feedback
  35. * Codelet feedback:: Per-codelet feedback
  36. * Worker feedback:: Per-worker feedback
  37. * Bus feedback:: Bus-related feedback
  38. * StarPU-Top:: StarPU-Top interface
  39. @end menu
  40. @node Enabling on-line performance monitoring
  41. @subsection Enabling on-line performance monitoring
  42. In order to enable online performance monitoring, the application can call
  43. @code{starpu_profiling_status_set(STARPU_PROFILING_ENABLE)}. It is possible to
  44. detect whether monitoring is already enabled or not by calling
  45. @code{starpu_profiling_status_get()}. Enabling monitoring also reinitialize all
  46. previously collected feedback. The @code{STARPU_PROFILING} environment variable
  47. can also be set to 1 to achieve the same effect.
  48. Likewise, performance monitoring is stopped by calling
  49. @code{starpu_profiling_status_set(STARPU_PROFILING_DISABLE)}. Note that this
  50. does not reset the performance counters so that the application may consult
  51. them later on.
  52. More details about the performance monitoring API are available in section
  53. @ref{Profiling API}.
  54. @node Task feedback
  55. @subsection Per-task feedback
  56. If profiling is enabled, a pointer to a @code{starpu_task_profiling_info}
  57. structure is put in the @code{.profiling_info} field of the @code{starpu_task}
  58. structure when a task terminates.
  59. This structure is automatically destroyed when the task structure is destroyed,
  60. either automatically or by calling @code{starpu_task_destroy}.
  61. The @code{starpu_task_profiling_info} structure indicates the date when the
  62. task was submitted (@code{submit_time}), started (@code{start_time}), and
  63. terminated (@code{end_time}), relative to the initialization of
  64. StarPU with @code{starpu_init}. It also specifies the identifier of the worker
  65. that has executed the task (@code{workerid}).
  66. These date are stored as @code{timespec} structures which the user may convert
  67. into micro-seconds using the @code{starpu_timing_timespec_to_us} helper
  68. function.
  69. It it worth noting that the application may directly access this structure from
  70. the callback executed at the end of the task. The @code{starpu_task} structure
  71. associated to the callback currently being executed is indeed accessible with
  72. the @code{starpu_task_get_current()} function.
  73. @node Codelet feedback
  74. @subsection Per-codelet feedback
  75. The @code{per_worker_stats} field of the @code{struct starpu_codelet} structure is
  76. an array of counters. The i-th entry of the array is incremented every time a
  77. task implementing the codelet is executed on the i-th worker.
  78. This array is not reinitialized when profiling is enabled or disabled.
  79. @node Worker feedback
  80. @subsection Per-worker feedback
  81. The second argument returned by the @code{starpu_worker_get_profiling_info}
  82. function is a @code{starpu_worker_profiling_info} structure that gives
  83. statistics about the specified worker. This structure specifies when StarPU
  84. started collecting profiling information for that worker (@code{start_time}),
  85. the duration of the profiling measurement interval (@code{total_time}), the
  86. time spent executing kernels (@code{executing_time}), the time spent sleeping
  87. because there is no task to execute at all (@code{sleeping_time}), and the
  88. number of tasks that were executed while profiling was enabled.
  89. These values give an estimation of the proportion of time spent do real work,
  90. and the time spent either sleeping because there are not enough executable
  91. tasks or simply wasted in pure StarPU overhead.
  92. Calling @code{starpu_worker_get_profiling_info} resets the profiling
  93. information associated to a worker.
  94. When an FxT trace is generated (see @ref{Generating traces}), it is also
  95. possible to use the @code{starpu_workers_activity} script (described in @ref{starpu-workers-activity}) to
  96. generate a graphic showing the evolution of these values during the time, for
  97. the different workers.
  98. @node Bus feedback
  99. @subsection Bus-related feedback
  100. TODO: ajouter STARPU_BUS_STATS
  101. @c how to enable/disable performance monitoring
  102. @c what kind of information do we get ?
  103. The bus speed measured by StarPU can be displayed by using the
  104. @code{starpu_machine_display} tool, for instance:
  105. @example
  106. StarPU has found:
  107. 3 CUDA devices
  108. CUDA 0 (Tesla C2050 02:00.0)
  109. CUDA 1 (Tesla C2050 03:00.0)
  110. CUDA 2 (Tesla C2050 84:00.0)
  111. from to RAM to CUDA 0 to CUDA 1 to CUDA 2
  112. RAM 0.000000 5176.530428 5176.492994 5191.710722
  113. CUDA 0 4523.732446 0.000000 2414.074751 2417.379201
  114. CUDA 1 4523.718152 2414.078822 0.000000 2417.375119
  115. CUDA 2 4534.229519 2417.069025 2417.060863 0.000000
  116. @end example
  117. @node StarPU-Top
  118. @subsection StarPU-Top interface
  119. StarPU-Top is an interface which remotely displays the on-line state of a StarPU
  120. application and permits the user to change parameters on the fly.
  121. Variables to be monitored can be registered by calling the
  122. @code{starpu_top_add_data_boolean}, @code{starpu_top_add_data_integer},
  123. @code{starpu_top_add_data_float} functions, e.g.:
  124. @cartouche
  125. @smallexample
  126. starpu_top_data *data = starpu_top_add_data_integer("mynum", 0, 100, 1);
  127. @end smallexample
  128. @end cartouche
  129. The application should then call @code{starpu_top_init_and_wait} to give its name
  130. and wait for StarPU-Top to get a start request from the user. The name is used
  131. by StarPU-Top to quickly reload a previously-saved layout of parameter display.
  132. @cartouche
  133. @smallexample
  134. starpu_top_init_and_wait("the application");
  135. @end smallexample
  136. @end cartouche
  137. The new values can then be provided thanks to
  138. @code{starpu_top_update_data_boolean}, @code{starpu_top_update_data_integer},
  139. @code{starpu_top_update_data_float}, e.g.:
  140. @cartouche
  141. @smallexample
  142. starpu_top_update_data_integer(data, mynum);
  143. @end smallexample
  144. @end cartouche
  145. Updateable parameters can be registered thanks to @code{starpu_top_register_parameter_boolean}, @code{starpu_top_register_parameter_integer}, @code{starpu_top_register_parameter_float}, e.g.:
  146. @cartouche
  147. @smallexample
  148. float alpha;
  149. starpu_top_register_parameter_float("alpha", &alpha, 0, 10, modif_hook);
  150. @end smallexample
  151. @end cartouche
  152. @code{modif_hook} is a function which will be called when the parameter is being modified, it can for instance print the new value:
  153. @cartouche
  154. @smallexample
  155. void modif_hook(struct starpu_top_param *d) @{
  156. fprintf(stderr,"%s has been modified: %f\n", d->name, alpha);
  157. @}
  158. @end smallexample
  159. @end cartouche
  160. Task schedulers should notify StarPU-Top when it has decided when a task will be
  161. scheduled, so that it can show it in its Gantt chart, for instance:
  162. @cartouche
  163. @smallexample
  164. starpu_top_task_prevision(task, workerid, begin, end);
  165. @end smallexample
  166. @end cartouche
  167. Starting StarPU-Top@footnote{StarPU-Top is started via the binary
  168. @code{starpu_top}.} and the application can be done two ways:
  169. @itemize
  170. @item The application is started by hand on some machine (and thus already
  171. waiting for the start event). In the Preference dialog of StarPU-Top, the SSH
  172. checkbox should be unchecked, and the hostname and port (default is 2011) on
  173. which the application is already running should be specified. Clicking on the
  174. connection button will thus connect to the already-running application.
  175. @item StarPU-Top is started first, and clicking on the connection button will
  176. start the application itself (possibly on a remote machine). The SSH checkbox
  177. should be checked, and a command line provided, e.g.:
  178. @example
  179. $ ssh myserver STARPU_SCHED=dmda ./application
  180. @end example
  181. If port 2011 of the remote machine can not be accessed directly, an ssh port bridge should be added:
  182. @example
  183. $ ssh -L 2011:localhost:2011 myserver STARPU_SCHED=dmda ./application
  184. @end example
  185. and "localhost" should be used as IP Address to connect to.
  186. @end itemize
  187. @node Off-line
  188. @section Off-line performance feedback
  189. @menu
  190. * Generating traces:: Generating traces with FxT
  191. * Gantt diagram:: Creating a Gantt Diagram
  192. * DAG:: Creating a DAG with graphviz
  193. * starpu-workers-activity:: Monitoring activity
  194. @end menu
  195. @node Generating traces
  196. @subsection Generating traces with FxT
  197. StarPU can use the FxT library (see
  198. @url{https://savannah.nongnu.org/projects/fkt/}) to generate traces
  199. with a limited runtime overhead.
  200. You can either get a tarball:
  201. @example
  202. $ wget http://download.savannah.gnu.org/releases/fkt/fxt-0.2.11.tar.gz
  203. @end example
  204. or use the FxT library from CVS (autotools are required):
  205. @example
  206. $ cvs -d :pserver:anonymous@@cvs.sv.gnu.org:/sources/fkt co FxT
  207. $ ./bootstrap
  208. @end example
  209. Compiling and installing the FxT library in the @code{$FXTDIR} path is
  210. done following the standard procedure:
  211. @example
  212. $ ./configure --prefix=$FXTDIR
  213. $ make
  214. $ make install
  215. @end example
  216. In order to have StarPU to generate traces, StarPU should be configured with
  217. the @code{--with-fxt} option:
  218. @example
  219. $ ./configure --with-fxt=$FXTDIR
  220. @end example
  221. Or you can simply point the @code{PKG_CONFIG_PATH} to
  222. @code{$FXTDIR/lib/pkgconfig} and pass @code{--with-fxt} to @code{./configure}
  223. When FxT is enabled, a trace is generated when StarPU is terminated by calling
  224. @code{starpu_shutdown()}). The trace is a binary file whose name has the form
  225. @code{prof_file_XXX_YYY} where @code{XXX} is the user name, and
  226. @code{YYY} is the pid of the process that used StarPU. This file is saved in the
  227. @code{/tmp/} directory by default, or by the directory specified by
  228. the @code{STARPU_FXT_PREFIX} environment variable.
  229. @node Gantt diagram
  230. @subsection Creating a Gantt Diagram
  231. When the FxT trace file @code{filename} has been generated, it is possible to
  232. generate a trace in the Paje format by calling:
  233. @example
  234. $ starpu_fxt_tool -i filename
  235. @end example
  236. Or alternatively, setting the @code{STARPU_GENERATE_TRACE} environment variable
  237. to @code{1} before application execution will make StarPU do it automatically at
  238. application shutdown.
  239. This will create a @code{paje.trace} file in the current directory that
  240. can be inspected with the @url{http://vite.gforge.inria.fr/, ViTE trace
  241. visualizing open-source tool}. It is possible to open the
  242. @code{paje.trace} file with ViTE by using the following command:
  243. @example
  244. $ vite paje.trace
  245. @end example
  246. To get names of tasks instead of "unknown", fill the optional @code{name} field
  247. of the codelets, or use a performance model for them.
  248. In the MPI execution case, collect the trace files from the MPI nodes, and
  249. specify them all on the @code{starpu_fxt_tool} command, for instance:
  250. @smallexample
  251. $ starpu_fxt_tool -i filename1 -i filename2
  252. @end smallexample
  253. By default, all tasks are displayed using a green color. To display tasks with
  254. varying colors, pass option @code{-c} to @code{starpu_fxt_tool}.
  255. @node DAG
  256. @subsection Creating a DAG with graphviz
  257. When the FxT trace file @code{filename} has been generated, it is possible to
  258. generate a task graph in the DOT format by calling:
  259. @example
  260. $ starpu_fxt_tool -i filename
  261. @end example
  262. This will create a @code{dag.dot} file in the current directory. This file is a
  263. task graph described using the DOT language. It is possible to get a
  264. graphical output of the graph by using the graphviz library:
  265. @example
  266. $ dot -Tpdf dag.dot -o output.pdf
  267. @end example
  268. @node starpu-workers-activity
  269. @subsection Monitoring activity
  270. When the FxT trace file @code{filename} has been generated, it is possible to
  271. generate an activity trace by calling:
  272. @example
  273. $ starpu_fxt_tool -i filename
  274. @end example
  275. This will create an @code{activity.data} file in the current
  276. directory. A profile of the application showing the activity of StarPU
  277. during the execution of the program can be generated:
  278. @example
  279. $ starpu_workers_activity activity.data
  280. @end example
  281. This will create a file named @code{activity.eps} in the current directory.
  282. This picture is composed of two parts.
  283. The first part shows the activity of the different workers. The green sections
  284. indicate which proportion of the time was spent executed kernels on the
  285. processing unit. The red sections indicate the proportion of time spent in
  286. StartPU: an important overhead may indicate that the granularity may be too
  287. low, and that bigger tasks may be appropriate to use the processing unit more
  288. efficiently. The black sections indicate that the processing unit was blocked
  289. because there was no task to process: this may indicate a lack of parallelism
  290. which may be alleviated by creating more tasks when it is possible.
  291. The second part of the @code{activity.eps} picture is a graph showing the
  292. evolution of the number of tasks available in the system during the execution.
  293. Ready tasks are shown in black, and tasks that are submitted but not
  294. schedulable yet are shown in grey.
  295. @node Codelet performance
  296. @section Performance of codelets
  297. The performance model of codelets (described in @ref{Performance model example}) can be examined by using the
  298. @code{starpu_perfmodel_display} tool:
  299. @example
  300. $ starpu_perfmodel_display -l
  301. file: <malloc_pinned.hannibal>
  302. file: <starpu_slu_lu_model_21.hannibal>
  303. file: <starpu_slu_lu_model_11.hannibal>
  304. file: <starpu_slu_lu_model_22.hannibal>
  305. file: <starpu_slu_lu_model_12.hannibal>
  306. @end example
  307. Here, the codelets of the lu example are available. We can examine the
  308. performance of the 22 kernel (in micro-seconds), which is history-based:
  309. @example
  310. $ starpu_perfmodel_display -s starpu_slu_lu_model_22
  311. performance model for cpu
  312. # hash size mean dev n
  313. 57618ab0 19660800 2.851069e+05 1.829369e+04 109
  314. performance model for cuda_0
  315. # hash size mean dev n
  316. 57618ab0 19660800 1.164144e+04 1.556094e+01 315
  317. performance model for cuda_1
  318. # hash size mean dev n
  319. 57618ab0 19660800 1.164271e+04 1.330628e+01 360
  320. performance model for cuda_2
  321. # hash size mean dev n
  322. 57618ab0 19660800 1.166730e+04 3.390395e+02 456
  323. @end example
  324. We can see that for the given size, over a sample of a few hundreds of
  325. execution, the GPUs are about 20 times faster than the CPUs (numbers are in
  326. us). The standard deviation is extremely low for the GPUs, and less than 10% for
  327. CPUs.
  328. This tool can also be used for regression-based performance models. It will then
  329. display the regression formula, and in the case of non-linear regression, the
  330. same performance log as for history-based performance models:
  331. @example
  332. $ starpu_perfmodel_display -s non_linear_memset_regression_based
  333. performance model for cpu_impl_0
  334. Regression : #sample = 1400
  335. Linear: y = alpha size ^ beta
  336. alpha = 1.335973e-03
  337. beta = 8.024020e-01
  338. Non-Linear: y = a size ^b + c
  339. a = 5.429195e-04
  340. b = 8.654899e-01
  341. c = 9.009313e-01
  342. # hash size mean stddev n
  343. a3d3725e 4096 4.763200e+00 7.650928e-01 100
  344. 870a30aa 8192 1.827970e+00 2.037181e-01 100
  345. 48e988e9 16384 2.652800e+00 1.876459e-01 100
  346. 961e65d2 32768 4.255530e+00 3.518025e-01 100
  347. ...
  348. @end example
  349. The same can also be achieved by using StarPU's library API, see
  350. @ref{Performance Model API} and notably the @code{starpu_perfmodel_load_symbol}
  351. function. The source code of the @code{starpu_perfmodel_display} tool can be a
  352. useful example.
  353. The @code{starpu_perfmodel_plot} tool can be used to draw performance models.
  354. It writes a @code{.gp} file in the current directory, to be run in the
  355. @code{gnuplot} tool, which shows the corresponding curve.
  356. When the @code{flops} field of tasks is set, @code{starpu_perfmodel_plot} can
  357. directly draw a GFlops curve, by simply adding the @code{-f} option:
  358. @example
  359. $ starpu_perfmodel_display -f -s chol_model_11
  360. @end example
  361. This will however disable displaying the regression model, for which we can not
  362. compute GFlops.
  363. When the FxT trace file @code{filename} has been generated, it is possible to
  364. get a profiling of each codelet by calling:
  365. @example
  366. $ starpu_fxt_tool -i filename
  367. $ starpu_codelet_profile distrib.data codelet_name
  368. @end example
  369. This will create profiling data files, and a @code{.gp} file in the current
  370. directory, which draws the distribution of codelet time over the application
  371. execution, according to data input size.
  372. This is also available in the @code{starpu_perfmodel_plot} tool, by passing it
  373. the fxt trace:
  374. @example
  375. $ starpu_perfmodel_plot -s non_linear_memset_regression_based -i /tmp/prof_file_foo_0
  376. @end example
  377. It will produce a @code{.gp} file which contains both the performance model
  378. curves, and the profiling measurements.
  379. If you have the R statistical tool installed, you can additionally use
  380. @example
  381. $ starpu_codelet_histo_profile distrib.data
  382. @end example
  383. Which will create one pdf file per codelet and per input size, showing a
  384. histogram of the codelet execution time distribution.
  385. @node Theoretical lower bound on execution time API
  386. @section Theoretical lower bound on execution time
  387. See @ref{Theoretical lower bound on execution time} for an example on how to use
  388. this API. It permits to record a trace of what tasks are needed to complete the
  389. application, and then, by using a linear system, provide a theoretical lower
  390. bound of the execution time (i.e. with an ideal scheduling).
  391. The computed bound is not really correct when not taking into account
  392. dependencies, but for an application which have enough parallelism, it is very
  393. near to the bound computed with dependencies enabled (which takes a huge lot
  394. more time to compute), and thus provides a good-enough estimation of the ideal
  395. execution time.
  396. @deftypefun void starpu_bound_start (int @var{deps}, int @var{prio})
  397. Start recording tasks (resets stats). @var{deps} tells whether
  398. dependencies should be recorded too (this is quite expensive)
  399. @end deftypefun
  400. @deftypefun void starpu_bound_stop (void)
  401. Stop recording tasks
  402. @end deftypefun
  403. @deftypefun void starpu_bound_print_dot ({FILE *}@var{output})
  404. Print the DAG that was recorded
  405. @end deftypefun
  406. @deftypefun void starpu_bound_compute ({double *}@var{res}, {double *}@var{integer_res}, int @var{integer})
  407. Get theoretical upper bound (in ms) (needs glpk support detected by @code{configure} script). It returns 0 if some performance models are not calibrated.
  408. @end deftypefun
  409. @deftypefun void starpu_bound_print_lp ({FILE *}@var{output})
  410. Emit the Linear Programming system on @var{output} for the recorded tasks, in
  411. the lp format
  412. @end deftypefun
  413. @deftypefun void starpu_bound_print_mps ({FILE *}@var{output})
  414. Emit the Linear Programming system on @var{output} for the recorded tasks, in
  415. the mps format
  416. @end deftypefun
  417. @deftypefun void starpu_bound_print ({FILE *}@var{output}, int @var{integer})
  418. Emit statistics of actual execution vs theoretical upper bound. @var{integer}
  419. permits to choose between integer solving (which takes a long time but is
  420. correct), and relaxed solving (which provides an approximate solution).
  421. @end deftypefun
  422. @node Memory feedback
  423. @section Memory feedback
  424. It is possible to enable memory statistics. To do so, you need to pass the option
  425. @code{--enable-memory-stats} when running configure. It is then
  426. possible to call the function @code{starpu_display_memory_stats()} to
  427. display statistics about the current data handles registered within StarPU.
  428. Moreover, statistics will be displayed at the end of the execution on
  429. data handles which have not been cleared out. This can be disabled by
  430. setting the environment variable @code{STARPU_MEMORY_STATS} to 0.
  431. For example, if you do not unregister data at the end of the complex
  432. example, you will get something similar to:
  433. @example
  434. $ STARPU_MEMORY_STATS=0 ./examples/interface/complex
  435. Complex[0] = 45.00 + 12.00 i
  436. Complex[0] = 78.00 + 78.00 i
  437. Complex[0] = 45.00 + 12.00 i
  438. Complex[0] = 45.00 + 12.00 i
  439. @end example
  440. @example
  441. $ STARPU_MEMORY_STATS=1 ./examples/interface/complex
  442. Complex[0] = 45.00 + 12.00 i
  443. Complex[0] = 78.00 + 78.00 i
  444. Complex[0] = 45.00 + 12.00 i
  445. Complex[0] = 45.00 + 12.00 i
  446. #---------------------
  447. Memory stats:
  448. #-------
  449. Data on Node #3
  450. #-----
  451. Data : 0x553ff40
  452. Size : 16
  453. #--
  454. Data access stats
  455. /!\ Work Underway
  456. Node #0
  457. Direct access : 4
  458. Loaded (Owner) : 0
  459. Loaded (Shared) : 0
  460. Invalidated (was Owner) : 0
  461. Node #3
  462. Direct access : 0
  463. Loaded (Owner) : 0
  464. Loaded (Shared) : 1
  465. Invalidated (was Owner) : 0
  466. #-----
  467. Data : 0x5544710
  468. Size : 16
  469. #--
  470. Data access stats
  471. /!\ Work Underway
  472. Node #0
  473. Direct access : 2
  474. Loaded (Owner) : 0
  475. Loaded (Shared) : 1
  476. Invalidated (was Owner) : 1
  477. Node #3
  478. Direct access : 0
  479. Loaded (Owner) : 1
  480. Loaded (Shared) : 0
  481. Invalidated (was Owner) : 0
  482. @end example
  483. @node Data statistics
  484. @section Data statistics
  485. Different data statistics can be displayed at the end of the execution
  486. of the application. To enable them, you need to pass the option
  487. @code{--enable-stats} when calling @code{configure}. When calling
  488. @code{starpu_shutdown()} various statistics will be displayed,
  489. execution, MSI cache statistics, allocation cache statistics, and data
  490. transfer statistics. The display can be disabled by setting the
  491. environment variable @code{STARPU_STATS} to 0.
  492. @example
  493. $ ./examples/cholesky/cholesky_tag
  494. Computation took (in ms)
  495. 518.16
  496. Synthetic GFlops : 44.21
  497. #---------------------
  498. MSI cache stats :
  499. TOTAL MSI stats hit 1622 (66.23 %) miss 827 (33.77 %)
  500. ...
  501. @end example
  502. @example
  503. $ STARPU_STATS=0 ./examples/cholesky/cholesky_tag
  504. Computation took (in ms)
  505. 518.16
  506. Synthetic GFlops : 44.21
  507. @end example
  508. @c TODO: data transfer stats are similar to the ones displayed when
  509. @c setting STARPU_BUS_STATS