123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554 |
- task->destroy = 0;
- starpu_task_submit(task);
- struct starpu_profiling_task_info *info = task->profiling_info;
- double delay += starpu_timing_timespec_delay_us(&info->submit_time, &info->start_time);
- double length += starpu_timing_timespec_delay_us(&info->start_time, &info->end_time);
- starpu_task_destroy(task);
- \endcode
- \code{.c}
- int worker;
- for (worker = 0; worker < starpu_worker_get_count(); worker++)
- {
- struct starpu_profiling_worker_info worker_info;
- int ret = starpu_profiling_worker_get_info(worker, &worker_info);
- STARPU_ASSERT(!ret);
- double total_time = starpu_timing_timespec_to_us(&worker_info.total_time);
- double executing_time = starpu_timing_timespec_to_us(&worker_info.executing_time);
- double sleeping_time = starpu_timing_timespec_to_us(&worker_info.sleeping_time);
- double overhead_time = total_time - executing_time - sleeping_time;
- float executing_ratio = 100.0*executing_time/total_time;
- float sleeping_ratio = 100.0*sleeping_time/total_time;
- float overhead_ratio = 100.0 - executing_ratio - sleeping_ratio;
- char workername[128];
- starpu_worker_get_name(worker, workername, 128);
- fprintf(stderr, "Worker %s:\n", workername);
- fprintf(stderr, "\ttotal time: %.2lf ms\n", total_time*1e-3);
- fprintf(stderr, "\texec time: %.2lf ms (%.2f %%)\n", executing_time*1e-3, executing_ratio);
- fprintf(stderr, "\tblocked time: %.2lf ms (%.2f %%)\n", sleeping_time*1e-3, sleeping_ratio);
- fprintf(stderr, "\toverhead time: %.2lf ms (%.2f %%)\n", overhead_time*1e-3, overhead_ratio);
- }
- \endcode
- \section PerformanceModelExample Performance Model Example
- To achieve good scheduling, StarPU scheduling policies need to be able to
- estimate in advance the duration of a task. This is done by giving to codelets
- a performance model, by defining a structure starpu_perfmodel and
- providing its address in the field starpu_codelet::model. The fields
- starpu_perfmodel::symbol and starpu_perfmodel::type are mandatory, to
- give a name to the model, and the type of the model, since there are
- several kinds of performance models. For compatibility, make sure to
- initialize the whole structure to zero, either by using explicit
- memset(), or by letting the compiler implicitly do it as examplified
- below.
- <ul>
- <li>
- Measured at runtime (model type ::STARPU_HISTORY_BASED). This assumes that for a
- given set of data input/output sizes, the performance will always be about the
- same. This is very true for regular kernels on GPUs for instance (<0.1% error),
- and just a bit less true on CPUs (~=1% error). This also assumes that there are
- few different sets of data input/output sizes. StarPU will then keep record of
- the average time of previous executions on the various processing units, and use
- it as an estimation. History is done per task size, by using a hash of the input
- and ouput sizes as an index.
- It will also save it in <c>$STARPU_HOME/.starpu/sampling/codelets</c>
- for further executions, and can be observed by using the tool
- <c>starpu_perfmodel_display</c>, or drawn by using
- the tool <c>starpu_perfmodel_plot</c> (\ref PerformanceModelCalibration). The
- models are indexed by machine name. To
- share the models between machines (e.g. for a homogeneous cluster), use
- <c>export STARPU_HOSTNAME=some_global_name</c>. Measurements are only done
- when using a task scheduler which makes use of it, such as
- <c>dmda</c>. Measurements can also be provided explicitly by the application, by
- using the function starpu_perfmodel_update_history().
- The following is a small code example.
- If e.g. the code is recompiled with other compilation options, or several
- variants of the code are used, the <c>symbol</c> string should be changed to reflect
- that, in order to recalibrate a new model from zero. The <c>symbol</c> string can even
- be constructed dynamically at execution time, as long as this is done before
- submitting any task using it.
- \code{.c}
- static struct starpu_perfmodel mult_perf_model =
- {
- .type = STARPU_HISTORY_BASED,
- .symbol = "mult_perf_model"
- };
- struct starpu_codelet cl =
- {
- .cpu_funcs = { cpu_mult },
- .cpu_funcs_name = { "cpu_mult" },
- .nbuffers = 3,
- .modes = { STARPU_R, STARPU_R, STARPU_W },
-
- .model = &mult_perf_model
- };
- \endcode
- </li>
- <li>
- Measured at runtime and refined by regression (model types
- ::STARPU_REGRESSION_BASED and ::STARPU_NL_REGRESSION_BASED). This
- still assumes performance regularity, but works
- with various data input sizes, by applying regression over observed
- execution times. ::STARPU_REGRESSION_BASED uses an <c>a*n^b</c> regression
- form, ::STARPU_NL_REGRESSION_BASED uses an <c>a*n^b+c</c> (more precise than
- ::STARPU_REGRESSION_BASED, but costs a lot more to compute).
- For instance,
- <c>tests/perfmodels/regression_based.c</c> uses a regression-based performance
- model for the function memset().
- Of course, the application has to issue
- tasks with varying size so that the regression can be computed. StarPU will not
- trust the regression unless there is at least 10% difference between the minimum
- and maximum observed input size. It can be useful to set the
- environment variable \ref STARPU_CALIBRATE to <c>1</c> and run the application
- on varying input sizes with \ref STARPU_SCHED set to <c>dmda</c> scheduler,
- so as to feed the performance model for a variety of
- inputs. The application can also provide the measurements explictly by
- using the function starpu_perfmodel_update_history(). The tools
- <c>starpu_perfmodel_display</c> and <c>starpu_perfmodel_plot</c> can
- be used to observe how much the performance model is calibrated
- (\ref PerformanceModelCalibration); when their output look good,
- \ref STARPU_CALIBRATE can be reset to <c>0</c> to let
- StarPU use the resulting performance model without recording new measures, and
- \ref STARPU_SCHED can be set to <c>dmda</c> to benefit from the performance models. If
- the data input sizes vary a lot, it is really important to set
- \ref STARPU_CALIBRATE to <c>0</c>, otherwise StarPU will continue adding the
- measures, and result with a very big performance model, which will take time a
- lot of time to load and save.
- For non-linear regression, since computing it
- is quite expensive, it is only done at termination of the application. This
- means that the first execution of the application will use only history-based
- performance model to perform scheduling, without using regression.
- </li>
- <li>
- Another type of model is ::STARPU_MULTIPLE_REGRESSION_BASED, which
- is based on multiple linear regression. In this model, the user
- defines both the relevant parameters and the equation for computing the
- task duration.
- \f[
- T_{kernel} = a + b(M^{\alpha_1} * N^{\beta_1} * K^{\gamma_1}) + c(M^{\alpha_2} * N^{\beta_2} * K^{\gamma_2}) + ...
- \f]
- \f$M, N, K\f$ are the parameters of the task, added at the task
- creation. These need to be extracted by the <c>cl_perf_func</c>
- function, which should be defined by the user. \f$\alpha, \beta,
- \gamma\f$ are the exponents defined by the user in
- <c>model->combinations</c> table. Finally, coefficients \f$a, b, c\f$
- are computed automatically by the StarPU at the end of the execution, using least
- squares method of the <c>dgels_</c> LAPACK function.
- <c>examples/mlr/mlr.c</c> example provides more details on
- the usage of ::STARPU_MULTIPLE_REGRESSION_BASED models.
- Coefficients computation is done at the end of the execution, and the
- results are stored in standard codelet perfmodel files. Additional
- files containing the duration of task together with the value of each
- parameter are stored in <c>.starpu/sampling/codelets/tmp/</c>
- directory. These files are reused when \ref STARPU_CALIBRATE
- environment variable is set to <c>1</c>, to recompute coefficients
- based on the current, but also on the previous
- executions. Additionally, when multiple linear regression models are
- disabled (using \ref disable-mlr "--disable-mlr" configure option) or when the
- <c>model->combinations</c> are not defined, StarPU will still write
- output files into <c>.starpu/sampling/codelets/tmp/</c> to allow
- performing an analysis. This analysis typically aims at finding the
- most appropriate equation for the codelet and
- <c>tools/starpu_mlr_analysis</c> script provides an example of how to
- perform such study.
- </li>
- <li>
- Provided as an estimation from the application itself (model type
- ::STARPU_COMMON and field starpu_perfmodel::cost_function),
- see for instance
- <c>examples/common/blas_model.h</c> and <c>examples/common/blas_model.c</c>.
- </li>
- <li>
- Provided explicitly by the application (model type ::STARPU_PER_ARCH):
- either field starpu_perfmodel::arch_cost_function, or
- the fields <c>.per_arch[arch][nimpl].cost_function</c> have to be
- filled with pointers to functions which return the expected duration
- of the task in micro-seconds, one per architecture, see for instance
- <c>tests/datawizard/locality.c</c>
- </li>
- </ul>
- For ::STARPU_HISTORY_BASED, ::STARPU_REGRESSION_BASED, and
- ::STARPU_NL_REGRESSION_BASED, the dimensions of task data (both input
- and output) are used as an index by default. ::STARPU_HISTORY_BASED uses a CRC
- hash of the dimensions as an index to distinguish histories, and
- ::STARPU_REGRESSION_BASED and ::STARPU_NL_REGRESSION_BASED use the total
- size as an index for the regression.
- The starpu_perfmodel::size_base and starpu_perfmodel::footprint fields however
- permit the application to override that, when for instance some of the data
- do not matter for task cost (e.g. mere reference table), or when using sparse
- structures (in which case it is the number of non-zeros which matter), or when
- there is some hidden parameter such as the number of iterations, or when the
- application actually has a very good idea of the complexity of the algorithm,
- and just not the speed of the processor, etc. The example in the directory
- <c>examples/pi</c> uses this to include the number of iterations in the base
- size. starpu_perfmodel::size_base should be used when the variance of the actual
- performance is known (i.e. bigger returned value is longer execution
- time), and thus particularly useful for ::STARPU_REGRESSION_BASED or
- ::STARPU_NL_REGRESSION_BASED. starpu_perfmodel::footprint can be used when the
- variance of the actual performance is unknown (irregular performance behavior,
- etc.), and thus only useful for ::STARPU_HISTORY_BASED.
- starpu_task_data_footprint() can be used as a base and combined with other
- parameters through starpu_hash_crc32c_be() for instance.
- StarPU will automatically determine when the performance model is calibrated,
- or rather, it will assume the performance model is calibrated until the
- application submits a task for which the performance can not be predicted. For
- ::STARPU_HISTORY_BASED, StarPU will require 10 (STARPU_CALIBRATE_MINIMUM)
- measurements for a given size before estimating that an average can be taken as
- estimation for further executions with the same size. For
- ::STARPU_REGRESSION_BASED and ::STARPU_NL_REGRESSION_BASED, StarPU will require
- 10 (STARPU_CALIBRATE_MINIMUM) measurements, and that the minimum measured
- data size is smaller than 90% of the maximum measured data size (i.e. the
- measurement interval is large enough for a regression to have a meaning).
- Calibration can also be forced by setting the \ref STARPU_CALIBRATE environment
- variable to <c>1</c>, or even reset by setting it to <c>2</c>.
- How to use schedulers which can benefit from such performance model is explained
- in \ref TaskSchedulingPolicy.
- The same can be done for task energy consumption estimation, by setting
- the field starpu_codelet::energy_model the same way as the field
- starpu_codelet::model. Note: for now, the application has to give to
- the energy consumption performance model a name which is different from
- the execution time performance model.
- The application can request time estimations from the StarPU performance
- models by filling a task structure as usual without actually submitting
- it. The data handles can be created by calling any of the functions
- <c>starpu_*_data_register</c> with a <c>NULL</c> pointer and <c>-1</c>
- node and the desired data sizes, and need to be unregistered as usual.
- The functions starpu_task_expected_length() and
- starpu_task_expected_energy() can then be called to get an estimation
- of the task cost on a given arch. starpu_task_footprint() can also be
- used to get the footprint used for indexing history-based performance
- models. starpu_task_destroy() needs to be called to destroy the dummy
- task afterwards. See <c>tests/perfmodels/regression_based.c</c> for an example.
- \section DataTrace Data trace and tasks length
- It is possible to get statistics about tasks length and data size by using :
- \verbatim
- $ starpu_fxt_data_trace filename [codelet1 codelet2 ... codeletn]
- \endverbatim
- Where filename is the FxT trace file and codeletX the names of the codelets you
- want to profile (if no names are specified, <c>starpu_fxt_data_trace</c> will profile them all).
- This will create a file, <c>data_trace.gp</c> which
- can be executed to get a <c>.eps</c> image of these results. On the image, each point represents a
- task, and each color corresponds to a codelet.
- \image html data_trace.png
- \image latex data_trace.eps "" width=\textwidth
- */
|