123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257 |
- # StarPU --- Runtime system for heterogeneous multicore architectures.
- #
- # Copyright (C) 2016-2021 Université de Bordeaux, CNRS (LaBRI UMR 5800), Inria
- #
- # StarPU is free software; you can redistribute it and/or modify
- # it under the terms of the GNU Lesser General Public License as published by
- # the Free Software Foundation; either version 2.1 of the License, or (at
- # your option) any later version.
- #
- # StarPU is distributed in the hope that it will be useful, but
- # WITHOUT ANY WARRANTY; without even the implied warranty of
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- #
- # See the GNU Lesser General Public License in COPYING.LGPL for more details.
- #
- ```{r Setup, echo=FALSE}
- opts_chunk$set(echo=FALSE)
- ```
- ```{r Load_R_files_and_functions}
- print_codelet <- function(reg,codelet){
- cat(paste("/* ############################################ */", "\n"))
- cat(paste("/*\t Automatically generated code */", "\n"))
- cat(paste("\t Check for potential errors and be sure parameter value are written in good order (alphabetical one by default)", "\n"))
- cat(paste("\t Adjusted R-squared: ", summary(reg)$adj.r.squared, "*/\n\n"))
- ncomb <- reg$rank - 1
- cat(paste("\t ", codelet, ".model->ncombinations = ", ncomb, ";\n", sep=""))
- cat(paste("\t ", codelet, ".model->combinations = (unsigned **) malloc(", codelet, ".model->ncombinations*sizeof(unsigned *))", ";\n\n", sep=""))
- cat(paste("\t if (", codelet, ".model->combinations)", "\n", "\t {\n", sep=""))
- cat(paste("\t for (unsigned i = 0; i < ", codelet, ".model->ncombinations; i++)", "\n", "\t {\n", sep=""))
- cat(paste("\t ", codelet, ".model->combinations[i] = (unsigned *) malloc(", codelet, ".model->nparameters*sizeof(unsigned))", ";\n", "\t }\n", "\t }\n\n", sep=""))
- # Computing combinations
- df <- data.frame(attr(reg$terms, "factors"))
- df <- df/2
- df$Params <- row.names(df)
- df <-df[c(2:nrow(df)),]
- i=1
- options(warn=-1)
- for(i in (1:nrow(df)))
- {
- name <- df[i,]$Params
- if (grepl("I\\(*", name))
- {
- exp <- as.numeric(gsub("(.*?)\\^(.*?)\\)", "\\2", name))
- df[i,] <- as.numeric(df[i,]) * exp
- df[i,]$Params <- as.character(gsub("I\\((.*?)\\^(.*?)\\)", "\\1", name))
- }
- }
- df <- aggregate(. ~ Params, transform(df, Params), sum)
- options(warn=0)
- i=1
- j=1
- for(j in (2:length(df)))
- {
- for(i in (1:nrow(df)))
- {
- cat(paste("\t ", codelet, ".model->combinations[", j-2, "][", i-1, "] = ", as.numeric(df[i,j]), ";\n", sep=""))
- }
- }
- cat(paste("/* ############################################ */", "\n"))
- }
- df<-read.csv(input_trace, header=TRUE)
- opts_chunk$set(echo=TRUE)
- ```
- # Multiple Linear Regression Model Example
- ## Introduction
- This document demonstrates the type of the analysis needed to compute
- the multiple linear regression model of the task. It relies on the
- input data benchmarked by the StarPU (or any other tool, but following
- the same format). The input data used in this example is generated by
- the task "mlr_init", from the "examples/mlr/mlr.c".
- This document can be used as an template for the analysis of any other
- task.
- ### How to compile
- ./starpu_mlr_analysis .starpu/sampling/codelets/tmp/mlr_init.out
- ### Software dependencies
- In order to run the analysis you need to have R installed:
- sudo apt-get install r-base
- In order to compile this document, you need *knitr* (although you can
- perfectly only use the R code from this document without knitr). If
- you decided that you want to generate this document, then start R
- (e.g., from terminal) and install knitr package:
- R> install.packages("knitr")
- No additional R packages are needed.
- ## First glimpse at the data
- First, we show the relations between all parameters in a single plot.
- ```{r InitPlot}
- plot(df)
- ```
- For this example, all three parameters M, N, K have some influence,
- but their relation is not easy to understand.
- In general, this type of plots can typically show if there are
- outliers. It can also show if there is a group of parameters which are
- mutually perfectly correlated, in which case only a one parameter from
- the group should be kept for the further analysis. Additionally, plot
- can show the parameters that have a constant value, and since these
- cannot have an influence on the model, they should also be ignored.
- However, making conclusions based solely on the visual analysis can be
- treacherous and it is better to rely on the statistical tools. The
- multiple linear regression methods used in the following sections will
- also be able to detect and ignore these irrelevant
- parameters. Therefore, this initial visual look should only be used to
- get a basic idea about the model, but all the parameters should be
- kept for now.
- ## Initial model
- At this point, an initial model is computed, using all the parameters,
- but not taking into account their exponents or the relations between
- them.
- ```{r Model1}
- model1 <- lm(data=df, Duration ~ M+N+K)
- summary(model1)
- ```
- For each parameter and the constant in the first column, an estimation
- of the corresponding coefficient is provided along with the 95%
- confidence interval. If there are any parameters with NA value, which
- suggests that the parameters are correlated to another parameter or
- that their value is constant, these parameters should not be used in
- the following model computations. The stars in the last column
- indicate the significance of each parameter. However, having maximum
- three stars for each parameter does not necessarily mean that the
- model is perfect and we should always inspect the adjusted R^2 value
- (the closer it is to 1, the better the model is). To the users that
- are not common to the multiple linear regression analysis and R tools,
- we suggest to the R documentation. Some explanations are also provided
- in the following article https://hal.inria.fr/hal-01180272.
- In this example, all parameters M, N, K are very important. However,
- it is not clear if there are some relations between them or if some of
- these parameters should be used with an exponent. Moreover, adjusted
- R^2 value is not extremely high and we hope we can get a better
- one. Thus, we proceed to the more advanced analysis.
- ## Refining the model
- Now, we can seek for the relations between the parameters. Note that
- trying all the possible combinations for the cases with a huge number
- of parameters can be prohibitively long. Thus, it may be better to first
- get rid of the parameters which seem to have very small influence
- (typically the ones with no stars from the table in the previous
- section).
- ```{r Model2}
- model2 <- lm(data=df, Duration ~ M*N*K)
- summary(model2)
- ```
- This model is more accurate, as the R^2 value increased. We can also
- try some of these parameters with the exponents.
- ```{r Model3}
- model3 <- lm(data=df, Duration ~ I(M^2)+I(M^3)+I(N^2)+I(N^3)+I(K^2)+I(K^3))
- summary(model3)
- ```
- It seems like some parameters are important. Now we combine these and
- try to find the optimal combination (here we go directly to the final
- solution, although this process typically takes several iterations of
- trying different combinations).
- ```{r Model4}
- model4 <- lm(data=df, Duration ~ I(M^2):N+I(N^3):K)
- summary(model4)
- ```
- This seems to be the most accurate model, with a high R^2 value. We
- can proceed to its validation.
- ## Validation
- Once the model has been computed, we should validate it. Apart from
- the low adjusted R^2 value, the model weakness can also be observed
- even better when inspecting the residuals. The results on two
- following plots (and thus the accuracy of the model) will greatly
- depend on the measurements variability and the design of experiments.
- ```{r Validation}
- par(mfrow=c(1,2))
- plot(model4, which=c(1:2))
- ```
- Generally speaking, if there are some structures on the left plot,
- this can indicate that there are certain phenomena not explained by
- the model. Many points on the same horizontal line represent
- repetitive occurrences of the task with the same parameter values,
- which is typical for a single experiment run with a homogeneous
- data. The fact that there is some variability is common, as executing
- exactly the same code on a real machine will always have slightly
- different duration. However, having a huge variability means that the
- benchmarks were very noisy, thus deriving an accurate models from them
- will be hard.
- Plot on the right may show that the residuals do not follow the normal
- distribution. Therefore, such model in overall would have a limited
- predictive power.
- If we are not satisfied with the accuracy of the observed models, we
- should go back to the previous section and try to find a better
- one. In some cases, the benchmarked data is just be too noisy or the
- choice of the parameters is not appropriate, and thus the experiments
- should be redesigned and rerun.
- When we are finally satisfied with the model accuracy, we should
- modify our task code, so that StarPU knows which parameters
- combinations are used in the model.
- ## Generating C code
- Depending on the way the task codelet is programmed, this section may
- be somehow useful. This is a simple helper to generate C code for the
- parameters combinations and it should be copied to the task
- description in the application. The function generating the code is
- not so robust, so make sure that the generated code correctly
- corresponds to computed model (e.g., parameters are considered in the
- alphabetical order).
- ```{r Code}
- print_codelet(model4, "mlr_cl")
- ```
- ## Conclusion
- We have computed the model for our benchmarked data using multiple
- linear regression. After encoding this model into the task code,
- StarPU will be able to automatically compute the coefficients and use
- the model to predict task duration.
|