| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242 | 
							
- ```{r Setup, echo=FALSE}
 
- opts_chunk$set(echo=FALSE)
 
- ```
 
- ```{r Load_R_files_and_functions}
 
- print_codelet <- function(reg,codelet){
 
-    cat(paste("/* ############################################ */", "\n"))
 
-    cat(paste("/*\t Automatically generated code */", "\n"))
 
-    cat(paste("\t Check for potential errors and be sure parameter value are written in good order (alphabetical one by default)", "\n"))
 
-    cat(paste("\t Adjusted R-squared: ", summary(reg)$adj.r.squared, "*/\n\n"))
 
-    ncomb <- reg$rank - 1
 
-    cat(paste("\t ", codelet, ".model->ncombinations = ", ncomb, ";\n", sep=""))
 
-    cat(paste("\t ", codelet, ".model->combinations = (unsigned **) malloc(", codelet, ".model->ncombinations*sizeof(unsigned *))", ";\n\n", sep=""))
 
-    cat(paste("\t if (", codelet, ".model->combinations)", "\n", "\t {\n", sep=""))
 
-    cat(paste("\t   for (unsigned i = 0; i < ", codelet, ".model->ncombinations; i++)", "\n", "\t   {\n", sep=""))
 
-    cat(paste("\t     ", codelet, ".model->combinations[i] = (unsigned *) malloc(", codelet, ".model->nparameters*sizeof(unsigned))", ";\n", "\t   }\n", "\t }\n\n", sep=""))
 
-    
 
-    # Computing combinations
 
-    df <- data.frame(attr(reg$terms, "factors"))
 
-    df <- df/2
 
-    df$Params <- row.names(df)
 
-    df <-df[c(2:nrow(df)),]
 
-    i=1
 
-    options(warn=-1)
 
-    for(i in (1:nrow(df)))
 
-    {
 
-      name <- df[i,]$Params
 
-      if (grepl("I\\(*", name))
 
-      {
 
-         exp <- as.numeric(gsub("(.*?)\\^(.*?)\\)", "\\2", name))
 
-         df[i,] <- as.numeric(df[i,]) * exp
 
-         df[i,]$Params <- as.character(gsub("I\\((.*?)\\^(.*?)\\)", "\\1", name))
 
-      }
 
-    }
 
-    df <- aggregate(. ~ Params, transform(df, Params), sum)
 
-    options(warn=0)
 
-    i=1
 
-    j=1 
 
-    for(j in (2:length(df)))
 
-    {
 
-      for(i in (1:nrow(df)))
 
-      {
 
-        cat(paste("\t ", codelet, ".model->combinations[", j-2, "][", i-1, "] = ", as.numeric(df[i,j]), ";\n", sep=""))
 
-      }
 
-    }
 
-    cat(paste("/* ############################################ */", "\n"))
 
- }
 
- df<-read.csv(input_trace, header=TRUE)
 
- opts_chunk$set(echo=TRUE)
 
- ```
 
- # Multiple Linear Regression Model Example
 
- ## Introduction
 
- This document demonstrates the type of the analysis needed to compute
 
- the multiple linear regression model of the task. It relies on the
 
- input data benchmarked by the StarPU (or any other tool, but following
 
- the same format). The input data used in this example is generated by
 
- the task "mlr_init", from the "examples/mlr/mlr.c".
 
- This document can be used as an template for the analysis of any other
 
- task.
 
- ### How to compile
 
-     ./starpu_mlr_analysis .starpu/sampling/codelets/tmp/mlr_init.out
 
- ### Software dependencies
 
- In order to run the analysis you need to have R installed:
 
-     sudo apt-get install r-base 
 
- In order to compile this document, you need *knitr* (although you can
 
- perfectly only use the R code from this document without knitr). If
 
- you decided that you want to generate this document, then start R
 
- (e.g., from terminal) and install knitr package:
 
-     R> install.packages("knitr")
 
- No additional R packages are needed.
 
- ## First glimpse at the data
 
- First, we show the relations between all parameters in a single plot.
 
- ```{r InitPlot}
 
- plot(df)
 
- ```
 
- For this example, all three parameters M, N, K have some influence,
 
- but their relation is not easy to understand.
 
- In general, this type of plots can typically show if there are
 
- outliers. It can also show if there is a group of parameters which are
 
- mutually perfectly correlated, in which case only a one parameter from
 
- the group should be kept for the further analysis. Additionally, plot
 
- can show the parameters that have a constant value, and since these
 
- cannot have an influence on the model, they should also be ignored.
 
- However, making conclusions based solely on the visual analysis can be
 
- treacherous and it is better to rely on the statistical tools. The
 
- multiple linear regression methods used in the following sections will
 
- also be able to detect and ignore these irrelevant
 
- parameters. Therefore, this initial visual look should only be used to
 
- get a basic idea about the model, but all the parameters should be
 
- kept for now.
 
- ## Initial model
 
- At this point, an initial model is computed, using all the parameters,
 
- but not taking into account their exponents or the relations between
 
- them.
 
- ```{r Model1}
 
- model1 <- lm(data=df, Duration ~ M+N+K)
 
- summary(model1)
 
- ```
 
- For each parameter and the constant in the first column, an estimation
 
- of the corresponding coefficient is provided along with the 95%
 
- confidence interval. If there are any parameters with NA value, which
 
- suggests that the parameters are correlated to another parameter or
 
- that their value is constant, these parameters should not be used in
 
- the following model computations. The stars in the last column
 
- indicate the significance of each parameter. However, having maximum
 
- three stars for each parameter does not necessarily mean that the
 
- model is perfect and we should always inspect the adjusted R^2 value
 
- (the closer it is to 1, the better the model is). To the users that
 
- are not common to the multiple linear regression analysis and R tools,
 
- we suggest to the R documentation. Some explanations are also provided
 
- in the following article https://hal.inria.fr/hal-01180272.
 
-        
 
- In this example, all parameters M, N, K are very important. However,
 
- it is not clear if there are some relations between them or if some of
 
- these parameters should be used with an exponent. Moreover, adjusted
 
- R^2 value is not extremely high and we hope we can get a better
 
- one. Thus, we proceed to the more advanced analysis.
 
- ## Refining the model
 
- Now, we can seek for the relations between the parameters. Note that
 
- trying all the possible combinations for the cases with a huge number
 
- of parameters can be prohibitively long. Thus, it may be better to first
 
- get rid of the parameters which seem to have very small influence
 
- (typically the ones with no stars from the table in the previous
 
- section).
 
- ```{r Model2}
 
- model2 <- lm(data=df, Duration ~ M*N*K)
 
- summary(model2)
 
- ```
 
- This model is more accurate, as the R^2 value increased. We can also
 
- try some of these parameters with the exponents.
 
- ```{r Model3}
 
- model3 <- lm(data=df, Duration ~ I(M^2)+I(M^3)+I(N^2)+I(N^3)+I(K^2)+I(K^3))
 
- summary(model3)
 
- ```
 
- It seems like some parameters are important. Now we combine these and
 
- try to find the optimal combination (here we go directly to the final
 
- solution, although this process typically takes several iterations of
 
- trying different combinations).
 
- ```{r Model4}
 
- model4 <- lm(data=df, Duration ~ I(M^2):N+I(N^3):K)
 
- summary(model4)
 
- ```
 
- This seems to be the most accurate model, with a high R^2 value. We
 
- can proceed to its validation.
 
- ## Validation
 
- Once the model has been computed, we should validate it. Apart from
 
- the low adjusted R^2 value, the model weakness can also be observed
 
- even better when inspecting the residuals. The results on two
 
- following plots (and thus the accuracy of the model) will greatly
 
- depend on the measurements variability and the design of experiments.
 
- ```{r Validation}
 
- par(mfrow=c(1,2))
 
- plot(model4, which=c(1:2))
 
- ```
 
- Generally speaking, if there are some structures on the left plot,
 
- this can indicate that there are certain phenomena not explained by
 
- the model. Many points on the same horizontal line represent
 
- repetitive occurrences of the task with the same parameter values,
 
- which is typical for a single experiment run with a homogeneous
 
- data. The fact that there is some variability is common, as executing
 
- exactly the same code on a real machine will always have slightly
 
- different duration. However, having a huge variability means that the
 
- benchmarks were very noisy, thus deriving an accurate models from them
 
- will be hard.
 
- Plot on the right may show that the residuals do not follow the normal
 
- distribution. Therefore, such model in overall would have a limited
 
- predictive power.
 
- If we are not satisfied with the accuracy of the observed models, we
 
- should go back to the previous section and try to find a better
 
- one. In some cases, the benchmarked data is just be too noisy or the
 
- choice of the parameters is not appropriate, and thus the experiments
 
- should be redesigned and rerun.
 
- When we are finally satisfied with the model accuracy, we should
 
- modify our task code, so that StarPU knows which parameters
 
- combinations are used in the model.
 
- ## Generating C code
 
- Depending on the way the task codelet is programmed, this section may
 
- be somehow useful. This is a simple helper to generate C code for the
 
- parameters combinations and it should be copied to the task
 
- description in the application. The function generating the code is
 
- not so robust, so make sure that the generated code correctly
 
- corresponds to computed model (e.g., parameters are considered in the
 
- alphabetical order).
 
- ```{r Code}
 
- print_codelet(model4, "mlr_cl")
 
- ```
 
- ## Conclusion
 
- We have computed the model for our benchmarked data using multiple
 
- linear regression. After encoding this model into the task code,
 
- StarPU will be able to automatically compute the coefficients and use
 
- the model to predict task duration.
 
 
  |