| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258 | # StarPU --- Runtime system for heterogeneous multicore architectures.## Copyright (C) 2017                                     CNRS# Copyright (C) 2016                                     Inria## StarPU is free software; you can redistribute it and/or modify# it under the terms of the GNU Lesser General Public License as published by# the Free Software Foundation; either version 2.1 of the License, or (at# your option) any later version.## StarPU is distributed in the hope that it will be useful, but# WITHOUT ANY WARRANTY; without even the implied warranty of# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.## See the GNU Lesser General Public License in COPYING.LGPL for more details.#```{r Setup, echo=FALSE}opts_chunk$set(echo=FALSE)``````{r Load_R_files_and_functions}print_codelet <- function(reg,codelet){   cat(paste("/* ############################################ */", "\n"))   cat(paste("/*\t Automatically generated code */", "\n"))   cat(paste("\t Check for potential errors and be sure parameter value are written in good order (alphabetical one by default)", "\n"))   cat(paste("\t Adjusted R-squared: ", summary(reg)$adj.r.squared, "*/\n\n"))   ncomb <- reg$rank - 1   cat(paste("\t ", codelet, ".model->ncombinations = ", ncomb, ";\n", sep=""))   cat(paste("\t ", codelet, ".model->combinations = (unsigned **) malloc(", codelet, ".model->ncombinations*sizeof(unsigned *))", ";\n\n", sep=""))   cat(paste("\t if (", codelet, ".model->combinations)", "\n", "\t {\n", sep=""))   cat(paste("\t   for (unsigned i = 0; i < ", codelet, ".model->ncombinations; i++)", "\n", "\t   {\n", sep=""))   cat(paste("\t     ", codelet, ".model->combinations[i] = (unsigned *) malloc(", codelet, ".model->nparameters*sizeof(unsigned))", ";\n", "\t   }\n", "\t }\n\n", sep=""))   # Computing combinations   df <- data.frame(attr(reg$terms, "factors"))   df <- df/2   df$Params <- row.names(df)   df <-df[c(2:nrow(df)),]   i=1   options(warn=-1)   for(i in (1:nrow(df)))   {     name <- df[i,]$Params     if (grepl("I\\(*", name))     {        exp <- as.numeric(gsub("(.*?)\\^(.*?)\\)", "\\2", name))        df[i,] <- as.numeric(df[i,]) * exp        df[i,]$Params <- as.character(gsub("I\\((.*?)\\^(.*?)\\)", "\\1", name))     }   }   df <- aggregate(. ~ Params, transform(df, Params), sum)   options(warn=0)   i=1   j=1   for(j in (2:length(df)))   {     for(i in (1:nrow(df)))     {       cat(paste("\t ", codelet, ".model->combinations[", j-2, "][", i-1, "] = ", as.numeric(df[i,j]), ";\n", sep=""))     }   }   cat(paste("/* ############################################ */", "\n"))}df<-read.csv(input_trace, header=TRUE)opts_chunk$set(echo=TRUE)```# Multiple Linear Regression Model Example## IntroductionThis document demonstrates the type of the analysis needed to computethe multiple linear regression model of the task. It relies on theinput data benchmarked by the StarPU (or any other tool, but followingthe same format). The input data used in this example is generated bythe task "mlr_init", from the "examples/mlr/mlr.c".This document can be used as an template for the analysis of any othertask.### How to compile    ./starpu_mlr_analysis .starpu/sampling/codelets/tmp/mlr_init.out### Software dependenciesIn order to run the analysis you need to have R installed:    sudo apt-get install r-baseIn order to compile this document, you need *knitr* (although you canperfectly only use the R code from this document without knitr). Ifyou decided that you want to generate this document, then start R(e.g., from terminal) and install knitr package:    R> install.packages("knitr")No additional R packages are needed.## First glimpse at the dataFirst, we show the relations between all parameters in a single plot.```{r InitPlot}plot(df)```For this example, all three parameters M, N, K have some influence,but their relation is not easy to understand.In general, this type of plots can typically show if there areoutliers. It can also show if there is a group of parameters which aremutually perfectly correlated, in which case only a one parameter fromthe group should be kept for the further analysis. Additionally, plotcan show the parameters that have a constant value, and since thesecannot have an influence on the model, they should also be ignored.However, making conclusions based solely on the visual analysis can betreacherous and it is better to rely on the statistical tools. Themultiple linear regression methods used in the following sections willalso be able to detect and ignore these irrelevantparameters. Therefore, this initial visual look should only be used toget a basic idea about the model, but all the parameters should bekept for now.## Initial modelAt this point, an initial model is computed, using all the parameters,but not taking into account their exponents or the relations betweenthem.```{r Model1}model1 <- lm(data=df, Duration ~ M+N+K)summary(model1)```For each parameter and the constant in the first column, an estimationof the corresponding coefficient is provided along with the 95%confidence interval. If there are any parameters with NA value, whichsuggests that the parameters are correlated to another parameter orthat their value is constant, these parameters should not be used inthe following model computations. The stars in the last columnindicate the significance of each parameter. However, having maximumthree stars for each parameter does not necessarily mean that themodel is perfect and we should always inspect the adjusted R^2 value(the closer it is to 1, the better the model is). To the users thatare not common to the multiple linear regression analysis and R tools,we suggest to the R documentation. Some explanations are also providedin the following article https://hal.inria.fr/hal-01180272.In this example, all parameters M, N, K are very important. However,it is not clear if there are some relations between them or if some ofthese parameters should be used with an exponent. Moreover, adjustedR^2 value is not extremely high and we hope we can get a betterone. Thus, we proceed to the more advanced analysis.## Refining the modelNow, we can seek for the relations between the parameters. Note thattrying all the possible combinations for the cases with a huge numberof parameters can be prohibitively long. Thus, it may be better to firstget rid of the parameters which seem to have very small influence(typically the ones with no stars from the table in the previoussection).```{r Model2}model2 <- lm(data=df, Duration ~ M*N*K)summary(model2)```This model is more accurate, as the R^2 value increased. We can alsotry some of these parameters with the exponents.```{r Model3}model3 <- lm(data=df, Duration ~ I(M^2)+I(M^3)+I(N^2)+I(N^3)+I(K^2)+I(K^3))summary(model3)```It seems like some parameters are important. Now we combine these andtry to find the optimal combination (here we go directly to the finalsolution, although this process typically takes several iterations oftrying different combinations).```{r Model4}model4 <- lm(data=df, Duration ~ I(M^2):N+I(N^3):K)summary(model4)```This seems to be the most accurate model, with a high R^2 value. Wecan proceed to its validation.## ValidationOnce the model has been computed, we should validate it. Apart fromthe low adjusted R^2 value, the model weakness can also be observedeven better when inspecting the residuals. The results on twofollowing plots (and thus the accuracy of the model) will greatlydepend on the measurements variability and the design of experiments.```{r Validation}par(mfrow=c(1,2))plot(model4, which=c(1:2))```Generally speaking, if there are some structures on the left plot,this can indicate that there are certain phenomena not explained bythe model. Many points on the same horizontal line representrepetitive occurrences of the task with the same parameter values,which is typical for a single experiment run with a homogeneousdata. The fact that there is some variability is common, as executingexactly the same code on a real machine will always have slightlydifferent duration. However, having a huge variability means that thebenchmarks were very noisy, thus deriving an accurate models from themwill be hard.Plot on the right may show that the residuals do not follow the normaldistribution. Therefore, such model in overall would have a limitedpredictive power.If we are not satisfied with the accuracy of the observed models, weshould go back to the previous section and try to find a betterone. In some cases, the benchmarked data is just be too noisy or thechoice of the parameters is not appropriate, and thus the experimentsshould be redesigned and rerun.When we are finally satisfied with the model accuracy, we shouldmodify our task code, so that StarPU knows which parameterscombinations are used in the model.## Generating C codeDepending on the way the task codelet is programmed, this section maybe somehow useful. This is a simple helper to generate C code for theparameters combinations and it should be copied to the taskdescription in the application. The function generating the code isnot so robust, so make sure that the generated code correctlycorresponds to computed model (e.g., parameters are considered in thealphabetical order).```{r Code}print_codelet(model4, "mlr_cl")```## ConclusionWe have computed the model for our benchmarked data using multiplelinear regression. After encoding this model into the task code,StarPU will be able to automatically compute the coefficients and usethe model to predict task duration.
 |