|
@@ -56,13 +56,22 @@ print_codelet <- function(reg,codelet){
|
|
|
df<-read.csv(input_trace, header=TRUE)
|
|
|
```
|
|
|
|
|
|
-# Introduction
|
|
|
+# Multiple Linear Regression Model Example
|
|
|
|
|
|
-TODO
|
|
|
+## Introduction
|
|
|
+
|
|
|
+This document demonstrates the type of the analysis needed to compute
|
|
|
+the multiple linear regression model of the task. It relies on the
|
|
|
+input data benchmarked by the StarPU (or any other tool, but following
|
|
|
+the same format). The input data used in this example is generated by
|
|
|
+the task "mlr_init", from the "examples/basic_examples/mlr.c".
|
|
|
+
|
|
|
+This document can be used as an template for the analysis of any other
|
|
|
+task.
|
|
|
|
|
|
### How to compile
|
|
|
|
|
|
- ./starpu_mlr_analysis .starpu/sampling/codelets/tmp/test_mlr.out
|
|
|
+ ./starpu_mlr_analysis .starpu/sampling/codelets/tmp/mlr_init.out
|
|
|
|
|
|
### Software dependencies
|
|
|
|
|
@@ -70,13 +79,16 @@ In order to run the analysis you need to have R installed:
|
|
|
|
|
|
sudo apt-get install r-base
|
|
|
|
|
|
-In order to compile this document, you need *knitr*. However, you can perfectly use the R code from this document without knitr in your own scripts. If you decided that you want to generate this document, then start R (e.g., from terminal) and install knitr package:
|
|
|
+In order to compile this document, you need *knitr* (although you can
|
|
|
+perfectly only use the R code from this document without knitr). If
|
|
|
+you decided that you want to generate this document, then start R
|
|
|
+(e.g., from terminal) and install knitr package:
|
|
|
|
|
|
R> install.packages("knitr")
|
|
|
|
|
|
No additional R packages are needed.
|
|
|
|
|
|
-# First glimpse
|
|
|
+## First glimpse at the data
|
|
|
|
|
|
First, we show the relations between all parameters in a single plot.
|
|
|
|
|
@@ -87,12 +99,12 @@ plot(df)
|
|
|
For this example, all three parameters M, N, K have some influence,
|
|
|
but their relation is not easy to understand.
|
|
|
|
|
|
-In general, this type of plots can typically show if there is a group
|
|
|
-of parameters which are mutually perfectly correlated, in which case
|
|
|
-only a one parameter from the group should be kept for the further
|
|
|
-analysis. Additionally, plot can show the parameters that have a
|
|
|
-constant value, and since these cannot have an influence on the model,
|
|
|
-they should also be ignored.
|
|
|
+In general, this type of plots can typically show if there are
|
|
|
+outliers. It can also show if there is a group of parameters which are
|
|
|
+mutually perfectly correlated, in which case only a one parameter from
|
|
|
+the group should be kept for the further analysis. Additionally, plot
|
|
|
+can show the parameters that have a constant value, and since these
|
|
|
+cannot have an influence on the model, they should also be ignored.
|
|
|
|
|
|
However, making conclusions based solely on the visual analysis can be
|
|
|
treacherous and it is better to rely on the statistical tools. The
|
|
@@ -102,7 +114,7 @@ parameters. Therefore, this initial visual look should only be used to
|
|
|
get a basic idea about the model, but all the parameters should be
|
|
|
kept for now.
|
|
|
|
|
|
-# Initial model
|
|
|
+## Initial model
|
|
|
|
|
|
At this point, an initial model is computed, using all the parameters,
|
|
|
but not taking into account their exponents or the relations between
|
|
@@ -127,14 +139,13 @@ are not common to the multiple linear regression analysis and R tools,
|
|
|
we suggest to the R documentation. Some explanations are also provided
|
|
|
in the following article https://hal.inria.fr/hal-01180272.
|
|
|
|
|
|
-In this example, all parameters M, N, K are all very
|
|
|
-important. However, it is not clear if there are some relations
|
|
|
-between them or if some of these parameters should be used with an
|
|
|
-exponent. Moreover, adjusted R^2 value is not extremelly high and we
|
|
|
-hope we can get a better one. Thus, we proceed to the more advanced
|
|
|
-analysis.
|
|
|
+In this example, all parameters M, N, K are very important. However,
|
|
|
+it is not clear if there are some relations between them or if some of
|
|
|
+these parameters should be used with an exponent. Moreover, adjusted
|
|
|
+R^2 value is not extremely high and we hope we can get a better
|
|
|
+one. Thus, we proceed to the more advanced analysis.
|
|
|
|
|
|
-# Refining the model
|
|
|
+## Refining the model
|
|
|
|
|
|
Now, we can seek for the relations between the parameters. Note that
|
|
|
trying all the possible combinations for the cases with a huge number
|
|
@@ -148,9 +159,8 @@ model2 <- lm(data=df, Duration ~ M*N*K)
|
|
|
summary(model2)
|
|
|
```
|
|
|
|
|
|
-This model is more accurate, as the R^2 value increased. Now when some
|
|
|
-relations are observed, we can try some of these parameters with the
|
|
|
-exponents.
|
|
|
+This model is more accurate, as the R^2 value increased. We can also
|
|
|
+try some of these parameters with the exponents.
|
|
|
|
|
|
```{r Model3}
|
|
|
model3 <- lm(data=df, Duration ~ I(M^2)+I(M^3)+I(N^2)+I(N^3)+I(K^2)+I(K^3))
|
|
@@ -158,17 +168,19 @@ summary(model3)
|
|
|
```
|
|
|
|
|
|
It seems like some parameters are important. Now we combine these and
|
|
|
-try to find the optimal combination.
|
|
|
+try to find the optimal combination (here we go directly to the final
|
|
|
+solution, although this process typically takes several iterations of
|
|
|
+trying different combinations).
|
|
|
|
|
|
```{r Model4}
|
|
|
model4 <- lm(data=df, Duration ~ I(M^2):N+I(N^3):K)
|
|
|
summary(model4)
|
|
|
```
|
|
|
|
|
|
-Depending on the machine characteristics and the variability of
|
|
|
-benchmarks, this may be the best model.
|
|
|
+This seems to be the most accurate model, with a high R^2 value. We
|
|
|
+can proceed to its validation.
|
|
|
|
|
|
-# Validation
|
|
|
+## Validation
|
|
|
|
|
|
Once the model has been computed, we should validate it. Apart from
|
|
|
the low adjusted R^2 value, the model weakness can also be observed
|
|
@@ -189,7 +201,7 @@ which is typical for a single experiment run with a homogeneous
|
|
|
data. The fact that there is some variability is common, as executing
|
|
|
exactly the same code on a real machine will always have slightly
|
|
|
different duration. However, having a huge variability means that the
|
|
|
-benchmarks were very noisy, thus having an accurate models from them
|
|
|
+benchmarks were very noisy, thus deriving an accurate models from them
|
|
|
will be hard.
|
|
|
|
|
|
Plot on the right may show that the residuals do not follow the normal
|
|
@@ -198,19 +210,31 @@ predictive power.
|
|
|
|
|
|
If we are not satisfied with the accuracy of the observed models, we
|
|
|
should go back to the previous section and try to find a better
|
|
|
-one. In some cases, the benchmarked data will just be too noisy and
|
|
|
-they should be redesigned and run again.
|
|
|
+one. In some cases, the benchmarked data is just be too noisy or the
|
|
|
+choice of the parameters is not appropriate, and thus the experiments
|
|
|
+should be redesigned and rerun.
|
|
|
|
|
|
When we are finally satisfied with the model accuracy, we should
|
|
|
modify our task code, so that StarPU knows which parameters
|
|
|
combinations are used in the model.
|
|
|
|
|
|
-# Generating C code
|
|
|
+## Generating C code
|
|
|
|
|
|
-This is a simple helper to generate C code which should be copied to
|
|
|
-the task description in your application. Make sure that the generated
|
|
|
-code correctly corresponds to computed model.
|
|
|
+Depending on the way the task codelet is programmed, this section may
|
|
|
+be somehow useful. This is a simple helper to generate C code for the
|
|
|
+parameters combinations and it should be copied to the task
|
|
|
+description in the application. The function generating the code is
|
|
|
+not so robust, so make sure that the generated code correctly
|
|
|
+corresponds to computed model (for example parameters are considered
|
|
|
+in the alphabetical order).
|
|
|
|
|
|
```{r Code}
|
|
|
print_codelet(model4, "mlr_cl")
|
|
|
```
|
|
|
+
|
|
|
+## Conclusion
|
|
|
+
|
|
|
+We have computed the model for our benchmarked data using multiple
|
|
|
+linear regression. After encoding this model into the task code,
|
|
|
+StarPU will be able to automatically compute the coefficients and use
|
|
|
+the model to predict task duration.
|