In order to generate the 300 spark configurations, we use the OpenTuner library. The available range of values for all the parameters can be found in src/python/spark-config-generator.py
file.
To generate the 300 configs we execute the following:
mkdir configs
for i in $(seq 1 1 300)
do
python3 spark-config-generator.py
mv out configs/$i
done
The process may take a while. The 300 configurations will be available in the configs folder.
This script generates a random SPARK configuration based on 107 SPARK parameters using the opentuner library.
When running the script, you might get the following error:
TypeError: Unicode-objects must be encoded before hashing
To fix this, first locate the installation path of opentuner package:
python3 -c "import opentuner; print(opentuner.__file__)"
Then edit the following line in file search/manipulator.py
@@ 858c858 @@
- return hashlib.sha256(repr(self.get_value(config)).encode('utf-8')).hexdigest().encode('utf-8')
+ return hashlib.sha256(repr(self.get_value(config)).encode('utf-8')).hexdigest().encode('utf-8')