Same as above but sets min executor memory to zero.
Same as above but sets min executor memory to zero. In this case it will be overridden by the alpine conf.
Create a simple version of the AutoTuner options.
Create a simple version of the AutoTuner options. This includes two basic constants used in the auto tuning process.
if the data is large we set the executors to the size of a yarn container. By default we set the driver to the same size as the executors. However, if your computation does not return much data to the driver you may not need so much driver memory. This scalar provides a way to scale the size of the driver. If you have a highly parallelizable algorithm that does not return much input to the driver (say KMeans) try setting this to less then one. If you want the driver memory to be larger even on small input data (perhaps for an algorithm that aggregates by a group and returns data to the driver try setting it to more than one.
In auto tuning we try to use enough executors so that the input data could be read in memory into the compute layer of the executors. We calculate the size of the input data in memory by multiplying the file size (which you can see by right clicking a dataset in alpine and viewing the "Hadoop File Properties" dialog) by a scalar Y = X * inputCachedSizeMultiplier, where X is a coefficient which takes into account the storage format, compression, and input column data types. For multiple tabular inputs we use the sum of the estimated input data memory for all inputs. Default value for inputCachedSizeMultiplier is 1.0, adjust this value based on your estimation of the resources required and if the operation is expensive.
the minimum size of the executor memory. This is design to insure that for algorithms that perform very expensive computations on the executors, that the executor memory is set to be pretty large, even if the cluster or input data is relatively small.