A more advanced method for adding SparkP Parameters.
A more advanced method for adding SparkP Parameters. Will also add a "StorageLevel" Parameter which will indicate what level of persistence to use within a Spark job. NOTE: The storage level parameter cannot be set automatically during runtime. To have any effect the custom operator developer must implement RDD persistence with this value (retrievable with 'getStorageLevel' method) in the Spark Job class of their operator.
- default storage level e.g. NONE or "MEMORY_AND_DISK.
- a list of a additional Spark Parameters.
Return true if the repartition parameter was added AND the user checked it.
Return true if the repartition parameter was added AND the user checked it. Return false otherwise. Note that because this parameter is exposed as a check box by the alpine engine, the value of the parameter will be either "true" or "false" (string representation of java booleans).
Retrieve storage level param added via "makeStorageLevelParam" from advanced parameters box.
Retrieve storage level param added via "makeStorageLevelParam" from advanced parameters box. Return NONE if the parameter was not added. NOTE: this method does not validate the String, so if the users put in an invalid storage level parameter, calling StorageLevel.fromString(s) on the result of this method will fail.
Retrieve the value of the number of partitions parameter added to the advanced spark box.
Retrieve the value of the number of partitions parameter added to the advanced spark box. However, the CO developer should be aware that Alpine Auto Tuning determines an optimal number of partitions to use and sets that value to spark.default.parallelism. So before repartitioning the developer should check that that is set if this method returns None.
Create a "Number of Partitions" parameter to let the user determine how many partitions should be used either when repartitioning the data (controlled by above param) or in when shuffling generally.
Create a "Number of Partitions" parameter to let the user determine how many partitions should be used either when repartitioning the data (controlled by above param) or in when shuffling generally. If this parameter is not set, a value will be selected by auto tuning. If a value is selected this value will be used to set the "spark.default.parallelism" parameter which controls the default number of parameter used in a wide transformation in Spark.
Create a "Repartition Data" checkbox to let the user determine whether the input data should be shuffled to increase the number of partitions.
Add storage level param.
Add storage level param. Default must be the string representation of a Spark Storage Level e.g. "MEMORY_AND_DISK"
Convenience functions for directly adding Spark related options to the dialog window.