com.alpine.plugin.core.spark

AutoTunerOptions

object AutoTunerOptions extends Serializable

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. AutoTunerOptions
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. def apply(driverMemoryFraction: Double, fileSizeMultiplier: Double): AutoTunerOptions

    Create a simple version of the AutoTuner options.

    Create a simple version of the AutoTuner options. This includes two basic constants used in the auto tuning process.

    driverMemoryFraction

    if the data is large we set the executors to the size of a yarn container. By default we set the driver to the same size as the executors. However, if your computation does not return much data to the driver you may not need so much driver memory. This scalar provides a way to scale the size of the driver. If you have a highly parallelizable algorithm that does not return much input to the driver (say KMeans) try setting this to less then one. If you want the driver memory to be larger even on small input data (perhaps for an algorithm that aggregates by a group and returns data to the driver try setting it to more than one.

    fileSizeMultiplier

    In auto tuning we try to use enough executors so that the input data could be read in memory into the compute layer of the executors. We calculate the size of the input data in memory by multiplying the file size (which you can see by right clicking a dataset in alpine and viewing the "Hadoop File Properties" dialog) by a scalar.

    The default if 4. i.e. we are assuming that you are using all of the input data and that it is perhaps a dataset of integers which take roughly four times as much space on memory than on disk. Adjust this value based on your estimation of the resources required. For example, if your operation immediately filters the data to a few integer columns a better multiplier might be (selectedColumns)/(totalColumns) * 4.

  7. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  8. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. val driverMemoryFractionId: String

  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. val fileSizeMultiplierId: String

  13. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  14. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  15. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  16. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  17. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  18. final def notify(): Unit

    Definition Classes
    AnyRef
  19. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  20. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  21. def toString(): String

    Definition Classes
    AnyRef → Any
  22. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped