Append contents to the given HDFS path.
Append contents to the given HDFS path.
The HDFS path that we want to append to.
OutputStream corresponding to the path.
Create a HDFS path for writing.
Create a HDFS path for writing.
The HDFS path that we want to create and write to.
Whether to overwrite the given path if it exists.
OutputStream corresponding to the path.
Delete the given HDFS path.
Delete the given HDFS path.
The HDFS path that we want to delete.
If it's a directory, whether we want to delete the directory recursively.
true if successful, false otherwise.
Determine whether the given path exists in the HDFS or not.
Determine whether the given path exists in the HDFS or not.
The path that we want to check.
true if it exists, false otherwise.
Returns the map of Spark parameters after autoTuning algorithm is applied.
Returns the map of Spark parameters after autoTuning algorithm is applied. This is the final set of Spark properties ready to be passed with Spark job submission (except spark.job.name) It leverages: - user-defined Spark properties at the operator level (Spark Advanced Settings box), workflow level and data source level (in this order of precedence). - AutoTunerOptions set in the SparkJobConfiguration (@param sparkConf) - auto-tuned parameters that were not user-specified
Input type.
IO typed job class.
Input to the job. This automatically gets serialized.
Parameters into the job.
Spark job configuration.
Listener to pass to the job. The spark job should be able to communicate directly with Alpine as it's running.
The map of relevant Spark properties after auto tuning algorithm was applied.
Create the directory path.
Create the directory path.
The directory path that we want to create.
true if it succeeds, false otherwise.
Open a HDFS path for reading.
Open a HDFS path for reading.
The HDFS path that we want to read from.
InputStream corresponding to the path.
This is to be the function to submit the IO typed job to Spark.
This is to be the function to submit the IO typed job to Spark. IO typed Spark jobs will automatically serialize/deserialize input/outputs. TODO: Not supported as of yet.
Input type.
Output type.
The job type.
IO typed job class.
Input to the job. This automatically gets serialized.
Parameters into the job.
Spark job configuration.
Listener to pass to the job. The spark job should be able to communicate directly with Alpine as it's running.
A submitted job object.
This is a mock version of SparkExecutionContext, for use in tests. This defines the HDFSVisualModelHelper as HDFSVisualModelHelperMock, and chorusAPICaller according to the argument.
It can be extended for different behaviour (e.g. mocking the file system).