Converts an Alpine specific 'ColumnType' to the corresponding Saprk SQL specific type.
Converts an Alpine specific 'ColumnType' to the corresponding Saprk SQL specific type. If no match can be found for the type, return a string type rather than throwing an exception. used to define data frame schemas.
Converts from a Spark SQL data type to an Alpine-specific column type.
Converts from a Spark SQL data type to an Alpine-specific column type.
Converts from a Spark SQL schema to the Alpine 'TabularSchema' type.
Converts from a Spark SQL schema to the Alpine 'TabularSchema' type. The 'TabularSchema' object this method returns can be used to create any of the tabular Alpine IO types (HDFSTabular dataset, dataTable etc.)
-a Spark SQL DataFrame schema
the equivalent Alpine schema for that dataset
Convert the Alpine 'TabularSchema' with column names and types to the equivalent Spark SQL data frame header.
Convert the Alpine 'TabularSchema' with column names and types to the equivalent Spark SQL data frame header.
An Alpine 'TabularSchemaOutline' object with fixed column definitions containing a name and Alpine specific type.
Checks if the given file path already exists (and would cause a 'PathAlreadyExists' exception when we try to write to it) and deletes the directory to prevent existing results at that path if they do exist.
Checks if the given file path already exists (and would cause a 'PathAlreadyExists' exception when we try to write to it) and deletes the directory to prevent existing results at that path if they do exist.
- the full HDFS path
For use with hive.
For use with hive. Returns a Spark data frame given a hive table.
Returns a DataFrame from an Alpine HdfsTabularDataset.
Returns a DataFrame from an Alpine HdfsTabularDataset. The DataFrame's schema will correspond to the column header of the Alpine dataset.
Spark SQL DataFrame
Write a DataFrame as an HDFSAvro dataset, and return the an instance of the Alpine HDFSAvroDataset type which contains the 'TabularSchema' definition (created by converting the DataFrame schema) and the path to the to the saved data.
Write a DataFrame as an HDFSAvro dataset, and return the an instance of the Alpine HDFSAvroDataset type which contains the 'TabularSchema' definition (created by converting the DataFrame schema) and the path to the to the saved data.
Write a DataFrame to HDFS as a Parquet file, and return an instance of the HDFSParquet IO base type which contains the Alpine 'TabularSchema' definition (created by converting the DataFrame schema) and the path to the to the saved data.
Write a DataFrame to HDFS as a Parquet file, and return an instance of the HDFSParquet IO base type which contains the Alpine 'TabularSchema' definition (created by converting the DataFrame schema) and the path to the to the saved data.
Write a DataFrame to HDFS as a Tabular Delimited file, and return an instance of the Alpine HDFSParquet type which contains the Alpine 'TabularSchema' definition (created by converting the DataFrame schema) and the path to the to the saved data.
Write a DataFrame to HDFS as a Tabular Delimited file, and return an instance of the Alpine HDFSParquet type which contains the Alpine 'TabularSchema' definition (created by converting the DataFrame schema) and the path to the to the saved data.
Save a data frame to a path using the given storage format, and return a corresponding HdfsTabularDataset object that points to the path.
Save a data frame to a path using the given storage format, and return a corresponding HdfsTabularDataset object that points to the path.
The path to which we'll save the data frame.
The data frame that we want to save.
The format that we want to store in.
Whether to overwrite any existing file at the path.
Mandatory source operator information to be included in the output object.
Mandatory addendum information to be included in the output object.
After saving the data frame, returns an HdfsTabularDataset object.
:: AlpineSdkApi ::