Override this method to define an output schema instead of using automatic inference.
Override this method to define an output schema instead of using automatic inference.
Override this method to define the output schema by assigning fixed column definitions.
Override this method to define the output schema by assigning fixed column definitions. If you want to have a variable number of output columns, simply override the defineEntireOutputSchema method The default implementation of this method returns the same columns as the input data.
- the Alpine 'TabularSchema' for the input DataFrame
The parameters of the operator, including values set by the user.
A list of Column definitions used to create the output schema
Calls 'updateOutputSchema' when the parameters are changed
Calls 'updateOutputSchema' when the parameters are changed
If the connected inputs contain tabular schemas, this is where they can be accessed, each with unique Ids.
The current parameter values to the operator.
This should be used to change the input/output schema, etc.
A status object about whether the inputs and/or parameters are valid. The default implementation assumes that the connected inputs and/or parameters are valid.
This is invoked for GUI to customize the operator output visualization after the operator finishes running.
This is invoked for GUI to customize the operator output visualization after the operator finishes running. Each output should have associated default visualization, but the developer can customize it here.
The parameter values to the operator.
This is the output from running the operator.
For creating visual models.
The visual model to be sent to the GUI for visualization.
Defines the params the user will be able to select.
Defines the params the user will be able to select. The default asks for desired output format & output location.
The operator dialog where the operator could add input text boxes, etc. to define UI for parameter inputs.
Before executing the runtime of the operator the developer should determine the underlying platform that the runtime will execute against. E.g., it is possible for an operator to have accesses to two different Hadoop clusters or multiple databases. A runtime can run on only one platform. A default platform will be used if nothing is done.
This can be used to provide information about the nature of the output/input schemas. E.g., provide the output schema.
Control the GUI of your Spark job, through this you can specify any visualization for the output of your job, and what params the user will need to specify. Uses the provided operator to generate an updated schema, this should work for most operators but if not (e.g. your operator doesn't handle empty data or output schema depends on input data) then you will have to perform your own schema update.