Given a dataFrame, the parameters and an instance of sparkRuntimeUtils, filters out all the rows containing null values.
Given a dataFrame, the parameters and an instance of sparkRuntimeUtils, filters out all the rows containing null values. Writes those rows to a file according to the values of the 'dataToWriteParam' and the 'badDataPathParam' (provided in the HdfsParameterUtils class). The method returns the data frame which does not contain nulls as well as a string containing an HTML formatted table with the information about the what data was removed and if/where it was stored. The message is generated using the 'AddendumWriter' object in the Plugin Core module.
Same as 'filterBadDataAndReport' but rather than using the .
Same as 'filterBadDataAndReport' but rather than using the .anyNull function in the dataFrame class allows the user to define a function which returns a boolean for each row is it contains bad data.
Helper function which uses the AddendumWriter object to generate a message about the bad data and get the data, if any, to write to the bad data file.
Rather than filtering a DataFrame, use this method if you already have the bad data as a data frame.
Split a DataFrame according to the value of the rowIsBad parameter.
Rather than filtering the data, just provide an RDD of Strings that contain the bad data and write the data and report according to the values of the other parameters.