Adds integer suffices to column names, to makes sure none are duplicated in the list.
Adds integer suffices to column names, to makes sure none are duplicated in the list. The returned Seq of will be same length and have column types in the same order as the original, but column names may be changed.
Seq of column definitions, which may contain duplicate names.
New Seq of column definitions with non-duplicate names.
Cleans up a string to make it suitable as a column name.
Cleans up a string to make it suitable as a column name. These names are used in Pig / SQL scripts, so they need to satisfy some constraints.
Cleaning includes: Strips accents; replacing non-alphanumeric characters with "_"; prefixing with an "_" if the string starts with a digit; truncating characters beyond the maximum length (optional).
String to be sanitized.
Optional, length to truncate the string at.
Processed string suitable for use as a column name.
This class is a utility for defining features, in particular the output features of models.