[Solved-1 Solution] Applying TRIM() in Pig for all fields in a tuple ?



TRIM()

  • The TRIM() function in pig accepts a string and returns its copy after removing the unwanted spaces before and after it.

Syntax

  • the syntax of the TRIM() function is below
grunt> TRIM(expression)

Example

  • Assume we have some unwanted spaces before and after the names of the employees in the records of the pers_data relation.
grunt> Dump pers_data; 
 
(1, Robin ,22,newyork)
(2,BOB,23,Kolkata) 
(3, Maya ,23,Tokyo)
(4,Sara,25,London)
(5, David ,23,Bhuwaneshwar) 
(6,maggy,22,Chennai)
(7,Robert,22,newyork) 
(8, Syam ,23,Kolkata)
  • By using the TRIM() function, we can remove these heading and tailing spaces from the names, as shown below.
grunt> trim_data = FOREACH emp_data GENERATE (id,name), TRIM(name);
  • The above statement returns the copy of the names by removing the heading and tailing spaces from the names of the persons. The result is stored in the relation named trim_data.

Problem:

If you are loading a CSV file with 56 fields. You need to apply TRIM() function in Pig for all fields in the tuple.

B = FOREACH A GENERATE TRIM(*);

But it fails with below error

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function
for org.apache.pig.builtin.TRIM as multiple or none of them fit. Please use an explicit cast.

Solution 1:

  • To Trim a tuple in the Pig, we should create a UDF. Register the UDF and apply the UDF with Foreach statement to the field of the tuple that wants to trim.

Below is the code for trimming the tuple with UDF.

public class StrTrim extends EvalFunc<String> {
    public String exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return null;
        try {
            String str = (String)input.get(0);
            return str.trim();
        }
        catch(Exception e) {
            throw WrappedIOException.wrap("Caught exception processing input row ", e);
        }

Related Searches to Applying TRIM() in Pig for all fields in a tuple