[Solved-1 Solution] Remove single quotes from data using Pig ?



Problem:

How to remove single quotes from data using Pig ?

Solution 1:

To remove '(' ')' and '(single quote) characters from your input.

  • Just Replace () with double backslash \\(\\).
  • '(single quote) is special character in Pig(default string literal), so this also required double backslash to remove the special meaning but double backslash doesn't convince pig parser(you will get error for double backslash) that is the reason i used three backslash for single quote \\\' to remove the special meaning.
  • [] is character class, this will match only one out of several characters. Simply place the characters inside the square bracket that you want to match ie. in our case its [()'].
  • + symbol is for matching one or more characters.

Here is the input

(10, 'ACCOUNTING', 'NEW YORK')
(20, 'RESEARCH', 'DALLAS')
(30, 'SALES', 'CHICAGO')
(40, 'OPERATIONS', 'BOSTON')

PigScript1:

A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE REPLACE(line,'[\\\'\\(\\)]+','');
STORE B INTO 'output';

Pigscript2:

A = LOAD 'input' USING PigStorage(',') AS (col1:chararray,col2:chararray,col3:chararray);
B = FOREACH A GENERATE REPLACE(col1,'[\\(]+',''),REPLACE(col2,'[\\\']',''),REPLACE(col3,'[\\)\\\']+','');
STORE B into 'output1' USING PigStorage(',');

Output :

10, ACCOUNTING, NEW YORK
20, RESEARCH, DALLAS
30, SALES, CHICAGO
40, OPERATIONS, BOSTON

Related Searches to Remove single quotes from data using Pig