pig tutorial - apache pig tutorial - Apache Pig - STRSPLIT() - pig latin - apache pig - pig hadoop



What is STRSPLIT() ?

  • STRSPLIT() function is used to split a given string by a given delimiter.

Syntax:

  • The syntax of STRSPLIT() is given below. This function accepts a string that is needed to be split, a regular expression, and an integer value specifying the limit (the number of substrings the string should be split).
  • This function parses the string and when it encounters the given regular expression, it splits the string into n number of substrings where n will be the value passed to limit.
grunt> STRSPLIT(string, regex, limit)

Example:

Assume that there is a file named wikitechy_emp.txt in the HDFS directory /pig_data/as shown below. This file contains the employee details such as id, name, age, and city.

wikitechy_emp.txt

001,Aadav,32,Tokyo
002,Aadhi,33,Kolkata
003,Charu,23, London
004,Daya,35,London 
005, Hansa,22,Bhuwaneshwar 
006, Hena,21,Chennai
007,Robert,24, Bhuwaneshwar
008,Kali,20,Kolkata
009,Leena,22, Chennai
010,Mahi,22, newyork
011,Priya,23, Tokyo
012,Rahul,20, newyork

And, we have loaded this file into Pig with a relation named wikitechy_emp_data as shown below.

grunt> wikitechy_emp_data = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_emp.txt' USING PigStorage(',')
   as (id:int, name:chararray, age:int, city:chararray);

Following is an example of the STRSPLIT() function. If you observe the wikitechy_emp.txt file, you can find that, in the name column, we have the names and surnames of the employees separated by the delemeter '_'. In this example, we are trying to split the name and surname of the employees using STRSPLIT() function.

grunt> strsplit_data = FOREACH wikitechy_emp_data GENERATE (id,name), STRSPLIT (name,'_',2);

The result of the statement will be stored in the relation named strsplit_data. Verify the content of the relation strsplit_data, using the Dump operator as shown below.

grunt> Dump strsplit_data;
  
((1,Aadav_Ajay),(Aadav,Ajay))   
((2,Aadhi_Amla),(Aadhi,Amla))
((3,Charu_Anu),(Charu,Anu))
((4,Daya_Avika),(Daya,Avika))
((5,Hansa_Dhanu),(Hansa,Dhanu))
((6,Hena_Ela),(Hena,Ela))
((7,Robert_Garv),(Robert,Garv))
((8,Kali_Heena),(Kali,Heena))
((9,Leena_Hiya),(Leena,Hiya))
((10,Mahi_Jay),(Mahi,Jay))
((11,Priya_Jaya),(Priya,Jaya))
((12,Rahul_Kanak),(Rahul,Kanak))

Related Searches to Apache Pig - STRSPLIT()