pig tutorial - apache pig tutorial - Apache Pig STRSPLITTOBAG() - pig latin - apache pig - pig hadoop



What is STRSPLIT()?

  • This function is similar to the STRSPLIT() function. It splits the string by a given delimiter and returns the result in a bag.

Syntax:

  • · The syntax of STRSPLITTOBAG() is given below.
  • · This function accepts a string that is needed to be split, a regular expression, and an integer value specifying the limit (the number of substrings the string should be split).
  • · This function parses the string and when it encounters the given regular expression, it splits the sting into n number of substrings where n will be the value passed to limit.
                   grunt> STRSPLITTOBAG(string, regex, limit)

Example:

  • Assume that there is a file named wikitechy_emp.txt in the HDFS directory /pig_data/as shown below. This file contains the employee details such as id, name, age, and city.

wikitechy_emp.txt

001,Aadav,32,Tokyo
002,Aadhi,33,Kolkata
003,Charu,23, London
004,Daya,35,London 
005, Hansa,22,Bhuwaneshwar 
006, Hena,21,Chennai
007,Robert,24, Bhuwaneshwar
008,Kali,20,Kolkata
009,Leena,22, Chennai
010,Mahi,22, newyork
011,Priya,23, Tokyo
012,Rahul,20, newyork
  • And, we have loaded this file into Pig with a relation named wikitechy_emp_data as shown below.
grunt> wikitechy_emp_data = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_emp.txt' USING PigStorage(',')
   as (id:int, name:chararray, age:int, city:chararray);
  • Following is an example of the STRSPLITTOBAG() function. If you observe the wikitechy_emp.txt file, you can find that, in the name column, we have name and surname of the employees separated by the delemeter “_”.
  • In this example, we are trying to split the name and surname of the employee, and get the result in a bag using STRSPLITTOBAG() function.
grunt> strsplittobag_data = FOREACH wikitechy_emp_data GENERATE (id,name), STRSPLITTOBAG (name,'_',2);
  • The result of the statement will be stored in the relation named strsplittobag_data. Verify the content of the relation strsplittobag_data,using the Dump operator as shown below.
((1,Aadav_Ajay),{(Aadav),(Ajay)})
((2,Aadhi_Amla),{(Aadhi),)Amla)})
((3,Charu_Anu),{(Charu,Anu)})
((4,Daya_Avika),{(Daya,Avika)})
((5,Hansa_Dhanu),{(Hansa,Dhanu)})
((6,Hena_Ela),{(Hena,Ela)})
((7,Robert_Garv),{(Robert,Garv)})
((8,Kali_Heena),{(Kali,Heena)})
((9,Leena_Hiya),{(Leena,Hiya)})
((10,Mahi_Jay),{(Mahi,Jay)})
((11,Priya_Jaya),{(Priya,Jaya)})
((12,Rahul_Kanak),{(Rahul,Kanak)})

Related Searches to Apache Pig STRSPLITTOBAG()