pig tutorial - apache pig tutorial - Apache Pig STRSPLITTOBAG() - pig latin - apache pig - pig hadoop




What is STRSPLIT()?

  • This function is similar to the STRSPLIT() function. It splits the string by a given delimiter and returns the result in a bag.

Syntax:

  • · The syntax of STRSPLITTOBAG() is given below.
  • · This function accepts a string that is needed to be split, a regular expression, and an integer value specifying the limit (the number of substrings the string should be split).
  • · This function parses the string and when it encounters the given regular expression, it splits the sting into n number of substrings where n will be the value passed to limit.
                   grunt> STRSPLITTOBAG(string, regex, limit)

Example:

  • Assume that there is a file named wikitechy_emp.txt in the HDFS directory /pig_data/as shown below. This file contains the employee details such as id, name, age, and city.

wikitechy_emp.txt

001,Aadav,32,Tokyo
002,Aadhi,33,Kolkata
003,Charu,23, London
004,Daya,35,London 
005, Hansa,22,Bhuwaneshwar 
006, Hena,21,Chennai
007,Robert,24, Bhuwaneshwar
008,Kali,20,Kolkata
009,Leena,22, Chennai
010,Mahi,22, newyork
011,Priya,23, Tokyo
012,Rahul,20, newyork
  • And, we have loaded this file into Pig with a relation named wikitechy_emp_data as shown below.
grunt> wikitechy_emp_data = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_emp.txt' USING PigStorage(',')
   as (id:int, name:chararray, age:int, city:chararray);
  • Following is an example of the STRSPLITTOBAG() function. If you observe the wikitechy_emp.txt file, you can find that, in the name column, we have name and surname of the employees separated by the delemeter “_”.
  • In this example, we are trying to split the name and surname of the employee, and get the result in a bag using STRSPLITTOBAG() function.
grunt> strsplittobag_data = FOREACH wikitechy_emp_data GENERATE (id,name), STRSPLITTOBAG (name,'_',2);
  • The result of the statement will be stored in the relation named strsplittobag_data. Verify the content of the relation strsplittobag_data,using the Dump operator as shown below.
((1,Aadav_Ajay),{(Aadav),(Ajay)})
((2,Aadhi_Amla),{(Aadhi),)Amla)})
((3,Charu_Anu),{(Charu,Anu)})
((4,Daya_Avika),{(Daya,Avika)})
((5,Hansa_Dhanu),{(Hansa,Dhanu)})
((6,Hena_Ela),{(Hena,Ela)})
((7,Robert_Garv),{(Robert,Garv)})
((8,Kali_Heena),{(Kali,Heena)})
((9,Leena_Hiya),{(Leena,Hiya)})
((10,Mahi_Jay),{(Mahi,Jay)})
((11,Priya_Jaya),{(Priya,Jaya)})
((12,Rahul_Kanak),{(Rahul,Kanak)})

Related Searches to Apache Pig STRSPLITTOBAG()

Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.

×