pig tutorial - apache pig tutorial - Apache Pig BagToString() Function - pig latin - apache pig - pig hadoop



What is BagToString Function in Apache Pig ?

  • The BagToString() function is used to concatenate the elements of a bag into the string in Apache Pig.
  • We can place a delimiter between these values while concatenating the string
  • The BagToString() function creates a single string from the elements of a bag, which is similar to the function SQL's GROUP_CONCAT
  • The BagToString() function can be of the arbitrary size, in which strings in Java cannot: either exhaust available memory or exceed the maximum number of characters
  • Bags used in BagToString() function are disordered and can be ordered by using the ORDER BY operator.

Syntax

grunt> BagToString(vals:bag [, delimiter:chararray])

Example

wikitechy_dateofbirth.txt

22,3,1990
23,11,1989
1,3,1998
2,6,1980
26,9,1989
  • We have loaded this file wikitechy_dateofbirth.txt into Pig with the relation name dob which is given below:
grunt> dob = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_dateofbirth.txt' USING PigStorage(',')
   as (day:int, month:int, year:int);

Converting Bag to String

  • Using the BagToString() function, we can convert the data from bag to string by using the bagtostring function.
  • We need to group the dob relation and hence this group operation will produce a bag which contains all the tuples of the relation.
  • We can group the relation dob by using the Group All operator, and we need to store the result in the relation name group_dob which is given below:
grunt> group_dob = Group dob All;
  • It will produce a relation for group_dob which is given below:
grunt> Dump group_dob; 
(all,{(26,9,1989),(2,6,1980),(1,3,1998),(23,11,1989),(22,3,1990)})
  • It will produce a relation for group_dob which is given below:
  • We are going to observe a bag which is having all the date-of-births as tuples
  • Now, we are going to convert the bag to string by using the BagToString() function.
grunt> dob_string = foreach group_dob Generate BagToString(dob);

Verification

grunt> Dump dob_string;

Output:

(26_9_1989_2_6_1980_1_3_1998_23_11_1989_22_3_1990)


Related Searches to Apache Pig BagToString() Function