pig tutorial - apache pig tutorial - Apache Pig TOKENIZE() Function - pig latin - apache pig - pig hadoop
What is TOKENIZE() function in Apache Pig ?
- The TOKENIZE() function used in Apache Pig is used to split a string in a single tuple and returns a bag which contains the output of the split operation.
- The TOKENIZE() function is used to break an input string into tokens separated by a regular expression pattern.
- The TOKENIZE() function is when the Token elements are placed under the element
- The TOKENIZE() function will returns one token element, which contains the input string.
- The TOKENIZE() function has each substring value which is found between the separator matches is placed inside elements with the name token and the namespace mhub
We have loaded the file into Pig with the relation name wikitechy_student_details which is given below:
Tokenizing a String
We can use the TOKENIZE() function to split into a string.
grunt> student_name_tokenize = foreach wikitechy_student_details Generate TOKENIZE(name);