pig tutorial - apache pig tutorial - Apache Pig SUBSTRING() - pig latin - apache pig - pig hadoop



What is substring in Apache Pig ?

  • A substring of a string is a string that occurs "in". For example, "the best of" is a substring of "It was the best of times".
  • This is not to be confused with subsequence, which is a generalization of substring. For example, "Itwastimes" is a subsequence of "It was the best of times", but not a substring.
  • This function returns a substring from the given string.

Syntax:

  • Given below is the syntax of the SUBSTRING() function.
  • This function accepts three parameters one is the column name of the string we want.
  • And the other two are the start and stop indexes of the required substring.
grunt> SUBSTRING(string, startIndex, stopIndex) 

Example:

  • Assume that there is a file named wikitechy_emp.txt in the HDFS directory /pig_data/as shown below. This file contains the employee details such as id, name age and city.

wikitechy_emp.txt

001,Aadav,32,Tokyo
002,Aadhi,33,Kolkata
003,Charu,23, London
  • And, we have loaded this file into Pig with a relation named wikitechy_emp_data as shown below.
grunt> wikitechy_emp_data = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_emp.txt' USING PigStorage(',')as (id:int, name:chararray, age:int, city:chararray);
  • Following is an example of the SUBSTRING() function. This example fetches the sub strings that starts with 0th letter and ends with 2nd letter from the employee names.
grunt> substring_data = FOREACH wikitechy_emp_data GENERATE (id,name), SUBSTRING (name, 0, 2);
  • The above statement fetches the required substrings from the names of the employees. The result of the statement will be stored in the relation named substring_data.
  • Verify the content of the relation substring_data, using the Dump operator as shown below.
grunt> Dump substring_data;

((1,Aadav),Aad)
((2,Aadhi),Aad)
((3,Charu),Cha) 

Related Searches to Apache Pig SUBSTRING()