pig tutorial - apache pig tutorial - Apache Pig INDEXOF() - pig latin - apache pig - pig hadoop



What is INDEXOF() in Apache Pig ?

  • The indexOf() method returns the position of the first occurrence of a specified value in a string.
  • This method returns -1 if the value to search for never occurs. Note: The indexOf() method is case sensitive. Tip: Also look at the lastIndexOf() method.
  • The INDEXOF() function accepts a string value, a character and an index (integer). It returns the first occurrence of the given character in the string, searching forward from the given index.

Syntax:

  • Given below is the syntax of the INDEXOF() function.
grunt> INDEXOF(string, 'character', startIndex)

Example:

  • Assume that there is a file named wikitechy_emp.txt in the HDFS directory /pig_data/as shown below. This file contains the employee details such as id, name, age, and city.

wikitechy_emp.txt

001,Aadav,32,Tokyo
002,Aadhi,33,Kolkata
003,Charu,23, London
004,Daya,35,London 
005, Hansa,22,Bhuwaneshwar 
006, Hena,21,Chennai
007,Robert,24, Bhuwaneshwar
008,Kali,20,Kolkata
009,Leena,22, Chennai
010,Mahi,22, newyork
011,Priya,23, Tokyo
012,Rahul,20, newyork
  • And, we have loaded this file into Pig with a relation named wikitechy_emp_data as shown below.
grunt> wikitechy_emp_data = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_emp.txt' USING PigStorage(',')
   as (id:int, name:chararray, age:int, city:chararray);
  • Given below is an example of the INDEXOF() function. In this example, we are finding the occurrence of the letter 'r' in the names of every employee using this function.
   grunt> indexof_data = FOREACH wikitechy_emp_data GENERATE (id,name), INDEXOF(name, 'r',0);
  • The above statement parses the name of each employee and returns the index value at which the letter ‘r’ occurred for the first time. If the name doesn’t contain the letter ‘r’ it returns the value -1
  • The result of the statement will be stored in the relation named indexof_data. Verify the content of the relation indexof_data, using the Dump operator as shown below.
grunt> Dump indexof_data;
((1,Aadav),-1)     
((2,Aadhi),-1)
((3,Charu),3)
((4,Daya),-1)
((5,Hansa),-1)
((6,Hena),-1)
((7,Robert),4)
((8,Kali),-1)
((9,Leena),-1)
((10,Mahi),-1)
((11,Priya),1)
((12,Rahul),-1) 

Related Searches to Apache Pig INDEXOF()