pig tutorial - apache pig tutorial - Apache Pig STARTSWITH() - pig latin - apache pig - pig hadoop



What is STARTSWITH() method in Apache Pig ?

  • The startsWith() method determines whether a string begins with the characters of a specified string.
  • This method returns true if the string begins with the characters, and false if not. Note: The startsWith() method is case sensitive.
  • This function accepts two string parameters. It verifies whether the first string starts with the second.

Syntax:

  • Given below is the syntax of the STARTSWITH() function.
grunt> STARTSWITH(string, substring)

Example:

  • Assume that there is a file named wikitechy_emp.txt in the HDFS directory /pig_data/as shown below. This file contains the employee details such as id, name, age, and city.

wikitechy_emp.txt

001,Aadav,32,Tokyo
002,Aadhi,33,Kolkata
003,Charu,23, London
004,Daya,35,London 
005, Hansa,22,Bhuwaneshwar 
006, Hena,21,Chennai
007,Robert,24, Bhuwaneshwar
008,Kali,20,Kolkata
009,Leena,22, Chennai
010,Mahi,22, newyork
011,Priya,23, Tokyo
012,Rahul,20, newyork
  • And, we have loaded this file into Pig with a relation named wikitechy_emp_data as shown below.
grunt > wikitechy_emp_data = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_emp.txt' USING PigStorage(',')
   as (id:int, name:chararray, age:int, city:chararray);

Example:

  • Following is an example of the STARTSWITH() function. In this example, we have verified whether the names of all the employees start with the substring “Aa”.
grunt> startswith_data = FOREACH wikitechy_emp_data GENERATE (id,name), STARTSWITH (name,’Aa’);
  • The above statement parses the names of all the employees if any of these names starts with the substring ‘Aa’.
  • Since the names of the employees ‘Aadav’ and ‘Aadhi’ starts with the substring ‘Aa’ for these two tuples the STARTSWITH() function returns the Boolean value ‘true’ and for remaining tuples the value will be ‘false’.
  • The result of the statement will be stored in the relation named startswith_data. Verify the content of the relation startswith_data, using the Dump operator as shown below.
grunt> Dump startswith_data;
  
((1,Aadav),true)     
((2,Aadhi),true)
((3,Charu),false)
((4,Daya),false)
((5,Hansa),false)
((6,Hena),false)
((7,Robert),false)
((8,Kali),false)
((9,Leena),false)
((10,Mahi),false)
((11,Priya),false)
((12,Rahul),false)

Related Searches to Apache Pig STARTSWITH()