pig tutorial - apache pig tutorial - Apache Pig Split Operator - pig latin - apache pig - pig hadoop




What is Split Operator Apache Pig ?

  • The SPLIT operator is used to split a relation into two or more relations.
  • The Split operator can be an operator within the reachability graph of a consistent region.
  • The Split operator is configurable with a single input port. The input port is non-mutating and its punctuation mode is Oblivious Output Ports.
  • The Split operator is configurable with one or more output ports.
  • SPLIT instruction:
    • Splits a relation into multiple relations based on conditions
    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig split operation

    learn apache pig - apache pig tutorial - pig tutorial - apache pig examples - big data - apache pig script - apache pig program - apache pig download - apache pig example  -apache pig split operation
  • Splitting Data into Training and Testing Dataset
  • SPLIT
    • SPLIT users into kids if age < 18, adults if age >= 18 and age <65, seniors otherwise;
    • SPLIT data into testing if RANDOM() <= 0.10, training otherwise;<
    • SPLIT operator cannot handle non deterministic functions (such as RANDOM).
  • Thus the above command won’t work and will raise an error:
  •  
    DEFINE split_into_training_testing(inputData, split_percentage)
    RETURNS training, testing{
    data = foreach $inputData generate RANDOM() as random_assignment, *;
    SPLIT data into testing_data if random_assignment <= $split_percentage, training_data otherwise;
    $training = foreach training_data generate $1..;
    $testing = foreach testing_data generate $1..;
    };
    inData = load ''some_files.txt‘ USING PigStorage(‘\t’);
    training, testing = split_into_training_testing(inData, 0.1);
    
        Syntax for Macro definition:-
        
    DEFINE macro_name (param [, param ...]) RETURNS {void | alias [, alias ...]} { pig_latin_fragment };
        
        
    Syntax for Macro expansion:-
        
    alias [, alias ...] = macro_name (param [, param ...]) ;
    
    

    Syntax

    grunt> SPLIT Relation1_name INTO Relation2_name IF (condition1), Relation2_name (condition2),
    

    Example

    Ensure that we have a file named wikitechy_employee_details.txt in the HDFS directory /pig_data/ as given below. wikitechy_employee_details.txt

    111,Anu,Shankar,23,9876543210,Chennai
    112,Barvathi,Nambiayar,24,9876543211,Chennai
    113,Kajal,Nayak,24,9876543212,Trivendram
    114,Preethi,Antony,21,9876543213,Pune
    115,Raj,Gopal,21,9876543214,Hyderabad
    116,Yashika,Kannan,22,9876543215,Delhi
    117,siddu,Narayanan,22,9876543216,Kolkata
    118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar
    
    • And we have loaded this file into Pig with the relation name wikitechy_employee_details as given below.
    Wikitechy_employee_details = LOAD 'hdfs://localhost:9000/pig_data/wikitechy_employee_details.txt' USING PigStorage(',')
       as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray); 
    
    • Now split the relation into two, one listing the employees of age less than 23, and the other listing the employees having the age between 22 and 25.
    SPLIT wikitechy_employee_details into wikitechy_employee _details1 if age<23, wikitechy_employee_details2 if (22<age and age>25);
    

    Verification

    Now verify the relations wikitechy_employee_details1 and wikitechy_employee_details2using the DUMP operator as shown below.

    grunt> Dump wikitechy_employee_details1;  
    
    grunt> Dump wikitechy_employee _details2; 
    

    Output

    • The following output, display the contents of the relations wikitechy_employee_details1 and wikitechy_employee _details2 respectively.
    grunt> Dump wikitechy_employee_details1;
    114,Preethi,Antony,21,9876543213,Pune
    115,Raj,Gopal,21,9876543214,Hyderabad
    116,Yashika,Kannan,22,9876543215,Delhi
    117,siddu,Narayanan,22,9876543216,Kolkata
      
    grunt> Dump wikitechy_employee_details2; 
    111,Anu,Shankar,23,9876543210,Chennai
    112,Barvathi,Nambiayar,24,9876543211,Chennai
    113,Kajal,Nayak,24,9876543212,Trivendram
    118,Timple,Mohanthy,23,9876543217,Bhuwaneshwar
    

    Related Searches to Apache Pig Split Operator

    Adblocker detected! Please consider reading this notice.

    We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

    We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

    We need money to operate the site, and almost all of it comes from our online advertising.

    Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.

    ×