[Solved-2 Solutions] In pig, Check if an element is present in a bag ?



Problem:

How to check in piglatin, if a bag contains an element ?

Solution 1:

Use foreach

  • The FOREACH operator is used to generate specified data transformations based on the column data.

Syntax

  • The syntax of FOREACH operator.
grunt> Relation_name2 = FOREACH Relatin_name1 GENERATE (required data);
  • In Apache Pig we can use statements nested in FOREACH . Here is example: A is a bag in B.
X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE COUNT (S.$0);
}
  • Instead of COUNT we can also use IsEmpty and ?: operator
X = FOREACH B {
        S = FILTER A BY 'xyz';
        GENERATE (IsEmpty(S.$0)) ? 'xyz NOT PRESENT' : 'xyz PRESENT') as present, B;
}

Solution 2:

This is one of the way to do it without any custom udf code is :

  • Assume A has schema my_bag:{(f1, f2, f3)};
B = FOREACH A {
X = FILTER my_bag BY f1 == 'my_element';
--- Now, count(X) will tell you if my_element is present in my_bag.
--- Example use below.
GENERATE my_bag, COUNT(X) as my_flag;
};

C = FILTER B by my_flag > 0;

Related Searches to In pig, Check if an element is present in a bag?