pig tutorial - apache pig tutorial - Apache Pig - SIZE() Function - pig latin - apache pig - pig hadoop



What is SIZE() Function in Apache Pig ?

    • The SIZE() function used in Apache Pig() is used to compute the number of elements based on any Pig data type.
    • The SIZE() function includes all the NULL values in the size computation
    • The SIZE() function are shape descriptors, in a geometrical and topological sense
    • The SIZE() function are the functions from the half-plane x < y {\displaystyle x
    • The SIZE() Function is counting certain connected components of a topological space and they are used in techniques like pattern recognition and topology.

    Syntax

      grunt> SIZE(expression)
      
      • The table which is given below gives the return values which vary according to the data types and their values in Apache Pig.
      Data type Value
      int, long, float, doubleFor all these types, the size function returns 1.
      Char arrayFor a char array, the size() function returns the number of characters in the array.
      Byte arrayFor a bytearray, the size() function returns the number of bytes in the array.
      TupleFor a tuple, the size() function returns number of fields in the tuple.
      BagFor a bag, the size() function returns number of tuples in the bag.
      MapFor a map, the size() function returns the number of key/value pairs in the map.

      Example

        <b>wikitechy_employee.txt</b>
        1,Joseph,2007-01-24,250
        2,John,2007-05-27,220  
        3,Patrick,2007-05-06,170  
        3,Patrick,2007-04-06,100  
        4,Mill,2007-04-06,220  
        5,Sarah,2007-06-06,300  
        5,Sarah,2007-02-06,350 
        

        We have loaded this file into Pig with the relation name called employee_data as given below.

        grunt> employee_data = LOAD 'hdfs://localhost:9000/pig_data/ wikitechy_employee.txt' USING PigStorage(',')
           as (id:int, name:chararray, workdate:chararray, daily_typing_pages:int);
        

        Calculating the Size of the Type

          Now, we need to calculate the size of the name type which is given below:

          grunt> size = FOREACH employee_data GENERATE SIZE(name);
          

          Verification.

            grunt> Dump size;
            

            Output

              (4) 
              (3) 
              (4) 
              (4) 
              (4) 
              (4) 
              (4) 
              

              Related Searches to Apache Pig Size() Function