[Solved-1 Solution] Understanding map syntax in pig



What is TOMAP() ?

  • The TOMAP() function of Pig Latin is used to convert the key-value pairs into a Map.

Syntax

grunt> TOMAP(key-expression, value-expression [, key-expression, valueexpression ...])

Problem :

We created a file containing the following text:

[open#apache]
[apache#hadoop]

We are able to load that file without errors:

a = load 'data/file_name.txt' as (M:map [])

Now, how to take the list of all the "values" ? I.e.

(apache)
(hadoop) 

Solution 1:

  • There is only one way to interact with a map, and that is to use the # operator. In order for it to have more functionality define some UDFs. Therefore the only way a map can really be used in pure pig is like:
B = FOREACH A GENERATE M#'open' ;

Which produces this as output:

(apache)
()
  • Note that the value after the # is a quoted string, it cannot change and must be set before we run the job.
  • Also, notice that is creates a NULL for the second line, because that map does not contain a key with the vaule 'open'.
  • This is slightly different then using FILTER on a schema of two chararrays key and value:
B = FILTER A BY key=='open' ;

Which produces the output :

(open,apache)

If only the value is desired, then it can be done simply by:

B = FOREACH (FILTER A BY key=='open') GENERATE value ;

Which produces:

(apache)

If keeping the NULLs is important, they can also be generated by using a bincond

B = FOREACH A GENERATE (key=='open'?value:NULL) ;

Which produces the same output as M#'open'


Related Searches to Understanding map syntax in pig