What is the purpose of @outputSchema decorator in Python UDF when using in Apache Pig ?

Answer:A UDF has input and output. Here is the different ways you can specify the output format of a Python UDF through use of the outputSchema decorator.

Specifying the UDF output schema

  • A UDF has input and output. Here is the different ways you can specify the output format of a Python UDF through use of the outputSchema decorator.

Sample Code:

[pastacode lang=”javascript” manual=”%23%20the%20original%20udf%0A%23%20it%20returns%20a%20single%20chararray%20(that’s%20PigLatin%20for%20String)%0A%40outputSchema(‘word%3Achararray’)%0Adef%20hi_world()%3A%0A%20%20%20%20return%20%22hello%20world%22%0A%20%20%20%20%0A%23%20this%20one%20returns%20a%20Python%20tuple.%20Pig%20recognises%20the%20first%20element%20%0A%23%20of%20the%20tuple%20as%20a%20chararray%20like%20before%2C%20and%20the%20next%20one%20as%20a%20%0A%23%20long%20(a%20kind%20of%20integer)%0A%40outputSchema(%22word%3Achararray%2Cnumber%3Along%22)%0Adef%20hi_everyone()%3A%0A%20%20return%20%22hi%20there%22%2C%2015%0A%0A%23we%20can%20use%20outputSchema%20to%20define%20nested%20schemas%20too%2C%20here%20is%20a%20bag%20of%20tuples%0A%40outputSchema(‘some_bag%3Abag%7Bt%3A(field_1%3Achararray%2C%20field_2%3Aint)%7D’)%0Adef%20bag_udf()%3A%0A%20%20%20%20return%20%5B%0A%20%20%20%20%20%20%20%20(‘hi’%2C1000)%2C%0A%20%20%20%20%20%20%20%20(‘there’%2C2000)%2C%0A%20%20%20%20%20%20%20%20(‘bill’%2C0)%0A%20%20%20%20%5D%0A%0A%23and%20here%20is%20a%20map%0A%40outputSchema(‘something_nice%3Amap%5B%5D’)%0Adef%20my_map_maker()%3A%0A%20%20%20%20return%20%7B%22a%22%3A%22b%22%2C%20%22c%22%3A%22d%22%2C%20%22e%22%2C%22f%22%7D” message=”” highlight=”” provider=”manual”/]

OutputSchema can be used to imply that a function outputs one or a combination of basic types. Those types are:

  • chararray: like a string
  • bytearray: a bunch of bytes in a row. Like a string but not as human friendly
  • long: long integer
  • int: normal integer
  • double: floating point number
  • datetime
  • boolean
  • No schema is specified;then the Pig assumes that the UDF outputs a bytearray.
Apache Pig UDF
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

What is a skewed join in Pig ?

Answer:Joining skewed data using apache Pig skewed join.In a distributed processing environment Data skew is a serious problem,and occurs when the data is not evenly divided among the key tuples from the map phase.
View Answer