Specifying the UDF output schema
- A UDF has input and output. Here is the different ways you can specify the output format of a Python UDF through use of the outputSchema decorator.
Sample Code:
[pastacode lang=”javascript” manual=”%23%20the%20original%20udf%0A%23%20it%20returns%20a%20single%20chararray%20(that’s%20PigLatin%20for%20String)%0A%40outputSchema(‘word%3Achararray’)%0Adef%20hi_world()%3A%0A%20%20%20%20return%20%22hello%20world%22%0A%20%20%20%20%0A%23%20this%20one%20returns%20a%20Python%20tuple.%20Pig%20recognises%20the%20first%20element%20%0A%23%20of%20the%20tuple%20as%20a%20chararray%20like%20before%2C%20and%20the%20next%20one%20as%20a%20%0A%23%20long%20(a%20kind%20of%20integer)%0A%40outputSchema(%22word%3Achararray%2Cnumber%3Along%22)%0Adef%20hi_everyone()%3A%0A%20%20return%20%22hi%20there%22%2C%2015%0A%0A%23we%20can%20use%20outputSchema%20to%20define%20nested%20schemas%20too%2C%20here%20is%20a%20bag%20of%20tuples%0A%40outputSchema(‘some_bag%3Abag%7Bt%3A(field_1%3Achararray%2C%20field_2%3Aint)%7D’)%0Adef%20bag_udf()%3A%0A%20%20%20%20return%20%5B%0A%20%20%20%20%20%20%20%20(‘hi’%2C1000)%2C%0A%20%20%20%20%20%20%20%20(‘there’%2C2000)%2C%0A%20%20%20%20%20%20%20%20(‘bill’%2C0)%0A%20%20%20%20%5D%0A%0A%23and%20here%20is%20a%20map%0A%40outputSchema(‘something_nice%3Amap%5B%5D’)%0Adef%20my_map_maker()%3A%0A%20%20%20%20return%20%7B%22a%22%3A%22b%22%2C%20%22c%22%3A%22d%22%2C%20%22e%22%2C%22f%22%7D” message=”” highlight=”” provider=”manual”/]OutputSchema can be used to imply that a function outputs one or a combination of basic types. Those types are:
- chararray: like a string
- bytearray: a bunch of bytes in a row. Like a string but not as human friendly
- long: long integer
- int: normal integer
- double: floating point number
- datetime
- boolean
- No schema is specified;then the Pig assumes that the UDF outputs a bytearray.

