Specifying the UDF output schema

  • A UDF has input and output. Here is the different ways you can specify the output format of a Python UDF through use of the outputSchema decorator.

Sample Code:

# the original udf
# it returns a single chararray (that's PigLatin for String)
@outputSchema('word:chararray')
def hi_world():
return "hello world"

# this one returns a Python tuple. Pig recognises the first element
# of the tuple as a chararray like before, and the next one as a
# long (a kind of integer)
@outputSchema("word:chararray,number:long")
def hi_everyone():
return "hi there", 15

#we can use outputSchema to define nested schemas too, here is a bag of tuples
@outputSchema('some_bag:bag{t:(field_1:chararray, field_2:int)}')
def bag_udf():
return [
('hi',1000),
('there',2000),
('bill',0)
]

#and here is a map
@outputSchema('something_nice:map[]')
def my_map_maker():
return {"a":"b", "c":"d", "e","f"}

OutputSchema can be used to imply that a function outputs one or a combination of basic types. Those types are:

  • chararray: like a string
  • bytearray: a bunch of bytes in a row. Like a string but not as human friendly
  • long: long integer
  • int: normal integer
  • double: floating point number
  • datetime
  • boolean
  • No schema is specified;then the Pig assumes that the UDF outputs a bytearray.
Apache Pig UDF

Categorized in:

Apache Pig

Tagged in:

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Share Article:

Leave a Reply

Ads Blocker Image Powered by Code Help Pro

Ads Blocker Detected!!!

We have detected that you are using extensions to block ads. Please support us by disabling these ads blocker.

Powered By
100% Free SEO Tools - Tool Kits PRO