[Solved-2 Solutions] Is there a common place to store data schemas in Hadoop ?



Problem:

Is there a common place to store data schemas in Hadoop ?

Solution 1:

Why apache avro ?

  • Avro is a remote procedure call and data serialization framework developed withinApache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.
  • Apache Spark SQL can access Avroas a data source.

Avro your schema is embedded in data, so we can read it without having to worry about schema issues and it makes schema evolution really easy.

The great thing about Avro is that it is completely integrated in Hadoop and you can use it with a lot of Hadoop sub-projects like Pig and Hive.

For example with Pig you could do:

EMP = LOAD 'myfile.avro' using AvroStorage();

Solution 2:

  • "Apache HCatalog is a table and storage management service for data created using Apache Hadoop.
  • This includes the following:
    • Providing a shared schema and data type mechanism.
    • Providing a table abstraction so that users need not be concerned with where or how their data is stored.
    • Providing interoperability across data processing tools such as Pig, Map Reduce, and Hive."

Related Searches to Is there a common place to store data schemas in Hadoop?