[Solved-2 Solutions] How to use Cassandra's Map Reduce with or w/o Pig ?

Problem:

How to use Cassandra's Map Reduce with or w/o Pig ?

Solution 1:

The way that a developer writes a MapReduce program that uses Cassandra as the data source is as follows.

What is Custom Input Format

The data to be processed on top of Hadoop is usually stored on Distributed File System. e.g. HDFS (Hadoop Distributed File System).
To read the data to be processed, Hadoop comes up with InputFormat, which has following responsibilities: Compute the input splits of data. Provide a logic to read the inputsplit.
We write a regular MapReduce program and the jars that are now available provide a CustomInputFormat that allows the input source to be Cassandra (which is Hadoop).

If we are using Pycassa we did say we are out of luck until either

(1) The maintainer of that project adds support for MapReduce
(2) We throw some Python functions together that write up a Java
Mapreduce program and run it.

Solution 2:

Cassandra provides an implementation of InputFormat. Incase you are new to Hadoop the InputFormat is what the mapper is going to use to load your data into it (basically).
Their subclass connects your mapper to pull the data in from Cassandra. What is also great here is that the Cassandra folks have also spent the time implementing the integration in the classic “Word Count” example.
Cassandra rows or row fragments (that is, pairs of key + SortedMap of columns) are input to Map tasks for processing by your job, as specified by a SlicePredicate that describes which columns to fetch from each row. Here’s how this looks in the word_count example, which selects just one configurable columnName from each row:

  ConfigHelper.setColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY);

SlicePredicate predicate = new SlicePredicate().setColumn_names(Arrays.asList(columnName.getBytes()));

ConfigHelper.setSlicePredicate(job.getConfiguration(), predicate);

Cassandra also provides a Pig LoadFunc for running jobs in Pig DSL instead of writing Java code by hand.

Apache Pig Basics

Apache Pig - Filtering

Apache Pig - Operators

Apache Pig - Functions

Eval Functions

Bag-Tuple Functions

DateTime Function

User Defined Function

Load-store Function

Math-function

Apache Pig- Regex

Apache Pig - Running Scripts

Apache pig - Execution

Apache Pig - How to

[Solved-2 Solutions] How to use Cassandra's Map Reduce with or w/o Pig ?

Problem:

Solution 1:

What is Custom Input Format

Solution 2:

Related Searches to How to use Cassandra's Map Reduce with or w/o Pig?

Wikitechy

Workshop

Join our Community

Other Languages

[Solved-2 Solutions] How to use Cassandra's Map Reduce with or w/o Pig ?

Problem:

Solution 1:

What is Custom Input Format

Solution 2:

Related Searches to How to use Cassandra's Map Reduce with or w/o Pig?

Summer Offline Internship

Summer Online Internship

Internship in Chennai

Programming / Technology Internship in Chennai

Wikitechy

Workshop

Join our Community

Other Languages