What is the difference between Cloudera Oryx and Apache Mahout ?

Differences between Cloudera Oryx and Apache Mahout

There are 3 broad things an operational ML system needs to do eventually
- Build models at scale, offline
- Update models in near real time
- Query models in real time
Most of the tools like Mahout or MLLib do building models at scale only.

Oryx tries to do all 3, and is not doing building model.
Therefore it is really intended as a complement to any Hadoop-based model build system.
As a result it is MapReduce based for model building and implemented algorithms instead of using Mahout to improve on perceived problems.
The project which is open source, is more designed as 3 complete apps rather than a platform for extension.
It only implements
- ALS for recommendation
- Kmeans for clustering
- Random decision forests for classification and regression
The major difference is fewer algorithms but complete apps including incremental update and serving. It is not the algorithms that are really the difference since Oryx is not a new library.
The next version is built on Spark and Kafka then becomes more of generic lambda architecture for ML that happens to have entire apps too.
It is kind of Summing bird for ML on Spark. It has no algorithms implementations at all, not now. Therefore it is even more different from Mahout or MLLib.

Categorized in:

Tagged in:

Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.