Differences between Cloudera Oryx and Apache Mahout

  • There are 3 broad things an operational ML system needs to do eventually
    • Build models at scale, offline
    • Update models in near real time
    • Query models in real time
  • Most of the tools like Mahout or MLLib do building models at scale only.
  • Oryx tries to do all 3, and is not doing building model.
  • Therefore it is really intended as a complement to any Hadoop-based model build system.
  • As a result it is MapReduce based for model building and implemented algorithms instead of using Mahout to improve on perceived problems.
  • The project which is open source, is more designed as 3 complete apps rather than a platform for extension.
  • It only implements
    • ALS for recommendation
    • Kmeans for clustering
    • Random decision forests for classification and regression
  • The major difference is fewer algorithms but complete apps including incremental update and serving. It is not the algorithms that are really the difference since Oryx is not a new library.
  • The next version is built on Spark and Kafka then becomes more of generic lambda architecture for ML that happens to have entire apps too.
  • It is kind of Summing bird for ML on Spark. It has no algorithms implementations at all, not now. Therefore it is even more different from Mahout or MLLib.

Categorized in:

Mahout

Tagged in:

, , , , , , , , , , , , , , ,

Share Article:

Leave a Reply

Ads Blocker Image Powered by Code Help Pro

Ads Blocker Detected!!!

We have detected that you are using extensions to block ads. Please support us by disabling these ads blocker.

Powered By
100% Free SEO Tools - Tool Kits PRO