Thursday, June 9, 2011

What are the most widely deployed machine learning algorithms?

One of the interesting questions is: "what are the most useful machine learning algorithms?".
I did a little survey by looking at the Mahout user mailing list and counting occurrences of keywords. The results I got are shown in the plot above.


It seems that matrix factorization (SVD) is the most widely used algorithm, and then K-means. We have just implemented SVD as a part of the GraphLab Collaborative Filtering library. Anyone who wants to beta test it is welcome!

5 comments:

  1. Nice :) But maybe this is a plot of inverse quality of the documentation for each algorithm?

    ReplyDelete
  2. I think you are wrong - for Mahout's SVD we would get division by zero.. :-)

    ReplyDelete
  3. I think that this is definitely not a measure of usage. From my experience, the order in descending frequency would be recommendations, K-means, SGD, SVD and frequent itemsets. The first two or three dominate. The order of the later ones is uncertain.

    ReplyDelete
  4. nice, added to the list http://www.quora.com/What-are-the-top-10-data-mining-or-machine-learning-algorithms

    ReplyDelete