September 13, 2013

Rapaio - statistical, data mining and machine learning toolbox

A year ago I participated to a machine learning contest. The contest was about predicting what movies would like each user. The results of the contest are not important. But due to the fact that the contest was held for one year period, I had the chance to try a lot of ideas along the way. I knew nothing about machine learning, and statistics, and I tried somehow to see what an engineer ca do without prior specific knowledge.

What I have earned was that I discovered a whole new world where many precise tools are available, and almost all of them can be useful only if their results are properly interpreted. I discovered a world were, at least that's what I think now, there is no chance to do something without a proper human judgment. And what I found astonishing is that all those available tools are no more than precise common sense. I loved it. I feel like I am captured in this new world and in the same time, completely free to explore, find and understand how the real world works. I started to buy books and study them line by line.

As usual, when I find something worth to learn, I always choose the hard way. And for me, the hard way in this case means to try to implements all those tools in order to understand better their limitations, theirs reasoning and power. The result of this starting efforts are now materialized into a project, an open source project.

Rapaio library is a statistics, data mining and machine learning toolbox. Now it is work-in-progress and most probably will stay this way for a long period of time. That does not mean it will not be usable. It means that ahead is a myriad of tools and knowledge waiting to be implemented and used.

The source code is available on githib.
https://github.com/padreati/rapaio


And to have a preview on what this library is doing I started to create some tutorials on various topics, where I present how this library can help you. Those tutorials are strongly connected with the source code itself. The code which generates those tutorial are part of the library itself and that will have among other benefits, the fact that those tutorials will be up-to-date and ready to use in practice.

Rapaio Tutorials are availableon github also.
http://padreati.github.io/rapaio/

No comments:

Post a Comment