Some months ago I understood something I will never forget. The work of a data scientist looks like the work of a carpenter. And that is correct. When the carpenter needs to solve a problem, he opens his toolbox, and he uses an specific tool for each task. So, as a carpenter, the data scientist needs to have in his toolbox of data analysis and data mining, all the necessary techniques and tools for extracting useful information, because it is impossible to face a data science problem with only statistic techniques, or with only five or six data mining methods. “The more tools in the box, the better solution”
Home » Data Analysis » Data scientist´s toolbox
Data scientist´s toolbox
Recent Posts
Archives
Latest Posts
-
An exclusive interview with Rayid Ghani
I would like to share with you an exclusive interview published by KDnuggets with Rayid Ghani, who played the rol of Chief Scientist at Obama…
-
Qlikview continues to revolutionize the world of business intelligence
Some years ago, when everything seemed to be invented in the world of Business Intelligence, when almost all the information necessities seemed to be covered,…
-
Head First: Data Analysis & Statistics
For beginners of data analysis and statistics, I think these two books are highly important: Head First Data Analysis and Head First Statistics. Head First…
-
Data scientist´s toolbox
Some months ago I understood something I will never forget. The work of a data scientist looks like the work of a carpenter. And that…

i like it
In operations improvement, tools (e.g., six sigma, lean, metrics) too often become an end in themselves (i.e., now that I have a hammer, everything looks like a nail). How do data scientists avoid this dysfunction?
Hello
In operational improvements, data scientists should be careful when solving problems with data. I think that there is no obvious mechanism to prevent the dysfunction that you raise it, I think it depends on the experience of the data scientist and the expertise that they have when they use the “tools”. It is true that some tools become useless at certain stages of the process, so it is important and I always warn that each problem does not resembles to another, even having the same data. Sometimes the time factor makes the difference and the same tool that you use now will become useless tomorrow.
Thank you very much for commenting.
Best regards
More tools better solution, sorry.
Most carpenters will carry a reasonable selection, but work with a favoured few; adding another power drill, circular saw or router doesn’t result in a better result. Knowing how to use the tools at hand, and their specialities, subtleties and limitations, is more useful than having a large collection.
Yes, it’s important to include a selection, but only one or two from each of the major areas:
* Data munging
* Discovery / DQ assessment
* Visualisation
* Statistical profiling
etc.
Don’t be seduced by the idea that more tools = better tradesmen. It’s ust not true.
Hello,
I agree with your comment, but it does not contradict what I said. Just one of the major problems for data scientists is not to be sure what is the best method of data analysis and data mining to use once they have the data in hand. Of course that trying to apply too many “tools” to the same problem, it could be a disaster. When I say that the more tools the carpenter has in his box, the better the solution, it is logical and deductive that the carpenter should know how to use each tool, otherwise it would not be a good carpenter. To combine some tools when facing a problem of data mining or data analysis, it is beneficial, according to the problem and taking some tools from each area. The more tools and knowing how to use them, the better solution.
Thank you very much for commenting.
Best regards