Why you don't need to understand machine learning algorithms

Some people think I am a machine learning expert. And in some ways, I am, which is great, because it is suddenly the flavor of the week—and I have lots of companies asking me for help solving their marketing problems with this new secret sauce. What I know is less how machine learning works, and more how to know that machine learning will help you with your marketing problem. What I don’t know is all how all of the lovely algorithms in the picture above actually work. And you don’t need to, either. Let me explain why.

Many of you know that I am a former IBM Distinguished Engineer. Distinguished is a nice word for “old.” I have been at this for a long time. When I first started out as a programmer (we were all programmers then, not developers or—God help us—software engineers), we were just starting to use databases. And the first databases were designed by the programmers writing the application programs. They put records of data into files. They decided how to split up the files. They decided how they would quickly find a record in the middle of the file. And they did all of this so that the data would be arranged so that it would be most quickly accessed by their particular program.

And that worked very well. Right up until you wanted to have a second program that used the same database. That second program would usually run incredibly slowly. I am talking weeks. So slow that by the time it answered the question you had, you didn’t care anymore. The reason this happened was that the database was optimized for the way the first program needed to use it, and the second one needed to use the data differently, so it didn’t work well at all. And if you wanted to change the database to accommodate the second program, you needed to rewrite the first program.

Enter the relational database, a wonder that removed all the knowledge of the physical layout of the data from the programmer. Now, a database administrator (DBA) was responsible for optimizing the database, not the programmer. All the programs using the database now ran relatively quickly because the DBA worked at making the best optimization choices for the whole set of programs. And the DBA could change the structure and optimization choices for the database every day of the week to make it better. No matter what the change was, you never needed to rewrite any of the programs. Nowadays, the database itself self-optimizes based on usage. So, all the techniques that the original database programmer needed to know are now known only by a few people who actually work on database products.

Why am I telling you all of this ancient history? Because it helps you understand what you need to know about machine learning.

When I first worked with machine learning, in the 1990s, it worked the same way as when programmers were designing the databases. You wrote your program using one of the algorithms, just like the ones shown in the graphic above. And that worked very well, as long as you had chosen the best algorithm. If you wanted to choose a different algorithm to see if it would give a better answer, you needed to rewrite your program.

In more recent times, machine learning platforms began to emerge. These platforms, such as the popular scikit-learn, now allow you to create your machine learning model (we in machine learning are too fancy to call it a program) using any algorithm you want. You can test it and see how it works and then run it again with a different algorithm and see if that works better. To do that, you need a machine learning specialist, the equivalent of the old DBA. It’s helpful for that person to understand how the algorithms work because it will take fewer tests, but it isn’t as crucial a decision as it once was, because it is easy to change if you get it wrong.

So, you, the marketer, can safely ignore all the talk about algorithms even today. Soon, you won’t need an expensive machine learning specialist either, because the platforms, just like relational databases, will start to optimize themselves. They can run all the algorithms and choose the best one. Automatically.

This is good news, because I am seeing a run on this smart machine learning talent and my clients are worried that they can’t compete with Google and Amazon who are snapping up all of these folks. What I am trying to tell you is that it is okay. Soon, you won’t need such gurus.

Now, you will still need data scientists who understand the data that you have and you’ll need experts to do feature engineering—deciding what aspects of the data should be examined by the machine learning component. But your organization won’t really need to understand the nuts and bolts of machine learning. So, if you are just getting started in machine learning, focus your talents on the data, not the machine learning—that is the skill you need. Renting the specialists will work just fine for now, and in the future, you won’t need to know those people at all.

You’re welcome.