What does it take to become a data scientist?

Data Scientist illustration

“You don’t need a Ph.D. in Machine Learning or Data Science to become a data scientist, but you do need strong math and statistics background to become a good one.”

Petr Tsatsin is our Head of Engineering and has been building the engineering team from the beginning of the company. To support our fast growth, I sat down with Petr and asked about what he looks for from candidates to build the next generation AI at Ople.

“I learned that the strong candidates are the ones with robust statistics background.”

Petr continued and explained that while the technicality to become a data scientist has been lowered, one must understand the underlying statistical principles of machine learning to be able to thoroughly understand and tweak models.

One question Petr is quite fond of asking is a very open-ended one: “Can you explain linear regression to me?” Petr likes this question because it encompasses many statistical concepts that are the foundations of machine learning.

From linear regression, he can dive deeper and test concepts on different types of distribution, over-fitting, and many more. His favorite follow-up question when talking about least-squares loss function is, “Why do you square the errors?” It may sound very basic, but so far, not many candidates could clearly explain the underlying necessity (besides the obvious reason to keep values above zero).

The reason for such questions that dive deeper into the fundamentals of statistics is because a data scientist needs to understand not just how to build a model but how to fix a model. Having a robust understanding helps a data scientist to make educated assumptions on the data and distribution, and apply well-thought out statistical process into the model selection.

Another question Petr asks frequently is to have a candidate walk him through a particular experiment. The simulation dives deeper into the data preparation part which is another essential piece of becoming a great data scientist. “Do you know what data you need to get? How would you get it?” In other words, can the candidate foresee the potential discrepancy between training distribution and real data distribution when the model goes to production?

In summary, Petr believes that strong data scientist candidates should show a firm understanding of statistics. If you are an enthusiastic data scientist who is ready to answer the questions above, please contact us because we are growing fast and looking for great people to join our team!

(Featured Image Designed by Freepik)