Exploring the Data Science Industry through Popular Job Titles

By Kyle Lee (Project Lead), Megan Ma, Amelie Ionescu, Christine Kim

DataRes at UCLA
6 min readDec 17, 2021

Introduction

The recent rise in interest in the data/machine learning field has been accompanied by an increase in job descriptions that are murky at best. It seems as if every company is looking for a data scientist to breathe new life into their business, but what skill sets are they actually looking for? Because the field has only become popular in the past five to ten years, there are currently no standardized industry requirements for data professionals unlike there are for software engineers. As a result, new grads and individuals looking into the industry may struggle to pinpoint what qualifications are necessary for which job titles. What is the difference between a data scientist and a data analyst? Or a machine learning engineer and a research scientist? Our goal in this project is to clarify the differences in recommended qualifications between popular data job titles, and use this information to give advice to current data job seekers.

Dataset

The dataset we used for this analysis comes from the Kaggle Machine Learning & Data Science Survey. Every year, Kaggle conducts an industry-wide survey to assess the current state of machine learning and data science. This year, the survey had 25,973 responses with questions regarding education level, job title, programming languages, machine learning frameworks, etc. We chose to focus on the 8 categories with the most responses (excluding respondents who are currently not employed). These categories are Business Analyst, Data Analyst, Data Engineer, Data Scientist, Machine Learning Engineer, Research Scientist, and Statistician. Although not a primarily data focused job, we also included the category Software Engineer for reference.

Common Degrees Held

The graph above shows a side by side bar chart of the percent of each job title that holds certain degrees, with the degrees being separated into bachelor’s, master’s, doctoral, and professional degrees. Professional degrees constitute a license for the particular job degree-holders are interested in. This question measures a respondent’s highest level of education; for example respondents with a bachelor’s and master’s degree would contribute to the count of master’s degrees only. As observed through the above graph, most data professionals hold bachelor’s or master’s degrees, while the number of professionals with doctoral or professional degrees is significantly less. Research scientists pose an exception to this observation, with the most common degree held by research scientists being a doctoral degree. Breaking down the rest of the occupational trends, most software engineers had either a bachelor’s or a master’s. Contrary to popular belief that software engineering only requires a bachelor’s degree, our data shows that many software engineers hold master’s degrees as well. For business analysts, data analysts, data engineers, and machine learning engineers, the percentage of those who have bachelor’s degrees in comparison to master’s degrees is roughly the same. For statisticians, there are significantly more professionals with master’s degrees rather than bachelor’s degrees.

Programming Languages

This graph shows the counts of the most common programming languages. From this graph we observe some interesting trends — the proportion of software engineers who use Javascript and C++ is significant, while professionals with other data job titles rarely use these languages. For job titles more closely associated with academic fields such as Statistics, R is more commonly used than Python. The most popular programming language for all other titles is Python. For a vast majority of job titles, SQL is also a consistently popular language.

This heat map shows which programming languages were recommended to learn by each type of professional. This is relevant to job seekers who are just getting into the data field and need to decide which languages would be most useful to learn. As you can see, the most recommended language to learn was python by every type of professional. However, many statisticians also recommended learning R as well. This correlates with our previous visualization, which showed that academia related professionals use R more frequently than python, so it makes sense why statisticians would view R as being more important to learn compared to other data professionals.

Programming Experience

As technology has been a basic necessity for education and industry, computer programming becomes the most essential technical skill to be aware of and learn about. That’s why we decided to see which roles needed the least and greatest amount of programming experience. The heat map chart above displays the distribution of programming experience in percentages out of each job title. Business Analysts, Data Analysts, and Statisticians have the majority with less than 1 to 3 years of programming experience. This is because of their tasks and responsibilities mainly being to analyze data or business systems in order to discover findings and make business decisions through their brains and not a computer’s. On the other hand, the majority of Research Scientists and Software Engineers are in the 3 to 20 years range. While it may be obvious why Software Engineers generally have more programming experience, a reason why Research Scientists cover a large range in experience is due to the fact that they are able to take advantage of programming to automate experimental processes that would otherwise be tedious for humans to continuously go through. Because they go through the experimental process from start to finish, modern researchers automate as much of their experiments as they can, whether that be extracting certain data or handling errors within their data.

Conclusion

Although subtle, there actually is a difference in the requirements of various popular data related jobs. With some exceptions, the programming languages used by professionals in these jobs are very similar. However, there is a difference in the level of education amongst these professionals. Here is a flowchart that summarizes recommendations for people looking for jobs in the data industry based on our findings:

Note: This flowchart is solely based on our analysis of a dataset that is relatively small compared to the total number of people working in the data industry, and this dataset only consists of Kaggle users. Please view it as a visual conclusion of our analysis, but know that it does not necessarily represent all data professionals.

Caveats

This article looks at three different categories (level of education, programming languages, and level of experience), each of which were analyzed separately. However, each of these categories could potentially influence one another and we did not factor this into our analysis. Additionally, we only explored a few categories in this article. There are many more categories (included in the Kaggle data and not) that could also influence a job seeker’s success in the data field. We have covered what we believe to be the most prominent ones, but this is not comprehensive of all the factors that could possibly influence a job seeker’s success. For future analysis, it might be interesting to look into the amount of influence each factor has on a job seeker’s success, and then rank the factors based on this degree of influence.

Github Link: here

--

--