So you’re building a data science team. That’s great news! As a business leader, finding a qualified data scientists is a critical step in your company’s ability to harness big data and machine learning technologies, which is a competitive advantage. But it’s also a process fraught with difficulty and pitfalls. We reached out to data science leaders to get their thoughts on the matter.
One of the most important steps to building a successful data science team is hiring a senior data scientist who can lead the further development of the data science team, says Seth Dobrin, who heads up IBM‘s Data Science Elite Team.
“Until you get a credible senior person in your organization that’s a data scientist, it’s hard to get others to come on board,” says Dobrin, a PhD with more than 20 years of experience in data science fields. “There are some clients that just can’t find talent.”
Dobrin was hired by IBM two years ago to build out the Data Science Elite Team, which is a new endeavor where IBM data scientists engage with organizations in six to 12 week engagements to collaborate on data science and AI projects. The service is free to customers, although there are some requirements (like your willingness to serve as a public reference).
After travelling the world to meet with IBM clients for a year, Dobrin set out to assemble the team, which currently consists of 60 data scientists, machine learning experts, and others with related expertise. Dobrin is currently looking to hire 30 more data scientists this year – which means he might be competing with you.
In Dobrin’s view, hiring a senior data scientist signals to other prospective data scientists that the company is serious about data science and AI, and isn’t just jumping on the big data bandwagon. The newly hired leaders will also be able to exercise their own professional networks to fill out the data science team, much as Dobrin has done himself.
“The hardest part is getting that first person in who has that deep network, who can bring in additional talent,” he says. “We all work for people. We don’t work for companies. We go change jobs to go work with someone, not necessarily to work for a specific company.”
In some situations, the company will rely on the senior data scientist for setting its data science and AI strategy. Ideally, however, the company will already have ideas where they want to apply data science and AI technologies and techniques, and the senior data scientist is brought in to execute those ideas with process and rigor.
“Ideally it comes from the top,” Dobrin says of the idea-creation process. “In an ideal situation, it’s the CEO. That’s a rare situation though. Usually it’s one or two people who get it. It’s the CIO or CFO or CMO who gets it, that starts pushing us and starts driving it within the company and getting the resources.”
If a company is struggling to come up with the big idea, there are plenty of consultancies that can help with that. System integrators like Deloitte, KPMG, E&Y, and PwC all have large staffs of data scientists and others who are expert at analyzing business models and figuring out where data can give them a boost.
Headhunters can also be hired to bring in an experienced data scientist to get things started. “It is a little bit of a chicken and egg problem if you don’t know how to build that value proposition,” Dobrin acknowledges.
Dobrin used familiar tools and channels to work his network, including making phone calls, emails, and LinkedIn. Getting the right job description on job boards is critical to clearly communicating the role, and acting quickly on nibbles is also important to hooking the big fish.
“If you take six weeks to go through an interview process from first contact to offer, you’re going to lose people,” he says. “My goal is 10 days.”
When it comes to specific technical skills, there are a few areas of expertise that are absolutely critical, such as Python. If you don’t know Python in this day and age, you had better have a hard-to-find talent in some other area. Apache Spark has become a critical tool in many data scientists’ toolboxes, so being familiar with how to use it is important. R is still a popular language for data science too.
On IBM’s Data Science Elite Team, XGBoost has become the go-to algorithm for traditional machine learning problems, thanks to its power, tunability, and forgiveness of overfitting, according to Dobrin. “There’s a constant barrage of new tools, methodologies, and packages that are out there that people just need to be up to date on,” he says.
Graduating from a data science bootcamp is a good start, but it’s not enough to consider yourself a full-fledged data scientist, says Pedro Alves Nogueira, who heads up the data science and AI business at Toptal.
“There are not a lot of people in the market with proven experience,” says Nogueria, who has a PhD in AI, human-computer interaction and affective computing from the University of Porto in Portugal. “Doing a bootcamp on AI and data science is probably not going to be enough for you to be a data scientist. It’s good enough for you to learn a skill…but it’s not going to give you the basic mathematical knowledge.”
Toptal prides itself on having the top 3% of talent – hence “Toptal” – in a given field of development. The company started by offering developers on demand for general application and Web development. As more clients looked to Toptal for data science and AI expertise, the company decided to formalize its data science and AI business by creating a dedicated department.
While data science is becoming more automated by the day, it’s still critical that a data scientist know how machine learning models work at a deep level, and to be able to build them by hand, if necessary, Nogueria says.
“We allow developers to use whatever technology they want,” Nogueria says. “What we’re most interested in is having the fundamental ability to understand the models and to implement them from scratch.”
Becoming one of Toptal’s data scientists or AI experts is a rigorous process, Nogueria says. The first step is ensuring that the prospective data scientists are proficient in English, which is important considering that the company heavily recruits from eastern and southern European universities. Next, they must prove their mathematical chops by solving a series of ML and AI problems.
“Then somebody on existing screening team, which is himself or herself a senior AI or data science developer who has been working with us for two years, [takes the prospect] through a live coding session,” he says. “Then you have to do a two-week sample project that you have to present to us as if we’re the client. We spend a lot of energy and time to make sure they really know what they’re talking about.”
Ultimately, pairing a Toptal data scientist with a specific client takes careful analysis of the business outcomes that are sought and the capability of the worker to fulfill the technical requirements.
“It’s not just about building models,” Nogueria says. “It’s about knowing what you’re building and making sure what you’re building is intelligible to people who are going to be using it, and that it is solid and useful for the business itself.”
Once you find a good data scientist, retaining them is also important. Providing good data science problems that impact the bottom line is arguably the best way to keep them around. Giving them the freedom to learn new technologies and techniques is also important. Of course, offering them a competitive salary and good benefits are critical too.