Monday, March 30, 2015

How to Interview Data Scientists

Daniel Tunkelang from LinkedIn on hiring data scientists

 

Great talk by Daniel Tunkelang at the Strata conference last month

https://www.youtube.com/watch?v=gUTuESHKbXI

clip_image001

clip_image002

clip_image003

Thresholds of knowledge in these skills, not perfect or near-perfect because those people are too expensive.

clip_image004

These people are hard to find and hard to assess! It seems straightforward but it is subtly hard.

clip_image005

clip_image006

clip_image007

You don't normally test what people will do in their roles in 30-60 minutes. That is ridiculous! Cognitive Biases prevent us from measuring well. We over-estimate our ability to judge people at interviews. We have many unconscious biases that prevent good judgments (cites Arielly, Dan Kaneman, others). Do not interview!

clip_image008

Hire only people with whom you have studied, worked, or socialized! That's one alternative.

clip_image009

You could also hire only interns. You make sushi together (at LinkedIn). Convert to full-time.

clip_image010

All of these alternatives have big problems. The pool of people we know is too small. Only early-stage start-ups can scale this way. This method, of course, creates an unhealthy mono-culture. You will not hire people who break you out of what you know already.

clip_image011

Interns are a huge investment. The program is a huge investment. Sourcing is fiercely competitive! Supervision is expensive. We look for a neutral productivity out of an intern. It is not cheap labor. It is often not successful. Long-term investment. Does not solve problems quickly (part of pipeline).

Try before you buy is ridiculous! "I know you have marriage proposals from everyone else but I would like to date for a few months!"

clip_image012

That does not go over well. You end up with weirdos who do not meet your needs.

clip_image013

Some start-ups do this. Nice idea.

clip_image014

A day, a week-end, or as long as they want, then they deliver and present. You can pay them for this type of interview.

clip_image015

Mitch does this now with candidates (reads candidates code on public Internet github). Yohannes competes in kaggle. You can see his code there. The code (e.g. Mitch's code on public github papers in google scholar) is work done under real-world conditions -- work done in non-interview conditions.

clip_image016

Arsenic is 100% natural. People have never seen our data before. Data cleansing? Not natural. Should they learn our environment in an hour or a day? Same pressures as an interview but needs more investment. Take-home assignments are too much effort for the candidate. Take-home assignments -- students ask others for help and cheat.

So in the end you interviewing people!

clip_image017

The only things to remember from this talk.

clip_image018

Snide comments about famous Google interview questions and how absurd they are at predicting performance.

clip_image019

Hacking skill a requirement. Testing coding in interview conditions is hard. Code written under interview conditions is not representative of code you actually get from an employee! You can test.

Print the numbers from 1 to 500; if the number is a multiple of 3 print fizz. Multiple of 5, print buzz; multiple of 15 print fizzbuzz.

Joel Spolsky blogged about it (citing someone else). Most people who claim they can code cannot code this easy problem. If they cannot code this, walk them to the door. Mitch uses this method with his "make change" coding question in interviews.

clip_image020

Code during phone screens -- Mitch does this too. Have them work in their own dev environment. Mitch uses skype, google hangouts to watch them code in their own environment. Mitch encourages them to google for ideas to solve the problem.

clip_image021

Really stupid idea. Does the candidate class know how to code a re-hash algorithm? No one will implement basic stuff like these. Use real problems!

clip_image022

String segmentation -- used at LinkedIn until it was outed at glassdoor, after which time he blogged about it. String + dictionary: sequence of words in dictionary into which this string can be segmented. "did you mean?" tokens without spaces problem. Nice features in this problem. http://thenoisychannel.com/ Fizzbuzz problem: break up into only 2 words. Recursive back tracking, dynamic programming, memo-ization. This problem is real. It tests basic principles of algorithms. Apply in a way candidate will need to solve!

Or, take problems from your own products: (e.g. linkedin)

clip_image023

Product Design question. You know something about how your own products are designed;

clip_image024

What is cool about your problems? e.g. people you may know, skills. Talk about your space. Sell the candidate and talk about the job. Generic worthiness is stupid. Implicit to the candidate is that you are not hiring generically.

clip_image025

clip_image026

Success should not be about a single insight. If they have seen the problem before they'll jump to it. Your ability to calibrate their struggle is poor. Partial credit, hints. Variety of ways to solve any problem. Nudge them. Otherwise you are testing recall and luck. Do not test what they won't be doing. Test smarts, general skills (Google smart creative). Should be a no-brainer but people like to quiz others.

Don't be an ass hat. It's an interview, not a first date. Be firm, hard but fair. Do not pressure candidates til they cry and measure how much pressure it took as if you are titrating them. You communicate your values when you interview. No boot camp rituals.

It should be fun; you are solving problems together; people should wish they have gotten in and like you even when you reject them.

clip_image027

Most important point. Maybe = No. You must commit to binary interview outcomes! Ken Moss was a big stickler for this one.

clip_image028

Similar but much softer than the Eric Schmidt's Google book. No easy way out. Compromise with weak or strong hire / no-hire. Two "no's" is a rejection. You should have candidates who get all "hires" and still reject because you discover they were all weak yes's. Hiring process is taking on too much of the evaluation process.

Firing for performance is very difficult. We must live with this problem. Be ruthlessly cautious and conservative unless you fire people after 90 days routinely and it is part of your company.

Phone screen -- weak hires cause much waste and churn when the candidate is decimated in on-site. Think about what it does to your interviewers. Make your phone screens hard. Consider the impact of a stream of bad candidates on interviewers! If 90% of priors are rejected, they will want to get back to their day jobs and stop wasting time. Huge bias.

It is better to risk the danger of not hiring good people who flunk the phone screens than to risk a stream of bad candidates on site.

Shoot for a 50% hire rate from on-site.

clip_image029

Get feedback about all three from every interview. Make sure the whole team does interviews.

clip_image030

Trust your team! If 1-2 say no, wait. You will find a good candidate.

Questions:

Q1: How are these interviews different from product development?

Coding is the same.

Data Science problems ask candidates to work through a recommendation. Where would you find the data? Labels? Objective functions.

You want to ask people to solve your product problems and apply engineering tasks / skills you need. You can take your own problems and simplify them.

Q2: How to find weaknesses? Where are talents? Longer-term passion?

Three C's. You cannot explicitly ask. Passion is something that everyone on the team has and they can gauge non-verbally. They see hard problems as excitement. Fuzzy but you can see if you get agreement if candidate has trait.

Q3: Design Questions: What if someone gives you free intellectual property during the interview?

If somebody under interview conditions provides a killer feature for your product, you, as a company are not trying hard enough or the candidate really wants to work for you. I have never heard of a situation where the candidate sues interviewer for the 30 min interview.

No comments: