AWS AI & Machine Learning Podcast

Episode 8

February 03, 2020 Julien Simon Season 1 Episode 8
AWS AI & Machine Learning Podcast
Episode 8
Chapters
AWS AI & Machine Learning Podcast
Episode 8
Feb 03, 2020 Season 1 Episode 8
Julien Simon

In this episode, I have a chat with Ségolène Dessertine-Panhard, a Data Scientist with the AWS Machine Learning Solutions Lab (https://aws.amazon.com/ml-solutions-lab/). As she spends her time working on customer projects, we talk about ML in the trenches: framing problems, building datasets, picking algos, deploying to production, and more.

⭐️⭐️⭐️ Don't forget to subscribe to be notified of future episodes ⭐️⭐️⭐️

This podcast is also available in video at https://youtu.be/74LwQP9oKU4

For more content, follow me on:
- Medium: https://medium.com/@julsimon
- Twitter: https://twitter.com/julsimon

Show Notes Transcript

In this episode, I have a chat with Ségolène Dessertine-Panhard, a Data Scientist with the AWS Machine Learning Solutions Lab (https://aws.amazon.com/ml-solutions-lab/). As she spends her time working on customer projects, we talk about ML in the trenches: framing problems, building datasets, picking algos, deploying to production, and more.

⭐️⭐️⭐️ Don't forget to subscribe to be notified of future episodes ⭐️⭐️⭐️

This podcast is also available in video at https://youtu.be/74LwQP9oKU4

For more content, follow me on:
- Medium: https://medium.com/@julsimon
- Twitter: https://twitter.com/julsimon

spk_0:   0:05
Hi. This is Julian from Minimal. Yes. Welcome to Episode eight of my podcast. Don't forget to subscribe to be notified of future episodes. In this episode, I talked to my colleague signaling single, and he's a dead a scientist who works for the ML Solutions lab inside of AWS. And we talk about machine learning in the trenches. So what does he take to have successful ML projects going from friend? The problem to building that sets to picking out goes to deploying in production. Lots of really good tips in there. Let's not wait and jump to the discussion, Siegel. And thank you very much for taking the time to talk to me today. Can you tell us a little bit about your role untouchable? Yes.

spk_1:   0:47
So thank you again for letting me and good morning, everyone. Um so I mean that as an interest in the national sensation at AWS. My team is dedicated toe orchestra toe do nationally on AWS.

spk_0:   1:07
Okay, that's that's a big topic.

spk_1:   1:09
That's a victim. Vegan. It's honestly Ginger. Yeah.

spk_0:   1:13
Good. Good. And your is your team hiring. What's the first thing customers need to focus on when they start a machine learning project.

spk_1:   1:26
I think the first thing is to focus first on the business problems they want to serve and out to friends.

spk_0:   1:36
Okay. And I keep saying the business problem should be something you can write on the white board. Something simple Is this something you see as well

spk_1:   1:45
as exactly that's really is the simple you. And this is one thing a Your Grace I took just over at the very first aid when I need them is keep it simple kiss. Keep it simple and stupid. That's very important to give her to formulate in a clear manner. What you want to serve was always That means that the rest of the project would be very hard

spk_0:   2:08
to serve. Okay. And you recommend identifying key business metrics. Exactly. Be improved. But And this is prediction.

spk_1:   2:18
This is exactly what we do when we met, when we need the customer for the first time, we tried to understand what is the business problems they want to serve. And after what are the tepee eyes? That's radio keyboard forest the captain can, which can be improved by OK,

spk_0:   2:37
so write something simple on the white board with metrics and then once what's the next

spk_1:   2:43
step is to make your hand dirty and try to sink your some better toe surgery. This problem because sometimes we can see that that's some kissed up with some customer. They have some big problem. But myself, the times they don't have that I don't know whether that is acceptable. So there's a single step is to understand and to know you better.

spk_0:   3:04
Okay, and, uh, in large companies, I suppose this means lots off investigation discussion with different teams involved. So who would you need to sit around the table to be successful here

spk_1:   3:22
in this case, our lives, our business, some business people. So let's take over the business or no, that was a business except Eva. And after some I t. Guys, we know that that knows the infrastructure. It's a two kind of people, but they took its this to people s super important. When you try to do some,

spk_0:   3:41
how important is it to understand the baseline? What? I mean, here is some machinery projects are completely new, right? But a lot of them actually try to replace a manual process or a traditional IittIe system. And I guess you need to understand that as well, right?

spk_1:   4:03
Exactly. And sometimes, you know, is you have some kissing, you say, Okay, we wantto make this taste Xterra because we know that we've got a lot of human a house, but one of the key. Quick question. I asked them every time in his who isn't in this charge of this manual process? Because once after you need to talk with the people in charge of the money processing. Okay, How do you know that? Except Ava. And this is how you can improve the task when you understand. Well, we're doing what and how he's doing what and why.

spk_0:   4:38
Why, why they're making mistakes. Yeah, Yeah, I think it's It's a good point if if you don't understand how human does it exactly. And how could you build a model that don't

spk_1:   4:48
thank you? Okay. And I can see that even a very big customer with a lot of debt. And sometimes it's years that we've got, like, four people in Children do exactly what? Okay. And so you need before doing like some Chris City planning things on the data. You really need to understand human behavior

spk_0:   5:10
when you're looking at KP eyes that are going to be improved by the model. Um, what would be the best practice in trying to improve on those KP eyes? Are you looking at reasonable? Let's improve. Is my friend percent or on it a rate? Or should you be more aggressive in our come on? Let's go for 50% for

spk_1:   5:33
my at four million experience with better again to keep it simple. Att The beginning ends once try toe went to see some improvement after. That means that you understand the process. You understand how to do that and after it would be easier to improve fester. There's a gap years, but from my point of view and even if some customer want to go fast and saying Okay, I'm gonna send some dreams and good I use a measure, nothing everywhere. I think it's good. Thio understand the baseline a CZ you say, And after it is your toe. Okay, go at some complexity,

spk_0:   6:10
so set expectations with the business stakeholders and explain that you will. It's right. And I think you you I have to reiterate anyway because data discovery is not something you're going to complete in one go. You always find new sources of

spk_1:   6:28
Exactly. And if doesn't know you go in your women project, you have some new ideas and new hypotheses, and you want to test. This is a reason. Well, at the beginning you get it's very protect us. Keep it to start with a very simple model. And after you it's It's like it's it's coming from automatically. You got new ideas. New that new people. Well, we want to be involved in this project, and it's Candace

spk_0:   6:55
Reiteration Operations division. Basically. Okay, Good. Now let's talk about data sets, OK? Because they're the blood off machine learning models. What what best practices would you recommend to customers? Or what mistakes do you see customers doing on building and curating cents

spk_1:   7:17
don't most. Sometimes they don't know that, and one of my first recommendation would be spending time to understand. Where did that come from, where you get some missing. You are bad quantity. Some heroes, I think that spending the exploratory, that disease, it's such a new, important step in any kind of damage with it, and after again, people want to rush into the emergency ings and say yeah, yeah, Yeah, we're gonna do this, but I am. Keep it. Take your time in the States so super important and you know, we as scientists, we used to say garbage in garbage out. And it's so true because, yeah, you don't know where you that I should don't units.

spk_0:   8:02
It was true in the seventies.

spk_1:   8:04
I think

spk_0:   8:05
it's true in the twenties, and it will be true in the

spk_1:   8:08
exact and after ritual issued the same computer version. If you get like a lot of noise in your library, after better and it's in President's toe understands that before doing any kindof transformation,

spk_0:   8:24
as with some of my previous guests, we discussed the role of the data engineer and how difficult it is to, uh, access data for data scientists. So, do you see this role as a well defined role with customers at the moment, or is it still emerging? And what's your opinion on how to be a successful date engineer?

spk_1:   8:47
So, for instance, might in that zone machine learning solution, then we get a different type of pro fine. We got that s aunty's brother, that scientist like me, we've got some deplaning architects were going to create the pipeline food, the full automated pay playing for any kindof Amerindian projects. And we've got some position off that engineering. So, as you see, we got three tight off. MM. Practitioner research in that. And I think this three orders are very needed each other and that engineering and Delia ready, super important and crucial in these aspect. Because otherwise I did that. A scientist. If I don't have access to the time, I can't do anything, so I really need them. And after once that idea, they need to have someone to do the drug off, making sense of this decker that this is important to have this kind off profile after sometimes. That's, um, kiss. Tomorrow there is only one gay in charge of everything is like that. That scientist, that engineers and after its way, provide some tools on service is to help in this area.

spk_0:   10:03
Building a data set is never something that gets done. It's never over.

spk_1:   10:08
No, it's never over, because again, you're going to have some new idea. I did to test some new, even making this a new date. And in this king of whether you would be ableto add some bozo s. Oh, it's like a very interpretive process, very dynamic. And that's very nice.

spk_0:   10:27
So it's an ongoing Thio. You plan for the exam? Okay, now it's time to try out. All goes on there. It's amazing values. Yeah, right from statistical machine learning to deep running to everything else. So how'd you get started here?

spk_1:   10:46
So on. And that's one of my most important recommendation. Every time is the baseline att, the beginning. It's like super important you take is just that Go very simple ago. You see what's happened and after you're gonna introduce a more complex stuff some more that I said, you're going to compare these amazing. But would

spk_0:   11:08
you even start with a subset of the data and try statistical goes, you know, logistic regression or something like that. Just to see a Can I get to 80% accuracy?

spk_1:   11:19
It's Yeah, it's a front bottom to just place it there with that doing eggs. Explanatory that Isis in case off mission on you, like running Forrester to seize the distribution of the debt, I should get, like your discretion spirit gotten

spk_0:   11:33
and then have more features. Yes, why out

spk_1:   11:38
different model and again Compare with the business you get. You see, after where is the improvement? You can have it up there for a moment. What is the speed exit? A bit against the very first time. I think most of the for instance. Imagine personalize and the Amazon focused work like this. You start with a simple weather which going to be your baseline. And after you're gonna try different algorithm. For instance, Farkas, you're gonna try to do some HBO with DPR likes it. But the very important stuff is to have a business.

spk_0:   12:08
OK, so again you convince yourself you have something interesting and then you can unleash a collection of our goes into hyper parameter tuning. And the first tooling is important here because you don't want to do that stuff manually.

spk_1:   12:22
Yeah, but sometimes at the beginning, it's about time to manually some stuff and

spk_0:   12:27
manu exploration.

spk_1:   12:29
Exactly. That's

spk_0:   12:30
OK, increase. And so when do you know it's time to stop

spk_1:   12:36
when you have? When you think when you dream about that during the night on that and you say OK, now I'm done. It's driving me crazy. Now I think that's the difference between pure research on business, You know, for your journey with us about your research. I could spend months just to improve by 1% to person went with it. And in business, it doesn't make sense. I mean, if you are at 60 or 70% of accuracy, you won't spend another eight weeks to just go to 7 72% of accuracy. So I think at one point, you know, it's like the human stuff in important, um, the mission and the planning project is that at one point you say, Okay, I'm agree. It could be perfect. It is not, but even work. And then when? When Interval will retrain. And I would see how it's worked in the future.

spk_0:   13:27
And also you can for some work flows, you can have a human in the loop, right? Yeah. So you could say, Well, this is only 73% accurate. But there's a human in the loop to money to money tar and, you know, catch. That's super important to

spk_1:   13:44
have you in the roof. And yeah, because after you putting purporting protection, some customers say Okay, now it's done, the job is done. And now the job is not done pretty. Skip it someone inside the room because it's always you don't know if there's a performance decrease and you don't super about into a someone they care about. Hungary's, um the prediction.

spk_0:   14:06
So once you reach the accuracy level, that is good enough for your for your business. You have to deploy production, and I keep saying this is actually the hardest part and the most dangerous part. Yeah, yeah. Do you agree? So tell us. Tell us about customer stories and horror stories. Last story. I

spk_1:   14:28
think that you put into predictions of Mother without ever having done some maybe testing before you have some customer who could using but find what's happening. You're like, Oh my God! So it's very tricky and you need again Thio. Be very, very picky on each type of the production organisation of shore. Then you need to be careful about I was there that are going to ingest what kind of off matrix you can have to follow and to understand by playing us in just a little better on after we got off course. Like some Sami's like Sagemaker, which ampule toe deploy and put into production in one week. But after you need Thio, take there. It's very, very kind of manure stuff to yes, to be sure that you Claude words, your step function works when the boot. So, yeah, so

spk_0:   15:26
I think it's I asked the question to again when the Viper was guessing. How soon can we won Clea deploy and forget about it? Huh? Why? Well, no, not right now. Now that's right. And I think

spk_1:   15:41
it would be a good idea because it's important to keep control of this thing was always again that I can evert the doughnuts in a very short time, and it's super important to you to keep an eye and to be to keep a touch.

spk_0:   15:57
Data drift missing feature something upstream that corrupts the data. There are a 1,000,000 things that onstage maker model monetary is how we try to solve a problem. So that's particularly Abbott. Production is the hardest.

spk_1:   16:17
That's super creditor you can do, and once it is in prediction, it's good because you see the earth done. It's a job you have done for so many months except Ava. Now it's really is life, and

spk_0:   16:29
you're like, Well, you can look at your KP eyes and and show that the improvements Israel. Because in my experience that the data science sandbox is one thing. And you know you're running your baby test and you try to do it well and then inside. Well, it works and everybody's very excited and you put it in production. And it doesn't work the way I expected it to work because data is a little different. I don't know. Lleyton sees a little bit different whatever. And I think that the truth lies in production and is that if you can replicate in production the results you have in the sandbox, then congratulations

spk_1:   17:09
against like, wins when it is the job done by that engineering. About that, the scientists became become, really? Because people can see and this is what people were not experienced in the area. Just business driver can touch, can say, OK, we can use this standoff af and begins his improvement, and people say okay and business understand that I did. That is super proud,

spk_0:   17:37
and I think it's I meet a lot of customers to say, Well, we built a park that works, and we stopped there on was saying, Well, Why don't you go and push that book in production even if it's only limited production? Because exactly what you said until you show the business guys that day, this metric is improving. It'll be just OK. Nice Phil. Yeah. Nice. So what? Well, that's what Right, exactly. So a park to me needs to go into production at some point under under full control, etcetera. But if it doesn't an introduction, I think he just didn't have the job.

spk_1:   18:20
Yeah, exactly. It's like a lot of research and things is one sentiment I see when feeling issue with a lot of that. That's I am pleased to say, OK, we are wearing the hell where somewhere way, Do some code and nobody look, it's a code and nobody under someone's we do. And this is Yeah, I think what you do is a book and you put it into production. You put it in life. People understand it was a working on the stones of footwear.

spk_0:   18:47
All right, we could go on for three hours, but we're almost out of time. So let's play the top three games. So top three things that are important for a machine learning project to succeed.

spk_1:   19:03
That's up, people business. Welcome.

spk_0:   19:10
Okay, great. Now top three things that kill ML projects

spk_1:   19:14
thing. I mean, you got a good business. Our country said well defined people won't be motivated. And don't try to find the good that out on the other side, if you ever ex stripper business has gone with some super motivated people with business people involved, plus good that direct you and indirectly related to the project

spk_0:   19:42
that you need help. The ML. So it's, you know, single. And and the and the rest of the team, which is amazing, are here to help. So you get in touch. Thank you very, very much for sharing the real life stories and the real life advice for customers and everybody else much appreciated. And, you know, I wish you many successful projects. Thank you very much.

spk_1:   20:04
Thank you.