AWS AI & Machine Learning Podcast

Episode 7

January 27, 2020 Julien Simon Season 1 Episode 7
AWS AI & Machine Learning Podcast
Episode 7
Show Notes Transcript

In this episode, I have a chat with Francesco Pochetti, a Data Scientist who worked on Amazon Kindle for 4 years. He's also an AWS Machine Learning Hero. We talk about real life ML, how to deal with business stakeholders, why model explainability is so important, and more.

⭐️⭐️⭐️ Don't forget to subscribe to be notified of future episodes ⭐️⭐️⭐️

Francesco's Twitter: @Fra_Pochetti
Francesco's blog:

This podcast is also available in video at
For more content, follow me on:
* Medium:
* Twitter:

spk_0:   0:05
everyone. This is Julian from it of all us. Welcome to Episode seven Off My Podcast. Don't forget to subscribe to be notified of future episodes In this episode, I'm talking to my friend Francesco and experienced data scientists work for a number of different companies, including Amazon Kindle. He's also actively blogging about data science. We talk about getting started with machine learning, running your machine, learning projects, interacting with business stakeholders and the whole bunch of different things. So I'm sure we'll enjoy this conversation and you will learn a few things. So let's not wait and let's listen to Francesco. Francesco, Thank you very much for taking the time to speak to us today. So let's start with a quick intro. Tell us about how you got started with that. Our science and machine moaning.

spk_1:   0:56
Hello, Julian. Yes. So, um, it's been, like, almost by by by chance for me, and that's that I I started, like in in that the end of 2013 on at that time, I was looking into Python mainly s O. I wanted to get started with Titan. Ah, and ah, for um uh, absolutely. By chance, I I stumbled upon the Ngong course on oil of course era. Ah, and I come from the front cover. That was This is this robot like And, you know, everybody, like, was telling Lily did that. There was Everybody was talking about this. Did you think about machine learning? Ah, which absolutely knew nothing about. So I said, You know, let's let's get started. And the funnily enough, the homework's were in math lab, so it was not even fight on. But, you know, like the the instructor was so was so awesome dance. I just stick to it and it lasted for 12 weeks. Oh, and it was my first introduction to machine learning. And after that, I, um I thought that the best the best way to get started was was casual. So that's that was my next my next step go with with Andrew and gee, that's I think that that's that this differently for stop and and do it you're working Parton like don't don't do that. May mean Matt Lab, Right, So that's ah, that's gonna take you around. I would select Walt 12 weeks so upto for months. Probably. Um, as soon as you gets to the end off the course, I would start with the the elements off heuristic learning. Um, so that's you know, that's kind of the Bible off machine learning. It's It's up, it's it's it's a huge. It's a huge thing Like there is also like the version for dummies. I wish. I think it's for me. Applications are learning with our something, something like that that's from the same folks s Oh, I would I will stick with their We got one, um, and then cargo, That's that's for sure. Ah ah! And you don't necessarily have to compete off course I competing. It is awesome, right? Because you get, like, an immediate benchmark off off what you're up to on. That is super important, right? Because it's like I feel that the mistake, which I did at the beginning, especially wants to getting my notebook like a majority. A notebook up and running and getting out a score right together. 80% 70% 80% actress, You're a UC, um and you get, like, in just comfort zone, right? Ah ah. Feeling absolutely enthusiastic about your results and super happy about About what? What? You what you achieved But the truth is that you don't know if that is good or not. Um, So, uh, it is it is super important toe to talk to people and compare your results with with with daughters. So something else, uh, which I I think it's it's ah. It's very often overlooked by practitioners is, um is a sequel. Ah, that is that is I didn't I very rarely find like this this skill set in Ah, in ah, block posts or or whenever we talk about Is she learning? And we, um, always end up like we're very often, I would say roundup when deep learning a neural network assembled his high, which is a ready Billy cool. And they actually work. But through this, uh, thing now in your day to day job, nobody's gonna get your data for you, right? No,

spk_0:   4:20
You should put this on a T shirt. Yes, that

spk_1:   4:25
is, uh uh, that is incredibly true. And and and I I am a soon as you face it, you realize that, um, people with the Leo Well, of course. I mean, this data is in dot database in you 10 different, I mean, then, uh is probably like, you know, I was a 100 different tables which are undocumented. Ah, so go on. Good luck with that. You can You can be like a machine learning wizard, but without ah good performing e t l or a good performing SQL skills, you will absolutely go nowhere.

spk_0:   5:01
So this is a really good transition to my next question, which is when you start working on a new project. What are the first few steps you take and is there a general way off addressing a new machine learning project? Or is it a custom thing every time,

spk_1:   5:18
The very first step? Um, which is to me the hardest part in any machinery project is understanding the business context. That was like when I was in was working for Amazon Kindle, actually. Um, so you know, we were We were asked liketo predict whether any book what would sell well or not right when, whether whenever it was available online, like on the Amazon website on then again, like getting back to what? Where's that before? Um, what possibly go wrong? Right it z z right? You got any book? Can you want to burn your I want to understand why much revenues justice this guy's gonna make. So listen, um, now the problem is that, um you know, um, for instance, is that they book just the part of the business we should be focusing on, right? What about that? The paper side, right? What about the fact that on that book is already available in print, Right? Eso is is my is my e book. Are my is my neighbor going to cannibalize sales from from the paper side? Right. That's something which is definitely not not Hold right. It's completely out of scope. But guess what it is instead a super critical part of your of your problem. Because if you if you over sell your e book, then your publisher is gonna get gonna is gonna be unhappy with you because the complete that the global volume of sales is gonna go down, right? And that is something which is not incorporated in your lost function, right? Uh, that that's that's why it is super important to frame your business part of your business problem correctly. If you just perform our aggression on sales off the book, you would optimize that's number.

spk_0:   7:07
Yeah, be careful what you wish, because that's what you're gonna get, right?

spk_1:   7:10
Exactly. Right. You will be my just a number. And guess what? The machine learning machine and gag Graham, the Justin Mathematical form like it doesn't. It doesn't know anything about cannibalization of the people inside. Right? So you have to take care of incorporating that knowledge in your data set or tweaking euros function in order to for you to understand that it doesn't have to penalize the paper book if present

spk_0:   7:35
and that knowledge you can only get from business people, right?

spk_1:   7:39
Exactly. That is knowledge which you can only get from business people. And that is knowledge which nobody will tell you about until you get too far in your project. Right on. So that is like, that's an absolute killer, right? Because you get in your in your meeting room, what were your stakeholders and you present your super proud of your first M v p o. And you get an accuracy of nights per stand. Oh, and you really want to share it out of your stakeholders? And then the very first question after two minutes you have 30. Presenting is have you considered the paper side, and you're there, like, feeling like super dumb because that's up. I actually you haven't considered that right. Um and that's that. Which basically means that you have to restart your project from scratch. Um, so, uh, make sure you talk to the business people understand why you're doing what you're doing that is off paramount importance for any machinery project machine learning problem is a machine learning solution. Sorry. Um, is something which souls a business problem, right? There is no no machinery solution is out there just because it's fine, right? Machinery solutions, sold business problems, and business problems are complicated. Oh, and so what you need to do is you need to understand what is going on. Ah, in your solution to make sure that you are solving their problem. And you are so long it's over. You get the best of your possibilities and also something something always which alerts yard way. Ah, is, um, part of managers or the business people getting back to you One ground them day on telling you Hey, I don't understand why, um x y zed turning to Oh aye, instead of beak. Ah, and it's you would look really dumb like if you told them. Absolutely. Um, I supposed to know whatever random Forest is performing what is performing, but it is not true.

spk_0:   9:38
So it's not sure it's not just explaining the model. It's explaining the model to business stakeholders. So exactly, which is a different thing. You know, because if you go, if you need to go to the you're sea level people you can't just print out, you know your extra boost reason. Oh, you see, this is how it works on. This is why I decided this and that. And neural networks are even worse. You know, show up with your radiance and whatever and they'll kick you out of the room. So that's the next level, making it understandable to known technical people in justifying that. Yes, it's working that way. And it's correct.

spk_1:   10:19
Yeah, totally. And, you know, I tell you what it's it's Ah, it's incredible amount of insights. Ah, which you can get out of those exercises. Ah, it like it wasn't an eye opener for me. So I highly recommend to do that for, for you know, everyone as soon as you get closer to get a modeling place and it's performing okay. Try to understand what is the relationship between your features and you're dependent Variable, right? Are you able to want your questions? Like what is what happens to the to the price of a house when the volume of the house increases like this? This is Italy nearly correlated. Ah, that's it doesn't go down now, but it depend on other variables. What, before he does What? What is What is this kind of relationship? Um, it is super important. And it will help you as a machinery practitioner to understand deeply understand what the problem is end to get back to the business with suggestions. Why is your model performing the way it is performing? Why God, Why did my loan request get rejected? Right. Why is that book performing better than these other book? Right, So interpreter belay. I like interpretive machine learning. I thinkit's another huge thing, which the community has made a lot of progress on. That's ah, that's that's clear. Ah, but it's also something which is very frequent, frequently overlooked, especially from machine learning engineers. Uh, we're really focusing on on making sure that that your mother is resilient that it's working and it's fine and we got 90% accuracy and it's good. But it's not right. It's not because you really want to understand what is what is going on right? And then, as the humans are like to make sure that we trust those those models, we need to understand that machinery practitioners were not simply solving a business problem. But we're also getting back to the business with advises on how to make things better, right? Ah, so and you can you can get there on Lee if you really deeply understand what your your mother means doing, Um and and which kind of honey inside you're getting out

spk_0:   12:33
now? That's a very interesting point. I think it's the first time I've heard it mentioned the fact that modeling the problem actually teaches you stuff you know about about the data itself and the business problem itself. So closing. They're not just predicting. It's not just one way right. Here's data. Give me the answer is here's data. Build the model, give me answers and understand the hidden relationships in the data so that maybe we can, you know, come up with business decisions, right? That's really, really good point. Yeah,

spk_1:   13:06
I think you know, I think that I think that that is that is what is probably my most excited about, Like in this in this era, like off machine learning the fact that we are democratizing machine learning, right? Oh, it is so important. It's very important because it should be used by business analysts as well. Right? So people who are well did analysts like people were not necessarily in touch with Ah, great Mick way of solving problems. But if if you're if you're pm like is asking you Hey, you know what is again? What is what is the relationship between the price off the house and its surface, right. If you answer that question plotting plastic like uni variant relationship price of house versus versus surface, you're doing it gold wrong, right? Because what you're doing is that you're implying that there are no relationship a while other features write What you should answer instead is that is the relationship between ours price and surface of the house, all other things being equal.

spk_0:   14:06
It is just a projection, right? It's just a projection

spk_1:   14:09
exactly. And you can answer that on Lee. Feel a model, right? So it would be, I think you really hope it gets into the routine off a day to day routine off everybody. Right? Like even people were not necessarily in touch with with machine learning to say. You know what? Let me Let me feet like an extra boost real quick, you know, let me do that, then. You know, let me plot the partial dependents plot, right? Or let me, like, find, like, shapely values over here right now. Try to understand what is what is going on. Right? Um S o. I think this is This is super exciting. I think it's even more exciting than the all that the did the algorithmic development in getting like state of the art results. That is really cool. And it's important, but it's even more important to me to really let this knowledge sink across everybody s Oh, that's why education is important at all levels. Oh, and get started. Just just do it. Just make sure that you know what is going on. Ah, and you can use those models a CZ, much as you can because they are extremely powerful

spk_0:   15:13
any any last thoughts and anything. Any question I forgot to ask anything you want to see. Oh,

spk_1:   15:21
I There is definitely something I want to say on. There's something i e. I cannot stress enough. Whenever I get in touch with new people want to know what they can do. Um, in this, you know, in this domain, which is extremely overwhelming. Ah, And what when you don't need to get a PhD in statistics trying to get a PhD computer science, Uh, I don't know what to do. Ah, you have, Like, I buy. You have bought, like, 10 different volumes talking about all those all sorts of different things. Where do we need to start? And to me, if I have to summarize everything is create a blogged of your own. It is incredibly important. Like I it's gonna be a huge satisfaction. So start from what you love. Ah, this is probably this is probably ah rather obvious. Uh, but I I don't think it is because because new machinery practitioners tend to read online about what's the do the business wants or or what is important to know because it will get you get a job. This is of course, it's it's relevant and you should follow. Like you know what people around you are saying about about this justice domain. But if you really want to get started, start from more. You like what you want. You're interesting in computer vision. Just trainer catfish is not classifier and write a block about it. Just do it. It's It's It's It's incredible. It's an incredible experience and most importantly, you can talk about it, right you can. You can even talk about a solution, which you have implemented from zero to the very end after in AWS based West Web application, right? How many people can can can tell this kind of story. They're not that many year. And because you know there's there's a lot of people focusing on learning every every single possible statistical middle of metal in the ward before writing our line of code. Just do it the other way around, right? Eso start coding now.

spk_0:   17:22
Well, that's a great conclusion. So we'll leave it at that, and I want to thank you very much for taking the time and sharing the knowledge. This is This is really this is really invaluable. Thank you very, very much. And I hope to see you. Thank you. Thank you. That's it for this episode. I hope you enjoyed it. Don't forget to subscribe to my channel and I'll see you soon. We have more conversations and more content. Until then, keep rocking.