AWS AI & Machine Learning Podcast

Episode 9

February 11, 2020 Julien Simon Season 1 Episode 9
AWS AI & Machine Learning Podcast
Episode 9
Show Notes Transcript

In this episode, I have a chat with Leo Souquet, a Data Scientist and the co-founder of the Data Science Tech Institute ( DSTI trains Data Scientists and Data Engineers, and we chat about those two roles, their respective skills, how they’re related, and why you need both on your team to build successful projects. Much to my sorrow, Leo also convincingly explains why you have to learn math to build the best models.

⭐️⭐️⭐️ Don't forget to subscribe to be notified of future episodes ⭐️⭐️⭐️

This podcast is also available in video at

For more content, follow me on:

- Medium:

- Twitter:

spk_1:   0:05
Hello, everyone. This is Julian formidable rest, and I'm not in my usual settings because I am in Ireland today getting ready for the AWS User Group in Dublin tonight. But I wanted to get these new episode out, and it's a piece of nine already, and I had a great great conversation with my friend Leo Sasuke and Leo is dead. A scientist is actually finishing is a PhD in Paris right now, and, believe it or not, he also co founded a school a few years ago. This courage. Call the Dallas on Step Institute, and they trained data scientists and their engineers with a lot of emphasis on math and theory. My favorite subject anyway. Leo as, ah, great perspective on dead or engineers. They're scientists and what it takes to be successful, what those jobs are, how important they are, how they relate to each other. So let's not wait. Let's listen to this conversation. And while you're doing this, I'll have more coffee for the meet up, not beer yet.

spk_0:   1:07
Leo, thank you very much for taking the time to talk to me today. I've got your questions, and I just The 1st 1 is Can you tell us a little bit about D S T I thank you for having me today. Yes. Dies as a postgraduate school that just came to train skill data, scientists and engineers. Okay. And when did you start? S. So we started. Um, October 2015 on We have now two courts for a year to entry per year, war in spring and one in autumn.

spk_1:   1:41
Okay. And you have, ah, online training and in class

spk_0:   1:45
training. Right way have three different moods. Exactly one so regular on campus trading just coming for classes on campus. We have the equivalent of night. So you full of classes, life online from home for wherever you are around the world. And then we have 1/3 more, which is we call the spoke Moz or no, uh, no pun intended just stands for self based on, like, horses, which is designed for people work. And we beg you full of the class is over the course of three years at their own based. Okay, So you mentioned your training data engineers on data scientists. So tell us a little bit about those two roles. So it's an interesting questions because the data Sansa stuff lately bean under the spotlights. It's the job of the 21st century. The sexes job, exit, ramp up that engine years years Well, so the for us, the data scientists is Maur the muff, please the math side of the force on Where's that? That engineers more related to the anti science. Okay, the 18 year. And actually you have many different studies for different governors and all that kind of thing which tell that in that's a data science team. You would need at least two or three other engineers for $1 scientists. Okay, so it's interesting point of view that, um and to our opinion, that engineer suffers a bet from the shadow of the fame of Inventor Sciences. But they are equally even know you could be more crucial. Okay, interesting. I'm now wondering what happens when you have a team of De Santis with zero Thistle's. A team that has been wrongly side are just has bean hired out of, you know, we need to do a I I days and just know them usually struggle in, like in every team you have a one will do You actually think that engine room. Okay, so let's zoom in on the tools. Okay, So, uh, let's start with the data engineer. All right? I

spk_1:   3:58
suppose they sit, you know, upstream. Thio Della Santa. So, what does data engineer do? What kind of unique skills do they need to have? What are their Davey tools?

spk_0:   4:11
What? Us? What does that, Florence First I would like to, especially by a point that the do sits upstream, but also down straight. Okay. What I mean is, they are here. So set of skills to master infrastructure. Um, ideally, big that infrastructure, but in frustrate us as general with only no network related issue system related issues storage. So they're always made to collect. Sometimes, you know, pre processing clean a bit, the data, store them and make them available for the Dallas and tissue. That's to play with. But they also in some companies downstream, because once did that a scientist has developed morals and everything. There will be the one industrializing the models. Okay, Sitting in production. What has been a development that scientists off course, it's all under the big hood off that engineers, but many different skill. Of course. It can be split up in different skills in different area, but already too strong. I 80 Bagram. So they need to be strong with i t that need to be strong with development on dhe. They also need to be above trends. What I mean is, they don't have to be a what? Okay, that engineer specialising the specific language of the specific tool it's more about It could be like disruption. It's a instead of mine. It's a philosophy, of course, instead of tool in technologies. But so so which optimized to automate your process. They also need to be master of sequel. Well, yeah, like, yeah, we had this conversation

spk_1:   5:49
well last week, actually with Francesco, Andi said the top skill you need machine learning is

spk_0:   5:57
no one is gonna get your data for Exactly. And that's that's a great idea. They need to be mashed up. Single. Then it should be. Yeah, also, you know, aware of the devil apps are always kind of like data ups, always automated to C I c. Always kind of lines by plans, integration and, of course, cloud infrastructure released. Their allow you to have the power. So you have a strong IittIe background and strong i t mindset.

spk_1:   6:27
And I suppose they need to be knowledgeable about the domain.

spk_0:   6:31
Exactly. Because otherwise you have a bunch of sequel

spk_1:   6:35
tables or loves or whatever data looks like and what you do with that, right? So it's

spk_0:   6:43
not just problem being, and I heard a sparkle, so it's nice for exactly that need to be. Especially there were the 1 to 4 they scream date our gather external source of that. You need to understand what you're doing it for, but also in our opinion, they also need to know. And I know people would disagree. A Lego build math to be able to discussion. Collaborate with that a scientist which are usually math people. Yeah, I'm so they don't have to go down into the equations. But the artist you know what? The linear Regression Bay Just enough to communicate exactly. So the data engineer is a unicorn to write, actually, yes, scales, there aren't so many of them.

spk_1:   7:26
And after rushing, you know, higher data science is not companies realized, man, we need that way.

spk_0:   7:32
We need the I t guy. You know, it's a good career. It's a very good curie about. And, uh, we have companiesforeign. So we were recently in a big French group who say I struggled to anger them and I would pay them 1.5 times the salary. I'd be dead. Santis, did you hear? So getting todo you need contacts? Exactly. Try to we try to. Everybody wants to become data sent way. Try to get this. You know, I t guys out of their development job, and so you can go, actually, father. Okay, So if you're bored with the devil, didn't becoming a You think you can probably easily transition to date and you're very, very good with a lot of work, of course, but it's a lonely open. Okay, that's good to know. Now, what about Della Santa? So, of course, your vision. And you know, because I know you have a very the way D S t I trades data scientist is probably different from a lot of other organizations. So what? All those people. So for us there, as I'm saying, as I said, massive. Jeez. So they're good in math. They also and this is our issue. Good nightie, because they need to give up. I need to prove that the I d. Works when they want to train a job in an old yes, in the cloud that just I need to do that themselves. You know, longshoremen seen. That's nothing complicated, but you need to be dependent on the outside. But from our as you say, our little difference as D s jazz curriculum is empathize on mouth. And the trend nowadays is, you know, to use out of the box black books tools off the shelf tomorrow's you like the famously Jews that works usually well. But our concern is, if you need to use them. If you were like you are going to use them, you need to know what are the muff behind it? You're not gonna be crushing equation every day, But the better you understand the algorithm you're using, the better you will be able to make value out of it. And we had a feedback from the students a few chords back because we can't gorge, we said, like in predictions in mashing, learning to go from 0 to 90% is rather easy with multi jewels at the FBI would learn how to go from 90 to 95%. This is a difficult because how do you Optimus tomorrow? How do you felt? You clean your data and always kind of math related issue. And also you need to prove what you're doing is just No. Ah, here it works. About two more. If you were to puts knowledge with them in a plane, you want to know how and why the algorithm actually worked interesting. It's a good way of putting it. But I ask myself the question, Where would

spk_1:   10:20
I go next? One Where What would I do to understand how to get to 95 or 97%? Right? And that's what we're talking about looking at those missed predictions and understanding what happens

spk_0:   10:33
exactly. And how did you get further how to explain what you've known how to prove? Yes, this this kind of question that we try to answer by understanding the mind behind in going beyond just out of the books were also, um, an experience where you know what? I want to take our python, for example, Loose blanche of lotteries. And what if one of those calculations actually wrong? Teacher, you shouldn't rely entirely on the result you should understand what? Actually, if the output makes sense, there's a logic behind it. And as you say, going further understanding the out liars understanding the residual, that was kind of thing, and you need to understand the mouth behind it. So let's get digging on. Uh, tell us about two or three mass domains that helps you do that, all right, Because when you look at all those great moves, you know, Andrew and G or fast et cetera, et cetera, There is some element off mass, of course, but not as deep as what you Carver. So tell us about a couple of those courses on how they help, for instance, like statistics, you know, the famous Lena Regression. But what about Milty? Very linear regression, logistic regression and in which contacts can got to use them. How do you how your data sets properly designed distribution of your data security just related to so you could have to use in aggression? When do you use cost smoker or smearing off tests in which contexts Another difficult to pronounce about is also difficult? Yeah, always kind of like in statistical concept. For instance, we also, you know, distributed environment nowadays and our old statistical algorithms, But like distributable a ll the results correct. When you distributes, okay, because you have, you know, concurrent calculations. You let someone together, so always kind of challenge understanding my behind it. Also, if you mentioned keep burning, for instance. Very easy to Kara, especially with Karash know a few layers at boom train up 92%. That's what I came here the best. But what's behind it? Because it's the whole feel optimization. You know what? How does the stochastic grading descent actually works? Um, when you look at your valuation actually secure, but you should lost curve. How do you know you're on the right path? How do you know how which what would lead to any improvements at understanding the field optimization, which is the foundation of your networks, weren't going very, very deep. We do. We do. And this is the point of off de esta, for it is also very YouTube, which is time Siri's, where no regular elegant cannot help. Deep learning is not the best. So how do you handle time serious now? This is and once again, if you just use the out of the shelf algorithms. You might actually end up making the wrong decision. Okay, So understanding the mouth behind it's always help you make the right prediction, so to speak and howl. So if you were to present in front of your you're see you on your board, Okay, This is my prediction, but this is my confidence interval. This is what it means. Um, that confidence or, you know, the algorithm is not good. This is also a huge concert, for instance, that tools will always give you a result. And it's easy to just, you know, like art interpret them, making sense, especially for the ones who don't understand them. Let's see what you want to see exactly. Kind of things you need to know. Okay, that model is not good news result. I'm not the one I should have it. So these are examples, or, um, we're understanding the mouth would significantly help you being the better than a scientist. Okay. How far can we actually go on explaining? I would say the limit tends to be near let were extensive burning. Should be like the old oh, all the linear regressions up to the random forest guard algorithms who's getting somehow mathematically proven on. You need to be very right, but it's doable. You can prove that the algorithm actually works, but on deep running there's a lot of work on, you know, explaining How did the algorithm came to that conclusion? How to certify the based algorithm will surely work in the specific contents. And this is the limit where that this is where I drew the lining. That's a latest advancing burning because we muddled works on dhe. Without doubt, it works. But explaining why? And just ask yourself, Would you would you go on a plane driven or flown by defending model and out now what it is, I would have no specific industries like iro ticks or defense this kind of companies. They are working on it and experiencing those kind of things. But they should know the story in pretty they've been production. So that's why the kulina Russian thieves. You worked well. Uh, yeah. So this Well, this is where I drew the line, but it doesn't mean that they're not interesting. Um, but you need to once again interview where on those limits on to which extent you can actually use them. So what's your opinion on our data? Big data and machine only I would say one of the my son is being sage maker. And I know you've been doing a lot of video about trust as a teacher when we when we set up in Jupiter and when you have to read by hands and installing Jupiter and everything some data scientists not wanting to do too much system I t are more than happy. When is your kick a button? He has your Jupiter. Help yourself. So we have sagemaker is probably my my fair because it makes my life so much easier and the fire that, you know, you could just launch and get your training on all the instances, all this kind of thing. We're still waiting for the deep racing car to be launching France to play around at school about that. So, yeah, you know that he's not getting wasted, really depressed or in front of everyone needs to be able to buy it. I'm just going to buy a justifiable elsewhere. Just so yeah, I'm pretty And the one I used the most I would sell So is recognition not to be able to quickly show. Okay. You want you some? Aye. Aye. Proof of concept. You don't have to win implements the whole deep learning more training for weeks. Somebody always did it for you. And if you want to prove that no facial commission works or object takes, it works here. Super lady. I call you run with it in a few hours, you can make a very good proof of concepts that will prove to your manager and users that this is what you will actually. Next business case. There's a business case. And this is what the census tend to forget. The actual business case, They didn't. You focus on the way. How can I get 99% then? Do you need 99%? Should I

spk_1:   18:06
use data or should I say is

spk_0:   18:07
Gina All right? My ocean. That right? This is my best barrier. Should I write good? Like my friend Leo, this is great. We could continue for are

spk_1:   18:16
almost out of time. So I guess my last question is, um how do you start your data engineer? If you're a developer and you're interested in that role when you get started

spk_0:   18:31
my best advice. Find one of the certifications on one of the providers you you have online and trying to fool the courses because passing those professional certification usually have a well structured buff towards its on Ben, looking to the industry your you're interested in and what kind of technology, What kind of challenges? Their encounters that are they able to, you know, there's no need to go into develops if you are. If you know, in a ministry that actually, no, it's mature enough for that kind of thing at if you're still in a company, not material will probably end up no handling of data wrangling. The obstacle. Come, think so. There might be a bit further for the down the roads. So depending on which kind of industry but Deuce Deuce play online. But for really get you started on high quality courses. Okay, So easy. Why would start?

spk_1:   19:26
Okay? And I'm afraid of the answer. But then, if you want to be a better scientist, where do you

spk_0:   19:31
start? I would see the end range. The machine would be the start. Andrew, Everyone loves you. That that does James lives. I think that all of you say that? But yeah, Angry. Nice, because for me, it's the perfect drills between 19 months. So we start there and then I would once again you can't be a full Dallas Die insisting you understand from every single aspect of every single algorithm. So it's more about wake of application you want to do because if you want to do you know, pure industrialized application you're probably just gonna be the basically in aggression logistics to be gibberish, maybe years of down. If you re interesting into advance A I to to take the term are interesting computer vision of 19 and LP and then, you know, take loose framework out of the box can help you follow and get something great and then specialize yourself in one of those areas to develop. You're a real expertise. Your expenses exactly. Because if you want to be a full data scientist, you're gonna you're gonna need to spend times in class, obviously, But getting into your job that would be good and off course. For every single data senses are, don't be afraid of it, because you will do a lot of fighting, right? See, it's not just me thing. All right. Learn about the cloud, guys. Let's shooting. Okay, Leo, this was really, really great. Thanks again. Thank you very much. Joining me today and I'll put all the details about D S T I

spk_1:   21:11
in the video

spk_0:   21:12
description. Checked them out. Get in touch with Leo