In this episode, I have a chat with Cosmin Catalin Sanda, a Data Science Engineer for AudienceProject (https://www.audienceproject.com). We talk about real life ML, what it takes for ML projects to be successful (experimentation, agility, intuition, etc.), how SageMaker helps, and more.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future episodes ⭐️⭐️⭐️
Cosmin's Twitter: https://twitter.com/cosminsanda
Cosmin's blog: https://cosminsanda.com/
Apache MXNet User Group in Copenhagen: https://www.meetup.com/meetup-group-bdEUVQHL/
This podcast is also available in video at https://youtu.be/YqRd4Gh9a98
For more content, follow me on:
* Medium: https://medium.com/@julsimon
* Twitter: https://twitter.com/julsimon
Hi, everybody. This is Julian, formidable us. And welcome to Episode six of my podcast. Don't forget to subscribe to be notified of future episodes in this podcast. I'm talking to my friend Cosmin from Denmark. Cosman is a dead. A scientist is a blogger and he also runs the Apache Mxnet meet up in Copenhagen. We talk about getting started with ML running your machine learning projects, right, best practices and whole bunch of different things. So I'm sure you will enjoy that conversation and you will learn a few things. Let's not wait. Let's just listen to cause me cause me and thank you very much for taking the time. Teoh, speak to me today. I guess we need to start with an introduction. So tell us a little bit about you. What you're doing today and how you got started with machine learning,
right? So I started with data in general about 10 years ago. I was very interested in doing reporting and, uh, filling with the database management. So then I moved into doing more and more data engineering, and at the same time, I I was lucky enough to work for companies that spear had it machine learning efforts in Denmark. So I naturally became interested with the data science domain.
Okay, so can you tell us a little bit about your company and the kind of projects you work on on a daily basis?
Right, Just to give a little bit off overview before I dive a little bit into details. Audience Project helps brands, agencies and publishers to plan, optimize and validate digital online campaigns. Att. The same time. We're also helping our customers thio grow audience segments that are high ven so to achieve that, we use data science and machine learning. But several levels in our organization from the actual projects through the operations and an example is our solution audience hub, which helps our customers like publishers, for example, to grow audience segments from their deterministic data. So we use extrapolation that is driven by machine learning models to grow this audience segments Another. Another example is our two frequency graph that we used Thio understand how many times, on average person has been exposed to an online campaign, So for that we don't use necessarily machine learning. But we use graph algorithms OK, and one last example that you might relate to is where we have used machine learning to understand which availability zones from from from Amazon our best to bid in force for easy to spot instances.
Interesting, though that we
can We can have stability over time and also low price thistles. An example how we have used machine learning and ate our science of different levels in organization to deliver value.
Okay, that's pretty cool. So tell us about that. I mean, the typical project. How did you get started? You know, people tend to focus on algorithms and and and the technicalities,
Of course, it depends very much on the problem at hand. The way I usually approach a project I'm on project is that I tried to use my previous knowledge or experience, and then I do some research online. Oh, and I essentially trust the community. The crowd wisdom. Okay. I tried to find the example the projects that are similar to what I'm trying to do, and I tried to fit that into a solution for my problem. Also, the point of death is to become familiar with the problem so that I'm confident enough to discuss it. Then I would probably move towards doing some exploratory data analysis. I would understand the Daytop do cleaning. You know, I also approach my colleagues and ask for opinions and validations. Essentially right. I'm fortunate enough to be surrounded by smart people. So that's that's that's really helpful. There's always a good, good fit back. And then I I guess I would go towards implementation. I try to I try to production eyes or to have a working prototype as soon as possible. Yeah, that that allows me to get, um, yet to have a framework for doing multiple it orations towards better results. I
think, Yeah, I think that's a very important point, because again, one of my beliefs is that machine learning is software engineering and unique tooling and unique age. I'll techniques and you need it orations. And, you know, sometimes I meet people who tell me, Well, you know, I just spent six months researching the thing, and and then, you know, I'll tell you in six months, if I can build a mother or not Sure, yeah. So if you're if you're doing pure research, you know, that's okay, but, I mean, you're you're working for a private company, right? I
have business constrained. Exactly. I need to be pragmatic. We need to be pragmatic into, uh, yeah, We need to live in between the constraints off, doing something that is very good and doing it in a fixed amount of time and within a certain budget. Right, So we need to be pragmatic. I also wanted to add one more thing. That I It's I believe it's important when I will we start building the solution, we build it towards my company, we architect towards change. Okay, right. So it's important to assume that things will change, especially ml for an ML projects the model might change the data might change assumptions that you had first might not be realistic in ah production scenario. So assume change. And, um yeah, engineer towards that
is just agility and validate assumptions all the time and accept any any advice you would give to a young ml engineer to get started, right? What should they focus on in the early steps of their projects and careers,
to see the pros and cons of different of different approaches? Um, right, So I mean, you might say that neural networks are very powerful and cancel some problems with it. But how about explain ability, right? Yeah. That's a tradeoff. Yeah, it might work. In some cases, it might not work. In other cases, you need to understand your context very well. Not to have a hammer and then look for for nails.
Yeah, exactly. Yeah.
Oh, you know, a library. Very well. And then you try to use it for for everything. Again. This this goes hand in hand with the pros and cons advice? It doesn't work. Maybe Maybe it's time to look at something else, not tried to force your problem into a certain box. If you will create a project for yourself, that's that's what I like doing. And then experiment with tools. I create data sets, I create artificial data sets or I derive data sets from from existing ones. And then I tried to solve a problem. This is what I do. For example, on my on my block. Yeah, it's, uh, it's us it. And then I you I artificially create a problem and I try to solve it. So this is one way to gain experience. And I think experience is the is the is the most important trade to have right, experienced you intuition and intuition is extremely important in data science, right allows you to choose one model over the other. Ideally, you're able to explore scientifically all reasonable paths. But in practice, it might not be physical to do that, right? So at that point, you have to use your experience to know it narrowed down where you need to look what our actual possible solutions to your problem. What are the limited set of possible solutions to your problem? Experience of the of the team is also important, right? So that the same project might be handled in a different way in a different team with different skill sets that arrived at the same atT. The same product with the same quality, right? So where I work, we have a certain we have some experience and we work with tools that we fit more we find most comfortable to work with. Another team might find a different combination of tools to be comfortable.
Yeah, so yeah, I guess the moral is, you know, use what you know and use the best tool for the job at any given point.
I think There's a combination between used the right tool for the job and use what you know. Um, if you if you only use what you know, you might be missing. Right? So there is this tendency this days too. I mean, for example, in the pasta extra boost was very popular. And this is one of my go to tools, right? But if I try to use extra boost for everything
Yeah, I see what you mean.
I'm not get the appropriate results. There's a certain type of problem that I would apply to boost and perhaps some other solutions would be good. But if the if this if the problem fits and my peers fits, I will use extra boost. Otherwise I would have to possibly look for a different a different library.
Eso curiosity is is important, right? Trying out the new al goes and looks math degree. So you mentioned you were using Sage maker. Can you tell us a little bit about that? What? You what you like about it? And what you think the really strong areas are in sage maker,
right? So I think the most important thing for us is that well, there are several. But the most important thing for us is that it, um, gives us resource is that are already provisioned with the libraries that we need. And we don't need to maintain that, Um, we're using Go mxnet I have I'm very happy to say that I'm a great fan off Apache mxnet. And and that comes directly provisions in the stage maker of notebooks that makes our life varies. You know, we start the notebook we start to get we started notebook, plaster, and then in a few minutes we're ready to dive into the fun stuff.
D'oh! Science No, First, no fuss, no setup,
Exactly. Another thing that is great. And I don't think I've heard it mentioned many times, but I think it's important is that siege maker gracefully encourages best practices, right? It's like framework for, ah, for doing data science and machine learning That doesn't force you into a certain way, but it's certainly encourages best practices, right? So you have ah, you can You can easily provisions machines dedicated to training and to ah, validation. You can do provisions, end points, so you have some separation. That sense, the documentation is also very very well targeted towards best practices.
That's good to hear
that we like the last point would be deficiency. Right. So the way the resource is our being proficient, so in the past, I would maybe ah, long launch Ah, GPU cluster in the m r just to have a mxnet running on it, and that would work just fine. But there would be a lot of wasted dollars in sage maker. I can do my data engineering on a cheaper machine that works just fine. And then I can launch my training on a very powerful GPU machine for just a few minutes. Every dollar spent would be a spent towards actual training. And, yeah,
there are some
recent developments for sage maker like the experiments library. So I was very interested in debt. I was so interested in that in in storing, um, trials. Consistent matter. That's before I knew that stage. Many experiments cinch maker experiments would appear I have made the commit or tow tow the ml flow product. That is from data bricks four for ah, a black in essentially for Apache mxnet glue on. Okay, but now we have sagemaker experiment, so I'm going to see which one I'm going.
Well, you know, you just try it out. Try both. Let us know for missing anything. So, Cartman, we're almost done. Any last words?
Um, just thank you for inviting me. It's always a pleasure. Thio have talked to you, Juliet.
Well, thank you very much. Thanks for your time. And thanks for sharing the knowledge. I'm sure this will be much appreciated by by the listeners. That's it for this episode. I hope you enjoyed it. Don't forget to subscribe to my channel. And I'll see you soon with more conversations and more content. Until then, keep rocking.