AWS AI & Machine Learning Podcast

Episode 1

December 23, 2019 Julien Simon Season 1 Episode 1
AWS AI & Machine Learning Podcast
Episode 1
Show Notes Transcript Chapter Markers

In this episode, I talk about new features on Amazon Lex (chatbots), Amazon Textract (text extraction), Amazon Personalize (recommendation & personalization), and Amazon Transcribe (speech to text). I also demo profanity filtering with Transcribe (run for cover!).

⭐️⭐️⭐️ Don't forget to subscribe to be notified of future episodes ⭐️⭐️⭐️

Additional resources mentioned in the podcast:
* Amazon SageMaker Ground Truth demo:
* Amazon Textract demo:
* Introduction to Graph Convolutional Networks:

This podcast is also available in video:

For more content, follow me at and at

speaker 0:   0:00
thing. Is Julian from edible? Yes. And I'd like to welcome you to the first episode of my podcast. What took me so long? You know, I guess 20 twenties is just round the corner. So this is my new year resolution. Anyway, in this podcast will talk about the new announcements from AWS on A I and machine learning. I'll do some quick demos, and I guess in the next episode's I'll have some guests who will explain what, baby? Hold on AWS. And by the way, if you're building cool stuff on a double us with machine learning, please get in touch. And I love to have you on the podcast. Okay, so let's get started. Let's do the news way have this week. So this week, we have we have legs, we have transcribed, we have textract and we have personalized. Okay, so just just high level service is this week. Okay, let's start with Lex. Lex is our chat boat service, as you know, and this is actually a big deal. You can now save conversation logs for Lex, whether you're interacting with the child but in text mode or in the voice mode. Okay, so you can have text transcripts and you can have audio transcripts. And this is probably something that people have been asking for since the service came out. So, yeah, I can hear a lot of people rejoicing in the background. Yes, it's here. So how do you set it up? It's super easy. You just go to the Lexx counsel and in the settings for your child. But you're going to see these new N trickle conversation logs, okay? And this is where you take the box that says text logs are oja logs or both. And you can pass a cloudwatch law group name for text logs as well as an extra bucket name for Pooja logs. And, of course, don't forget to said the I am role that gives Lex permission to write stuff to called which logs and as three, Okay, and that's all it takes. Um, another benefit of this is of course, you'll be able to see missed, um, utterances. Okay, sentences that the user's said. And that didn't trigger anything in the butt s. So, of course it's all log now so you can run all kinds of analytics again, you know, here, you rejoicing and, uh, enjoy this new feature. Okay, let's move on to Textract. Test track is our text extraction service from images and documents, and it's PC idea says certified now. So if you're building retail on e commerce applications, Textract is in the picture. And more importantly, I think we release the whole bunch of quality improvements for textract. So it's even better now at extracting data from tables and forms. Right? And these can be quite complicated. So I won't spend too much time here because I actually recorded ah, video demo. I'll put the link to the YouTube video in the description trying textract with different types of documents. And it did do pretty well. I have to say so if you've never tried text right, this could be good. Ah, good time to do it. Okay, let's move on To personalize one of my favorite service is so personalized lets you build recommendation and personalization models from C S V data sets and actually now, in those data sets, you can pass contextual information, contextual information means, device information, time of day, information, etcetera. And these are important to provide the best possible recommendations. You're probably not looking at the same content on your mobile phone and on the Web. Right? So you want that data to be factored in. So what it really means is zooming in a bid here in the in the data said that you passed to personalize you now can use new ah keywords like location or a device to pass that extra information. And some of the Al Guero's here, available in in personalized will actually use that data. Okay, so H r and then we'll definitely use it. And so this allows you to improve the quality of recommendations and just, you know, build better models. So this is particularly and this is really easy to try. Just use those new keywords. Amazon transcribe was also updated this week with two new features. The 1st 1 is Job cueing, which lets you now submit up to 10,000 jobs concurrently on you could think, Wow, that's a crazy high number. Who would need that? Well, cold centers are heavy users off transcribe, and of course, they could have thousands and thousands off girls that they want. Thio transcribe her day and the prior Lehman was 100. So you could only send me 200 jobs, and then you would have to wait for those jobs to complete before you could send me dollars. Okay? And that's a low number for a cold center. So now we've bumped this number to 10,000 and we have, ah, a five folk. You to process them. Okay, So pretty, pretty useful if you want to transcribe at scale. And the second feature that I was on transcribe received this week is vocabulary filtering. And this is the woman I'm going to demo. February filtering is exactly what you'd expect. The purpose here is to remove unwanted words from the transcription outputs. So first released those words in a text file. Okay, One word per line. And we upload that file to transcribe. Okay. And here I built a file containing profanity because, of course, that's to my use case here. And once you've done that, you can create a transcription job with that filter. Hey, but I guess we need a sample. Right? So here's my sample. Um, this is a very happy customer, apparently. Hey, I'm calling because I bought this piece of shit product from your company and then It's complete crap. The motherfucker who sold these to me just told me a whole lot of bullshit. So I just want my body back. All right, Uh, I've heard this guy before. Okay? So you don't want that stuff in your transcription output? I think even. Ah, very basic sentiment analysis. Algo understand. This guy is mad literally, and we don't need all those ugly words. So let's create a job and let's give it a name. Um, the language is English. I'm going to pass the location of that firing as three. Of course, company before then I just click next and say, Hey, please use my profanity filter. And I've got two options here which are either to completely remove those words or replace them with that triple start, Ogan. And I'll go for that. Okay. And then I click on create. Okay, so let's just wait for a minute. Well, this job to be completed, and then hopefully we can see a clean and family friendly output. Okay, After a couple of minutes, the job's complete. And if I open it and look at the transcription well, I can see that all those nasty words have been removed. Hey, I'm calling because I bought this piece off blip product from your company. And man, it's complete blip. The blip sold this to me. Just told me a whole lot of blip and I want my money back. Okay, so? Well, this that's the funniest feature I've tested in a while, and I'm sure it's gonna be a popular one on stage. So that was it for the demo of the week. Now, let's talk about a few extra resources that you may like. Okay, let me share a few additional resources that I build this week. The 1st 1 is a multi part video on sage maker Ground Truth, where I'm showing you an end to end demo off annotating images. I'm actually using semantic segmentation because we have a new automatic tool for this. That's a critical demo. If you're interested in labeling image data sets, you're gonna like that. The second thing I want to talk about is my textract demo I mentioned earlier. We improve the quality of textract for tables and forms, and ah, again, I recorded a video showing you how to do this And the last one I want to mention is, of course, my deep graph library video showing you are introducing you to, ah, graft convolution, all networks. Okay, so this one is definitely more technical and and cold level. But if you're curious about graft networks, I guess this is a good place to start, Right? And of course, you can run the code yourself. Well, this is it for this first episode going down in history. Maybe you Maybe not. We'll see. I hope this was fun. I would love to hear your feedback again If you're building cool stuff on AWS, Please, Please, Please get in touch. And you could be my guest on the podcast. I'll see you later and keep rocking.

The news
The demo: vocabulary filtering in Amazon Transcribe
Extra resources