AWS AI & Machine Learning Podcast

Episode 13: Amazon Kendra special

March 11, 2020 Julien Simon
AWS AI & Machine Learning Podcast
Episode 13: Amazon Kendra special
Show Notes Transcript

In this episode, I focus on Amazon Kendra, an enterprise search service powered by machine learning... but you don't need any ML skills to set it up and use it! I show you how to create an index, add data sources, and then I run queries using the AWS console and the AWS CLI.

⭐️⭐️⭐️ Don't forget to subscribe to be notified of future episodes ⭐️⭐️⭐️

This podcast is also available in video at

For more content, follow me on:
* Medium
* Twitter

speaker 0:   0:00
Hey, good morning, everybody. This is Julian from AWS. Welcome to appease or 13 off my podcast. Don't forget to subscribe to my channel to be notified of future videos this episode. We're going to move away from machine learning for a second, and we're going to focus on a new service that was announced that reinvent a few months ago. This service is Candra. Kendra is a search engine that makes it really easy to create an index from data located in different sources and then query that index using natural language really, really cool service. And of course, it is powered by deep running under the hood. But you don't need to know the first thing about this. Let's not wait. Let me show you how to use it. Let's take a look at the council first. This is the Kendra Council, and the first step is to create an index. I created one already because this operation takes a bit of time. But let me show you how to create a new one. So basically, simply click on create index, give you the name description should like a role, because, as you can expect, Kendra needs permission to fetch data from the different data sources. We're gonna look at those in a second, but basically as three rd s etcetera. So you need to have permissions, okay, that you can create a role here, or you can use an existing one. Um, just make sure you have permission to access your buckets, et cetera. Exception. Okay, Click on create. And that's it. Okay, so it's gonna It's gonna run for a while, generating that new index, and then you can start adding data to it. Okay, so my index has already been created, all right? It's active, blah, blah, blah. And now you need thio. Data sources, data sources can be either an s three bucket on RGs databases or SharePoint online. Okay. And here I simply added industry bucket. And again, this is exactly what you would think. Give it a name past the location off your bucket. You can pass, mate a data. So if you have extra information on those files, you can have a separate file in that bucket with that extra information. So here. That's not what I'm doing. I'm just passing. Ah, bulk data. The I am roll and a sink schedule, so you can go from on demand to hourly daily weekly, et cetera, et cetera. Okay, click next, and that's it. Okay, so super easy and the same four aren't. Yes, right. And you can pick the engine type. And, of course, the connection information to go and fetch data and their own schedule. Okay, so, nothing. Nothing weird here. So let's take a look at my s re bucket. This is what I have in there. A bunch of pdf files, a couple off doc files. Um, these are just plain text. This is the 20 newsgroup data Said that you may know. So basically a collection off text messages from newsgroups. And what else? Ah, I have some slides. Yes, awesome. Off. My PowerPoint decks are in there, and I have a tongue off wiki media files. Okay, About 50,000 fires or something. So that's a very small fraction off Wikipedia. Here s O. If we look at the documentation, actually, for Kendra, we can see the types off documents that you can index, so html power point were plain text pdf So I have a bit off each, and as you'll see, you can also add questions and answers. Okay, So structure information. If you want to have really precise pretty fine answers, Thio, common questions. You can do that as well. Okay, so all these are supported at the moment as we saw we have a run schedule for sources here. I'm running on demand. So if I clicked on sink now, then the index would be refreshed with the daytime. That bucket. I run this a couple of times and I can see the number of documents that have been added are updated. So that's my total number of documents over 57,000 here. Okay, so super simple. All right. Um f accuse. Maybe so. F accuse our easy as well. All you have to do is upload a C s V file with a colorful question calling for answer and a column for you R l If you have that, if you want Thio, add extra information to the answer. Okay? Just put that in a necessary bucket. Uploaded, Okay? And this becomes a nephew que that your index will point to when when user requests hit the questions. Okay, um, before we query, you can also use facets. So, basically facets are fields that you can use to filter your content. So here I have some predefined feels like document title and last up dates if you count et cetera, et cetera, so I could make those visible in my search. You, you I And, uh, of course, the users could filter their requests based on that. Okay, you cannot custom fields as well. This is a little too involved for this short demo. But please take a look at the documentation. It's really, really not difficult. So once you've index your data sources, of course you want to query. So let's take a look at that. Yes, sir. We have a built in search council here, and, well, let's try and run some queries and I'm going to use natural language because that's what we do, right? We don't want to use keywords and complex query languages. We just want to use natural language. So what is Amazon sage maker? And I see this Amazon Candra suggested answer which which is really the best answer that camera confined. So what did it find? Um, it found text in one of my power on decks, and this is actually in speaker notes. Okay. And as you can see, Tyler lighting the text that he thinks is the best answer. And, well, this is actually a very good answer. Okay? This is proper language, not just a hit on a document telling you. Well, yeah, I found keywords in there. But is that really what you want? Your next one's arm or, you know, key words that are matched. So I could say, Well, this is a good one. Thanks. Okay, I live the sensor. Um, let's try another one. Okay, so there's a feature in sage medical pipe mode, and I'm sure it's in those documents. Let's take a look. All right, so here again, I have a suggested answer. So really, the top answer with a proper piece of texts. And this is really the definition of pipe mode, right? Pipe motor feature that streams data from Amazon. There three to training instances. And this is in one of my word documents. Okay, that's pretty cool. All right. And then the rest is mostly keywords. So, as you can see, Kendra is really able thio, um, highlight the talk at the top. Answer the one that really contains natural language that answers the question can not just matching key words. Okay, let's try and find information in those 50,000 files S o father record. I indexed articles starting with T. H. So that's where you're going to see a lot of that. It's really again, just a small subset of Wikipedia. So let's try that one. Who's that? Jones. Do you know? Okay, so Fat Jones is an American jazz from prettier composer and bandleader. Okay, again, this is a really good answer, because this is exactly what I was looking for. And the fact that title and finally and everything contains that Jones obviously helps can draw. Find out about answer for me. Okay, so let's try a few more. Maybe I want to know what instruments does that. Jones. Clay. Let's see what we have here and again. This is a really good answer because it does pull one of the Thought Jones article, but we can see it's highlighting again the meaningful words so that Jones is in there because obviously it's in my query and trumpet is highlighted. So there's definitely an association here between that Jones and Trump, which is exactly what my queer was about. Okay. So once again, there was nothing in my query. That's a trumpet. Kendra was able to understand the context of my query and find their rights. Answer and highlight the right bit of text inside the answer. Okay, maybe a last one. So where Waas that Jones born. Let's see what we have. And here we get the actual answer. Right. Contact. Michigan is highlighted on. And this is really nice, because obviously it is in Ah, in the Wikipedia article off that Jones. But here, Kendra can extract that information and promote it. Right. And it's it's really extracted from the articles. Again, we see natural language processing out of work. We see context being extracted from the query and, uh, check strings extracted from the top ranking article pointing me out. Yes, this is exactly your answer. Okay, I used the council here, but of course, we could use the CLI, so let's take a look. We have a bunch of can drapey eyes. Okay, here they are. Let's list indexes. And let's not argue whether indexes are indices is the right word. That's one for Miss Colors. Ah, I see. We have on index year. Okay, Let's see if we can clear it. Rajendra Weary Index I D and query. All right, so where Waas that Jones born. Okay. And I get adjacent answer, which is just the jays and representation of what we see. You're right. So a list of answers with you are a lt's to the documents and offsets to the relevant highlights, etcetera, etcetera. Okay, so there you go. Uh, this is Ah, this is the 10 minutes or 15 minute demo to Kendra. Super. Super easy to use. Um, pretty powerful, deep learning. Another hood, as you can see. But just a couple of clicks, a couple of a p I calls, and you can start indexing your data. Okay. Uh, s three rd s SharePoint online and Maur connectors in the futures. I'm sure. Okay. This is the test counsel. What would you do next? Well, I guess the next step would be to start integrating those different widgets in your own application. Okay, so we have a bunch off of documents here that show you how to do that. I'm not gonna go deep on that because that would make the video too long. And in involves front and skills, which I definitely don't have. Okay, so sorry about that. But basically, you can go and fetch those different components and integrate them. We have, ah, sample app that shows you how to do that. So if you were a front end person, you'll get it in no time. You know me. That's a challenge for sure. Okay. The last thing I want to say is ah, police check out the service page. You can find more information on features. Pricing, which is important, can drives probably more expensive than the service's you're used to. So make sure you understand pricing before you start indexing tons of documents. And, of course, you'll find customer stories here as well. All right, this is it for this episode. I hope you liked it and learn a few things again. Don't forget to subscribe to my channel and I'll see you soon with more videos. Until then, fuck the virus and keep rocking