First impressions of Amazon Kendra, AWS’s new Enterprise Search Engine

I did a quick hackathon proof of concept using Amazon Kendra, AWS’s new service launched at May – an enterprise search engine (more on that later) that uses natural language.

We’re using Confluence wiki for internal documentation. People are encouraged to participate, and we end up having thousands of pages. When it comes to looking for information, the search is… well…

Google auto-complete says it's bad.

My goal for the three-day hackathon was to see if Amazon Kendra can beat the Confluence search.

The good

It works, it finds quality results, and it’s quick to set up. Kendra is using natural language both for extracting information from documents, and for parsing the search query and finding results. I was impressed that it knew how to surface the high-quality pages out of the 7,500 pages it indexed, and in many cases could highlight exact answers inside these documents. Kendra can answer questions like “what offices do we have in tokyo?“, “what is mongodb?“, and can handle jargon like “define {thing}” or even “what team owns {internal-tool}?” – all of this was picked up from our documentation.

I used the S3 interface for loading documents into Kendra. Each PDF file is paired with an optional metadata json file which is useful for category and the source URL.
It took me a little over a day to export 7,500 PDFs from Confluence – the slowest part of my project. A couple more hours went into generating the metadata files. Uploading all files to S3 was quick, and then it took about 30 minutes to create the Amazon Kendra Index, configure an S3 data source, and then 4 minutes to load the data. Once loaded, the data was ready for use. I used the AWS Console to configure Amazon Kendra, which also includes an example search form – more than enough to evaluate the results and demonstrate Kendra’s capabilities (they even supply React components).

Who’s the target audience? What’s special about enterprise search?

Large organizations have multiple knowledge management and sharing tools – wiki for technical documentations, internal blogs, training, chats (like Slack or Teams), documents (like Google Drive), and more. Each tool comes with its own search engine. A common problem is that employees don’t know where the data they need lives, so a unified search engine makes a lot of sense.

Kendra has another feature that is required for a company search and you will never find in a public search engine like Google or DuckDuckGo – per-document permissions. Each user is expected to be authenticated, and find only the documents they are allowed to see. (At my first job I took part of a project that introduced a company search engine only to shut it down on the same day – because suddenly many bad-kept secrets were readily available. The search worked – turns out the permissions have always been broken…)

Finally, the pricing and quotas – with an enterprise edition starting at $5,000 a month and limited to 40,000 daily searches – only make sense for large organizations and a predictable number of internal users.

What I’d like to see next

I worked with Kendra for a short while and didn’t dive in too deeply, but there are some features that would be more than nice to have:

  • Integration with industry-standard tools: For now, Kendra has connectors for “S3, SharePoint, Salesforce, ServiceNow, RDS databases, One Drive, and many more coming later this year“. Missing here are tools like Confluence or Google’s G Suite. This list is biased toward Microsoft services – no doubt targeting a certain kind of customer. It is also not clear how these connectors work when the data is on-premises, although a PrivateLink interface is provided if your data lives in AWS.
  • Support for comments, context, and hierarchy: Kendra can ingest whole documents and parses them well, but not all data is equal. It is missing support for any kind of links between documents. A comment that doesn’t repeat information from the page is meaningless on its own, and chats rely on the discussion around them. There is currently no way of modeling this Kendra, and the pricing is not friendly to this use case either – a short comment counts the same as a full document. You can sort of get around it by including comments as part of the page (breaking direct links), but I doubt this would fit a tool like Slack. For comparison, Elasticsearch can model relations between documents.
  • Visibility into accuracy: Looking into query results, there is no indication as to the confidence of the results. Was the query well understood? Did we find good answers or only poor matches? This data would enrich the result and allow more usages (for example, a Slack bot that answers questions when the confidence is high – like they did for Lex). Closest thing here is the TopAnswer attribute.
  • Better fine-tuning: I was relieved I didn’t have to tweak any settings, or define taxonomy and stop words – steps that are not always easy or clear. Kendra does have settings for boosting documents based on fields, but if you need finer control, currently it isn’t there.
  • Planned features: Auto complete, suggestion of related searches or correction, and user feedback used for incremental learning.

Conclusion

It’s good! Results look promising, and it will be even better with multiple data sources. I’m going for it.

Advertisement

Google Chrome, First Impression

Like many of us, I’ve spent much time yesterday evaluating Google Chrome. Here are some of my thoughts about Chrome:

While many features are missing from Chrome, I should note the exceptional design and
attention to details. Chrome is so usable, you can easily miss its features. For example, try to open many tabs, and than start closing them by middle clicking, or the close button. While you close tabs, Chrome does not resize them until you move away, so after one tab is closed, the next appears on the same spot. And the link destination status-bar, moving out of the window gracefully when the mouse is near it.

Google Chrome Screenshot

Google Chrome Screenshot


And “find in page” highlights the word automatically, and also highlights the scrollbar to where these results are. Nice one. These are small details, of course, but I think they show how well the interface is though out.
I was fairly impressed by the so-called omnibar, Chrome’s Url bar. I didn’t find it a lot better that Firefox’s bar though. Also, a lot has been said of Chrome’s speed and memory footprint. I’ll admit, I wish Firefox load pages this fast…

Some obvious things I miss on Chrome:

  1. List of open tabs – I could not find a list of all open tabs. Ctrl+Esc shows a list of all processes, but I cannot select tabs from there.
  2. Multiple Tabs – I can move one tab to a new window, but what about a group of tabs, or uniting windows? I’d expect selecting tabs with Shift or Ctrl, but no.
  3. Non-standard Interface – Google redraws the interface, and the obviously didn’t want to include a status bar at the bottom of their window. One thing they forgot: that little gripper at the corner that resizes the window. Not visible even with both scollbars visible.
  4. Popup blocker – The blocked windows looks like a title of a window on the bottom of the current tab, hiding the bottom of the page and the horizontal scrollbar. I think the Firefox \ Explorer way is more comfortable, although they can move the information bar to the bottom, if they like. Chrome’s way is distracting, almost like a popup. and gets in the way, like a popup.
  5. Default language – I’m Israeli, that’s true. But I use an English system. Chrome default language was Hebrew, without any warning. The setup was English, and the download site. Even after I changed the interface to English, I had to remove Israeli search engines and manually change the omnibar’s search to English.
  6. Plugins! – It’s a different web with adblock. And I miss changing tabs with the mouse wheel.
  7. Rss support – I click on an RSS file, and I see it’s content as if it was HTML. Ok, I don’t think a browser needs an RSS reader. I don’t even think browsers need bookmarks. But at least ask me what I want to do with it, or display XML nicely. Well, I’m sure this feature will come.

And lastly, some (two) links:

Chrome security hole
About pages – comments have some bugs, with a sure way to crash Chrome. It’s ok. It’s new.