TechForum 2019: Artificial Intelligence and Discoverability

This year’s TechForum conference featured many presentations on the latest research and developments connected to publishing. In today’s post, we provide a recap of the presentation Project TAMIS: Using AI to Address Discoverability given by Christian Roy, president of Brix Labs.

Project TAMIS was developed by Éditions Septentrion in partnership with Brix Labs. The project sought to demystify the issue of discoverability in the wake of modern developments such as search engines and recommendation algorithms. Making sure that readers can discover titles is a perennial problem in publishing, as publishers strive to help titles stand out in an increasingly crowded cultural landscape. As Roy pointed out, the way that customers look for titles has changed substantially over the years from searching through bookstores to searching using technology such as search engines and suggestion algorithms. However, publishing discoverability practices have not necessarily adapted to match these changes. At the same time, consumers are coming to expect personalized experiences and help finding new content. These technological changes have altered not only how audiences search for content, but also how they see results, as well as their expectations of service.

Customer service is moving away from product-centered systems of content discovery in which the burden of discovery lies on the customer. Instead, Roy explained, we are seeing a rise in consumer-focused systems of recommendation. Consumers no longer need to search for titles they might enjoy—algorithms examine past purchases and offer up curated options. This shift means that consumer data gains even greater importance. All these changes and alterations point to a shifting paradigm of customer service, from search to recommendation. Instead of wandering through bookstores, book buyers find titles through search engines. If searching through a desktop, results are presented in list format. However, 50% of searches are now done on mobile devices, which means that fewer results are visible at first glance on their smaller screens. Additionally, many search engines supply informational cards that highlight what the engine believes are the most relevant search results for mobile users. Roy pointed out that while everything from customer search methods to retail practices change, the way books exist in the system remains the same. The goal of Project TAMIS is to change that.

The hypothesis behind Project TAMIS is simple: books require metadata in order to be searchable by the new systems, but publishers often lack time or resources to create adequate keyword lists for all of their titles. Artificial intelligence—learning systems that specialize in pattern recognition and application replication—can work on a large scale, and books have a rich pool of content from which to extract data. Therefore, it should be possible to use artificial intelligence to generate metadata. To test this theory, TAMIS tested the ability of current AI programs to produce keywords on three fronts: image description, keyword extraction, and category generation.

Image description AIs generate metadata based on book covers. To learn more about the effectiveness of these programs, Project TAMIS tested AWS Rekognition, Microsoft Azure, IBM Watson, Cloudsight and Clarifai. Overall, the AIs all generated keywords that reflected the colour and content of the cover image. The AI provided by Clarifai also suggested abstract themes. For example, a cover featuring an arctic landscape across the board generated keywords such as glacier, ice, mountain, snow and winter, with Clarifai also generating more abstract keywords such as travel and adventure.

The usefulness of this kind of AI turned out to be largely dependent on the book cover type. TAMIS revealed that heavily stylized or cartoon images perform poorly, while photographic covers have a 50% chance of delivering useful keywords. The best keywords resulted from analyses of covers with realistic illustrations. The Project TAMIS researchers concluded that the usefulness of image-based AI generated keywords largely depends on the type of image being used.

Artificial intelligence vendors also offer the possibility of generating keywords by extracting them from the actual text. The AI identifies keywords, or entities, from the text using broad categories such as locations, events, people, and organizations. For example, a book about hockey might generate keywords such as NHL, Wayne Gretzky, and Stanley Cup. For this test, Project TAMIS analyzed results from Rosette, Google Cloud, TextRazor, and IBM Watson. The AIs tested provided an abundance of original keywords. However, the keywords were not necessarily relevant. Moreover, researchers found that the abundance of keywords generated could be problematic, especially since some retailers place a limit on the amount of metadata publishers can provide. For example, Amazon limits keywords to a 250-byte limit. Based on these facts, the study recommended the use of keyword extraction AIs to publishers who have extensive backlists that require metadata.

Finally, it is possible for artificial intelligence to provide publishers not just with keywords, but also BISAC or Thema codes for titles. There currently is no vendor providing this service, so TAMIS had to train AIs to classify titles. Training artificial intelligence requires exposing the system to extensive data and teaching it the correct way to interpret the data. During the course of the research, TAMIS trained Amazon Comprehend to produce BISAC codes by analyzing text content and selecting an appropriate code. TextRazor was trained to produce Thema codes by analyzing the subject headings found in a text. Results of the experiment were mixed. Amazon Comprehend only suggested FIC000000 or JUV000000, revealing a training issue that might resolve itself with more machine-learning. TextRazor proved more successful, providing consistently relevant Thema codes for non-fiction titles. However, TextRazor was not useful for fiction titles, which it struggled to differentiate from non-fiction. The research concluded that TextRazor is a good resource for publishers with a large non-fiction list, but overall, AIs still require more training before becoming a dependable resource for title categorization.

Based on these tests, Project TAMIS determined that AI most certainly has a place in publishing’s present and future. For now, artificial intelligence is a good tool for generating metadata tags. Studies have shown that humans are better at generating keywords, but AI can provide a good jumping off point from which a human can work, especially in the case of extensive backlist needs. Project TAMIS is especially eager to develop AI’s position in the future. Artificial intelligence systems create the possibility of increasing discoverability outside of retailers: AI could be used to produce structured data for search engines, vendor chatbots, and content-based recommendation engines, all of which depend on keywords to function. Content-based recommendation engines contrast with the current consumer behaviour recommendation engines, as content-based engines rely on keywords to match products to similar products, as opposed to matching consumer behaviour, which requires access to consumer data to suggest purchases.

Finally, the rise of voice assistants presents a new method of searching to consider. How do these assistants present search results? How do consumers use this technology to search for books? While Roy’s presentation did not focus on voice assistants, he did raise the point that the changes they bring are not insignificant.

For now, the Project TAMIS researchers continue their experiments and publishing results. The team also plans to build an interface for publishers that will assist in curating and downloading results from AI systems. To learn more about their projects and experiments, visit the TAMIS website:  tamis.ca.

08/19/2019 | Book Fairs, Digital