2023 was the year of the LLM. The ubiquity of large language models across information and content-related industries caused both excitement and trepidation.

For taxonomists, ontologists, content strategists and the like, initiatives focused on leveraging these powerful datasets for the categorization, classification, and creation of content presented unprecedented challenges.

Here are three areas where LLMs, aka “the robots,” may benefit from the expertise of actual humans-in-the-loop:

Accuracy

LLMs produce information based largely on existing content. As with metadata quality, the output is only as good as the input. The reliance on incomplete or contradictory data and and the large language model’s propensity to guess based on patterns versus facts can contribute to inaccurate results.

Let’s take this image as an example. This red squirrel appears in an image search for the word ‘fox.’ Automated or inaccurate image tagging caused by human error or poor quality image recognition begins a cycle of inaccuracy.

Photo by Man Dy on Pexels.com

Using an LLM-powered process to generate content about this animal is flawed due to the initial input and can then be conflated additionally if the LLM hallucinates from conflicting data.

For instance, by choosing to describe this ‘fox’ by definition as ‘a carnivorous mammal of the dog family’ or even by shortening this information to ‘a carnivorous dog,’ confusion will ensue.

In this case, process changes to include quality checks at various points can prevent any AI application from going down the wrong path.

AMBIGUITY

As if that squirrel fox wasn’t ambiguous enough, LLM-generated content can suffer from subjectivity as the robot overlords are not as adept at humans at distinguishing between similar concepts.

Let’s take every taxonomist’s favorite example of the word ‘orange.’ While ChatGPT knows that orange is both color and a fruit, the information provided is generalized, includes both options, and is not targeted.

Photo by Dominika Roseclay on Pexels.com

Again, a human needs to either specify color or fruit in the prompt or to evaluate or edit the final content. Having a structure like a taxonomy with nodes for color and fruit separately could help immensely with the autogeneration of prompts at scale to result in less ambiguity and higher accuracy.

APPROPRIATENESS

LLMs don’t really care about your feelings. When autogenerating content using these models, beware of inadvertently exposing information to end users that just isn’t nice.

For instance, if you are trying to generate interesting blurbs about popular celebrities, an LLM can’t really differentiate between a potentially salacious fact or something that is more appropriate.

While it is perfectly appropriate for an LLM to generate factual content about Taylor Swift’s birthday such as ‘Taylor Swift was born on December 13th, 1989.’

However, if the LLM decided to quantify Swift’s relationships and craft a fact such as ‘Taylor Swift has dated over a dozen other celebrities,’ that could be less appropriate depending on the context.

Summary

The robots probably shouldn’t just run amuck creating content independent of human classification and evaluation. At least not quite yet…

Using LLMs in conjunction with taxonomies specifically tailored to direct prompts and to classify content for distribution can prevent accuracy errors, reduce ambiguity, and also ensure that there are checks and balances on safety and appropriateness.

In addition, human evaluation of content quality is still necessary and likely will be needed for quite some time.

Here’s to 2024 – the year the taxonomists and the robots team up! 🎉

Leave a comment

Trending