Skip to main content

Teneo x Generative AI

Concept

Generative AI is an extremely powerful tool that allows you to create varied, high-quality data at development time (for co-pilot scenarios) or at runtime. This makes it a great tool to use when building chat bots, as it can be used to enhance the flexibility and scope of a conversational AI solution.

However, relying solely on Generative AI can be problematic for the bot-building process in a number of ways. Generative AI systems face some limitations when it comes to reliability, understanding, and responses can be unpredictable without human supervision. Combining Generative AI with the functionality of Teneo brings together the best of both worlds and allows developers to:

  • Safeguard the Gen AI responses through multiple levels of control over what is said by the bot and from which source it originates
    • mitigating the risk of unwanted outputs or hallucinations from Large Language Models (LLMs)
    • controlling the costs of unnecessary LLM usage
    • preventing the solution from giving unwanted responses (and therefore protecting the company / brand)
  • Monitor and maintain Generative AI usage within the solution
    • use Teneo data tool to monitor and validate the runtime behavior of deployed bots
    • listen to the users to decide where the Gen AI is working
  • Maintain and develop a Hybrid approach as a team and over many iterations of the deployed solution
  • Teneo works with the developer to expose the power of Gen AI - but the control still lays in the hand of the developer

Combining Generative AI with Teneo and boosting the performance of both is easy and efficient, whether using Teneo's Gen AI Management features, Teneo Copilot, or integrating with Generative AI in some other way. Teneo supports the use of different types of LLMs - all together in one hybrid solution, and thanks to Teneo's pragmatic, modular platform, integrating emerging technologies can be done seamlessly.

When to use Generative AI

The decision about in which use cases to apply Generative AI is a balance of many things. Almost as many as there are use cases, but some key cases to consider:

  • Risk - Generative AI creating answers at runtime to end users is speaking with the Company Voice
    • Lower severity use cases could be covered with Gen AI
    • Higher severity use cases (such as quoting, cancellation or complaint handling)
  • Cost
    • Generative AI services have a cost, which can increase cost
    • However they can simplify some development, reducing the development cost
  • Applicability - Generative AI is able to many many things
    • The real win comes in choosing the things that it is good at
    • Things where the overall solution quality is improved by the addition of Gen AI

Generative AI technologies allow the users to perform a variety of tasks which can be controlled by and monitored in Teneo, reducing the risks of hallucination. Some common use cases include:

  • Dialogue summarization: summarize a dialog pulled from Teneo's built-in runtime dialog management - this summary can then be passed to another system at handover, recorded for future reference, emailed to the end user for their records etc.
  • Rephrasing output: personalize outputs sent to the end-user based on the dialog context and/or their available information
  • Sentiment analysis: generate the sentiment of the input or dialog and annotate for future use in matching or intent processing
  • Intent classification: classify the intent of a users input on the fly from a small set of intents (e.g. agree/disagree) and annotate for future use in matching or intent processing

Please see Gen AI Management for guides on how to implement some common use cases.

Input Handling

Pre-processing user inputs before sending them to the LLM is crucial for obtaining good results while maintaining privacy. The tokens that are sent into the LLM define the response that is generated, which is why it is important to ensure the quality of the user input.

Input filtering

Input Filtering is controlling which inputs are sent to Gen AI and which are handled by other systems within Teneo:

  • define which inputs should be fully handled by Teneo - and where a Gen AI should be brought in to provide additional information.
  • minimizes latency by not calling out to an external and potentially slow Gen AI model.
  • helps to build a robust system by never handling valueless or dangerous inputs such as insults, nonsense and abuse.

Data anonymization

Data anonymization ensures no personal or sensitive information is sent to the LLM.

Keeping track of what personal data may be captured by the bot is part of the normal development of bots. In practice, this means removing or encrypting sensitive data in the user input.

In Teneo, the solution developer is in complete control of which parts of the input are sent to Gen AI and in which form allowing full control of anonymization of the user input. Supporting anonymization both before (input) and after (response) means that the response can still be personal even within the personal data being sent to the Model.

Data normalization

Data normalisation boosts accuracy as the LLM is not having to work against the errors and minimizes costs by sending fewer tokens to the LLM.

Normalization is a embedded part of Teneo input handling and will out of the box clean the user input with a variety of language specific processes.

In Teneo, standard data normalization is handled by the Input Processor, and can be extended with Global Scripts.

When sending a user input to a Gen AI model it is important to consider which form (original, teneo-normalized, custom) of the input to send to get the best results.

Response Generation

Generative AI can be used to generate a response and provide this response directly to the user.

This use of Generative AI covers use cases such as personalisation and rephrasing - as well as covering responses to questions which fall into the domain in the Model's knowledge, for most standard Gen AI models this means public domain information at the time of training the bot.

One way of using Generative AI to generate responses is Retrieval-Augmented Generation (RAG). The RAG architecture has become the most wide-spread solution for grounding LLMs on data. This architecture approaches the problem of answering questions outside the model's pre-trained domain in two steps:

  1. In a retrieval step, the knowledge base is searched based on the question. This can be done using exact words in the question (keyword search), using an embedding vector that is computed from the question (vector search), or a combination of both (hybrid search). The retrieval step returns text snippets from the knowledge base that match the question. Using semantic ranking further boosts the quality of search results by finding the most semantically relevant information.
  2. A prompt is sent to the LLM, asking the model to generate an answer for the question based on the text snippets found in the first step.

Gen AI in Teneo

It is easy to integrate and maintain Gen AI within a solution with the new Gen AI Management features. These provide control, separation of responsibilities, collaboration features and all that you would expect from Teneo. This is the recommended way to integrate Generative AI in a Teneo solution. It allows the users to define AI agents with different personalities and further define different tasks for them, such as rephrasing outputs, extracting entities or creating a safetynet. Furthermore, its reusable modules ensure a well-structured AI system, which can be updated or expanded easily based on the user's needs. Furthermore this AI structure can be visualized using the Gen AI Overview to visualise all Teneo Gen AI usages within a solution and how they relate to each other.

Tips and Tricks

Static Answers

You can define "Filter Flows" in the solution to ensure that inputs are handled in the most appropriate (read: effective) place. It is recommended to avoid sending queries where a static answer can easily be provided instead and where the static answer is as effective as a generated answer. "Low value" use cases such as greetings, insults, repetitive questions are examples of these cases, as answers for them can easily - and in these cases more safely be defined in Teneo. Not only does this give you full control of the answers, but it also cuts down costs and latency that the use of an LLM implies.

Answer Analysis

Teneo Inquire allows to analyze the logs and build additional Flows based on those answers and the users' feedback to them.

User interactions with a published bot are stored in the logs. You can analyze these conversations using Teneo Query Language (TQL) and use findings to further improve your bot. For example, if you spot certain commonly recurring topics in user interactions, it may be beneficial to add Flows related to those topics to handle them within Teneo, rather than sending them on to the LLM.

Optimize Cost and Latency

Deferred annotations are useful for controlling the use of LLMs and ensuring processes related to cost and latency are optimized. You can, for example, ensure that a slow process is deferred and only run once if it is actually needed or that an expensive process will be called maximum once per transaction.