The first question that one may ask is why not just use the ChatGPT interface and ask questions. It has been trained on a humungous volume of Internet data generated till 2021, so a text corpus like the Mahabharata is known to it.
That was my first approach. I asked the ChatGPT several questions about the Mahabharata. I got good answers to some questions. However, they lack the rigour for the most. And that is expected. The GPT is trained over general data sets. It can very well understand and interpret natural languages. It can also reason well enough. However, it is not an expert in any specific domain. So, while it might have some knowledge of The Mahabharata, it may not respond with deeply researched answers. At times the GPT may not have any answer at all. In those cases, it either humbly refuses to answer the question, or confidently makes them up (Hallucinations).
The second most obvious way to achieve KBQA is to use a Retrieval QA Prompt. Here is where LangChain starts being extremely useful.
Retrieval QA
For those unfamiliar with the LangChain library, It is one of the best ways to use LLMs like GPT in your code. Here is an implementation of KBQA using LangChain.
To summarise, here are the steps to achieve KBQA on any body of documents
- Split the knowledge base into text chunks.
- Create a numerical representation (Embeddings) for each chunk and save them to a vector database.
If your data is static, Steps 1 and 2 are one-time efforts. - Run a semantic search using the user’s query on this database and fetch relevant text chunks.
- Send these text chunks to the LLM along with the user’s questions and ask them to Answer.
Here is a graphical representation of this process.
So why go any further? It seems like a solved problem!
Not quite 🙁
This approach works well for simple questions on a simple and factual knowledge base. However, it does not work for a more complex knowledge base and more complicated questions that require deeper, Multi-hop, reasoning. Multi-hop reasoning refers to a process in which multiple steps of logical or contextual inference are taken to arrive at a conclusion or answer to a question.
Moreover, the LLMs are limited in the length of text they can chew in one prompt. You can, of course, send the documents one at a time and then ‘refine’ or ‘reduce’ the answer with every call. However, this approach does not allow for complex ‘multi-hop’ reasoning. In some cases, the results using the ‘refine’ or ‘reduce’ approach are better than simply stuffing all the documents in a single prompt, but not by a high margin.
For a complex knowledge base, the users’ question by itself may not be enough to find all the relevant documents that can help the LLM arrive at an accurate answer.
For example:
Who was Arjuna?
This is a simple question and can be answered with limited context. However, the following question:
Why did the Mahabharata war happen?
Is a question that has its context spread all across the text corpus. The question itself has limited information about its context. To find the relevant chunks of text and then to reason based on that may not work.
So what next?
AI Agents
This is one of the coolest concepts that has emerged after the advent of AI. If you don’t know the concept of an AI Agent, I can’t wait to explain it to you, but I may still fail to convey its awesomeness. Let me use ChatGPT to explain it first.
An AI agent, also known simply as an “agent,” refers to a software program or system that can autonomously perceive its environment, make decisions, and take actions to achieve specific goals. AI agents are designed to mimic human-like behaviour in problem-solving and decision-making tasks. They operate within a defined environment and interact with that environment to achieve desired outcomes.
Simply speaking, an Agent is a program that takes a problem, decides how to solve it, and then solves it. The Agent is provided with a set of tools like Functions, methods, API calls, etc. It can use any of them if it chooses to do so in any sequence it deems fit. Contrast this to conventional software, where the sequence of steps needed to solve the problem is pre-programmed. This is, of course, a very vague definition. But you probably get the hang of it by now.
Here are the two different agents I tried for our KBQA use case.
React
This Agent uses a ‘ReAct’ (Reason and Action) style of reasoning to decide which tool to use for the given problem.
Here is the langChain implementation of a ReAct Agent:
I provided the Agent with the following tools to choose from:
- Retrieval QA chain with a document store.
- The Character Glossary search (I created a glossary with Named Entity Recognition using a pre-trained model)
- Wikipedia search.
The react agent did not give me good results and failed to converge to any answer most of the time. It does not work well with GPT 3.5. It may work better with GPT 4, which is 20 -30 times more expensive than GPT 3.5, so that may not be an option yet.
Even when it converged, I could not get good results. Someone more knowledgeable in creating ‘react’ prompts probably would have done better.
Self-Ask Agent
This agent asks follow-up questions based on the original question and then tries to find the intermediate answers. Using these intermediate answers, it finally arrives at a final answer. Here is an article explaining the Self-Ask Agent
This approach gave me some good results. It works well for a Single-hop reason. But even this fails for questions that require multiple hops.
For example, the questions:
Who killed Karna, and why?
Is relatively easy to answer with this approach
The question
Why did Arjuna kill Karna, his half-brother?
Is much more difficult to answer. It requires the LLM to know the fact that Arjuna did not know that Karna was his half-brother. The LLM can’t know that it needs to know this fact, either by understanding the question or by asking further questions based on the original question.