Part Two: How to chat with your documents in a Local Chatbot using OpenAI API

In Part one of the series, I showed how to create a local chatbot using python
In this article I will show step by step how to create a local chatbot which can take multiple files as input. Our chatbot would be able to answer our questions using information contained in files.
To achieve our goal I will modify my code in part one in following four steps
a) In part one, I used gradio chatinterface. In this step, I will use gradio block interface to enable more flexibility in our code
b) In the second step, I will add streaming support to our chatbot. Instead of waiting for the entire output of OpenAi API. Our chatbot will start replying as early as possible
c) In the third step, I will add support for uploading a single text file as input.
d) In the final step , I will allow adding multiple files as input where each file can be either Word doc, a PDF or a text file.
STEP ONE: Gradio Chatbot using Blocks Interface
This code uses gr.Blocks interface. This allows us to have flexibility in function which responds to chatbot. In gradio chatintterface used in part one, the response function to chatbot must have two inputs and two outputs. Here we will use the flexibility of blocks interface to use file data as input in last two steps.
In lines 7 to 9 , we define the three components of our chatbot : chat window, clear button and text box. In the response function, we define a function that takes the user message and history as input and returns api response along with the history. In line 32, we connect the textbox submit button with the response function and chat window.
STEP TWO: Gradio Chatbot using Blocks Interface with streaming output
The difference with the code in STEP ONE is given below

Here we add streaming support to the chatbot. The code passes stream = true argument to OpenAi API. Instead of returning the entire answer to the question at once, we get the response in chunks from API. We update the output of the chatbot as soon as each chunk is received. In lines 33 to 38, we iterate over the response object and also update the output of the chatbot by modifying history.
STEP THREE:Gradio Chatbot using Blocks Interface with text file as input
The difference with step two is given below

Here we add a gradio file component to our chatbot. We can select a single file using our component. We pass the content of the text file as context to API. So we can ask questions on the information contained in our text file. We have to take care that our total tokens in the query do not exceed the token limit of our model.
In line 10 we defined the file component of gradio. We read the file and pass its content as context in lines 17 to 20.
STEP FOUR: Gradio Chatbot using Blocks Interface with multiple files as input
The difference with STEP THREE is given below

In the final step, I added support for adding multiple files as context. This I enable by allowing passing filecount argument to gradio file component in line 31.
In lines 9 to 22, I added support for reading the contents of a text file, pdf file or a word document. In lines 38 to 40 we iterate over each of documents selected by the user in gradio file component and add its content to the context.
TESTING
I tested OpenAi API on SEBI regulations. I asked the following question to GPT 4.
Q Circular Trading is a violation of which Sebi PFUTP regulations
First I asked this question without giving additional context. This was the GPT 4 model reply
Circular trading refers to a fraudulent practice in which a group of traders create artificial trading activity in a stock by repeatedly buying and selling shares among themselves, without any genuine change in ownership or economic interest. This is done to manipulate the price or create a misleading appearance of active trading volume.
Circular trading is a violation of the regulations of the Securities and Exchange Board of India (SEBI), specifically under the Prohibition of Fraudulent and Unfair Trade Practices (PFUTP) Regulations. These regulations are designed to prevent trading practices that are abusive, unfair, or deceptive and that can undermine the integrity of the securities market.
Under the SEBI PFUTP Regulations, circular trading would fall under a broader category of manipulative and deceptive practices, including:
Fraudulent and unfair trade practices relating to securities.
Market manipulation.
Creation of false market.
Price manipulation.
Delivery of manipulated and misleading information to market participants.
SEBI can take action against those found to be engaging in circular trading, which can include monetary penalties, bans from trading, and even criminal prosecution. The specific rules and clauses under which circular trading would be an offense can be found in the SEBI (Prohibition of Fraudulent and Unfair Trade Practices relating to Securities Market) Regulations, which are updated from time to time to address various forms of market manipulation and unfair practices.
Then I uploaded SEBI PFUTP regulations and repeated this question. This was GPT 4 model reply
Circular trading refers to a fraudulent scheme where sell orders are entered by a broker who knows that offsetting buy orders for the exact number of shares at the same time and at the same price either have been or will be entered. Such activity is used to manipulate stock prices and create artificial trading volumes, giving the impression of legitimate trading activity.
This kind of activity is a violation of the Securities and Exchange Board of India (SEBI) (Prohibition of Fraudulent and Unfair Trade Practices relating to Securities Market) Regulations, 2003. Specifically, it falls under:
Regulation 4(2), which states the following as deemed to be manipulative, fraudulent, or an unfair trade practice if it involves:
(a) knowingly indulging in an act which creates false or misleading appearance of trading in the securities market;
(b) dealing in a security not intended to effect transfer of beneficial ownership but intended to operate only as a device to inflate, depress, or cause fluctuations in the price of such security for wrongful gain or avoidance of loss;
(n) circular transactions in respect of a security entered into between persons including intermediaries to artificially provide a false appearance of trading in such security or to inflate, depress, or cause fluctuations in the price of such security.Circular trading is typically involved in creating a misleading appearance of trading (point a) and artificially influencing the price of the securities (point n) as described above in Regulation 4(2).
SEBI has the authority to investigate and impose sanctions on entities and individuals involved in such manipulative activities, as outlined in Chapter III of the SEBI (Prohibition of Fraudulent and Unfair Trade Practices relating to Securities Market) Regulations, 2003. The sanctions can include suspension or cancellation of registration, prohibition from securities markets, impounding and retention of proceeds, and other actions as deemed necessary by SEBI to protect the integrity of the securities market and the interests of investors.
We can see with additional context, GPT 4’s reply to the question is much better. It correctly quoted sections of regulations.
It has been my observation that GPT 4 has not memorised the sections of regulations. If we want to make an application for legal use then we need to provide relevant laws and regulations to GPT.
However, our approach has a limitation. If the file size is large then we cannot pass the entire context to OpenAI API call. Further passing large context in each API call can be very costly. In the next part of the series, I will explore how to pass only relevant chunks of document in an API call.
If you liked my article please clap and subscribe.