ChatGPT, Gemini and Claude compared on CBSE Class 12 Question papers
On March 04, 2024, Anthropic released Claude 3 Large Language Model. It is the second model after Gemini Ultra to reach GPT 4 level of performance.
As per Anthropic, in several of benchmarks, it beats GPT 4. But results from real-world testing seem mixed. In multiple independent evaluations of its coding ability, it fails to beat GPT 4.
But some people have reported very good performance on queries with long context
On LLM Arena Leaderboard [3], both Claude 3 Opus and Claude 3 Sonnet are very close to the performance of GPT 4.
Anthropic midsize model, Claude 3 sonnet is available for free at claude.ai with free subscription. Claude Opus is available only with paid subscription. Like Gemini pro model all size models of Claude 3 are capable of image input. This feature is not available with free version of chatgpt.
On image input benchmarks, Claude 3 sonnet performance is not much lower than GPT 4. Especially on Math and Science they are very close to GPT 4. So I decided to test the performance of these LLMS on CBSE Class 12 Question Papers.
I took CBSE Class 12 Question papers and answer keys from this website
The main question that I wanted to answer with this exercise was whether these LLMs can replace a school teacher.
I tested Claude Sonnet, Gemini Pro, and GPT 4 on randomly selected five questions from the question paper of six subject exams in CBSE Class 12. Each of the models was tested on the same questions.
While testing these models, the main questions I asked myself is what precautions I will ask a child to take before using these models as tutor.
GPT 4 only narrowly edged out Claude Sonnet and Gemini pro models which are available with free subscription. I used GPT 4, as free chatgpt does not have image input capability. I did not use any prompting techniques as I wanted to see performance of these models when used by general public. For giving questions as input, I used screenshot of questions.
Performance of these models was better on non scientific questions. GPT 4 really struggled with mathematical notations in images. Gemini was really close to GPT 4 performance, it only fell behind after flunking accountancy paper. Further Claude Sonnet model is is available with free subscription but has usage caps. You can only send certain number of queries every day.
So for a student trying to learn subjects, Gemini model from Google can be best teacher. Unless you take a paid subscription of Chatgpt or Claude Opus. Gemini is close to GPT 4 in performance. As can be seen from rankings at LLM leaderboard.
Further there is a very simple trick to improve performance of these models. Just add “Lets think step by step” at the end of your prompt.
Despite all this, these LLMs can sometime provide wrong answers. In many of my questions, even GPT 4 provided wrong answers with full confidence. These models do not know when they are wrong and when they are right. So everytime you have to be careful while using their answers.
Once a solution is found to their hallucination problems, these LLMS can become a wonderful teacher.
Source: