Part 1: Optical Character Recognition in Python

Rohit Raj
2 min readOct 27, 2024

--

PaddleOCR is the best open-source library for Optical character recognition. But if you want output in structured format you can use existing LLMs. In this article I will show how to do Optical Character Recognition using both PaddleOCR and Claude Sonnet.

  1. PaddleOCR

First install libraries

pip install paddleocr
pip install paddlepaddle

Then you can do OCR using following code

from paddleocr import PaddleOCR, draw_ocr
import cv2

# Initialize the OCR model with the desired language
ocr = PaddleOCR(use_angle_cls=True, lang='en') # Set 'lang' based on the language you want

# Run inference on an image
img_path = 'path_to_your_image.jpg'
result = ocr.ocr(img_path, cls=True)

# Display the result
for line in result:
print(line)

# Optional: Draw the detected text on the image
image = cv2.imread(img_path)
boxes = [line[0] for line in result[0]]
texts = [line[1][0] for line in result[0]]
scores = [line[1][1] for line in result[0]]

# Draw and display the image with detected text
from paddleocr import draw_ocr
image_with_boxes = draw_ocr(image, boxes, texts, scores)
cv2.imshow("OCR Result", image_with_boxes)
cv2.waitKey(0)

2. Claude Sonnet

First install libraries

pip install anthropic

Then obtain API key from claude website and initialise anthropic client

import anthropic

client = anthropic.Anthropic(api_key= ANTHROPIC_API_KEY)

Then you can OCR using following code

import os
import base64
import anthropic

client = anthropic.Anthropic(api_key= ANTHROPIC_API_KEY)

# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')

img_path = 'path_to_your_image.jpg'

base64_image = encode_image(img_path)
message = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": f"image/{img_path.split('.')[-1]}",
"data": base64_image,
},
},
{
"type": "text",
"text": "Return text in this image"
}
],
}
],
)
print(message.content[0].text)

If you are only extracting text then paddleOCR is better. But for extracting tables or structured data, Claude Sonnet or any other equivalent LLM will be better.

--

--