PaddleOCR is the best open-source library for Optical character recognition. But if you want output in structured format you can use existing LLMs. In this article I will show how to do Optical Character Recognition using both PaddleOCR and Claude Sonnet.
- PaddleOCR
First install libraries
pip install paddleocr
pip install paddlepaddle
Then you can do OCR using following code
from paddleocr import PaddleOCR, draw_ocr
import cv2
# Initialize the OCR model with the desired language
ocr = PaddleOCR(use_angle_cls=True, lang='en') # Set 'lang' based on the language you want
# Run inference on an image
img_path = 'path_to_your_image.jpg'
result = ocr.ocr(img_path, cls=True)
# Display the result
for line in result:
print(line)
# Optional: Draw the detected text on the image
image = cv2.imread(img_path)
boxes = [line[0] for line in result[0]]
texts = [line[1][0] for line in result[0]]
scores = [line[1][1] for line in result[0]]
# Draw and display the image with detected text
from paddleocr import draw_ocr
image_with_boxes = draw_ocr(image, boxes, texts, scores)
cv2.imshow("OCR Result", image_with_boxes)
cv2.waitKey(0)
2. Claude Sonnet
First install libraries
pip install anthropic
Then obtain API key from claude website and initialise anthropic client
import anthropic
client = anthropic.Anthropic(api_key= ANTHROPIC_API_KEY)
Then you can OCR using following code
import os
import base64
import anthropic
client = anthropic.Anthropic(api_key= ANTHROPIC_API_KEY)
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
img_path = 'path_to_your_image.jpg'
base64_image = encode_image(img_path)
message = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": f"image/{img_path.split('.')[-1]}",
"data": base64_image,
},
},
{
"type": "text",
"text": "Return text in this image"
}
],
}
],
)
print(message.content[0].text)
If you are only extracting text then paddleOCR is better. But for extracting tables or structured data, Claude Sonnet or any other equivalent LLM will be better.