How_to_Extract_Text_from_Images

How to Extract Text from Images with Python (OpenCV, Pytesseract)

In today’s blog, we will explore how to use Python to extract and analyze data from specific regions of a screen capture. By utilizing the powerful libraries OpenCV, pytesseract, and CSV, we can efficiently process images and extract valuable information for various use cases, such as forms, reports, and technical documents.

Objective

The goal of this script is to capture predefined regions of interest (ROIs) from a screen capture, extract text using OCR (Optical Character Recognition) via pytesseract, and store the results in a structured CSV format for easy analysis.

Prerequisites

Before diving into the code, ensure you have the following libraries installed:

pip install opencv-python numpy pytesseract

You’ll also need to have Tesseract-OCR installed on your system. You can download it from the site.

Defining Regions of Interest (ROIs)

In the script, we define multiple regions of interest (ROIs) based on the coordinates of the areas we want to capture. Each ROI consists of:

  • Top-left coordinates
  • Bottom-right coordinates
  • Type of data (text or box)
  • Field name

Here’s an example of how the ROIs are defined:

roi = [
    [(642, 43), (1236, 140), 'text', 'FormTitle'],
    [(33, 199), (141, 242), 'text', 'Date'],
    [(144, 198), (265, 242), 'text', 'MachineNo'],
    # More regions can be added here...
]

Each entry in this list corresponds to a specific region of the screen where text or numbers are located. The goal is to extract the relevant data based on these coordinates.

Setting up the Screen Dimensions

For accurate extraction, the screen dimensions should be specified. This can be adjusted depending on the resolution of the screen you’re working with:

screen_width = 1024  # Example: Full HD width
screen_height = 786  # Example: Full HD height

These values should match the screen resolution from which you’re capturing the image.

Image Processing with OpenCV

Now, let’s break down the image processing part. The script performs several tasks:

  1. Capture Screenshot: The script will take a screenshot of the screen (or a specified window).
  2. Extract Data: Using pytesseract, it will extract text from the defined ROIs.
  3. Save Results: Finally, the results are written to a CSV file for further processing.

Here’s a simplified process using OpenCV to capture a region and extract text:

# Capture the screen
screenshot = cv2.imread('screenshot.png')

# Define a region
x1, y1 = roi[0][0]
x2, y2 = roi[0][1]
roi_image = screenshot[y1:y2, x1:x2]

# Use pytesseract to extract text from the image
extracted_text = pytesseract.image_to_string(roi_image)

print(f"Extracted Text: {extracted_text}")

Processing Text and Boxes

In the ROIs, some fields are meant to extract text, while others represent boxes for data entry. The script can differentiate between the two and process them accordingly.

For text fields, pytesseract is used to extract the content. For box-type fields, further logic can be added to process numerical data or generate specific outputs.

Storing Results in a CSV

After extracting the text from each ROI, the results are stored in a CSV file. Here’s how to write the data to a CSV file:

fields = ["Field Name", "Extracted Value"]
data = [
    ['FormTitle', extracted_text],
    # Add more fields here...
]

# Write data to CSV
with open("extracted_data.csv", mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(fields)
    writer.writerows(data)

Example Output

The script will generate a CSV file containing all the extracted fields and their corresponding values. For example:

Field NameExtracted Value
FormTitleTechnical Report
Date2024-12-19
MachineNo123456
ModelModelX 3000
ComponentEngine
TestNumber87456
GradeA

Complete Code

main.py

# main.py
import cv2
import numpy as np
import pytesseract
import os
import csv

# Screen dimensions (modify based on your screen size)
screen_width = 1024  # Example: Full HD width
screen_height = 786  # Example: Full HD height


per = 20
pixelThreshold = 70
roi = [
    [(642, 43), (1236, 140), 'text', 'FormTitle'],

    [(33, 199), (141, 242), 'text', 'Date'],
    [(144, 198), (265, 242), 'text', 'MachineNo'],
    [(269, 199), (489, 242), 'text', 'Model'],
    [(492, 199), (609, 242), 'text', 'Component'],
    [(613, 198), (837, 242), 'text', 'TestNumber'],
    [(840, 199), (1060, 242), 'text', 'Grade'],
    [(1063, 199), (1218, 242), 'text', 'Source'],
    [(1220, 199), (1342, 242), 'text', 'Weight'],
    [(1402, 212), (1420, 229), 'box', 'Shift_MN'],
    [(1473, 212), (1491, 229), 'box', 'Shift_DS'],
    [(1545, 212), (1562, 229), 'box', 'Shift_NS'],
    [(1637, 212), (1653, 229), 'box', 'DocNo_QCM'],
    [(1704, 212), (1722, 230), 'box', 'DocNo_PIR'],
    [(1829, 212), (1847, 229), 'box', 'DocNo_BAPREV2'],

]

# C:\Program Files\Tesseract-OCR
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'

imgQ = cv2.imread('01_PIERCING & BLANKING_F_page-0001.jpg')

if imgQ is None:
    print("Error: Query image not found!")
    exit()

h, w, c = imgQ.shape

orb = cv2.ORB_create(50000)
kp1, des1 = orb.detectAndCompute(imgQ, None)


# imgShow = cv2.resize(imgQ, (w // 3, h // 3))
# cv2.imshow(" - Processed", imgQ)

path = 'UserForms'
myPicList = os.listdir(path)

print(f"Found images: {myPicList}")

# Create a CSV writer object
with open('DataOutput.csv', 'a+', newline='') as f:
    writer = csv.writer(f)
    header_written = os.path.getsize('DataOutput.csv') == 0
    if header_written:
        writer.writerow([r[3] for r in roi])

    for j, y in enumerate(myPicList):
        img = cv2.imread(path + "/" + y)

        if img is None:
            print(f"Error: Image {y} not found!")
            continue

        kp2, des2 = orb.detectAndCompute(img, None)

        bf = cv2.BFMatcher(cv2.NORM_HAMMING)
        matches = bf.match(des2, des1)
        matches = sorted(matches, key=lambda x: x.distance)
        good = matches[:int(len(matches) * (per / 100))]
        imgMatch = cv2.drawMatches(
            img, kp2, imgQ, kp1, good[:400], None, flags=2)
        # cv2.imshow(y, imgMatch)
        srcPoints = np.float32(
            [kp2[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
        dstPoints = np.float32(
            [kp1[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)
        if len(good) >= 4:
            M, _ = cv2.findHomography(srcPoints, dstPoints, cv2.RANSAC, 5.0)
            imgScan = cv2.warpPerspective(img, M, (w, h))
            # imgScan = cv2.resize(imgScan, (w // 4, h//4))
            # cv2.imshow("y", imgScan)
        else:
            print("Not enough good matches to calculate homography!")
        # M, _ = cv2.findHomography(srcPoints, dstPoints, cv2.RANSAC, 5.0)
        # imgScan = cv2.warpPerspective(img, M, (w, h))

        imgShow = imgScan.copy()
        imagMask = np.zeros_like(imgShow)

        myData = []
        for x, r in enumerate(roi):
            # print(r, "dddddddd")
            cv2.rectangle(imagMask, (r[0][0], r[0][1]),
                          (r[1][0], r[1][1]), (0, 255, 0), cv2.FILLED)
            imgShow = cv2.addWeighted(imgShow, 0.99, imagMask, 0.1, 0)
            imgCrop = imgScan[r[0][1]:r[1][1], r[0][0]:r[1][0]]
            # print(imgCrop)
            if imgCrop is not None and imgCrop.size > 0:
                # Show the cropped image for debugging purposes
                # cv2.imshow(str({x}), imgCrop)

                if r[2] == 'text':
                    imgGray = cv2.cvtColor(imgCrop, cv2.COLOR_BGR2GRAY)
                    imgThresh = cv2.threshold(
                        imgGray, 150, 255, cv2.THRESH_BINARY)[1]
                    kernel = np.ones((1, 1), np.uint8)
                    imgCleaned = cv2.morphologyEx(
                        imgThresh, cv2.MORPH_OPEN, kernel)
                    kernel = np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]])
                    imgSharpened = cv2.filter2D(imgCleaned, -1, kernel)
                    text = pytesseract.image_to_string(
                        imgSharpened, config='--psm 6').strip()
                    # print(f'{r[3]} : {text}')
                    myData.append(text if text else "")

                if r[2] == 'box':
                    imgGray = cv2.cvtColor(imgCrop, cv2.COLOR_BGR2GRAY)
                    imgThresh = cv2.threshold(
                        imgGray, 170, 255, cv2.THRESH_BINARY_INV)[1]
                    totalPixels = cv2.countNonZero(imgThresh)
                    # print(totalPixels)
                    totalPixels = 1 if totalPixels > pixelThreshold else 0
                    # print(f'{r[3]} : {totalPixels}')
                    myData.append(totalPixels)
            else:
                print(
                    f"Error: Cropped image for region {x} is empty or invalid!")

        # Write data to CSV using the writer
        writer.writerow(myData)

        # Ensure imgShow is valid before resizing
        if imgShow is not None and imgShow.size > 0:
            # Calculate aspect ratio
            aspect_ratio = imgShow.shape[1] / \
                imgShow.shape[0]  # width / height

            # Calculate new dimensions
            if imgShow.shape[1] > screen_width or imgShow.shape[0] > screen_height:
                if aspect_ratio > 1:  # Wider than tall
                    new_width = screen_width
                    new_height = int(screen_width / aspect_ratio)
                    if new_height > screen_height:  # Adjust if still exceeds
                        new_height = screen_height
                        new_width = int(screen_height * aspect_ratio)
                else:  # Taller than wide
                    new_height = screen_height
                    new_width = int(screen_height * aspect_ratio)
                    if new_width > screen_width:  # Adjust if still exceeds
                        new_width = screen_width
                        new_height = int(screen_width / aspect_ratio)
            else:
                # If the image already fits the screen, keep original dimensions
                new_width = imgShow.shape[1]
                new_height = imgShow.shape[0]

            # Resize the image
            imgResized = cv2.resize(
                imgShow, (new_width, new_height), interpolation=cv2.INTER_AREA)

            # Display the resized image
            cv2.imshow(f"{y} - Processed", imgResized)
        else:
            print(f"Error: Invalid image dimensions for {y}!")


cv2.waitKey(0)

Region Selector Tool

RegionSelectorwithoptions.py

#RegionSelectorwithoptions.py
import cv2
import numpy as np
import random
import tkinter as tk
from tkinter import simpledialog, colorchooser

# Globals
scale = 1.0  # Zoom scale
pan_x, pan_y = 0, 0  # Panning offsets
drag_start = None
dragging = False
circles = []  # Points with colors
counter = 0
point1 = []
point2 = []
myPoints = []  # Annotated regions
myColor = (255, 0, 0)  # Default color: Red

# Function to ask for mandatory input using Tkinter dialog
def ask_input(prompt):
    root = tk.Tk()
    root.withdraw()  # Hide the root window
    result = None
    while not result:  # Keep asking until user provides input
        result = simpledialog.askstring("Input", prompt, parent=root)
        if not result:
            tk.messagebox.showwarning("Input Required", "This field cannot be empty!")
    root.destroy()
    return result

# Function to select color using Tkinter's color chooser
def select_color():
    global myColor
    color = colorchooser.askcolor()[1]  # Get hex color code
    if color:
        myColor = tuple(int(color[i:i+2], 16) for i in (1, 3, 5))  # Convert hex to RGB
        print("Selected color:", myColor)

# Function to increase zoom
def zoom_in():
    global scale
    scale *= 1.1
    print("Zoomed In:", scale)

# Function to decrease zoom
def zoom_out():
    global scale
    scale /= 1.1
    print("Zoomed Out:", scale)

# Function to reset the view (zoom and pan)
def reset_view():
    global scale, pan_x, pan_y
    scale = 1.0
    pan_x, pan_y = 0, 0
    print("View Reset")

# Function to create the menu
def create_menu():
    root = tk.Tk()
    root.title("Menu")

    # Select Color button
    select_button = tk.Button(root, text="Select Color", command=select_color)
    select_button.pack(pady=10)

    # Zoom In button
    zoom_in_button = tk.Button(root, text="Zoom In", command=zoom_in)
    zoom_in_button.pack(pady=10)

    # Zoom Out button
    zoom_out_button = tk.Button(root, text="Zoom Out", command=zoom_out)
    zoom_out_button.pack(pady=10)

    # Reset button
    reset_button = tk.Button(root, text="Reset View", command=reset_view)
    reset_button.pack(pady=10)

    # Quit button
    quit_button = tk.Button(root, text="Quit", command=root.quit)
    quit_button.pack(pady=10)

    root.mainloop()

# Mouse callback
def mousePoints(event, x, y, flags, params):
    global counter, point1, point2, circles, myColor, drag_start, dragging, pan_x, pan_y, scale

    if event == cv2.EVENT_RBUTTONDOWN:  # Start dragging
        drag_start = (x, y)
        dragging = True

    elif event == cv2.EVENT_MOUSEMOVE:  # Handle dragging
        if dragging and drag_start:
            dx, dy = x - drag_start[0], y - drag_start[1]
            # Invert the panning offsets to align with drag direction
            pan_x -= dx
            pan_y -= dy
            drag_start = (x, y)

    elif event == cv2.EVENT_RBUTTONUP:  # Stop dragging
        dragging = False
        drag_start = None

    elif event == cv2.EVENT_LBUTTONDOWN and not dragging:
        # Map screen coordinates to original image coordinates
        orig_x = int((x - pan_x) / scale)
        orig_y = int((y - pan_y) / scale)
        if counter == 0:
            point1 = (orig_x, orig_y)
            counter += 1
            myColor = (
                random.randint(0, 2) * 200,
                random.randint(0, 2) * 200,
                random.randint(0, 2) * 200
            )
        elif counter == 1:
            point2 = (orig_x, orig_y)
            type = ask_input('Enter Type: ')
            name = ask_input('Enter Name: ')
            myPoints.append([point1, point2, type, name])
            counter = 0

        circles.append([orig_x, orig_y, myColor])
       
    elif event == cv2.EVENT_MOUSEWHEEL:  # Zoom in/out
        if flags > 0:  # Scroll up
            scale *= 1.1
        else:  # Scroll down
            scale /= 1.1

# Load and prepare the image
img = cv2.imread('01_PIERCING & BLANKING_F_page-0001.jpg')  # Ensure the path is correct
if img is None:
    raise ValueError("Image not found. Ensure the path is correct.")
original_img = img.copy()

# Run the menu in a separate thread or before the OpenCV window
import threading
menu_thread = threading.Thread(target=create_menu)
menu_thread.start()

while True:
    # Create a blank canvas for display
    canvas = np.zeros_like(original_img)

    # Resize the image for zooming
    resized = cv2.resize(original_img, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
    resized_h, resized_w = resized.shape[:2]

    # Calculate placement on canvas
    canvas_h, canvas_w = canvas.shape[:2]

    # Determine the region of the canvas where the image will be placed
    x_start_canvas = pan_x
    y_start_canvas = pan_y
    x_end_canvas = pan_x + resized_w
    y_end_canvas = pan_y + resized_h

    # Compute the area on the canvas to place the image
    x_start_display = max(0, x_start_canvas)
    y_start_display = max(0, y_start_canvas)
    x_end_display = min(canvas_w, x_end_canvas)
    y_end_display = min(canvas_h, y_end_canvas)

    # Compute the area on the resized image to copy
    x_start_resized = max(0, -x_start_canvas)
    y_start_resized = max(0, -y_start_canvas)
    x_end_resized = x_start_resized + (x_end_display - x_start_display)
    y_end_resized = y_start_resized + (y_end_display - y_start_display)

    # Ensure that the indices are within bounds
    if x_start_display < x_end_display and y_start_display < y_end_display:
        # Place the resized image on the canvas
        canvas[y_start_display:y_end_display, x_start_display:x_end_display] = resized[ 
            y_start_resized:y_end_resized, 
            x_start_resized:x_end_resized
        ]

    # Draw circles (annotations) adjusted for zoom and pan
    for orig_x, orig_y, color in circles:
        display_x = int(orig_x * scale + pan_x)
        display_y = int(orig_y * scale + pan_y)
        if 0 <= display_x < canvas_w and 0 <= display_y < canvas_h:
            cv2.circle(canvas, (display_x, display_y), 5, color, cv2.FILLED)

    # Show the image
    cv2.imshow('Image Annotator', canvas)
    cv2.setMouseCallback('Image Annotator', mousePoints)

    key = cv2.waitKey(1) & 0xFF
    if key == ord('q'):  # Quit
        print("Annotated Regions:", myPoints)
        break
    elif key == ord('r'):  # Reset view
        reset_view()

cv2.destroyAllWindows()

Conclusion

This script provides a simple yet effective way to extract valuable information from specific regions of a screen using Python. Whether you’re automating form data extraction or processing technical documents, this approach can be easily customized for various applications.

By combining OpenCV, pytesseract, and CSV, we can automate data collection from images and convert them into structured formats for analysis, reporting, or further processing.

Stay tuned for more articles on how to optimize this method for different use cases, and feel free to share your thoughts and improvements!

Leave a Reply

Your email address will not be published. Required fields are marked *