In today’s blog, we will explore how to use Python to extract and analyze data from specific regions of a screen capture. By utilizing the powerful libraries OpenCV, pytesseract, and CSV, we can efficiently process images and extract valuable information for various use cases, such as forms, reports, and technical documents.
Objective
The goal of this script is to capture predefined regions of interest (ROIs) from a screen capture, extract text using OCR (Optical Character Recognition) via pytesseract, and store the results in a structured CSV format for easy analysis.
Prerequisites
Before diving into the code, ensure you have the following libraries installed:
pip install opencv-python numpy pytesseract
You’ll also need to have Tesseract-OCR installed on your system. You can download it from the site.
Defining Regions of Interest (ROIs)
In the script, we define multiple regions of interest (ROIs) based on the coordinates of the areas we want to capture. Each ROI consists of:
- Top-left coordinates
- Bottom-right coordinates
- Type of data (text or box)
- Field name
Here’s an example of how the ROIs are defined:
roi = [
[(642, 43), (1236, 140), 'text', 'FormTitle'],
[(33, 199), (141, 242), 'text', 'Date'],
[(144, 198), (265, 242), 'text', 'MachineNo'],
# More regions can be added here...
]
Each entry in this list corresponds to a specific region of the screen where text or numbers are located. The goal is to extract the relevant data based on these coordinates.
Setting up the Screen Dimensions
For accurate extraction, the screen dimensions should be specified. This can be adjusted depending on the resolution of the screen you’re working with:
screen_width = 1024 # Example: Full HD width
screen_height = 786 # Example: Full HD height
These values should match the screen resolution from which you’re capturing the image.
Image Processing with OpenCV
Now, let’s break down the image processing part. The script performs several tasks:
- Capture Screenshot: The script will take a screenshot of the screen (or a specified window).
- Extract Data: Using pytesseract, it will extract text from the defined ROIs.
- Save Results: Finally, the results are written to a CSV file for further processing.
Here’s a simplified process using OpenCV to capture a region and extract text:
# Capture the screen
screenshot = cv2.imread('screenshot.png')
# Define a region
x1, y1 = roi[0][0]
x2, y2 = roi[0][1]
roi_image = screenshot[y1:y2, x1:x2]
# Use pytesseract to extract text from the image
extracted_text = pytesseract.image_to_string(roi_image)
print(f"Extracted Text: {extracted_text}")
Processing Text and Boxes
In the ROIs, some fields are meant to extract text, while others represent boxes for data entry. The script can differentiate between the two and process them accordingly.
For text fields, pytesseract is used to extract the content. For box-type fields, further logic can be added to process numerical data or generate specific outputs.
Storing Results in a CSV
After extracting the text from each ROI, the results are stored in a CSV file. Here’s how to write the data to a CSV file:
fields = ["Field Name", "Extracted Value"]
data = [
['FormTitle', extracted_text],
# Add more fields here...
]
# Write data to CSV
with open("extracted_data.csv", mode='w', newline='') as file:
writer = csv.writer(file)
writer.writerow(fields)
writer.writerows(data)
Example Output
The script will generate a CSV file containing all the extracted fields and their corresponding values. For example:
Field Name | Extracted Value |
---|---|
FormTitle | Technical Report |
Date | 2024-12-19 |
MachineNo | 123456 |
Model | ModelX 3000 |
Component | Engine |
TestNumber | 87456 |
Grade | A |
Complete Code
main.py
# main.py
import cv2
import numpy as np
import pytesseract
import os
import csv
# Screen dimensions (modify based on your screen size)
screen_width = 1024 # Example: Full HD width
screen_height = 786 # Example: Full HD height
per = 20
pixelThreshold = 70
roi = [
[(642, 43), (1236, 140), 'text', 'FormTitle'],
[(33, 199), (141, 242), 'text', 'Date'],
[(144, 198), (265, 242), 'text', 'MachineNo'],
[(269, 199), (489, 242), 'text', 'Model'],
[(492, 199), (609, 242), 'text', 'Component'],
[(613, 198), (837, 242), 'text', 'TestNumber'],
[(840, 199), (1060, 242), 'text', 'Grade'],
[(1063, 199), (1218, 242), 'text', 'Source'],
[(1220, 199), (1342, 242), 'text', 'Weight'],
[(1402, 212), (1420, 229), 'box', 'Shift_MN'],
[(1473, 212), (1491, 229), 'box', 'Shift_DS'],
[(1545, 212), (1562, 229), 'box', 'Shift_NS'],
[(1637, 212), (1653, 229), 'box', 'DocNo_QCM'],
[(1704, 212), (1722, 230), 'box', 'DocNo_PIR'],
[(1829, 212), (1847, 229), 'box', 'DocNo_BAPREV2'],
]
# C:\Program Files\Tesseract-OCR
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
imgQ = cv2.imread('01_PIERCING & BLANKING_F_page-0001.jpg')
if imgQ is None:
print("Error: Query image not found!")
exit()
h, w, c = imgQ.shape
orb = cv2.ORB_create(50000)
kp1, des1 = orb.detectAndCompute(imgQ, None)
# imgShow = cv2.resize(imgQ, (w // 3, h // 3))
# cv2.imshow(" - Processed", imgQ)
path = 'UserForms'
myPicList = os.listdir(path)
print(f"Found images: {myPicList}")
# Create a CSV writer object
with open('DataOutput.csv', 'a+', newline='') as f:
writer = csv.writer(f)
header_written = os.path.getsize('DataOutput.csv') == 0
if header_written:
writer.writerow([r[3] for r in roi])
for j, y in enumerate(myPicList):
img = cv2.imread(path + "/" + y)
if img is None:
print(f"Error: Image {y} not found!")
continue
kp2, des2 = orb.detectAndCompute(img, None)
bf = cv2.BFMatcher(cv2.NORM_HAMMING)
matches = bf.match(des2, des1)
matches = sorted(matches, key=lambda x: x.distance)
good = matches[:int(len(matches) * (per / 100))]
imgMatch = cv2.drawMatches(
img, kp2, imgQ, kp1, good[:400], None, flags=2)
# cv2.imshow(y, imgMatch)
srcPoints = np.float32(
[kp2[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
dstPoints = np.float32(
[kp1[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)
if len(good) >= 4:
M, _ = cv2.findHomography(srcPoints, dstPoints, cv2.RANSAC, 5.0)
imgScan = cv2.warpPerspective(img, M, (w, h))
# imgScan = cv2.resize(imgScan, (w // 4, h//4))
# cv2.imshow("y", imgScan)
else:
print("Not enough good matches to calculate homography!")
# M, _ = cv2.findHomography(srcPoints, dstPoints, cv2.RANSAC, 5.0)
# imgScan = cv2.warpPerspective(img, M, (w, h))
imgShow = imgScan.copy()
imagMask = np.zeros_like(imgShow)
myData = []
for x, r in enumerate(roi):
# print(r, "dddddddd")
cv2.rectangle(imagMask, (r[0][0], r[0][1]),
(r[1][0], r[1][1]), (0, 255, 0), cv2.FILLED)
imgShow = cv2.addWeighted(imgShow, 0.99, imagMask, 0.1, 0)
imgCrop = imgScan[r[0][1]:r[1][1], r[0][0]:r[1][0]]
# print(imgCrop)
if imgCrop is not None and imgCrop.size > 0:
# Show the cropped image for debugging purposes
# cv2.imshow(str({x}), imgCrop)
if r[2] == 'text':
imgGray = cv2.cvtColor(imgCrop, cv2.COLOR_BGR2GRAY)
imgThresh = cv2.threshold(
imgGray, 150, 255, cv2.THRESH_BINARY)[1]
kernel = np.ones((1, 1), np.uint8)
imgCleaned = cv2.morphologyEx(
imgThresh, cv2.MORPH_OPEN, kernel)
kernel = np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]])
imgSharpened = cv2.filter2D(imgCleaned, -1, kernel)
text = pytesseract.image_to_string(
imgSharpened, config='--psm 6').strip()
# print(f'{r[3]} : {text}')
myData.append(text if text else "")
if r[2] == 'box':
imgGray = cv2.cvtColor(imgCrop, cv2.COLOR_BGR2GRAY)
imgThresh = cv2.threshold(
imgGray, 170, 255, cv2.THRESH_BINARY_INV)[1]
totalPixels = cv2.countNonZero(imgThresh)
# print(totalPixels)
totalPixels = 1 if totalPixels > pixelThreshold else 0
# print(f'{r[3]} : {totalPixels}')
myData.append(totalPixels)
else:
print(
f"Error: Cropped image for region {x} is empty or invalid!")
# Write data to CSV using the writer
writer.writerow(myData)
# Ensure imgShow is valid before resizing
if imgShow is not None and imgShow.size > 0:
# Calculate aspect ratio
aspect_ratio = imgShow.shape[1] / \
imgShow.shape[0] # width / height
# Calculate new dimensions
if imgShow.shape[1] > screen_width or imgShow.shape[0] > screen_height:
if aspect_ratio > 1: # Wider than tall
new_width = screen_width
new_height = int(screen_width / aspect_ratio)
if new_height > screen_height: # Adjust if still exceeds
new_height = screen_height
new_width = int(screen_height * aspect_ratio)
else: # Taller than wide
new_height = screen_height
new_width = int(screen_height * aspect_ratio)
if new_width > screen_width: # Adjust if still exceeds
new_width = screen_width
new_height = int(screen_width / aspect_ratio)
else:
# If the image already fits the screen, keep original dimensions
new_width = imgShow.shape[1]
new_height = imgShow.shape[0]
# Resize the image
imgResized = cv2.resize(
imgShow, (new_width, new_height), interpolation=cv2.INTER_AREA)
# Display the resized image
cv2.imshow(f"{y} - Processed", imgResized)
else:
print(f"Error: Invalid image dimensions for {y}!")
cv2.waitKey(0)
Region Selector Tool
RegionSelectorwithoptions.py
#RegionSelectorwithoptions.py
import cv2
import numpy as np
import random
import tkinter as tk
from tkinter import simpledialog, colorchooser
# Globals
scale = 1.0 # Zoom scale
pan_x, pan_y = 0, 0 # Panning offsets
drag_start = None
dragging = False
circles = [] # Points with colors
counter = 0
point1 = []
point2 = []
myPoints = [] # Annotated regions
myColor = (255, 0, 0) # Default color: Red
# Function to ask for mandatory input using Tkinter dialog
def ask_input(prompt):
root = tk.Tk()
root.withdraw() # Hide the root window
result = None
while not result: # Keep asking until user provides input
result = simpledialog.askstring("Input", prompt, parent=root)
if not result:
tk.messagebox.showwarning("Input Required", "This field cannot be empty!")
root.destroy()
return result
# Function to select color using Tkinter's color chooser
def select_color():
global myColor
color = colorchooser.askcolor()[1] # Get hex color code
if color:
myColor = tuple(int(color[i:i+2], 16) for i in (1, 3, 5)) # Convert hex to RGB
print("Selected color:", myColor)
# Function to increase zoom
def zoom_in():
global scale
scale *= 1.1
print("Zoomed In:", scale)
# Function to decrease zoom
def zoom_out():
global scale
scale /= 1.1
print("Zoomed Out:", scale)
# Function to reset the view (zoom and pan)
def reset_view():
global scale, pan_x, pan_y
scale = 1.0
pan_x, pan_y = 0, 0
print("View Reset")
# Function to create the menu
def create_menu():
root = tk.Tk()
root.title("Menu")
# Select Color button
select_button = tk.Button(root, text="Select Color", command=select_color)
select_button.pack(pady=10)
# Zoom In button
zoom_in_button = tk.Button(root, text="Zoom In", command=zoom_in)
zoom_in_button.pack(pady=10)
# Zoom Out button
zoom_out_button = tk.Button(root, text="Zoom Out", command=zoom_out)
zoom_out_button.pack(pady=10)
# Reset button
reset_button = tk.Button(root, text="Reset View", command=reset_view)
reset_button.pack(pady=10)
# Quit button
quit_button = tk.Button(root, text="Quit", command=root.quit)
quit_button.pack(pady=10)
root.mainloop()
# Mouse callback
def mousePoints(event, x, y, flags, params):
global counter, point1, point2, circles, myColor, drag_start, dragging, pan_x, pan_y, scale
if event == cv2.EVENT_RBUTTONDOWN: # Start dragging
drag_start = (x, y)
dragging = True
elif event == cv2.EVENT_MOUSEMOVE: # Handle dragging
if dragging and drag_start:
dx, dy = x - drag_start[0], y - drag_start[1]
# Invert the panning offsets to align with drag direction
pan_x -= dx
pan_y -= dy
drag_start = (x, y)
elif event == cv2.EVENT_RBUTTONUP: # Stop dragging
dragging = False
drag_start = None
elif event == cv2.EVENT_LBUTTONDOWN and not dragging:
# Map screen coordinates to original image coordinates
orig_x = int((x - pan_x) / scale)
orig_y = int((y - pan_y) / scale)
if counter == 0:
point1 = (orig_x, orig_y)
counter += 1
myColor = (
random.randint(0, 2) * 200,
random.randint(0, 2) * 200,
random.randint(0, 2) * 200
)
elif counter == 1:
point2 = (orig_x, orig_y)
type = ask_input('Enter Type: ')
name = ask_input('Enter Name: ')
myPoints.append([point1, point2, type, name])
counter = 0
circles.append([orig_x, orig_y, myColor])
elif event == cv2.EVENT_MOUSEWHEEL: # Zoom in/out
if flags > 0: # Scroll up
scale *= 1.1
else: # Scroll down
scale /= 1.1
# Load and prepare the image
img = cv2.imread('01_PIERCING & BLANKING_F_page-0001.jpg') # Ensure the path is correct
if img is None:
raise ValueError("Image not found. Ensure the path is correct.")
original_img = img.copy()
# Run the menu in a separate thread or before the OpenCV window
import threading
menu_thread = threading.Thread(target=create_menu)
menu_thread.start()
while True:
# Create a blank canvas for display
canvas = np.zeros_like(original_img)
# Resize the image for zooming
resized = cv2.resize(original_img, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
resized_h, resized_w = resized.shape[:2]
# Calculate placement on canvas
canvas_h, canvas_w = canvas.shape[:2]
# Determine the region of the canvas where the image will be placed
x_start_canvas = pan_x
y_start_canvas = pan_y
x_end_canvas = pan_x + resized_w
y_end_canvas = pan_y + resized_h
# Compute the area on the canvas to place the image
x_start_display = max(0, x_start_canvas)
y_start_display = max(0, y_start_canvas)
x_end_display = min(canvas_w, x_end_canvas)
y_end_display = min(canvas_h, y_end_canvas)
# Compute the area on the resized image to copy
x_start_resized = max(0, -x_start_canvas)
y_start_resized = max(0, -y_start_canvas)
x_end_resized = x_start_resized + (x_end_display - x_start_display)
y_end_resized = y_start_resized + (y_end_display - y_start_display)
# Ensure that the indices are within bounds
if x_start_display < x_end_display and y_start_display < y_end_display:
# Place the resized image on the canvas
canvas[y_start_display:y_end_display, x_start_display:x_end_display] = resized[
y_start_resized:y_end_resized,
x_start_resized:x_end_resized
]
# Draw circles (annotations) adjusted for zoom and pan
for orig_x, orig_y, color in circles:
display_x = int(orig_x * scale + pan_x)
display_y = int(orig_y * scale + pan_y)
if 0 <= display_x < canvas_w and 0 <= display_y < canvas_h:
cv2.circle(canvas, (display_x, display_y), 5, color, cv2.FILLED)
# Show the image
cv2.imshow('Image Annotator', canvas)
cv2.setMouseCallback('Image Annotator', mousePoints)
key = cv2.waitKey(1) & 0xFF
if key == ord('q'): # Quit
print("Annotated Regions:", myPoints)
break
elif key == ord('r'): # Reset view
reset_view()
cv2.destroyAllWindows()
Conclusion
This script provides a simple yet effective way to extract valuable information from specific regions of a screen using Python. Whether you’re automating form data extraction or processing technical documents, this approach can be easily customized for various applications.
By combining OpenCV, pytesseract, and CSV, we can automate data collection from images and convert them into structured formats for analysis, reporting, or further processing.
Stay tuned for more articles on how to optimize this method for different use cases, and feel free to share your thoughts and improvements!