Forums

Tesseract problem

Hello, I uploaded my python code here and it uses flask and tesseract. When I tried to test the code by putting image address from the internet, it errors. I searched about it and it said something like python anywhere doesnt allow external sources? Now my question is, will it work if the image sent to the python code is an uploaded image, not a url? Thank you

Since you are using a free account, you are limited to external sites on our allowlist. Yes, you can upload images directly.

Hello, i would like to ask for help. Im trying to put here my python code with tesseract. Here is my code for the app.py: from flask import Flask, request, jsonify from PIL import Image, ImageEnhance, ImageFilter import pytesseract import io import re from flask_cors import CORS

app = Flask(name) CORS(app) # Enable CORS for all routes

def preprocess_image(image): """ Apply various image processing techniques to improve OCR accuracy """ # Convert to RGB if image is in RGBA mode if image.mode == 'RGBA': image = image.convert('RGB')

# Convert to grayscale
image = image.convert('L')

# Apply noise reduction
image = image.filter(ImageFilter.MedianFilter(3))

# Enhance contrast
enhancer = ImageEnhance.Contrast(image)
image = enhancer.enhance(2)

# Apply sharpening
image = image.filter(ImageFilter.SHARPEN)

# Increase image size if too small
if image.width < 1000 or image.height < 1000:
    ratio = max(1000/image.width, 1000/image.height)
    new_size = (int(image.width * ratio), int(image.height * ratio))
    image = image.resize(new_size, Image.Resampling.LANCZOS)

return image

@app.route('/process-image', methods=['POST']) def process_image(): try: # Check if image file is in request if 'image' not in request.files: return jsonify({'error': 'No image file provided'}), 400

    image_file = request.files['image']
    if image_file.filename == '':
        return jsonify({'error': 'No selected file'}), 400

    # Read and process the image
    img = Image.open(io.BytesIO(image_file.read()))

    # Apply image preprocessing
    processed_img = preprocess_image(img)

    # Perform OCR with custom configuration
    custom_config = r'--oem 3 --psm 6'  # Use LSTM OCR Engine Mode with automatic page segmentation
    extracted_text = pytesseract.image_to_string(processed_img, config=custom_config)

    # Tokenize text
    # Split into words, convert to lowercase, and remove special characters
    tokens = re.findall(r'\b\w+\b', extracted_text.lower())

    # Remove very short tokens (likely noise)
    tokens = [token for token in tokens if len(token) > 1]

    # Create response with both raw text and processed tokens
    response = {
        'raw_text': extracted_text,
        'tokens': tokens,
        'token_count': len(tokens)
    }

    return jsonify(response), 200

except Exception as e:
    return jsonify({'error': str(e), 'type': str(type(e))}), 500

if name == 'main': app.run(host='0.0.0.0', port=5000)

My requirements.txt: blinker certifi charset-normalizer click colorama Flask Flask-Cors gunicorn idna itsdangerous Jinja2 MarkupSafe packaging pillow pytesseract requests urllib3 Werkzeug

My Dockerfile: FROM python:3.9-slim-buster

RUN apt-get update && \ apt-get -qq -y install tesseract-ocr && \ apt-get -qq -y install libtesseract-dev

RUN apt-get update && apt-get install -y tesseract-ocr

WORKDIR /app

COPY requirements.txt requirements.txt RUN pip3 install -r requirements.txt

COPY . .

CMD ["gunicorn", "app:app"]

Will these provided code work? Or there is something lacking. I hope u can help me. Thank you

Please T T our deadline is next week and we still haven’t deployed the python, i’m begging for help. Our teacher doesn’t help us with anything except expect big results from us without teaching

You can't run Flask with docker on PythonAnywhere. Take a look at https://help.pythonanywhere.com/pages/Flask