1.9 KiB

Raw Blame History

PocketFlow Tool: PDF Vision

A PocketFlow example project demonstrating PDF processing with OpenAI's Vision API for OCR and text extraction.

Features

Convert PDF pages to images while maintaining quality and size limits
Extract text from scanned documents using GPT-4 Vision API
Support for custom extraction prompts
Maintain page order and formatting in extracted text
Batch processing of multiple PDFs from a directory

Installation

Clone the repository
Install dependencies:
```
pip install -r requirements.txt
```
Set your OpenAI API key as an environment variable:
```
export OPENAI_API_KEY=your_api_key_here
```

Usage

Place your PDF files in the pdfs directory
Run the example:
```
python main.py
```
The script will process all PDF files in the pdfs directory and output the extracted text for each one.

Project Structure

pocketflow-tool-pdf-vision/
├── pdfs/           # Directory for PDF files to process
├── tools/
│   ├── pdf.py     # PDF to image conversion
│   └── vision.py  # Vision API integration
├── utils/
│   └── call_llm.py # OpenAI client config
├── nodes.py       # PocketFlow nodes
├── flow.py        # Flow configuration
└── main.py        # Example usage

Flow Description

LoadPDFNode: Loads PDF and converts pages to images
ExtractTextNode: Processes images with Vision API
CombineResultsNode: Combines extracted text from all pages

Customization

You can customize the extraction by modifying the prompt in shared:

shared = {
    "pdf_path": "your_file.pdf",
    "extraction_prompt": "Your custom prompt here"
}

Limitations

Maximum PDF page size: 2000px (configurable in tools/pdf.py)
Vision API token limit: 1000 tokens per response
Image size limit: 20MB per image for Vision API

License

MIT

1.9 KiB Raw Blame History

PocketFlow Tool: PDF Vision

Features

Installation

Usage

Project Structure

Flow Description

Customization

Limitations

License

1.9 KiB

Raw Blame History