|
|
||
|---|---|---|
| .. | ||
| pdfs | ||
| tools | ||
| utils | ||
| README.md | ||
| flow.py | ||
| main.py | ||
| nodes.py | ||
| requirements.txt | ||
README.md
PocketFlow Tool: PDF Vision
A PocketFlow example project demonstrating PDF processing with OpenAI's Vision API for OCR and text extraction.
Features
- Convert PDF pages to images while maintaining quality and size limits
- Extract text from scanned documents using GPT-4 Vision API
- Support for custom extraction prompts
- Maintain page order and formatting in extracted text
- Batch processing of multiple PDFs from a directory
Installation
- Clone the repository
- Install dependencies:
pip install -r requirements.txt - Set your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your_api_key_here
Usage
- Place your PDF files in the
pdfsdirectory - Run the example:
The script will process all PDF files in thepython main.pypdfsdirectory and output the extracted text for each one.
Project Structure
pocketflow-tool-pdf-vision/
├── pdfs/ # Directory for PDF files to process
├── tools/
│ ├── pdf.py # PDF to image conversion
│ └── vision.py # Vision API integration
├── utils/
│ └── call_llm.py # OpenAI client config
├── nodes.py # PocketFlow nodes
├── flow.py # Flow configuration
└── main.py # Example usage
Flow Description
- LoadPDFNode: Loads PDF and converts pages to images
- ExtractTextNode: Processes images with Vision API
- CombineResultsNode: Combines extracted text from all pages
Customization
You can customize the extraction by modifying the prompt in shared:
shared = {
"pdf_path": "your_file.pdf",
"extraction_prompt": "Your custom prompt here"
}
Limitations
- Maximum PDF page size: 2000px (configurable in
tools/pdf.py) - Vision API token limit: 1000 tokens per response
- Image size limit: 20MB per image for Vision API
License
MIT