pocketflow/cookbook/pocketflow-parallel-batch-node/README.md

105 lines
2.5 KiB
Markdown

# PocketFlow Parallel Batch Node Example
This example demonstrates parallel processing using AsyncParallelBatchNode to summarize multiple news articles concurrently. It shows how to:
1. Process multiple items in parallel
2. Handle I/O-bound tasks efficiently
3. Manage rate limits with throttling
## What this Example Does
When you run the example:
1. It loads multiple news articles from a data directory
2. Processes them in parallel using AsyncParallelBatchNode
3. For each article:
- Extracts key information
- Generates a summary using an LLM
- Saves the results
4. Combines all summaries into a final report
## How it Works
The example uses AsyncParallelBatchNode to process articles in parallel:
```python
class ParallelSummarizer(AsyncParallelBatchNode):
async def prep_async(self, shared):
# Return list of articles to process
return shared["articles"]
async def exec_async(self, article):
# Process single article (called in parallel)
summary = await call_llm_async(f"Summarize: {article}")
return summary
async def post_async(self, shared, prep_res, summaries):
# Combine all summaries
shared["summaries"] = summaries
return "default"
```
Key features demonstrated:
- Parallel execution of `exec_async`
- Rate limiting with semaphores
- Error handling for failed requests
- Progress tracking for parallel tasks
## Project Structure
```
pocketflow-parallel-batch-node/
├── README.md
├── requirements.txt
├── data/
│ ├── article1.txt
│ ├── article2.txt
│ └── article3.txt
├── main.py
├── flow.py
├── nodes.py
└── utils.py
```
## Running the Example
```bash
# Install dependencies
pip install -r requirements.txt
# Run the example
python main.py
```
## Sample Output
```
Loading articles...
Found 3 articles to process
Processing in parallel...
[1/3] Processing article1.txt...
[2/3] Processing article2.txt...
[3/3] Processing article3.txt...
Summaries generated:
1. First article summary...
2. Second article summary...
3. Third article summary...
Final report saved to: summaries.txt
```
## Key Concepts
1. **Parallel Processing**
- Using AsyncParallelBatchNode for concurrent execution
- Managing parallel tasks efficiently
2. **Rate Limiting**
- Using semaphores to control concurrent requests
- Avoiding API rate limits
3. **Error Handling**
- Graceful handling of failed requests
- Retrying failed tasks
4. **Progress Tracking**
- Monitoring parallel task progress
- Providing user feedback