parallel doc
This commit is contained in:
parent
b40f5e087a
commit
93a048c86a
|
|
@ -15,6 +15,7 @@ We model the LLM workflow as a **Nested Flow**:
|
||||||
- A Flow can be treated as a Node for **Nested Flows**.
|
- A Flow can be treated as a Node for **Nested Flows**.
|
||||||
- Both Nodes and Flows can be **Batched** for data-intensive tasks.
|
- Both Nodes and Flows can be **Batched** for data-intensive tasks.
|
||||||
- Nodes and Flows can be **Async** for user inputs.
|
- Nodes and Flows can be **Async** for user inputs.
|
||||||
|
- **Async** Nodes and Flows can be executed in **Parallel**.
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="https://github.com/zachary62/miniLLMFlow/blob/main/assets/minillmflow.jpg?raw=true" width="400"/>
|
<img src="https://github.com/zachary62/miniLLMFlow/blob/main/assets/minillmflow.jpg?raw=true" width="400"/>
|
||||||
|
|
@ -27,6 +28,7 @@ We model the LLM workflow as a **Nested Flow**:
|
||||||
- [Communication](./communication.md)
|
- [Communication](./communication.md)
|
||||||
- [Batch](./batch.md)
|
- [Batch](./batch.md)
|
||||||
- [Async](./async.md)
|
- [Async](./async.md)
|
||||||
|
- [Parallel](./parallel.md)
|
||||||
|
|
||||||
## Preparation
|
## Preparation
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,57 @@
|
||||||
|
---
|
||||||
|
layout: default
|
||||||
|
title: "Parallel"
|
||||||
|
parent: "Core Abstraction"
|
||||||
|
nav_order: 6
|
||||||
|
---
|
||||||
|
|
||||||
|
# Parallel
|
||||||
|
|
||||||
|
**Parallel** Nodes and Flows let you run multiple tasks **concurrently**—for example, summarizing multiple texts at once. Unlike a regular **BatchNode**, which processes items sequentially, **AsyncParallelBatchNode** and **AsyncParallelBatchFlow** can fire off tasks in parallel. This can improve performance by overlapping I/O and compute.
|
||||||
|
|
||||||
|
## AsyncParallelBatchNode
|
||||||
|
|
||||||
|
Like **AsyncBatchNode**, but uses `prep_async()`, `exec_async()`, and `post_async()` in **parallel**:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class ParallelSummaries(AsyncParallelBatchNode):
|
||||||
|
async def prep_async(self, shared):
|
||||||
|
# e.g., multiple texts
|
||||||
|
return shared["texts"]
|
||||||
|
|
||||||
|
async def exec_async(self, text):
|
||||||
|
prompt = f"Summarize: {text}"
|
||||||
|
return await call_llm_async(prompt)
|
||||||
|
|
||||||
|
async def post_async(self, shared, prep_res, exec_res_list):
|
||||||
|
shared["summary"] = "\n\n".join(exec_res_list)
|
||||||
|
return "default"
|
||||||
|
|
||||||
|
node = ParallelSummaries()
|
||||||
|
flow = AsyncFlow(start=node)
|
||||||
|
```
|
||||||
|
|
||||||
|
## AsyncParallelBatchFlow
|
||||||
|
|
||||||
|
Parallel version of **BatchFlow**. Each iteration of the sub-flow runs **concurrently** using different parameters:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class SummarizeMultipleFiles(AsyncParallelBatchFlow):
|
||||||
|
async def prep_async(self, shared):
|
||||||
|
return [{"filename": f} for f in shared["files"]]
|
||||||
|
|
||||||
|
sub_flow = AsyncFlow(start=LoadAndSummarizeFile())
|
||||||
|
parallel_flow = SummarizeMultipleFiles(start=sub_flow)
|
||||||
|
await parallel_flow.run_async(shared)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
- **Ensure Tasks Are Independent**
|
||||||
|
If each item depends on the output of a previous item, **don’t** parallelize. Parallelizing dependent tasks can lead to inconsistencies or race conditions.
|
||||||
|
|
||||||
|
- **Beware Rate Limits**
|
||||||
|
Parallel calls can **quickly** trigger rate limits on LLM services. You may need a **throttling** mechanism (e.g., semaphores or sleep intervals) to avoid hitting vendor limits.
|
||||||
|
|
||||||
|
- **Consider Single-Node Batch APIs**
|
||||||
|
Some LLMs offer a **batch inference** API where you can send multiple prompts in a single call. This is more complex to implement but can be more efficient than launching many parallel requests. Conceptually, it can look similar to an **AsyncBatchNode** or **BatchNode**, but the underlying call bundles multiple items into **one** request.
|
||||||
Loading…
Reference in New Issue