108 lines
3.6 KiB
Markdown
108 lines
3.6 KiB
Markdown
---
|
|
layout: default
|
|
title: "Text-to-Speech"
|
|
parent: "Utility Function"
|
|
nav_order: 8
|
|
---
|
|
|
|
## Text-to-Speech
|
|
|
|
| **Service** | **Free Tier** | **Pricing Model** | **Docs** |
|
|
|----------------------|-----------------------|--------------------------------------------------------------|---------------------------------------------------------------------|
|
|
| **Amazon Polly** | 5M std + 1M neural | ~$4 /M (std), ~$16 /M (neural) after free tier | [Polly Docs](https://aws.amazon.com/polly/) |
|
|
| **Google Cloud TTS** | 4M std + 1M WaveNet | ~$4 /M (std), ~$16 /M (WaveNet) pay-as-you-go | [Cloud TTS Docs](https://cloud.google.com/text-to-speech) |
|
|
| **Azure TTS** | 500K neural ongoing | ~$15 /M (neural), discount at higher volumes | [Azure TTS Docs](https://azure.microsoft.com/products/cognitive-services/text-to-speech/) |
|
|
| **IBM Watson TTS** | 10K chars Lite plan | ~$0.02 /1K (i.e. ~$20 /M). Enterprise options available | [IBM Watson Docs](https://www.ibm.com/cloud/watson-text-to-speech) |
|
|
| **ElevenLabs** | 10K chars monthly | From ~$5/mo (30K chars) up to $330/mo (2M chars). Enterprise | [ElevenLabs Docs](https://elevenlabs.io) |
|
|
|
|
## Example Python Code
|
|
|
|
### Amazon Polly
|
|
```python
|
|
import boto3
|
|
|
|
polly = boto3.client("polly", region_name="us-east-1",
|
|
aws_access_key_id="YOUR_AWS_ACCESS_KEY_ID",
|
|
aws_secret_access_key="YOUR_AWS_SECRET_ACCESS_KEY")
|
|
|
|
resp = polly.synthesize_speech(
|
|
Text="Hello from Polly!",
|
|
OutputFormat="mp3",
|
|
VoiceId="Joanna"
|
|
)
|
|
|
|
with open("polly.mp3", "wb") as f:
|
|
f.write(resp["AudioStream"].read())
|
|
```
|
|
|
|
### Google Cloud TTS
|
|
```python
|
|
from google.cloud import texttospeech
|
|
|
|
client = texttospeech.TextToSpeechClient()
|
|
input_text = texttospeech.SynthesisInput(text="Hello from Google Cloud TTS!")
|
|
voice = texttospeech.VoiceSelectionParams(language_code="en-US")
|
|
audio_cfg = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)
|
|
|
|
resp = client.synthesize_speech(input=input_text, voice=voice, audio_config=audio_cfg)
|
|
|
|
with open("gcloud_tts.mp3", "wb") as f:
|
|
f.write(resp.audio_content)
|
|
```
|
|
|
|
### Azure TTS
|
|
```python
|
|
import azure.cognitiveservices.speech as speechsdk
|
|
|
|
speech_config = speechsdk.SpeechConfig(
|
|
subscription="AZURE_KEY", region="AZURE_REGION")
|
|
audio_cfg = speechsdk.audio.AudioConfig(filename="azure_tts.wav")
|
|
|
|
synthesizer = speechsdk.SpeechSynthesizer(
|
|
speech_config=speech_config,
|
|
audio_config=audio_cfg
|
|
)
|
|
|
|
synthesizer.speak_text_async("Hello from Azure TTS!").get()
|
|
```
|
|
|
|
### IBM Watson TTS
|
|
```python
|
|
from ibm_watson import TextToSpeechV1
|
|
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
|
|
|
|
auth = IAMAuthenticator("IBM_API_KEY")
|
|
service = TextToSpeechV1(authenticator=auth)
|
|
service.set_service_url("IBM_SERVICE_URL")
|
|
|
|
resp = service.synthesize(
|
|
"Hello from IBM Watson!",
|
|
voice="en-US_AllisonV3Voice",
|
|
accept="audio/mp3"
|
|
).get_result()
|
|
|
|
with open("ibm_tts.mp3", "wb") as f:
|
|
f.write(resp.content)
|
|
```
|
|
|
|
### ElevenLabs
|
|
```python
|
|
import requests
|
|
|
|
api_key = "ELEVENLABS_KEY"
|
|
voice_id = "ELEVENLABS_VOICE"
|
|
url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"
|
|
headers = {"xi-api-key": api_key, "Content-Type": "application/json"}
|
|
|
|
json_data = {
|
|
"text": "Hello from ElevenLabs!",
|
|
"voice_settings": {"stability": 0.75, "similarity_boost": 0.75}
|
|
}
|
|
|
|
resp = requests.post(url, headers=headers, json=json_data)
|
|
|
|
with open("elevenlabs.mp3", "wb") as f:
|
|
f.write(resp.content)
|
|
```
|
|
|