Eleven Labs Integration
What the Integration Does
Eleven Labs integration enables AI-based text-to-speech and sound effect generation, providing high-quality audio outputs from textual inputs. It's primarily used for creating dynamic audio content, personalized voiceovers, and enhancing interactive applications with audio capabilities.
Example Scenarios
- Convert customer support text prompts into friendly, human-like audio interventions for enhanced user engagement.
- Generate sound effects dynamically based on user input in gaming applications.
- Create personalized voice messages or audio articles from written content.
Capabilities
What the Integration Enables
- List Available Voices
- Retrieve a detailed list of voice options including IDs, names, and descriptions.
- Text-to-Speech Conversion
- Convert text into speech audio using specified voice settings.
- Generate Sound Effects
- Create unique sound effects based on provided text prompts and optional duration.
Input/Output Schemas
- Text-to-Speech Conversion
- Input: text (string), voice_id (string), model_id (string), output_format (string)
- Output: base64_audio (string)
- Generate Sound Effects
- Input: prompt (string), duration_seconds (optional float between 0.5 and 22)
- Output: base64_audio (string)
Limitations
- Audio duration for sound effects is restricted between 0.5 and 22 seconds.
- Certain output formats may require specific tiers or configurations.
Setup & Configuration
Prerequisites
- An Eleven Labs account is required.
- Obtain an API key from the Eleven Labs platform.
Authentication
- API key must be set in environment variable ELEVEN_LABS_API_KEY.
- Example: export ELEVEN_LABS_API_KEY="YOUR_API_KEY"
Step-by-Step Guide
- Acquire your API key by logging into Eleven Labs and navigating to API settings.
- Configure your environment with the API key: export ELEVEN_LABS_API_KEY="YOUR_API_KEY".
- Optionally, set the target directory for storing audio outputs.
Testing Connection
- Verify by calling the get_voices function to ensure proper API communication.
How to Use in Agents
- Example code snippet to add in your agent's toolkit:
python eleven_labs_tools = ElevenLabsTools(voice_id="your_voice_id", api_key="your_api_key") agent.register_toolkit(eleven_labs_tools)
- Utilize the text_to_speech or generate_sound_effect methods to process text inputs.
Best Practices
- Optimize text input length to manage processing time and ensure quality audio output.
- Consider audio format and size based on application needs and tier limitations.
Reference Section
- API documentation provided by Eleven Labs: API Documentation
- Supported audio formats are explicitly defined for processing precision.