Eleven Labs Integration

What the Integration Does

Eleven Labs integration enables AI-based text-to-speech and sound effect generation, providing high-quality audio outputs from textual inputs. It's primarily used for creating dynamic audio content, personalized voiceovers, and enhancing interactive applications with audio capabilities.

Example Scenarios

  • Convert customer support text prompts into friendly, human-like audio interventions for enhanced user engagement.
  • Generate sound effects dynamically based on user input in gaming applications.
  • Create personalized voice messages or audio articles from written content.

Capabilities

What the Integration Enables

  1. List Available Voices
    • Retrieve a detailed list of voice options including IDs, names, and descriptions.
  2. Text-to-Speech Conversion
    • Convert text into speech audio using specified voice settings.
  3. Generate Sound Effects
    • Create unique sound effects based on provided text prompts and optional duration.

Input/Output Schemas

  • Text-to-Speech Conversion
    • Input: text (string), voice_id (string), model_id (string), output_format (string)
    • Output: base64_audio (string)
  • Generate Sound Effects
    • Input: prompt (string), duration_seconds (optional float between 0.5 and 22)
    • Output: base64_audio (string)

Limitations

  • Audio duration for sound effects is restricted between 0.5 and 22 seconds.
  • Certain output formats may require specific tiers or configurations.

Setup & Configuration

Prerequisites

  • An Eleven Labs account is required.
  • Obtain an API key from the Eleven Labs platform.

Authentication

  • API key must be set in environment variable ELEVEN_LABS_API_KEY.
    • Example: export ELEVEN_LABS_API_KEY="YOUR_API_KEY"

Step-by-Step Guide

  1. Acquire your API key by logging into Eleven Labs and navigating to API settings.
  2. Configure your environment with the API key: export ELEVEN_LABS_API_KEY="YOUR_API_KEY".
  3. Optionally, set the target directory for storing audio outputs.

Testing Connection

  • Verify by calling the get_voices function to ensure proper API communication.

How to Use in Agents

  • Example code snippet to add in your agent's toolkit:

python eleven_labs_tools = ElevenLabsTools(voice_id="your_voice_id", api_key="your_api_key") agent.register_toolkit(eleven_labs_tools)

  • Utilize the text_to_speech or generate_sound_effect methods to process text inputs.

Best Practices

  • Optimize text input length to manage processing time and ensure quality audio output.
  • Consider audio format and size based on application needs and tier limitations.

Reference Section

  • API documentation provided by Eleven Labs: API Documentation
  • Supported audio formats are explicitly defined for processing precision.