{
  "voiceId": "voice_abc123_def456",
  "status": "processing",
  "voiceName": "MyCustomVoice",
  "trainingProgress": {
    "currentStep": "preprocessing",
    "percentComplete": 15.0,
    "estimatedTimeRemaining": "8-12 minutes"
  },
  "audioAnalysis": {
    "totalDuration": 180.5,
    "audioQuality": "good",
    "speakerConsistency": 0.87,
    "languageDetected": "en"
  },
  "createdAt": "2024-01-16T16:00:00Z"
}
Create realistic voice clones from audio samples and use them to generate natural-sounding speech. Perfect for content creation, dubbing, personalized assistants, and audiobook production.

Voice Cloning Process

Voice cloning involves two main steps:
  1. Voice Training: Upload audio samples to create a custom voice model
  2. Speech Generation: Use the cloned voice to generate speech from text

Create Voice Clone

Request Body

voiceName
string
required
Unique name for the voice clone (used for future reference)
audioSamples
array
required
Array of audio sample URLs or base64-encoded audio data
language
string
default:"en"
Primary language of the voice samples:
  • en - English
  • es - Spanish
  • fr - French
  • de - German
  • it - Italian
  • pt - Portuguese
  • ja - Japanese
  • ko - Korean
  • zh - Chinese
voiceType
string
default:"standard"
Voice cloning quality level:
  • instant - Fast cloning from 10-30 seconds of audio
  • standard - Balanced quality, requires 1-3 minutes of audio
  • premium - Highest quality, requires 5-10 minutes of audio
  • professional - Studio quality, requires 15+ minutes of clean audio
trainingOptions
object
Advanced training configuration

Response

voiceId
string
Unique identifier for the created voice clone
status
string
Training status: processing, completed, failed
voiceName
string
Name of the voice clone
trainingProgress
object
Training progress information
audioAnalysis
object
Analysis of the provided audio samples

Example: Create Voice Clone

cURL
curl -X POST "https://api.tensorone.ai/v2/ai/voice-cloning" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voiceName": "MyCustomVoice",
    "audioSamples": [
      {
        "audioUrl": "https://example.com/sample1.wav",
        "transcript": "Hello, this is a sample of my voice for cloning purposes."
      },
      {
        "audioUrl": "https://example.com/sample2.wav", 
        "transcript": "I am providing multiple samples to improve the quality of the voice clone."
      }
    ],
    "language": "en",
    "voiceType": "standard",
    "trainingOptions": {
      "enhanceQuality": true,
      "removeNoise": true,
      "normalizeVolume": true
    }
  }'
Python
import requests
import base64
import time

def create_voice_clone(voice_name, audio_files_with_transcripts):
    # Prepare audio samples
    audio_samples = []
    
    for audio_file, transcript in audio_files_with_transcripts:
        # Read and encode audio file
        with open(audio_file, 'rb') as f:
            audio_data = base64.b64encode(f.read()).decode('utf-8')
        
        audio_samples.append({
            "audioBase64": audio_data,
            "transcript": transcript
        })
    
    # Create voice clone
    response = requests.post(
        "https://api.tensorone.ai/v2/ai/voice-cloning",
        headers={
            "Authorization": "Bearer YOUR_API_KEY",
            "Content-Type": "application/json"
        },
        json={
            "voiceName": voice_name,
            "audioSamples": audio_samples,
            "language": "en",
            "voiceType": "standard",
            "trainingOptions": {
                "enhanceQuality": True,
                "removeNoise": True,
                "normalizeVolume": True,
                "speakerConsistency": 0.85
            }
        }
    )
    
    return response.json()

# Create voice clone
audio_files = [
    ("sample1.wav", "Hello, this is my voice speaking clearly."),
    ("sample2.wav", "I'm providing samples for voice cloning."),
    ("sample3.wav", "The more samples I provide, the better the clone will be.")
]

result = create_voice_clone("JohnDoeVoice", audio_files)
voice_id = result['voiceId']

print(f"Voice cloning started: {voice_id}")
print(f"Audio quality: {result['audioAnalysis']['audioQuality']}")
print(f"Total duration: {result['audioAnalysis']['totalDuration']} seconds")

# Monitor training progress
while True:
    status_response = requests.get(
        f"https://api.tensorone.ai/v2/ai/voice-cloning/{voice_id}/status",
        headers={"Authorization": "Bearer YOUR_API_KEY"}
    )
    
    status = status_response.json()
    
    if status['status'] == 'completed':
        print("Voice cloning completed!")
        break
    elif status['status'] == 'failed':
        print(f"Voice cloning failed: {status.get('error')}")
        break
    else:
        progress = status['trainingProgress']
        print(f"Status: {progress['currentStep']} - {progress['percentComplete']:.1f}%")
        time.sleep(30)
JavaScript
const fs = require('fs');

async function createVoiceClone(voiceName, audioFiles) {
  // Prepare audio samples
  const audioSamples = [];
  
  for (const { filePath, transcript } of audioFiles) {
    const audioBuffer = fs.readFileSync(filePath);
    const audioBase64 = audioBuffer.toString('base64');
    
    audioSamples.push({
      audioBase64,
      transcript
    });
  }
  
  const response = await fetch('https://api.tensorone.ai/v2/ai/voice-cloning', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      voiceName,
      audioSamples,
      language: 'en',
      voiceType: 'standard'
    })
  });
  
  return await response.json();
}

// Usage
const audioFiles = [
  { filePath: 'sample1.wav', transcript: 'Hello, my name is Sarah.' },
  { filePath: 'sample2.wav', transcript: 'I am creating a voice clone.' }
];

createVoiceClone('SarahVoice', audioFiles)
  .then(result => {
    console.log('Voice cloning started:', result.voiceId);
    return monitorTraining(result.voiceId);
  })
  .then(() => {
    console.log('Voice clone ready!');
  });
{
  "voiceId": "voice_abc123_def456",
  "status": "processing",
  "voiceName": "MyCustomVoice",
  "trainingProgress": {
    "currentStep": "preprocessing",
    "percentComplete": 15.0,
    "estimatedTimeRemaining": "8-12 minutes"
  },
  "audioAnalysis": {
    "totalDuration": 180.5,
    "audioQuality": "good",
    "speakerConsistency": 0.87,
    "languageDetected": "en"
  },
  "createdAt": "2024-01-16T16:00:00Z"
}

Generate Speech with Cloned Voice

Once your voice clone is ready, use it to generate speech:

Request Body

voiceId
string
required
ID of the voice clone to use
text
string
required
Text to convert to speech (max 5000 characters)
speed
number
default:"1.0"
Speech speed multiplier (0.5 to 2.0)
pitch
number
default:"0.0"
Pitch adjustment in semitones (-12 to +12)
emotion
string
default:"neutral"
Emotional tone: neutral, happy, sad, angry, excited, calm
outputFormat
string
default:"mp3"
Audio output format: mp3, wav, flac, ogg
sampleRate
integer
default:"44100"
Audio sample rate: 22050, 44100, 48000

Example: Generate Speech

cURL
curl -X POST "https://api.tensorone.ai/v2/ai/voice-cloning/voice_abc123_def456/speak" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! This is my cloned voice speaking. It sounds just like the original, doesn'\''t it?",
    "speed": 1.0,
    "pitch": 0.0,
    "emotion": "happy",
    "outputFormat": "mp3"
  }'
Python
def generate_speech(voice_id, text, emotion="neutral"):
    response = requests.post(
        f"https://api.tensorone.ai/v2/ai/voice-cloning/{voice_id}/speak",
        headers={
            "Authorization": "Bearer YOUR_API_KEY",
            "Content-Type": "application/json"
        },
        json={
            "text": text,
            "speed": 1.0,
            "pitch": 0.0,
            "emotion": emotion,
            "outputFormat": "mp3",
            "sampleRate": 44100
        }
    )
    
    return response.json()

# Generate speech with cloned voice
speech_result = generate_speech(
    "voice_abc123_def456",
    "Welcome to our podcast! Today we'll be discussing the latest in AI technology.",
    emotion="excited"
)

print(f"Audio URL: {speech_result['audioUrl']}")

# Download the generated audio
audio_response = requests.get(speech_result['audioUrl'])
with open('generated_speech.mp3', 'wb') as f:
    f.write(audio_response.content)

print("Speech generated and saved!")
{
  "audioUrl": "https://voices.tensorone.ai/generated/speech_abc123.mp3",
  "duration": 12.5,
  "text": "Hello! This is my cloned voice speaking.",
  "voiceId": "voice_abc123_def456",
  "format": "mp3",
  "sampleRate": 44100,
  "generatedAt": "2024-01-16T16:15:00Z"
}

Voice Management

List Your Voice Clones

cURL
curl -X GET "https://api.tensorone.ai/v2/ai/voice-cloning/voices" \
  -H "Authorization: Bearer YOUR_API_KEY"
Python
# List all voice clones
response = requests.get(
    "https://api.tensorone.ai/v2/ai/voice-cloning/voices",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

voices = response.json()
print(f"Found {len(voices['voices'])} voice clones:")

for voice in voices['voices']:
    print(f"- {voice['voiceName']} ({voice['voiceId']}) - {voice['status']}")
    print(f"  Language: {voice['language']}, Quality: {voice['audioQuality']}")

Delete Voice Clone

cURL
curl -X DELETE "https://api.tensorone.ai/v2/ai/voice-cloning/voice_abc123_def456" \
  -H "Authorization: Bearer YOUR_API_KEY"
Python
# Delete a voice clone
response = requests.delete(
    f"https://api.tensorone.ai/v2/ai/voice-cloning/{voice_id}",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

if response.status_code == 204:
    print("Voice clone deleted successfully")

Advanced Features

Multilingual Voice Cloning

Create voices that can speak multiple languages:
multilingual_voice = requests.post(
    "https://api.tensorone.ai/v2/ai/voice-cloning",
    json={
        "voiceName": "MultilingualSpeaker",
        "audioSamples": [
            {
                "audioUrl": "english_sample.wav",
                "transcript": "Hello, how are you?",
                "language": "en"
            },
            {
                "audioUrl": "spanish_sample.wav", 
                "transcript": "Hola, ¿cómo estás?",
                "language": "es"
            }
        ],
        "voiceType": "premium",
        "multilingualSupport": True
    }
)

Emotional Range Training

Train voices with emotional variety:
emotional_voice = requests.post(
    "https://api.tensorone.ai/v2/ai/voice-cloning",
    json={
        "voiceName": "EmotionalNarrator",
        "audioSamples": [
            {
                "audioUrl": "happy_sample.wav",
                "transcript": "I'm so excited about this!",
                "emotion": "happy"
            },
            {
                "audioUrl": "sad_sample.wav",
                "transcript": "This is very disappointing.",
                "emotion": "sad"
            },
            {
                "audioUrl": "neutral_sample.wav",
                "transcript": "This is a neutral statement.",
                "emotion": "neutral"
            }
        ],
        "voiceType": "professional",
        "emotionalRange": True
    }
)

Batch Speech Generation

Generate multiple audio files at once:
def batch_generate_speech(voice_id, texts):
    batch_request = {
        "voiceId": voice_id,
        "texts": [
            {"text": text, "filename": f"speech_{i}.mp3"}
            for i, text in enumerate(texts)
        ],
        "outputFormat": "mp3"
    }
    
    response = requests.post(
        "https://api.tensorone.ai/v2/ai/voice-cloning/batch-speak",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json=batch_request
    )
    
    return response.json()

# Generate multiple speeches
texts = [
    "Welcome to chapter one of our audiobook.",
    "In this chapter, we explore the basics.",
    "Let's begin our journey together."
]

batch_result = batch_generate_speech("voice_abc123_def456", texts)
print(f"Batch job started: {batch_result['batchId']}")

Quality Guidelines

Audio Sample Requirements

  • Duration: Minimum 30 seconds per sample, 5+ minutes total recommended
  • Quality: Clear, noise-free audio (preferably studio quality)
  • Consistency: Same speaker, environment, and recording conditions
  • Format: WAV or FLAC preferred, MP3 acceptable
  • Sample Rate: 44.1kHz or 48kHz recommended

Best Practices

  • Multiple Samples: Use 3-10 different recordings for better quality
  • Varied Content: Include different speaking styles and emotional tones
  • Clean Audio: Remove background noise, echo, and artifacts
  • Consistent Volume: Normalize audio levels across samples
  • Transcripts: Always provide accurate transcripts for better results

Use Cases

Content Creation

  • Podcasts: Create consistent voice for regular episodes
  • Audiobooks: Narrate books with custom voices
  • YouTube: Generate voiceovers for video content
  • Advertising: Create brand-specific voice for commercials

Personalization

  • Virtual Assistants: Custom voice for AI assistants
  • Gaming: Character voices for video games
  • Apps: Personalized voice notifications
  • Accessibility: Voice restoration for medical conditions

Business Applications

  • Call Centers: Consistent brand voice for automated systems
  • Training: Corporate training materials with specific voices
  • Dubbing: Localize content with cloned voices
  • Presentations: Professional narration for business content

Pricing

  • Voice Training:
    • Instant: $5 per voice clone
    • Standard: $15 per voice clone
    • Premium: $50 per voice clone
    • Professional: $100 per voice clone
  • Speech Generation: $0.02 per minute of generated audio
  • Batch Processing: 20% discount for 100+ audio files
  • Commercial License: Additional licensing fees may apply

Limitations

  • Processing Time: 5-30 minutes depending on voice type and audio duration
  • Storage: Voice models stored for 90 days by default
  • Usage: Generated content must comply with platform terms of service
  • Languages: Quality varies by language, English has best support
Voice clones are stored securely and can only be accessed by your account. Voice models are automatically deleted after 90 days unless renewed.
Voice cloning should only be used with explicit consent from the original speaker. Misuse for impersonation or fraud is strictly prohibited and may result in account termination.