Create realistic voice clones from audio samples and use them to generate natural-sounding speech. Perfect for content creation, dubbing, personalized assistants, and audiobook production.
Voice Cloning Process
Voice cloning involves two main steps:
Voice Training : Upload audio samples to create a custom voice model
Speech Generation : Use the cloned voice to generate speech from text
Create Voice Clone
Request Body
Unique name for the voice clone (used for future reference)
Array of audio sample URLs or base64-encoded audio data URL to audio file (WAV, MP3, or FLAC)
Base64-encoded audio data (alternative to audioUrl)
Text transcript of the audio (improves cloning quality)
Audio duration in seconds (auto-detected if not provided)
Primary language of the voice samples:
en
- English
es
- Spanish
fr
- French
de
- German
it
- Italian
pt
- Portuguese
ja
- Japanese
ko
- Korean
zh
- Chinese
Voice cloning quality level:
instant
- Fast cloning from 10-30 seconds of audio
standard
- Balanced quality, requires 1-3 minutes of audio
premium
- Highest quality, requires 5-10 minutes of audio
professional
- Studio quality, requires 15+ minutes of clean audio
Advanced training configuration Apply audio enhancement during training
Automatically remove background noise
Normalize audio volume levels
Consistency threshold for multiple samples (0.0 to 1.0)
Response
Unique identifier for the created voice clone
Training status: processing
, completed
, failed
Training progress information Current training step: preprocessing
, training
, validation
, completed
Training completion percentage (0-100)
Estimated time until completion
Analysis of the provided audio samples Total duration of all audio samples in seconds
Overall audio quality: excellent
, good
, fair
, poor
Consistency score across samples (0.0 to 1.0)
Detected primary language
Example: Create Voice Clone
curl -X POST "https://api.tensorone.ai/v2/ai/voice-cloning" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"voiceName": "MyCustomVoice",
"audioSamples": [
{
"audioUrl": "https://example.com/sample1.wav",
"transcript": "Hello, this is a sample of my voice for cloning purposes."
},
{
"audioUrl": "https://example.com/sample2.wav",
"transcript": "I am providing multiple samples to improve the quality of the voice clone."
}
],
"language": "en",
"voiceType": "standard",
"trainingOptions": {
"enhanceQuality": true,
"removeNoise": true,
"normalizeVolume": true
}
}'
import requests
import base64
import time
def create_voice_clone ( voice_name , audio_files_with_transcripts ):
# Prepare audio samples
audio_samples = []
for audio_file, transcript in audio_files_with_transcripts:
# Read and encode audio file
with open (audio_file, 'rb' ) as f:
audio_data = base64.b64encode(f.read()).decode( 'utf-8' )
audio_samples.append({
"audioBase64" : audio_data,
"transcript" : transcript
})
# Create voice clone
response = requests.post(
"https://api.tensorone.ai/v2/ai/voice-cloning" ,
headers = {
"Authorization" : "Bearer YOUR_API_KEY" ,
"Content-Type" : "application/json"
},
json = {
"voiceName" : voice_name,
"audioSamples" : audio_samples,
"language" : "en" ,
"voiceType" : "standard" ,
"trainingOptions" : {
"enhanceQuality" : True ,
"removeNoise" : True ,
"normalizeVolume" : True ,
"speakerConsistency" : 0.85
}
}
)
return response.json()
# Create voice clone
audio_files = [
( "sample1.wav" , "Hello, this is my voice speaking clearly." ),
( "sample2.wav" , "I'm providing samples for voice cloning." ),
( "sample3.wav" , "The more samples I provide, the better the clone will be." )
]
result = create_voice_clone( "JohnDoeVoice" , audio_files)
voice_id = result[ 'voiceId' ]
print ( f "Voice cloning started: { voice_id } " )
print ( f "Audio quality: { result[ 'audioAnalysis' ][ 'audioQuality' ] } " )
print ( f "Total duration: { result[ 'audioAnalysis' ][ 'totalDuration' ] } seconds" )
# Monitor training progress
while True :
status_response = requests.get(
f "https://api.tensorone.ai/v2/ai/voice-cloning/ { voice_id } /status" ,
headers = { "Authorization" : "Bearer YOUR_API_KEY" }
)
status = status_response.json()
if status[ 'status' ] == 'completed' :
print ( "Voice cloning completed!" )
break
elif status[ 'status' ] == 'failed' :
print ( f "Voice cloning failed: { status.get( 'error' ) } " )
break
else :
progress = status[ 'trainingProgress' ]
print ( f "Status: { progress[ 'currentStep' ] } - { progress[ 'percentComplete' ] :.1f} %" )
time.sleep( 30 )
const fs = require ( 'fs' );
async function createVoiceClone ( voiceName , audioFiles ) {
// Prepare audio samples
const audioSamples = [];
for ( const { filePath , transcript } of audioFiles ) {
const audioBuffer = fs . readFileSync ( filePath );
const audioBase64 = audioBuffer . toString ( 'base64' );
audioSamples . push ({
audioBase64 ,
transcript
});
}
const response = await fetch ( 'https://api.tensorone.ai/v2/ai/voice-cloning' , {
method: 'POST' ,
headers: {
'Authorization' : 'Bearer YOUR_API_KEY' ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({
voiceName ,
audioSamples ,
language: 'en' ,
voiceType: 'standard'
})
});
return await response . json ();
}
// Usage
const audioFiles = [
{ filePath: 'sample1.wav' , transcript: 'Hello, my name is Sarah.' },
{ filePath: 'sample2.wav' , transcript: 'I am creating a voice clone.' }
];
createVoiceClone ( 'SarahVoice' , audioFiles )
. then ( result => {
console . log ( 'Voice cloning started:' , result . voiceId );
return monitorTraining ( result . voiceId );
})
. then (() => {
console . log ( 'Voice clone ready!' );
});
{
"voiceId" : "voice_abc123_def456" ,
"status" : "processing" ,
"voiceName" : "MyCustomVoice" ,
"trainingProgress" : {
"currentStep" : "preprocessing" ,
"percentComplete" : 15.0 ,
"estimatedTimeRemaining" : "8-12 minutes"
},
"audioAnalysis" : {
"totalDuration" : 180.5 ,
"audioQuality" : "good" ,
"speakerConsistency" : 0.87 ,
"languageDetected" : "en"
},
"createdAt" : "2024-01-16T16:00:00Z"
}
Generate Speech with Cloned Voice
Once your voice clone is ready, use it to generate speech:
Request Body
ID of the voice clone to use
Text to convert to speech (max 5000 characters)
Speech speed multiplier (0.5 to 2.0)
Pitch adjustment in semitones (-12 to +12)
Emotional tone: neutral
, happy
, sad
, angry
, excited
, calm
Audio output format: mp3
, wav
, flac
, ogg
Audio sample rate: 22050
, 44100
, 48000
Example: Generate Speech
curl -X POST "https://api.tensorone.ai/v2/ai/voice-cloning/voice_abc123_def456/speak" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello! This is my cloned voice speaking. It sounds just like the original, doesn' \' 't it?",
"speed": 1.0,
"pitch": 0.0,
"emotion": "happy",
"outputFormat": "mp3"
}'
def generate_speech ( voice_id , text , emotion = "neutral" ):
response = requests.post(
f "https://api.tensorone.ai/v2/ai/voice-cloning/ { voice_id } /speak" ,
headers = {
"Authorization" : "Bearer YOUR_API_KEY" ,
"Content-Type" : "application/json"
},
json = {
"text" : text,
"speed" : 1.0 ,
"pitch" : 0.0 ,
"emotion" : emotion,
"outputFormat" : "mp3" ,
"sampleRate" : 44100
}
)
return response.json()
# Generate speech with cloned voice
speech_result = generate_speech(
"voice_abc123_def456" ,
"Welcome to our podcast! Today we'll be discussing the latest in AI technology." ,
emotion = "excited"
)
print ( f "Audio URL: { speech_result[ 'audioUrl' ] } " )
# Download the generated audio
audio_response = requests.get(speech_result[ 'audioUrl' ])
with open ( 'generated_speech.mp3' , 'wb' ) as f:
f.write(audio_response.content)
print ( "Speech generated and saved!" )
Speech Generation Response
{
"audioUrl" : "https://voices.tensorone.ai/generated/speech_abc123.mp3" ,
"duration" : 12.5 ,
"text" : "Hello! This is my cloned voice speaking." ,
"voiceId" : "voice_abc123_def456" ,
"format" : "mp3" ,
"sampleRate" : 44100 ,
"generatedAt" : "2024-01-16T16:15:00Z"
}
Voice Management
List Your Voice Clones
curl -X GET "https://api.tensorone.ai/v2/ai/voice-cloning/voices" \
-H "Authorization: Bearer YOUR_API_KEY"
# List all voice clones
response = requests.get(
"https://api.tensorone.ai/v2/ai/voice-cloning/voices" ,
headers = { "Authorization" : "Bearer YOUR_API_KEY" }
)
voices = response.json()
print ( f "Found { len (voices[ 'voices' ]) } voice clones:" )
for voice in voices[ 'voices' ]:
print ( f "- { voice[ 'voiceName' ] } ( { voice[ 'voiceId' ] } ) - { voice[ 'status' ] } " )
print ( f " Language: { voice[ 'language' ] } , Quality: { voice[ 'audioQuality' ] } " )
Delete Voice Clone
curl -X DELETE "https://api.tensorone.ai/v2/ai/voice-cloning/voice_abc123_def456" \
-H "Authorization: Bearer YOUR_API_KEY"
# Delete a voice clone
response = requests.delete(
f "https://api.tensorone.ai/v2/ai/voice-cloning/ { voice_id } " ,
headers = { "Authorization" : "Bearer YOUR_API_KEY" }
)
if response.status_code == 204 :
print ( "Voice clone deleted successfully" )
Advanced Features
Multilingual Voice Cloning
Create voices that can speak multiple languages:
multilingual_voice = requests.post(
"https://api.tensorone.ai/v2/ai/voice-cloning" ,
json = {
"voiceName" : "MultilingualSpeaker" ,
"audioSamples" : [
{
"audioUrl" : "english_sample.wav" ,
"transcript" : "Hello, how are you?" ,
"language" : "en"
},
{
"audioUrl" : "spanish_sample.wav" ,
"transcript" : "Hola, ¿cómo estás?" ,
"language" : "es"
}
],
"voiceType" : "premium" ,
"multilingualSupport" : True
}
)
Emotional Range Training
Train voices with emotional variety:
emotional_voice = requests.post(
"https://api.tensorone.ai/v2/ai/voice-cloning" ,
json = {
"voiceName" : "EmotionalNarrator" ,
"audioSamples" : [
{
"audioUrl" : "happy_sample.wav" ,
"transcript" : "I'm so excited about this!" ,
"emotion" : "happy"
},
{
"audioUrl" : "sad_sample.wav" ,
"transcript" : "This is very disappointing." ,
"emotion" : "sad"
},
{
"audioUrl" : "neutral_sample.wav" ,
"transcript" : "This is a neutral statement." ,
"emotion" : "neutral"
}
],
"voiceType" : "professional" ,
"emotionalRange" : True
}
)
Batch Speech Generation
Generate multiple audio files at once:
def batch_generate_speech ( voice_id , texts ):
batch_request = {
"voiceId" : voice_id,
"texts" : [
{ "text" : text, "filename" : f "speech_ { i } .mp3" }
for i, text in enumerate (texts)
],
"outputFormat" : "mp3"
}
response = requests.post(
"https://api.tensorone.ai/v2/ai/voice-cloning/batch-speak" ,
headers = { "Authorization" : "Bearer YOUR_API_KEY" },
json = batch_request
)
return response.json()
# Generate multiple speeches
texts = [
"Welcome to chapter one of our audiobook." ,
"In this chapter, we explore the basics." ,
"Let's begin our journey together."
]
batch_result = batch_generate_speech( "voice_abc123_def456" , texts)
print ( f "Batch job started: { batch_result[ 'batchId' ] } " )
Quality Guidelines
Audio Sample Requirements
Duration : Minimum 30 seconds per sample, 5+ minutes total recommended
Quality : Clear, noise-free audio (preferably studio quality)
Consistency : Same speaker, environment, and recording conditions
Format : WAV or FLAC preferred, MP3 acceptable
Sample Rate : 44.1kHz or 48kHz recommended
Best Practices
Multiple Samples : Use 3-10 different recordings for better quality
Varied Content : Include different speaking styles and emotional tones
Clean Audio : Remove background noise, echo, and artifacts
Consistent Volume : Normalize audio levels across samples
Transcripts : Always provide accurate transcripts for better results
Use Cases
Content Creation
Podcasts : Create consistent voice for regular episodes
Audiobooks : Narrate books with custom voices
YouTube : Generate voiceovers for video content
Advertising : Create brand-specific voice for commercials
Personalization
Virtual Assistants : Custom voice for AI assistants
Gaming : Character voices for video games
Apps : Personalized voice notifications
Accessibility : Voice restoration for medical conditions
Business Applications
Call Centers : Consistent brand voice for automated systems
Training : Corporate training materials with specific voices
Dubbing : Localize content with cloned voices
Presentations : Professional narration for business content
Pricing
Voice Training :
Instant: $5 per voice clone
Standard: $15 per voice clone
Premium: $50 per voice clone
Professional: $100 per voice clone
Speech Generation : $0.02 per minute of generated audio
Batch Processing : 20% discount for 100+ audio files
Commercial License : Additional licensing fees may apply
Limitations
Processing Time : 5-30 minutes depending on voice type and audio duration
Storage : Voice models stored for 90 days by default
Usage : Generated content must comply with platform terms of service
Languages : Quality varies by language, English has best support
Voice clones are stored securely and can only be accessed by your account. Voice models are automatically deleted after 90 days unless renewed.
Voice cloning should only be used with explicit consent from the original speaker. Misuse for impersonation or fraud is strictly prohibited and may result in account termination.