AI Chat Completion

curl -X POST "https://api.tensorone.ai/v2/ai/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful AI assistant."
      },
      {
        "role": "user", 
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "model": "llama-3.1-70b",
    "maxTokens": 500,
    "temperature": 0.7
  }'

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Quantum computing is like having a super-powered calculator that works in a fundamentally different way than regular computers. While classical computers use bits that are either 0 or 1, quantum computers use quantum bits (qubits) that can be both 0 and 1 simultaneously through a property called superposition.\n\nThis allows quantum computers to explore many possible solutions at once, making them potentially much faster for specific types of problems like breaking encryption, simulating molecules, or optimizing complex systems.\n\nThink of it like this: if a classical computer is like checking each door in a maze one by one, a quantum computer is like a ghost that can walk through walls and check multiple paths simultaneously."
      },
      "finishReason": "stop"
    }
  ],
  "usage": {
    "promptTokens": 42,
    "completionTokens": 138,
    "totalTokens": 180
  }
}

curl -X POST "https://api.tensorone.ai/v2/ai/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful AI assistant."
      },
      {
        "role": "user", 
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "model": "llama-3.1-70b",
    "maxTokens": 500,
    "temperature": 0.7
  }'

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Quantum computing is like having a super-powered calculator that works in a fundamentally different way than regular computers. While classical computers use bits that are either 0 or 1, quantum computers use quantum bits (qubits) that can be both 0 and 1 simultaneously through a property called superposition.\n\nThis allows quantum computers to explore many possible solutions at once, making them potentially much faster for specific types of problems like breaking encryption, simulating molecules, or optimizing complex systems.\n\nThink of it like this: if a classical computer is like checking each door in a maze one by one, a quantum computer is like a ghost that can walk through walls and check multiple paths simultaneously."
      },
      "finishReason": "stop"
    }
  ],
  "usage": {
    "promptTokens": 42,
    "completionTokens": 138,
    "totalTokens": 180
  }
}

Generate intelligent responses using state-of-the-art language models including Llama, Mistral, and custom fine-tuned models available on the TensorOne platform.

Request Body

messages

array

required

Array of message objects representing the conversation history

Show Message Object

role

string

required

Role of the message sender: user, assistant, or system

content

string

required

The content of the message

model

string

default:"llama-3.1-70b"

Model to use for completion. Available models:

llama-3.1-70b (Recommended)
llama-3.1-8b
mistral-7b
mixtral-8x7b
Custom fine-tuned models

maxTokens

integer

default:"1000"

Maximum number of tokens to generate in the response

temperature

number

default:"0.7"

Controls randomness. 0.0 is deterministic, 1.0 is very creative

topP

number

default:"0.9"

Controls diversity via nucleus sampling

stream

boolean

default:"false"

Whether to stream the response as it’s generated

Response

choices

array

Array of completion choices

Show Choice Object

message

object

The generated message

Show Message Object

role

string

Always assistant for generated responses

content

string

The generated text content

finishReason

string

Why the generation stopped: stop, length, or content_filter

usage

object

Token usage statistics

Show Usage Object

promptTokens

integer

Number of tokens in the input prompt

completionTokens

integer

Number of tokens in the generated completion

totalTokens

integer

Total tokens used (prompt + completion)

Example

curl -X POST "https://api.tensorone.ai/v2/ai/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful AI assistant."
      },
      {
        "role": "user", 
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "model": "llama-3.1-70b",
    "maxTokens": 500,
    "temperature": 0.7
  }'

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Quantum computing is like having a super-powered calculator that works in a fundamentally different way than regular computers. While classical computers use bits that are either 0 or 1, quantum computers use quantum bits (qubits) that can be both 0 and 1 simultaneously through a property called superposition.\n\nThis allows quantum computers to explore many possible solutions at once, making them potentially much faster for specific types of problems like breaking encryption, simulating molecules, or optimizing complex systems.\n\nThink of it like this: if a classical computer is like checking each door in a maze one by one, a quantum computer is like a ghost that can walk through walls and check multiple paths simultaneously."
      },
      "finishReason": "stop"
    }
  ],
  "usage": {
    "promptTokens": 42,
    "completionTokens": 138,
    "totalTokens": 180
  }
}

Streaming

For real-time applications, you can stream responses as they’re generated:

import requests

response = requests.post(
    "https://api.tensorone.ai/v2/ai/chat",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "messages": [{"role": "user", "content": "Tell me a story"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

Best Practices

Use system messages to provide context and instructions
Adjust temperature based on your use case (lower for factual, higher for creative)
Monitor token usage to optimize costs
Implement retry logic for production applications
Use streaming for better user experience in chat applications

Training Evaluation Uncensored AI Chat

Getting Started

Account Management

GPU Clusters (VPS)

Serverless Endpoints

Managed Training

AI Services

Payment & Billing

Monitoring & Analytics

Request Body

Response

Example

Streaming

Best Practices

Getting Started

Account Management

GPU Clusters (VPS)

Serverless Endpoints

Managed Training

AI Services

Payment & Billing

Monitoring & Analytics

​Request Body

​Response

​Example

​Streaming

​Best Practices

Request Body

Response

Example

Streaming

Best Practices