curl -X POST "https://api.tensorone.ai/v2/ai/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful AI assistant."
      },
      {
        "role": "user", 
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "model": "llama-3.1-70b",
    "maxTokens": 500,
    "temperature": 0.7
  }'
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Quantum computing is like having a super-powered calculator that works in a fundamentally different way than regular computers. While classical computers use bits that are either 0 or 1, quantum computers use quantum bits (qubits) that can be both 0 and 1 simultaneously through a property called superposition.\n\nThis allows quantum computers to explore many possible solutions at once, making them potentially much faster for specific types of problems like breaking encryption, simulating molecules, or optimizing complex systems.\n\nThink of it like this: if a classical computer is like checking each door in a maze one by one, a quantum computer is like a ghost that can walk through walls and check multiple paths simultaneously."
      },
      "finishReason": "stop"
    }
  ],
  "usage": {
    "promptTokens": 42,
    "completionTokens": 138,
    "totalTokens": 180
  }
}
Generate intelligent responses using state-of-the-art language models including Llama, Mistral, and custom fine-tuned models available on the TensorOne platform.

Request Body

messages
array
required
Array of message objects representing the conversation history
model
string
default:"llama-3.1-70b"
Model to use for completion. Available models:
  • llama-3.1-70b (Recommended)
  • llama-3.1-8b
  • mistral-7b
  • mixtral-8x7b
  • Custom fine-tuned models
maxTokens
integer
default:"1000"
Maximum number of tokens to generate in the response
temperature
number
default:"0.7"
Controls randomness. 0.0 is deterministic, 1.0 is very creative
topP
number
default:"0.9"
Controls diversity via nucleus sampling
stream
boolean
default:"false"
Whether to stream the response as it’s generated

Response

choices
array
Array of completion choices
usage
object
Token usage statistics

Example

curl -X POST "https://api.tensorone.ai/v2/ai/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful AI assistant."
      },
      {
        "role": "user", 
        "content": "Explain quantum computing in simple terms"
      }
    ],
    "model": "llama-3.1-70b",
    "maxTokens": 500,
    "temperature": 0.7
  }'
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Quantum computing is like having a super-powered calculator that works in a fundamentally different way than regular computers. While classical computers use bits that are either 0 or 1, quantum computers use quantum bits (qubits) that can be both 0 and 1 simultaneously through a property called superposition.\n\nThis allows quantum computers to explore many possible solutions at once, making them potentially much faster for specific types of problems like breaking encryption, simulating molecules, or optimizing complex systems.\n\nThink of it like this: if a classical computer is like checking each door in a maze one by one, a quantum computer is like a ghost that can walk through walls and check multiple paths simultaneously."
      },
      "finishReason": "stop"
    }
  ],
  "usage": {
    "promptTokens": 42,
    "completionTokens": 138,
    "totalTokens": 180
  }
}

Streaming

For real-time applications, you can stream responses as they’re generated:
import requests

response = requests.post(
    "https://api.tensorone.ai/v2/ai/chat",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "messages": [{"role": "user", "content": "Tell me a story"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

Best Practices

  • Use system messages to provide context and instructions
  • Adjust temperature based on your use case (lower for factual, higher for creative)
  • Monitor token usage to optimize costs
  • Implement retry logic for production applications
  • Use streaming for better user experience in chat applications