Embeddings

Embedding models enable developers to convert text into high-dimensional vector representations. These embeddings capture semantic meaning and can be used for semantic search, clustering, recommendation systems, and other machine learning applications.

Overview

The Embeddings API allows you to generate vector representations of text. The API is OpenAI-compatible and supports single text strings or arrays of text strings. Embeddings are useful for tasks such as semantic search, text similarity, clustering, and classification.

Use Cases

Embeddings enable powerful applications including semantic search (document search, product search, content recommendation, question answering), clustering and classification (content categorization, customer feedback analysis, anomaly detection, topic modeling), recommendation systems (content recommendations, user matching, cross-domain recommendations, personalization), and multilingual applications (cross-lingual search, translation quality assessment, multilingual clustering, language-agnostic systems).

Quick Start

Endpoint

POST https://api.inference.nebul.io/v1/embeddings

Parameters

Parameter	Type	Required	Description
`model`	String	Yes	The model ID to use (e.g., `intfloat/multilingual-e5-large-instruct`).
`input`	String or Array	Yes	Input text to embed. Can be a single string or an array of strings.
`encoding_format`	String	No	The format to return the embeddings in. Options: `"float"` (default) or `"base64"`.
`dimensions`	Integer	No	The number of dimensions the resulting output embeddings should have. Only supported for some models.
`user`	String	No	A unique identifier representing your end-user.
`normalize`	Boolean	No	Whether to normalize the embeddings. Defaults to `false`.
`embed_dtype`	String	No	The data type for embeddings. Options: `"float32"` (default) or `"float16"`.
`add_special_tokens`	Boolean	No	Whether to add special tokens. Defaults to `true`.

Code Examples

Python
cURL

Single text embedding:

python
import requests
import numpy as np

url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
    "Authorization": "Bearer <YOUR_API_KEY>",
    "Content-Type": "application/json",
}

payload = {
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": "The quick brown fox jumps over the lazy dog."
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

embedding = data["data"][0]["embedding"]
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Multiple text embeddings:

python
import requests

url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
    "Authorization": "Bearer <YOUR_API_KEY>",
    "Content-Type": "application/json",
}

payload = {
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": [
        "The weather is nice today.",
        "It's a beautiful sunny day.",
        "I love programming in Python."
    ]
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

for i, item in enumerate(data["data"]):
    print(f"Text {i+1}: {len(item['embedding'])} dimensions")

With normalization:

python
import requests
import numpy as np

url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
    "Authorization": "Bearer <YOUR_API_KEY>",
    "Content-Type": "application/json",
}

payload = {
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": "Semantic search with embeddings",
    "normalize": True
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

embedding = np.array(data["data"][0]["embedding"])
# Normalized embeddings have unit length
print(f"Embedding norm: {np.linalg.norm(embedding):.6f}")

Single text embedding:

bash
curl -X POST https://api.inference.nebul.io/v1/embeddings \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

Multiple text embeddings:

bash
curl -X POST https://api.inference.nebul.io/v1/embeddings \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": [
      "The weather is nice today.",
      "It'\''s a beautiful sunny day.",
      "I love programming in Python."
    ]
  }'

With normalization:

bash
curl -X POST https://api.inference.nebul.io/v1/embeddings \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": "Semantic search with embeddings",
    "normalize": true
  }'

tip

Batch Processing: The API supports batch processing of multiple texts in a single request. This is more efficient than making separate requests for each text. However, be mindful of token limits and request size.

tip

Normalization: When using embeddings for similarity calculations, consider setting normalize: true to ensure all embeddings have unit length. This makes cosine similarity calculations more efficient and accurate.

Response Format

The API returns a JSON object in OpenAI-compatible format:

json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0123, -0.0456, 0.0789, ...],
      "index": 0
    }
  ],
  "model": "intfloat/multilingual-e5-large-instruct",
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10
  }
}

Response Fields

Field	Type	Description
`object`	String	Always `"list"` for embeddings responses.
`data`	Array	List of embedding objects.
`data[].object`	String	Always `"embedding"`.
`data[].embedding`	Array	The embedding vector as an array of floats.
`data[].index`	Integer	The index of the embedding in the input array.
`model`	String	The model ID used for generating embeddings.
`usage`	Object	Token usage information.
`usage.prompt_tokens`	Integer	Number of tokens in the input.
`usage.total_tokens`	Integer	Total tokens processed.

Advanced Usage

Semantic Similarity

Calculate similarity between texts using cosine similarity:

Python
cURL

python
import requests
import numpy as np
from numpy.linalg import norm

url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
    "Authorization": "Bearer <YOUR_API_KEY>",
    "Content-Type": "application/json",
}

# Get embeddings for two texts
texts = [
    "The cat sat on the mat",
    "A feline rested on the rug"
]

payload = {
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": texts,
    "normalize": True
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

embedding1 = np.array(data["data"][0]["embedding"])
embedding2 = np.array(data["data"][1]["embedding"])

# Cosine similarity
similarity = np.dot(embedding1, embedding2)
print(f"Semantic similarity: {similarity:.4f}")

bash
curl -X POST https://api.inference.nebul.io/v1/embeddings \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": [
      "The cat sat on the mat",
      "A feline rested on the rug"
    ],
    "normalize": true
  }'

Batch Processing

Process multiple texts efficiently:

Python
cURL

python
import requests

url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
    "Authorization": "Bearer <YOUR_API_KEY>",
    "Content-Type": "application/json",
}

# Large batch of texts
texts = [
    "Document 1 content...",
    "Document 2 content...",
    # ... many more documents
]

payload = {
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": texts
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

print(f"Processed {len(data['data'])} embeddings")
print(f"Total tokens: {data['usage']['total_tokens']}")

bash
curl -X POST https://api.inference.nebul.io/v1/embeddings \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": [
      "Document 1 content...",
      "Document 2 content...",
      "Document 3 content..."
    ]
  }'

Base64 Encoding

Get embeddings in base64 format for efficient storage:

Python
cURL

python
import requests
import base64
import numpy as np

url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
    "Authorization": "Bearer <YOUR_API_KEY>",
    "Content-Type": "application/json",
}

payload = {
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": "Text to embed",
    "encoding_format": "base64"
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

# Decode base64 embedding
base64_embedding = data["data"][0]["embedding"]
embedding_bytes = base64.b64decode(base64_embedding)
embedding = np.frombuffer(embedding_bytes, dtype=np.float32)

print(f"Decoded embedding shape: {embedding.shape}")

bash
curl -X POST https://api.inference.nebul.io/v1/embeddings \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "intfloat/multilingual-e5-large-instruct",
    "input": "Text to embed",
    "encoding_format": "base64"
  }'

Model Specifications

The following embedding models are available:

BAAI/bge-m3 - 567M parameters, 8K context, float16 precision, supports Text
sentence-transformers/all-MiniLM-L6-v2 - 22M parameters, float16 precision, supports Text
intfloat/multilingual-e5-large-instruct - 0.6B parameters, 8K context, bfloat16 precision, supports Text, Multilingual (100+ languages)

Best Practices

Normalization: Use normalize: true when computing cosine similarity or distance metrics. Normalized embeddings have unit length, making similarity calculations more efficient.
Batch Processing:
- Process multiple texts in a single request for better efficiency
- The API can handle arrays of texts efficiently
- Consider batching when processing large datasets
Embedding Storage:
- Store embeddings in vector databases (e.g., Pinecone, Weaviate, Qdrant)
- Use base64 encoding for compact storage if needed
- Consider dimensionality reduction for very large embeddings
Similarity Metrics:
- Use cosine similarity for normalized embeddings
- Use dot product for normalized embeddings (equivalent to cosine similarity)
- Use Euclidean distance for non-normalized embeddings
Model Selection:
- Choose models based on your language requirements
- Multilingual models work well for cross-language applications
- Consider model size vs. performance trade-offs
Error Handling:
- Handle rate limits gracefully
- Implement retry logic for transient failures
- Validate input text length (respect context limits)
Performance Optimization:
- Cache embeddings for frequently accessed texts
- Batch requests to reduce API calls
- Use async processing for large datasets
Quality Considerations:
- Test embedding quality on your specific use case
- Fine-tune similarity thresholds based on your data
- Consider domain-specific models if available

Overview​

Use Cases​

Quick Start​

Endpoint​

Parameters​

Code Examples​

Response Format​

Response Fields​

Advanced Usage​

Semantic Similarity​

Batch Processing​

Base64 Encoding​

Model Specifications​

Best Practices​