Skip to main content

Embeddings

Embedding models enable developers to convert text into high-dimensional vector representations. These embeddings capture semantic meaning and can be used for semantic search, clustering, recommendation systems, and other machine learning applications.

Overview

The Embeddings API allows you to generate vector representations of text. The API is OpenAI-compatible and supports single text strings or arrays of text strings. Embeddings are useful for tasks such as semantic search, text similarity, clustering, and classification.

Use Cases

Embeddings enable powerful applications including semantic search (document search, product search, content recommendation, question answering), clustering and classification (content categorization, customer feedback analysis, anomaly detection, topic modeling), recommendation systems (content recommendations, user matching, cross-domain recommendations, personalization), and multilingual applications (cross-lingual search, translation quality assessment, multilingual clustering, language-agnostic systems).

Quick Start

Endpoint

POST https://api.inference.nebul.io/v1/embeddings

Parameters

ParameterTypeRequiredDescription
modelStringYesThe model ID to use (e.g., intfloat/multilingual-e5-large-instruct).
inputString or ArrayYesInput text to embed. Can be a single string or an array of strings.
encoding_formatStringNoThe format to return the embeddings in. Options: "float" (default) or "base64".
dimensionsIntegerNoThe number of dimensions the resulting output embeddings should have. Only supported for some models.
userStringNoA unique identifier representing your end-user.
normalizeBooleanNoWhether to normalize the embeddings. Defaults to false.
embed_dtypeStringNoThe data type for embeddings. Options: "float32" (default) or "float16".
add_special_tokensBooleanNoWhether to add special tokens. Defaults to true.

Code Examples

Single text embedding:

python
1234567891011121314151617181920
import requests
import numpy as np
url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json",
}
payload = {
"model": "intfloat/multilingual-e5-large-instruct",
"input": "The quick brown fox jumps over the lazy dog."
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
embedding = data["data"][0]["embedding"]
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Multiple text embeddings:

python
12345678910111213141516171819202122
import requests
url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json",
}
payload = {
"model": "intfloat/multilingual-e5-large-instruct",
"input": [
"The weather is nice today.",
"It's a beautiful sunny day.",
"I love programming in Python."
]
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
for i, item in enumerate(data["data"]):
print(f"Text {i+1}: {len(item['embedding'])} dimensions")

With normalization:

python
123456789101112131415161718192021
import requests
import numpy as np
url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json",
}
payload = {
"model": "intfloat/multilingual-e5-large-instruct",
"input": "Semantic search with embeddings",
"normalize": True
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
embedding = np.array(data["data"][0]["embedding"])
# Normalized embeddings have unit length
print(f"Embedding norm: {np.linalg.norm(embedding):.6f}")
tip

Batch Processing: The API supports batch processing of multiple texts in a single request. This is more efficient than making separate requests for each text. However, be mindful of token limits and request size.

tip

Normalization: When using embeddings for similarity calculations, consider setting normalize: true to ensure all embeddings have unit length. This makes cosine similarity calculations more efficient and accurate.

Response Format

The API returns a JSON object in OpenAI-compatible format:

json
123456789101112131415
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.0123, -0.0456, 0.0789, ...],
"index": 0
}
],
"model": "intfloat/multilingual-e5-large-instruct",
"usage": {
"prompt_tokens": 10,
"total_tokens": 10
}
}

Response Fields

FieldTypeDescription
objectStringAlways "list" for embeddings responses.
dataArrayList of embedding objects.
data[].objectStringAlways "embedding".
data[].embeddingArrayThe embedding vector as an array of floats.
data[].indexIntegerThe index of the embedding in the input array.
modelStringThe model ID used for generating embeddings.
usageObjectToken usage information.
usage.prompt_tokensIntegerNumber of tokens in the input.
usage.total_tokensIntegerTotal tokens processed.

Advanced Usage

Semantic Similarity

Calculate similarity between texts using cosine similarity:

python
12345678910111213141516171819202122232425262728293031
import requests
import numpy as np
from numpy.linalg import norm
url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json",
}
# Get embeddings for two texts
texts = [
"The cat sat on the mat",
"A feline rested on the rug"
]
payload = {
"model": "intfloat/multilingual-e5-large-instruct",
"input": texts,
"normalize": True
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
embedding1 = np.array(data["data"][0]["embedding"])
embedding2 = np.array(data["data"][1]["embedding"])
# Cosine similarity
similarity = np.dot(embedding1, embedding2)
print(f"Semantic similarity: {similarity:.4f}")

Batch Processing

Process multiple texts efficiently:

python
12345678910111213141516171819202122232425
import requests
url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json",
}
# Large batch of texts
texts = [
"Document 1 content...",
"Document 2 content...",
# ... many more documents
]
payload = {
"model": "intfloat/multilingual-e5-large-instruct",
"input": texts
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
print(f"Processed {len(data['data'])} embeddings")
print(f"Total tokens: {data['usage']['total_tokens']}")

Base64 Encoding

Get embeddings in base64 format for efficient storage:

python
12345678910111213141516171819202122232425
import requests
import base64
import numpy as np
url = "https://api.inference.nebul.io/v1/embeddings"
headers = {
"Authorization": "Bearer <YOUR_API_KEY>",
"Content-Type": "application/json",
}
payload = {
"model": "intfloat/multilingual-e5-large-instruct",
"input": "Text to embed",
"encoding_format": "base64"
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
# Decode base64 embedding
base64_embedding = data["data"][0]["embedding"]
embedding_bytes = base64.b64decode(base64_embedding)
embedding = np.frombuffer(embedding_bytes, dtype=np.float32)
print(f"Decoded embedding shape: {embedding.shape}")

Model Specifications

The following embedding models are available:

  • BAAI/bge-m3 - 567M parameters, 8K context, float16 precision, supports Text
  • sentence-transformers/all-MiniLM-L6-v2 - 22M parameters, float16 precision, supports Text
  • intfloat/multilingual-e5-large-instruct - 0.6B parameters, 8K context, bfloat16 precision, supports Text, Multilingual (100+ languages)

Best Practices

  1. Normalization: Use normalize: true when computing cosine similarity or distance metrics. Normalized embeddings have unit length, making similarity calculations more efficient.

  2. Batch Processing:

    • Process multiple texts in a single request for better efficiency
    • The API can handle arrays of texts efficiently
    • Consider batching when processing large datasets
  3. Embedding Storage:

    • Store embeddings in vector databases (e.g., Pinecone, Weaviate, Qdrant)
    • Use base64 encoding for compact storage if needed
    • Consider dimensionality reduction for very large embeddings
  4. Similarity Metrics:

    • Use cosine similarity for normalized embeddings
    • Use dot product for normalized embeddings (equivalent to cosine similarity)
    • Use Euclidean distance for non-normalized embeddings
  5. Model Selection:

    • Choose models based on your language requirements
    • Multilingual models work well for cross-language applications
    • Consider model size vs. performance trade-offs
  6. Error Handling:

    • Handle rate limits gracefully
    • Implement retry logic for transient failures
    • Validate input text length (respect context limits)
  7. Performance Optimization:

    • Cache embeddings for frequently accessed texts
    • Batch requests to reduce API calls
    • Use async processing for large datasets
  8. Quality Considerations:

    • Test embedding quality on your specific use case
    • Fine-tune similarity thresholds based on your data
    • Consider domain-specific models if available