Embeddings
Embedding models enable developers to convert text into high-dimensional vector representations. These embeddings capture semantic meaning and can be used for semantic search, clustering, recommendation systems, and other machine learning applications.
Overview
The Embeddings API allows you to generate vector representations of text. The API is OpenAI-compatible and supports single text strings or arrays of text strings. Embeddings are useful for tasks such as semantic search, text similarity, clustering, and classification.
Use Cases
Embeddings enable powerful applications including semantic search (document search, product search, content recommendation, question answering), clustering and classification (content categorization, customer feedback analysis, anomaly detection, topic modeling), recommendation systems (content recommendations, user matching, cross-domain recommendations, personalization), and multilingual applications (cross-lingual search, translation quality assessment, multilingual clustering, language-agnostic systems).
Quick Start
Endpoint
POST https://api.inference.nebul.io/v1/embeddings
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | String | Yes | The model ID to use (e.g., intfloat/multilingual-e5-large-instruct). |
input | String or Array | Yes | Input text to embed. Can be a single string or an array of strings. |
encoding_format | String | No | The format to return the embeddings in. Options: "float" (default) or "base64". |
dimensions | Integer | No | The number of dimensions the resulting output embeddings should have. Only supported for some models. |
user | String | No | A unique identifier representing your end-user. |
normalize | Boolean | No | Whether to normalize the embeddings. Defaults to false. |
embed_dtype | String | No | The data type for embeddings. Options: "float32" (default) or "float16". |
add_special_tokens | Boolean | No | Whether to add special tokens. Defaults to true. |
Code Examples
- Python
- cURL
Single text embedding:
import requestsimport numpy as npurl = "https://api.inference.nebul.io/v1/embeddings"headers = {"Authorization": "Bearer <YOUR_API_KEY>","Content-Type": "application/json",}payload = {"model": "intfloat/multilingual-e5-large-instruct","input": "The quick brown fox jumps over the lazy dog."}response = requests.post(url, headers=headers, json=payload)data = response.json()embedding = data["data"][0]["embedding"]print(f"Embedding dimension: {len(embedding)}")print(f"First 5 values: {embedding[:5]}")
Multiple text embeddings:
import requestsurl = "https://api.inference.nebul.io/v1/embeddings"headers = {"Authorization": "Bearer <YOUR_API_KEY>","Content-Type": "application/json",}payload = {"model": "intfloat/multilingual-e5-large-instruct","input": ["The weather is nice today.","It's a beautiful sunny day.","I love programming in Python."]}response = requests.post(url, headers=headers, json=payload)data = response.json()for i, item in enumerate(data["data"]):print(f"Text {i+1}: {len(item['embedding'])} dimensions")
With normalization:
import requestsimport numpy as npurl = "https://api.inference.nebul.io/v1/embeddings"headers = {"Authorization": "Bearer <YOUR_API_KEY>","Content-Type": "application/json",}payload = {"model": "intfloat/multilingual-e5-large-instruct","input": "Semantic search with embeddings","normalize": True}response = requests.post(url, headers=headers, json=payload)data = response.json()embedding = np.array(data["data"][0]["embedding"])# Normalized embeddings have unit lengthprint(f"Embedding norm: {np.linalg.norm(embedding):.6f}")
Single text embedding:
curl -X POST https://api.inference.nebul.io/v1/embeddings \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "intfloat/multilingual-e5-large-instruct","input": "The quick brown fox jumps over the lazy dog."}'
Multiple text embeddings:
curl -X POST https://api.inference.nebul.io/v1/embeddings \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "intfloat/multilingual-e5-large-instruct","input": ["The weather is nice today.","It'\''s a beautiful sunny day.","I love programming in Python."]}'
With normalization:
curl -X POST https://api.inference.nebul.io/v1/embeddings \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "intfloat/multilingual-e5-large-instruct","input": "Semantic search with embeddings","normalize": true}'
Batch Processing: The API supports batch processing of multiple texts in a single request. This is more efficient than making separate requests for each text. However, be mindful of token limits and request size.
Normalization: When using embeddings for similarity calculations, consider setting normalize: true to ensure all embeddings have unit length. This makes cosine similarity calculations more efficient and accurate.
Response Format
The API returns a JSON object in OpenAI-compatible format:
{"object": "list","data": [{"object": "embedding","embedding": [0.0123, -0.0456, 0.0789, ...],"index": 0}],"model": "intfloat/multilingual-e5-large-instruct","usage": {"prompt_tokens": 10,"total_tokens": 10}}
Response Fields
| Field | Type | Description |
|---|---|---|
object | String | Always "list" for embeddings responses. |
data | Array | List of embedding objects. |
data[].object | String | Always "embedding". |
data[].embedding | Array | The embedding vector as an array of floats. |
data[].index | Integer | The index of the embedding in the input array. |
model | String | The model ID used for generating embeddings. |
usage | Object | Token usage information. |
usage.prompt_tokens | Integer | Number of tokens in the input. |
usage.total_tokens | Integer | Total tokens processed. |
Advanced Usage
Semantic Similarity
Calculate similarity between texts using cosine similarity:
- Python
- cURL
import requestsimport numpy as npfrom numpy.linalg import normurl = "https://api.inference.nebul.io/v1/embeddings"headers = {"Authorization": "Bearer <YOUR_API_KEY>","Content-Type": "application/json",}# Get embeddings for two textstexts = ["The cat sat on the mat","A feline rested on the rug"]payload = {"model": "intfloat/multilingual-e5-large-instruct","input": texts,"normalize": True}response = requests.post(url, headers=headers, json=payload)data = response.json()embedding1 = np.array(data["data"][0]["embedding"])embedding2 = np.array(data["data"][1]["embedding"])# Cosine similaritysimilarity = np.dot(embedding1, embedding2)print(f"Semantic similarity: {similarity:.4f}")
curl -X POST https://api.inference.nebul.io/v1/embeddings \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "intfloat/multilingual-e5-large-instruct","input": ["The cat sat on the mat","A feline rested on the rug"],"normalize": true}'
Batch Processing
Process multiple texts efficiently:
- Python
- cURL
import requestsurl = "https://api.inference.nebul.io/v1/embeddings"headers = {"Authorization": "Bearer <YOUR_API_KEY>","Content-Type": "application/json",}# Large batch of textstexts = ["Document 1 content...","Document 2 content...",# ... many more documents]payload = {"model": "intfloat/multilingual-e5-large-instruct","input": texts}response = requests.post(url, headers=headers, json=payload)data = response.json()print(f"Processed {len(data['data'])} embeddings")print(f"Total tokens: {data['usage']['total_tokens']}")
curl -X POST https://api.inference.nebul.io/v1/embeddings \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "intfloat/multilingual-e5-large-instruct","input": ["Document 1 content...","Document 2 content...","Document 3 content..."]}'
Base64 Encoding
Get embeddings in base64 format for efficient storage:
- Python
- cURL
import requestsimport base64import numpy as npurl = "https://api.inference.nebul.io/v1/embeddings"headers = {"Authorization": "Bearer <YOUR_API_KEY>","Content-Type": "application/json",}payload = {"model": "intfloat/multilingual-e5-large-instruct","input": "Text to embed","encoding_format": "base64"}response = requests.post(url, headers=headers, json=payload)data = response.json()# Decode base64 embeddingbase64_embedding = data["data"][0]["embedding"]embedding_bytes = base64.b64decode(base64_embedding)embedding = np.frombuffer(embedding_bytes, dtype=np.float32)print(f"Decoded embedding shape: {embedding.shape}")
curl -X POST https://api.inference.nebul.io/v1/embeddings \-H "Authorization: Bearer <YOUR_API_KEY>" \-H "Content-Type: application/json" \-d '{"model": "intfloat/multilingual-e5-large-instruct","input": "Text to embed","encoding_format": "base64"}'
Model Specifications
The following embedding models are available:
BAAI/bge-m3- 567M parameters, 8K context, float16 precision, supports Textsentence-transformers/all-MiniLM-L6-v2- 22M parameters, float16 precision, supports Textintfloat/multilingual-e5-large-instruct- 0.6B parameters, 8K context, bfloat16 precision, supports Text, Multilingual (100+ languages)
Best Practices
-
Normalization: Use
normalize: truewhen computing cosine similarity or distance metrics. Normalized embeddings have unit length, making similarity calculations more efficient. -
Batch Processing:
- Process multiple texts in a single request for better efficiency
- The API can handle arrays of texts efficiently
- Consider batching when processing large datasets
-
Embedding Storage:
- Store embeddings in vector databases (e.g., Pinecone, Weaviate, Qdrant)
- Use base64 encoding for compact storage if needed
- Consider dimensionality reduction for very large embeddings
-
Similarity Metrics:
- Use cosine similarity for normalized embeddings
- Use dot product for normalized embeddings (equivalent to cosine similarity)
- Use Euclidean distance for non-normalized embeddings
-
Model Selection:
- Choose models based on your language requirements
- Multilingual models work well for cross-language applications
- Consider model size vs. performance trade-offs
-
Error Handling:
- Handle rate limits gracefully
- Implement retry logic for transient failures
- Validate input text length (respect context limits)
-
Performance Optimization:
- Cache embeddings for frequently accessed texts
- Batch requests to reduce API calls
- Use async processing for large datasets
-
Quality Considerations:
- Test embedding quality on your specific use case
- Fine-tune similarity thresholds based on your data
- Consider domain-specific models if available