Reranking

PREVIEW MODE

This feature and its corresponding model are currently in preview mode. Implementation details may change before the official release. Access to the reranking model is available upon request.

Reranking models enable developers to score and reorder documents based on their relevance to a query. These examples will use the BAAI/bge-reranker-v2-m3 model, which is a state-of-the-art reranking model that supports multilingual text reranking.

Overview

The Reranking API allows you to score a list of documents against a query and return them in order of relevance. This is particularly useful for improving search results, filtering content, and enhancing retrieval-augmented generation (RAG) systems. The API is aligned with the Cohere rerank API specification.

Use Cases

Reranking improves search results (e-commerce product ranking, documentation prioritization, content discovery), enhances RAG systems (two-stage retrieval, context selection, answer quality improvement), and enables content filtering (recommendation systems, content moderation, personalization).

Quick Start

Endpoint

POST https://api.inference.nebul.io/v1/rerank

Alternatively, you can use the direct endpoint:

POST https://api.inference.nebul.io/rerank

Both endpoints behave identically.

Parameters

Parameter	Type	Required	Description
`model`	String	Yes	The model ID to use (e.g., `BAAI/bge-reranker-v2-m3`).
`query`	String	Yes	The search query to rank documents against.
`documents`	Array[String]	Yes	List of documents to rerank.
`top_n`	Integer	No	Number of top results to return. If not specified, returns all documents sorted by relevance score.
`return_documents`	Boolean	No	Whether to return the original documents in the response. Defaults to `false`.
`raw_scores`	Boolean	No	Whether to return raw scores instead of normalized scores. Defaults to `false`.

Code Examples

Python
cURL

Using the requests library:

python
import requests

url = "https://api.inference.nebul.io/v1/rerank"
headers = {
    "Authorization": "Bearer <YOUR_API_KEY>",
    "Content-Type": "application/json",
}

payload = {
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "What is machine learning?",
    "documents": [
        "Machine learning is a subset of artificial intelligence.",
        "The weather today is sunny and warm.",
        "Deep learning uses neural networks with multiple layers.",
        "Python is a popular programming language."
    ],
    "top_n": 3,
    "return_documents": True
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

for result in data["results"]:
    print(f"Score: {result['relevance_score']:.4f}")
    print(f"Document: {result['document']}")
    print(f"Index: {result['index']}\n")

bash
curl -X POST https://api.inference.nebul.io/v1/rerank \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "What is machine learning?",
    "documents": [
      "Machine learning is a subset of artificial intelligence.",
      "The weather today is sunny and warm.",
      "Deep learning uses neural networks with multiple layers.",
      "Python is a popular programming language."
    ],
    "top_n": 3,
    "return_documents": true
  }'

tip

Top N Selection: Use the top_n parameter to limit results to the most relevant documents. This reduces response size and improves performance when you only need the top results. If not specified, all documents are returned sorted by relevance score.

tip

Score Interpretation: Relevance scores are normalized between 0 and 1, with higher scores indicating better relevance. Scores above 0.7 typically indicate strong relevance, while scores below 0.3 suggest weak relevance.

Response Format

The API returns a JSON object containing the reranked results:

json
{
  "object": "rerank",
  "id": "rerank-12345",
  "model": "BAAI/bge-reranker-v2-m3",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.9876,
      "document": "Machine learning is a subset of artificial intelligence."
    },
    {
      "index": 2,
      "relevance_score": 0.8543,
      "document": "Deep learning uses neural networks with multiple layers."
    },
    {
      "index": 3,
      "relevance_score": 0.1234,
      "document": "Python is a popular programming language."
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "total_tokens": 45
  },
  "created": 1731500000
}

Response Fields

Field	Type	Description
`object`	String	Always `"rerank"` for reranking responses.
`id`	String	Unique identifier for the reranking request.
`model`	String	The model ID used for reranking.
`results`	Array	List of reranked documents, sorted by relevance score (highest first).
`results[].index`	Integer	Original index of the document in the input array.
`results[].relevance_score`	Number	Relevance score between the query and document (higher is more relevant).
`results[].document`	String	The document text (only included if `return_documents` is `true`).
`usage`	Object	Token usage information.
`usage.prompt_tokens`	Integer	Number of tokens in the input.
`usage.total_tokens`	Integer	Total tokens processed.
`created`	Integer	Unix timestamp of when the request was created.

Advanced Usage

Limiting Results with top_n

Use top_n to return only the most relevant documents:

Python
cURL

python
payload = {
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "What is artificial intelligence?",
    "documents": [
        "AI is the simulation of human intelligence.",
        "The sky is blue.",
        "Machine learning enables AI systems to learn.",
        "Today is Monday.",
        "Neural networks are used in AI."
    ],
    "top_n": 2,
    "return_documents": True
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

print(f"Top {len(data['results'])} most relevant documents:")
for result in data["results"]:
    print(f"  - {result['document']} (score: {result['relevance_score']:.4f})")

bash
curl -X POST https://api.inference.nebul.io/v1/rerank \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "What is artificial intelligence?",
    "documents": [
      "AI is the simulation of human intelligence.",
      "The sky is blue.",
      "Machine learning enables AI systems to learn.",
      "Today is Monday.",
      "Neural networks are used in AI."
    ],
    "top_n": 2,
    "return_documents": true
  }'

Using Raw Scores

Get raw (unnormalized) scores instead of normalized relevance scores:

Python
cURL

python
payload = {
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "Python programming",
    "documents": [
        "Python is a high-level programming language.",
        "Java is another programming language.",
        "Python supports multiple programming paradigms."
    ],
    "raw_scores": True,
    "return_documents": True
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

for result in data["results"]:
    print(f"Raw score: {result['relevance_score']}")
    print(f"Document: {result['document']}\n")

bash
curl -X POST https://api.inference.nebul.io/v1/rerank \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "Python programming",
    "documents": [
      "Python is a high-level programming language.",
      "Java is another programming language.",
      "Python supports multiple programming paradigms."
    ],
    "raw_scores": true,
    "return_documents": true
  }'

Multilingual Reranking

The model supports multilingual queries and documents:

Python
cURL

python
# Dutch query with mixed language documents
payload = {
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "Wat is kunstmatige intelligentie?",
    "documents": [
        "Artificial intelligence simulates human intelligence.",
        "Kunstmatige intelligentie simuleert menselijke intelligentie.",
        "L'intelligence artificielle simule l'intelligence humaine.",
        "The weather is nice today."
    ],
    "top_n": 2,
    "return_documents": True
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

for result in data["results"]:
    print(f"Score: {result['relevance_score']:.4f}")
    print(f"Document: {result['document']}\n")

bash
curl -X POST https://api.inference.nebul.io/v1/rerank \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "Wat is kunstmatige intelligentie?",
    "documents": [
      "Artificial intelligence simulates human intelligence.",
      "Kunstmatige intelligentie simuleert menselijke intelligentie.",
      "L'\''intelligence artificielle simule l'\''intelligence humaine.",
      "The weather is nice today."
    ],
    "top_n": 2,
    "return_documents": true
  }'

RAG Integration Example

Reranking is commonly used to improve retrieval-augmented generation (RAG) systems:

Python
cURL

python
# Step 1: Retrieve candidate documents (e.g., from a vector database)
candidate_docs = [
    "Machine learning algorithms can be supervised or unsupervised.",
    "The capital of France is Paris.",
    "Neural networks consist of layers of interconnected nodes.",
    "Python has a large ecosystem of libraries for data science.",
    "Deep learning is a subset of machine learning."
]

# Step 2: Rerank documents based on the user query
user_query = "How do neural networks work?"

payload = {
    "model": "BAAI/bge-reranker-v2-m3",
    "query": user_query,
    "documents": candidate_docs,
    "top_n": 2,
    "return_documents": True
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()

# Step 3: Use top-ranked documents for context
top_documents = [result["document"] for result in data["results"]]
context = "\n\n".join(top_documents)

print("Top relevant documents for context:")
for i, doc in enumerate(top_documents, 1):
    print(f"{i}. {doc}")

bash
curl -X POST https://api.inference.nebul.io/v1/rerank \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "How do neural networks work?",
    "documents": [
      "Machine learning algorithms can be supervised or unsupervised.",
      "The capital of France is Paris.",
      "Neural networks consist of layers of interconnected nodes.",
      "Python has a large ecosystem of libraries for data science.",
      "Deep learning is a subset of machine learning."
    ],
    "top_n": 2,
    "return_documents": true
  }'

Model Specifications

The following reranking models are available:

BAAI/bge-reranker-v2-m3 - 568M parameters, 8K context, float16 precision, supports Text, Multilingual (100+ languages) (Preview)

Best Practices

Batch Processing: Rerank multiple queries in separate requests rather than batching queries together.
Document Length: Keep documents reasonably sized. Very long documents may be truncated or split.
Top N Selection: Use top_n to limit results when you only need the most relevant documents, which can improve response time.
Score Interpretation:
- Normalized scores (default): Range typically 0-1, higher is more relevant
- Raw scores: Unnormalized, can vary in range depending on the model
Multilingual Queries: The model handles multilingual queries well, but matching the query language to document languages can improve results.
RAG Workflow: Combine with vector search for best results:
- Use vector search to retrieve a larger candidate set (e.g., 50-100 documents)
- Rerank the candidates to get the top N most relevant (e.g., top 5-10)
- Use the reranked results as context for your LLM

Overview​

Use Cases​

Quick Start​

Endpoint​

Parameters​

Code Examples​

Response Format​

Response Fields​

Advanced Usage​

Limiting Results with top_n​

Using Raw Scores​

Multilingual Reranking​

RAG Integration Example​

Model Specifications​

Best Practices​