Skip to main content
The Search backend gives the LLM two meta-tools: search_tools(query) to find relevant tools by description, and call_tool(tool_name, arguments) to invoke them. The LLM never sees your full tool list.

Setup

from concierge import Concierge, Config, ProviderType

app = Concierge(
    "my-server",
    config=Config(provider_type=ProviderType.SEARCH),
)
Requires sentence-transformers. Install separately: pip install sentence-transformers

How It Works

The key difference: instead of seeing all 100+ tools upfront, the LLM searches for what it needs. This is like a developer searching an API reference instead of reading the entire docs.

What the LLM Sees

Only two tools ever appear in the tool list:
[
  {
    "name": "search_tools",
    "description": "Search available tools by description.",
    "inputSchema": {
      "properties": {
        "query": {"type": "string", "description": "What you're looking for"}
      }
    }
  },
  {
    "name": "call_tool",
    "description": "Call a tool by name with arguments.",
    "inputSchema": {
      "properties": {
        "tool_name": {"type": "string"},
        "arguments": {"type": "object"}
      }
    }
  }
]
Constant context cost regardless of how many tools you register:2 tool definitions instead of 200.

Configuration

OptionDefaultDescription
max_results5Number of search results returned per query
modelBAAI/bge-large-en-v1.5SentenceTransformer model for embeddings
from sentence_transformers import SentenceTransformer

app = Concierge(
    "my-server",
    config=Config(
        provider_type=ProviderType.SEARCH,
        max_results=10,
        model=SentenceTransformer("all-MiniLM-L6-v2"),
    ),
)

When to Use

Use Search when you have a large API (100+ tools) where the LLM only needs a few tools per conversation.
Good fit:
  • Large APIs with 100+ tools
  • Tools with clear, descriptive names and docstrings
  • Exploration-heavy use cases (“what can this server do?”)
Bad fit:
  • Small APIs (Plain is simpler)
  • Strict ordering requirements (use stages)
  • Latency-sensitive apps (embedding adds ~50ms per search)