Skip to content

Search Endpoints

GET /api/v1/search

Search for files by keyword or hybrid BM25+vector semantic relevance.

Query Parameters:

Parameter Type Required Description
q string Yes Search query
type string No Filter by file extension (e.g. pdf, .txt)
limit integer No Maximum results to return
offset integer No Pagination offset
path string No Restrict search to a specific directory
semantic boolean No Use hybrid BM25+vector search (default: false)

Keyword search (semantic=false, default): ranks results by filename and path relevance using scoring tiers (exact match → stem contains → extension match → path contains).

Semantic search (semantic=true): indexes file content using BM25 and TF-IDF vector representations, then fuses both rankings with Reciprocal Rank Fusion (RRF, k=60) to return content-relevant results. The score field contains the RRF score.

Response: 200 OK — array of search result objects.

[
  {
    "filename": "quarterly_budget.txt",
    "path": "/home/user/docs/quarterly_budget.txt",
    "score": 0.030769,
    "type": "txt",
    "size": 4096,
    "created": "2026-01-15T10:30:00Z"
  }
]

Response Schema:

Field Type Description
filename string Base name of the file
path string Absolute path to the file
score number Relevance score (keyword: 0.0-1.0 tier-based; semantic: 0.0-1.0 RRF score)
type string File extension without leading dot
size integer File size in bytes
created string ISO 8601 timestamp of file creation

Semantic Search Setup

Semantic search (semantic=true) requires optional dependencies to be installed:

Installation:

pip install 'file-organizer[search]'

This installs: - rank-bm25>=0.2.0 - BM25 keyword ranking algorithm - scikit-learn>=1.4.0 - TF-IDF vector embeddings

Dependencies Not Installed:

If semantic search is requested (semantic=true) but dependencies are not installed, the API returns: - Status: 503 Service Unavailable - Response body:

{
  "detail": "Semantic search is not available: search dependencies not installed. Install with: pip install 'file-organizer[search]'"
}

Fallback Behavior: Use keyword search (semantic=false, the default) if semantic dependencies are not available.

Error responses: - 400 Bad Requestq parameter missing or empty - 422 Unprocessable Entity — Invalid parameter values (e.g., negative limit or offset) - 500 Internal Server Error — Search index unavailable or query processing failed

Examples:

# Basic keyword search
curl "http://localhost:8000/api/v1/search?q=report&limit=10"

# Semantic search with type filter
curl "http://localhost:8000/api/v1/search?q=quarterly+budget+forecast&semantic=true&type=txt"

# Paginated results
curl "http://localhost:8000/api/v1/search?q=invoice&limit=20&offset=40"

# Path-restricted search
curl "http://localhost:8000/api/v1/search?q=meeting&path=/home/user/documents/2026"

# Combined filters: type, path, and semantic
curl "http://localhost:8000/api/v1/search?q=contract&type=pdf&path=/home/user/legal&semantic=true&limit=5"

# Search with URL-encoded spaces and special characters
curl "http://localhost:8000/api/v1/search?q=Q1%202026%20sales&type=xlsx"

# Multiple file types (using OR logic on client side, separate requests)
curl "http://localhost:8000/api/v1/search?q=presentation&type=pptx"
curl "http://localhost:8000/api/v1/search?q=presentation&type=pdf"

# Pretty-printed JSON response with jq
curl "http://localhost:8000/api/v1/search?q=budget&semantic=true" | jq '.'

Python Client Usage:

import requests

# Basic search
response = requests.get(
    "http://localhost:8000/api/v1/search",
    params={"q": "report", "limit": 10}
)
results = response.json()
for file in results:
    print(f"{file['filename']} - Score: {file['score']}")

# Semantic search with filters
response = requests.get(
    "http://localhost:8000/api/v1/search",
    params={
        "q": "quarterly budget forecast",
        "semantic": True,
        "type": "txt",
        "path": "/home/user/financial"
    }
)
results = response.json()

# Paginated search with error handling
def search_all(query, page_size=50):
    """Fetch all results for a query using pagination."""
    all_results = []
    offset = 0

    while True:
        try:
            response = requests.get(
                "http://localhost:8000/api/v1/search",
                params={"q": query, "limit": page_size, "offset": offset},
                timeout=10
            )
            response.raise_for_status()

            batch = response.json()
            if not batch:
                break

            all_results.extend(batch)
            offset += page_size

            # Stop if we got fewer results than requested (last page)
            if len(batch) < page_size:
                break

        except requests.exceptions.HTTPError as e:
            if response.status_code == 400:
                print(f"Bad request: {response.json()}")
            elif response.status_code == 422:
                print(f"Invalid parameters: {response.json()}")
            else:
                print(f"HTTP error: {e}")
            break
        except requests.exceptions.RequestException as e:
            print(f"Request failed: {e}")
            break

    return all_results

# Usage
all_invoices = search_all("invoice")
print(f"Found {len(all_invoices)} invoice files")

# Semantic search with result filtering
response = requests.get(
    "http://localhost:8000/api/v1/search",
    params={"q": "machine learning research", "semantic": True, "type": "pdf"}
)

if response.status_code == 200:
    results = response.json()
    # Filter by score threshold on client side
    high_relevance = [r for r in results if r['score'] > 0.02]
    print(f"High relevance results: {len(high_relevance)}")
else:
    print(f"Error {response.status_code}: {response.text}")

Async Usage with httpx:

import httpx
import asyncio

async def search_semantic_async(query: str, file_type: str | None = None):
    """Async semantic search with optional type filter."""
    async with httpx.AsyncClient() as client:
        response = await client.get(
            "http://localhost:8000/api/v1/search",
            params={"q": query, "semantic": True, "type": file_type},
            timeout=10.0
        )
        response.raise_for_status()
        return response.json()

# Usage
async def main():
    # Single async search
    results = await search_semantic_async("machine learning", file_type="pdf")
    for file in results[:5]:
        print(f"{file['filename']}: {file['score']:.4f}")

    # Concurrent searches with asyncio.gather
    queries = ["budget", "report", "invoice"]
    results_list = await asyncio.gather(
        *[search_semantic_async(q) for q in queries]
    )
    print(f"Total results across {len(queries)} queries: {sum(len(r) for r in results_list)}")

asyncio.run(main())

See API Reference for authentication and other endpoints.