SightSpeak AI Blog

Home / AI Blog

Making Search Smarter with FastAPI and HybridVectorizer

Hey community! You know that feeling when you search for something, and the results make zero sense?
Like typing “similar companies to Apple” and getting “Pineapple Corp” instead of Microsoft? That happens because most “AI search” systems only look at one kind of data — usually text or images. But in the real world, your data is messy. It’s got words, numbers, categories — all mixed up together.

That’s the exact problem that HybridVectorizer tries to solve. And when you pair it with FastAPI, you can build a super-smart, real-world search system that actually understands your data. Let’s break it down. 

The Real World Isn’t Just Text

Take a simple table of stock data:

Name Description Industry Market Cap
Apple Tech giant known for iPhones Technology 2.9T
 

Now, if you asked an AI, “Find me companies similar to Apple,”
you’d want it to look at both the description and the numbers, right?

But most vector search tools only focus on text.
So they might say Apple is similar to… Pineapple Juice.

That’s the problem with traditional vector embeddings — they can’t handle mixed data.
They understand words really well, but numbers and categories? Not so much.

Meet HybridVectorizer: The Common-Sense Encoder

HybridVectorizer (created by Hari Narayanan) fixes this by combining all your data types — text, numbers, and categories — into one single vector that actually makes sense.

Here’s how it works:

  • It looks at each type of feature separately — text, numeric, categorical.
  • It turns each into an embedding (a vector of numbers that capture meaning).
  • It combines them into one big vector.
  • You can even adjust the weights, so you decide what matters more.

Want your search to care more about “Market Cap” than “Description”? Easy.
Want “Category” to count less than text? Just change a number.

Quick example:
pip install hybrid-vectorizer

Then in Python:

from hybrid_vectorizer import HybridVectorizer

data = [
    {"name": "Apple", "desc": "Tech giant known for iPhones", "category": "Technology", "market_cap": 2.9},
    {"name": "Microsoft", "desc": "Cloud and software leader", "category": "Technology", "market_cap": 2.7},
]

hv = HybridVectorizer(
    text_features=["desc"],
    categorical_features=["category"],
    numeric_features=["market_cap"],
    weights={"text": 0.5, "categorical": 0.2, "numeric": 0.3}
)

embeddings = hv.fit_transform(data)
print(embeddings)

Now you’ve got “smart vectors” that understand both the meaning and the numbers.

FastAPI: The Friendly API Builder

Okay, you’ve got your embeddings. Now what? You need a way to use them — to let other apps or people send data, get similar results, and run searches.

That’s where FastAPI comes in. FastAPI is a modern Python web framework that’s fast, simple, and developer-friendly.
It automatically gives you an interactive API UI (so you can play with your endpoints in the browser), validates data for you, and supports async operations.

Here’s a tiny FastAPI example:

from fastapi import FastAPI
from hybrid_vectorizer import HybridVectorizer

app = FastAPI()
hv = HybridVectorizer(
    text_features=["desc"],
    categorical_features=["category"],
    numeric_features=["market_cap"]
)

@app.post("/search")
def search(item: dict):
    vector = hv.transform([item])
    return {"vector": vector.tolist()}

Now you can send a POST request with your data — and FastAPI will return the vector you can use for similarity search in a vector database like Qdrant, Milvus, or Pinecone.

At the end of the day, search isn’t about finding text — it’s about finding context.
It’s about helping people discover things that actually make sense together.

And that’s what this stack — HybridVectorizer + FastAPI + a vector database — does so well.

So next time you’re building a recommendation engine, similarity search, or data explorer,
don’t just embed your text — embed your whole story.

Because real data is mixed. And smart search should be too. Thanks for reading! More awesome blogs are on the way with SightSpeak AI, so stay tuned for what’s next!

Published: 2 days ago

By: puja.kumari