SightSpeak AI Blog

Home / AI Blog

Tabsearch: Your New Best Friend for Searching Spreadsheets

Hey community! Have you ever struggled to locate similar items in a spreadsheet—such as similar products, businesses, or records—because it's a jumbled combination of words, numbers, and categories? I came across Tabsearch, a really cool, free Python program that makes searching tables like magic. It's like bringing your spreadsheet to life so it can locate things that are similar, not just the exact ones. Let's get into what Tabsearch is, why it's so great, and how you can use it, in simple, friendly language!

What’s Tabsearch All About?

Tabsearch is a Python tool that you can use for free to search within tables (such as Excel or CSV tables) that contain all types of data—product names, prices, or categories such as "Tech" or "Retail." Rather than looking for exact matches, it gets the meaning of your data. For instance, if you want to find something such as "AI software, $100, Tech," it will be able to search for similar things such as "Cloud platform, $200, Tech" because they're connected.

It's constructed by Harihara Prabhu, and you can get it on GitHub (here's the link). It applies clever tricks such as converting words to math (embeddings) and can deal with ginormous tables with a speed increase known as FAISS. Don't worry if that seems technical—it's simple to use!

Why You’ll Love It
  • Works with Any Data: Got text, numbers, or categories? Tabsearch figures it out without you doing extra work.

  • Saves You Time: No need to write tricky code or manually sort through rows.

  • Make It Your Own: You can tell it to focus more on certain columns, like descriptions over prices.

  • Handles Big Tables: From tiny lists to millions of rows, it’s got you covered.

  • Totally Free: It’s open-source, so anyone can use it for school, work, or fun projects.

I gave it a spin, and it’s like having a super-smart assistant for your data. Perfect for finding similar products, companies, or even weird stuff like log files!

How to Get Started (It’s Easier Than You Think!)

Let’s try Tabsearch with a couple of fun examples. You’ll need Python (grab it from python.org if you don’t have it). If coding feels intimidating, you can skip all setup and play with it in your browser using this Google Colab link—it’s like an online playground!

Step 1: Set It Up
  1. Open Your Terminal:

    • On Windows, use Command Prompt; on Mac/Linux, use Terminal.

    • Make a folder for your project: Type mkdir my-tabsearch and cd my-tabsearch.

  2. Create a Safe Space (Optional):

    • Run python -m venv venv to make a virtual environment (keeps things neat).

    • Activate it: Windows: venv\Scripts\activate; Mac/Linux: source venv/bin/activate.

  3. Install Tabsearch:

    • Type pip install tabsearch. Done in a minute!

    • Got a huge dataset? Add pip install faiss-cpu for extra speed.

That’s it! You’re ready to roll.

Step 2: Try a Simple Example

Let’s search a tiny table of products to see how it works. You can do this in a Python file or a Jupyter notebook (install with pip install notebook, then run jupyter notebook to start).

Create a file called try_tabsearch.py or a notebook, and add this code (click the Copy button to grab it):

import pandas as pd
from tabsearch import HybridVectorizer

# Create a tiny table
data = {
    "id": [1, 2, 3],
    "category": ["Tech", "Retail", "Tech"],
    "price": [100, 50, 200],
    "description": ["AI software", "Online store", "Cloud platform"]
}
table = pd.DataFrame(data)

# Set up Tabsearch
search_tool = HybridVectorizer(index_column="id")
search_tool.fit_transform(table)

# Find stuff similar to the first row
query = table.iloc[0].to_dict()
results = search_tool.similarity_search(query, ignore_exact_matches=True)

# See what you got!
print(results)

Run it (type python try_tabsearch.py or hit Shift+Enter in a notebook). You’ll see a list of rows ranked by how similar they are to “AI software, $100, Tech.” For example, “Cloud platform, $200, Tech” might pop up first because it’s also tech-related and has a software vibe. The “similarity” score (0 to 1) tells you how close the match is—higher is better!

Step 3: Play with a Real-World Example

Tabsearch comes with a fun dataset of ~500 S&P 500 companies. Let’s find companies similar to Google based on their sector, price, and business description.

Add this to a new file or notebook (hit Copy to grab the code):

import pandas as pd
from tabsearch import HybridVectorizer
from tabsearch.datasets import load_sp500_demo

# Load the company data
table = load_sp500_demo()
table = table[["Symbol", "Sector", "Industry", "Currentprice", "Marketcap", 
               "Fulltimeemployees", "Longbusinesssummary"]]

# Set up Tabsearch
search_tool = HybridVectorizer(index_column="Symbol")
search_tool.fit_transform(table)

# Find companies like Google
google = table.loc[table["Symbol"] == "GOOGL"].iloc[0].to_dict()
results = search_tool.similarity_search(google, top_n=5, ignore_exact_matches=True)

# Check out the top matches
print(results[["Symbol", "Sector", "Industry", "similarity"]])

Run it, and you might see companies like Microsoft (MSFT) or Amazon (AMZN) because they’re tech giants with similar business descriptions (like cloud or AI). It’s cool to see how Tabsearch mixes numbers (like market cap) and words (like company summaries) to find matches.

Fun Stuff You Can Do
  • Save for Later: Done with a table? Save it with search_tool.save('my_model.pkl') and load it later with HybridVectorizer.load('my_model.pkl') to skip redoing the work.

  • Big Tables? No Sweat: If you’ve got tons of rows, add use_faiss=True when creating HybridVectorizer for lightning-fast searches.

  • Make It Yours: Want to care more about descriptions? Add block_weights={'text': 1.0, 'numerical': 0.5, 'categorical': 0.5} to similarity_search.

  • Peek Under the Hood: Curious how it sees your data? Run print(search_tool.get_encoding_report()) to check how it handles your columns.

Why I Think It’s Awesome

I tried Tabsearch on a product list, and it found similar items in seconds—no more scrolling through spreadsheets! It’s beginner-friendly but powerful enough for serious data nerds. Whether you’re a student, a small business owner, or just love playing with data, Tabsearch makes it fun and easy.

Quick Tips
  • Start Small: Try the simple example first to get the hang of it.

  • Use Colab: If setting up sounds like a hassle, the Colab link lets you jump right in.

  • Fix Problems: If something doesn’t work, check if Tabsearch is installed (pip show tabsearch) or look at the GitHub issues page.

  • Big Data?: For huge tables, FAISS is your friend, and a GPU can make things even faster.

Wrapping Up

Tabsearch is like a superpower for anyone with a spreadsheet. It’s simple, fast, and makes finding similar stuff a breeze. Try the examples above, or load your own data (like a CSV with pd.read_csv('your_file.csv')) to see what it can do. Got questions or want to try it on your own data? Check out the GitHub page or play in Colab. Have fun exploring, and let Tabsearch make your data adventures way easier!

 

Published: September 29, 2025

By: puja.kumari