Internet Search and the Difficulty of Being Found

Keyword based search works quite well most of the time. Ignore for the moment the contamination of results by those who could afford to buy their way to the top or somehow game the ranking system. In fact I used to take pride in my ability to formulate queries that yielded good results. Factual queries mostly yield moderately relevant results in my personal experience. This is not some objective measure of relevance, but purely my level of satisfaction.

On the other hand, queries that require language understanding or synthesis of results fail miserably. I’m not implying that the search engines have somehow failed. The fact of the matter is that this is a very difficult problem.

For example, consider two queries (and countless variations of those) that I ran recently:

  • “Innovative IoT Startups building Retail Technology in Montreal”
  • “Regulations guidelines for electronics plastics enclosures casings”

The first one was an absolute disaster, with almost no relevant results. The second one fared slightly better, and I managed to get some information around plastics types and grades, fire rating, etc.

It made me think how far we still are from the dream of organizing the world’s information and making it accessible in a meaningful way. I’m sure there are many great minds out there who must have already researched this problem and maybe even come up with some solutions. From my perspective, the reasons seem to be:

  • The relevant content must exist already
  • The relevant content should be found and indexed and tagged in some way
  • The NLP engine that processes the query “understands” the query and the context, matches the previouly stored content to this context, and retrieves the content

As I said, these are very difficult problems.

There is a huge amount of information out there that is not online. It is in people’s minds, books, tapes, pictures, and videos. There are still companies out there that don’t have websites or haven’t shared their content. This is being slowly resolved by the digitization of content.

But my issue is with the fact that existing search engines offer no insight into how the results that are being displayed were retrieved. What is the context that has been used? Show me clusters, or categories, or logical components that went into the search, and then give me a way to refine the context so that more relevant results could be retrieved.

This will not only give users better results, but also help in training and improving the models themselves.

On the other hand, I realized how difficult it still is for companies and brands to be found by people who are looking for them. This is especially a killer for small firms and startups that have limited budgets to get in front of their audience. Optimizing for the long-tail keyword search queries is fine, but how does one semantically link ourselves to a bigger idea? Knowledge lies in the abstraction and association of related concepts. So if a foot-traffic counter or store heat-map generator or smart label producer doesn’t get linked to the concept of innovative retail technology, they will never show up in results that don’t use those specific keywords.

A lot of searches are generic, exploratory queries and get iteratively refined as the user gets more hints and keywords from the initial searches. But if it takes too long, the user will just get frustrated and give up.

In my opinion, Internet search to date has done an excellent job, but now it’s time to move to the next level. Search is ripe for disruption. The first company that manages to crack this code will reap the rewards for decades to come.