“The web is a collection of data, but it’s a mess,” says Will Bryk, co-founder and CEO of Exa. “There’s a Joe Rogan video here, one Atlantic article there. There is no organization. But the dream is that the web is like a database.
Websets is aimed at power users who need to search for things that other search engines aren’t good at finding, such as types of people or businesses. Ask “startups making futuristic hardware” and you’ll get a list of specific companies hundreds of years long rather than random links to web pages that mention those terms. Google can’t do that, Bryk says: “There are many valuable use cases for investors or recruiters.” or really anyone who wants to get any type of dataset from around the web.
Things have moved on quickly since then. MIT Technology Review announced in 2021 that Google researchers were exploring the use of large linguistic models in a new type of search engine. The idea quickly attracted strong criticism. But tech companies haven’t paid attention. Three years later, giants like Google and Microsoft are scrambling with a slew of newcomers like Perplexity and OpenAI, which launched ChatGPT Search in October, to get a piece of this hot new trend.
Exa isn’t trying to outdo any of these companies (yet). Instead, it offers something new. Most other search companies build large language models around existing search engines, using these models to analyze a user’s query and then summarize the results. But the search engines themselves haven’t changed much. Perplexity always directs its queries to Google Search or Bing, for example. Think of today’s AI search engines as a sandwich with fresh bread but stale toppings.
More than keywords
Exa provides users with familiar link lists, but uses the technology behind large language models to reinvent the way search itself is performed. Here’s the basic idea: Google works by crawling the web and creating a large index of keywords that then match user queries. Exa crawls the web and encodes the content of web pages in a format called embedding, which can be processed by large language models.
Embeddings transform words into numbers such that words with similar meanings become numbers with similar values. This is because this allows Exa to capture the meaning of text on web pages, not just keywords.
Large language models use embeddings to predict the next words in a sentence. Exa’s search engine predicts the following link. Type in “startups making futuristic hardware” and the template will suggest (real) links that could follow that phrase.
Exa’s approach comes at a cost, however. Encoding pages rather than indexing keywords is slow and expensive. Exa has coded several billion web pages, Bryk says. That’s tiny next to Google, which has indexed about a trillion. But Bryk doesn’t see a problem with that: “You don’t have to integrate the entire web to be useful,” he says. (Fun fact: “exa” means a 1 followed by 18 0s and “googol” means a 1 followed by 100 0s.)