How to Build a Search Engine Technology

How to Build an Egalitarian, Decentralized Search Engine Part 1: The Principles

Search is dead. So how do we revive it, without billions of dollars in funding and massive computing resources? We leverage the crowd.

We need a functioning search engine if the open web is to prevail. Google and its competitors do not care to make a decent product. It counters their business goals. So we have to build it ourselves.

It’s a terrible idea to vest this much power with one company, even one as fun, user-centered and technologically excellent as Google. It’s too much power for a handful of companies to wield.

The question of what we can and can’t see when we go hunting for answers demands a transparent, participatory solution. There’s no dictator benevolent enough to entrust with the power to determine our political, commercial, social and ideological agenda. This is one for The People.

— Cory Doctorow, Search is too important to leave to one company – even Google

Doctorow wrote this is 2009. His view of Google has since cooled, along with rest of the zeitgeist. Google’s decision to present answers over indexing knowledge has had catastrophic effect on the web, or at least “looking up useful information” part of the web. We need to reverse the course by creating a more egalitarian system. One free of commercial incentives and governmental interventions, as much as that may be possible. We can build a superior product, without it being a product per se.

Note: This is just a hypothetical model of how a service such as this should work. I will mostly avoid any technical details for now.

The Principles

A completely Peer-to-Peer network purporting to have egalitarian goals a can only survive if it adheres to its principles. These principles must be followed not only to the letter and but also in their spirit. Or it will be yet another me-too trying to challenge Google’s supremacy.

The principles are:

  • It must be Open Source.
  • It must be Peer-to-Peer.
  • It must be zealously anticommercial.
  • It must only have client implementation
  • It must be a Search Engine, and only a Search Engine
  • It must be legally neutral.
  • It must be morally neutral.

Why Open Source?

This one is self explanatory. A non-commercial enterprise can’t be scaled without crowdsourcing. Being open-source will not only allow people to contribute, but also audit the code. It also introduces a layer of transparency that is missing from commercial competitors. Any changes we make could be analyzed by any user.

Why Peer-to-peer?

When you search something on Google, to this day they mention the number of results and how long it took. This is a flex. They are showing the might of their algorithm and hardware. This is truly monumental. How can truly decentralized search engine compete with that? By spreading the workload across anyone willing to provide computing power. Since there is a lot of computing power sitting idle in people’s pockets and desktops and even TVs, this can be easily leveraged to power the algorithms.

The Disadvantages of Commercialism

Commercial incentives do not care about the product. Google destroyed the credibility of their search results to make their ads more credible. At least that’s what it seems like. It is in their commercial interests to make their product worse. Our goal is different so our incentives need to be different. If we commercialize our product, we would fall into the same pitfalls that the current market solutions stumbled into.

We also have to go beyond that. The open source license needs to be so restrictive that nobody can make a commercial product through using our technology. We do not need to end up like Elasticsearch and provide free labor and technology to one of the biggest companies on earth. A system that props up a new Google would defeat the purpose. We need to be better.

About Clients

Do you use PeerTube? Mastodon? If you want to turn a friend onto these technologies, where will you direct them? Decentralized social media has shown that client fragmentation is just a hindrance to adoption. The client is how the user will access this service. It is your brand. Having just one client would allow us to manage the brand. It also avoids spending resources on enforcement of the license. The client and the algorithm must be the same offering.

Indexing the Truth

An algorithm that takes your query and points you the most relevant source is not the same as an algorithm that provides you with an answer. The former leaves the final decision on the user, while the latter makes the decision for them. Instead of allowing them to explore, it forces the answer on them. The search engine needs to be just that. It should not be more, or less.

Legal Neutrality

It should not be private citizens job to enforce the law. And when it comes to a distribute technology, what law should be enforce? The algorithm needs to not care what sort of result is illegal in any given jurisdiction. There is one genuine argument against this approach; most illegal information, even in the most restrictive States, are illegal for a reason and should be blocked.

Well there is no reason any Government can’t use our search to target the source of the illegal content. There are many resources spent in the name of Law Enforcement. States can use those resources. To do its job. While many governments do not share the egalitarian ideals of this project, we do not share their governmental and geopolitical interests.

Moral Neutrality

Moral Relativism is a downward spiral. On the other hand, Moral Absolutism is the cause of most mass-violence that happens on this planet. We should avoid that at all cost.

We have the same conundrum as in the legal issue; what is our moral compass? Do we use the Abrahamic ethos? Buddhist? What about secular ethics, if such a concept can exist? Should we block all mentions of food recipes that include meat because animal consumption is immoral? What about alcohol consumption? I personally find alcohol consumption to be one of the most major drivers of misery in the world. Should I block that?

Of course, not all ideas are so morally ambiguous. There is some vile shit on the internet. Depictions of people being harmed, invocations of violence, images of brutal violence etc. These things are very difficult to justify. So while the system is morally neutral, the users do not need to be. But instead of having a simple SafeSearch opt-in, we will have to offer a more complex opt-in mechanism on the client end.

This is colored a lot by my biases about human nature. This would be another post altogether. But suffice to say it wouldn’t take long for the really vile trash to face collective retribution.

This is going to be monumentally difficult task. Maybe even impossible. Doesn’t mean we should not try. Of course nothing is perfect. In part 2, I will discuss the downsides of this approach.

PS: I do not have analytics on this site. So leave a comment, even a nasty one, here or on your preferred social site, or shoot an email to feedback at the domain of this site

2 replies on “How to Build an Egalitarian, Decentralized Search Engine Part 1: The Principles”

Thank you for pointing it out. But there is a reason I did not mention the blockchain. The blockchain introduces its own set of incentives that I would want to avoid.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.