Since my last article, many other alternatives have cropped up, bringing some very interesting features and concepts, but it still remains to be seen if they offer acceptable results in the fundamentally important area of relevant search results. This comparison sets out to analyze and compare the current batch of alternatives in 2020.
This is a nice detailed review of the search engines named above. Each undergoes the same tests and the results are analysed.
What this excellent article shows, is we need more search engines that have their own crawlers and indexes. Most of the search engines tested rely on Bing for their core organic results which usually means: when Bing fails they all fail. A near global duopoly of just Google and Bing is just ridiculous considering the size of the Internet audience in English alone.
There is a whole raft of new search engine startups plus many established players that all share one thing: their primary search source is Bing. Which leads me to the question of: how many Bing powered search engines can the market absorb?
Now don’t get me wrong, I’m not criticizing search engine projects for using Bing nor am I criticizing Bing for selling access to it’s search feed. Many of these search engines have been able to differentiate themselves from the others and from Bing in innovative ways and what appeals to one person might repel another so having a large choice of Bing powered engines is not bad. But in the end they are all retreads of Bing all dressed in different clothes.
Bing Powered Search Engines
DuckDuckGo (despite the many sources DDG uses, the organic search results you see are from Bing.)
The good thing, is that all of the Bing powered search engines together, help to erode Google’s near monopoly of search. I like to think of it as Bing’s, guerrilla war on Google, fought with surrogates. “Death by a thousand cuts.” And, all these Bing powered engines provide users with a variety of UI’s and features that they may prefer over just being stuck using Bing itself. Good so far as it goes.
The bad thing is, we the users, are still stuck with a duopoly. At the end of the day the entire global English language searchverse is still stuck with only two major search engines: Google and Bing. That’s it, you only get two opinions for finding things on the web – no third, forth or fifth opinions. No matter how you dress it up it’s still a duoculture. There is nobody else to turn to.
The second list of meta-search engines are stuck in the same boat, they are stuck in the duopoly too, being forced to rely on Google and Bing (plus Bing powered engines.) Some are able to bring in smaller, independent engines that have their own crawler and indexes like Mojeek, and Yandex but those are not deep enough.
Do They Realize It’s Bing?
Based on the comments I read across the web and on social networks, I’m pretty sure the average Joe or Jane user is unaware that the search results on their alternative search engine of choice are derived from Bing. In fact, I’ve seen many users swear that, say DuckDuckGo results are so much better, deeper more relevant than Bing. The problem is, that sooner or later the public is going to catch on. And we can’t just keep adding more Bing retreads to the mix.
The DuckDuckGo Question
Nobody has woven together a variety of different search sources around Bing with more skill than DuckDuckGo. Yet the backbone of DDG’s results remains Bing, to the extent that they are totally dependent on Bing to function as a search engine. As more Bing clones launch their task of making themselves different becomes harder. Yet, I do not see any evidence of DDG trying to crawl and create their own index. To me, that would be a long term strategy. DDG has a crawler but they seem to use it for their Answers feature. Other search engines like SwissCows and Qwant are building their own indexes in other languages so we know it can be done. But only Mojeek seems to be trying in English. Again we are up against the barrier of the duopoly which suppresses all other attempts at competition.
How long can we keep cloning Bing for variety in search? Ultimately the market, and regulation from the EU and/or US will decide. What the Bing clones do show is that demand is out there for something different – something not Google. But how long will we be happy with just cloning Bing?
This new and improved guide aims to be the most in-depth resource available on private search engines. We’ll examine the best private search engines for 2020, how to keep your data safe when searching, and also some search engines to avoid.
This is in-depth, more so than all the other like guides I’ve read. This is not the usual list of search engine names either. And for 2020 there are some important changes to the usual list of private search engines that you should be aware of. So I highly recommend you read it.
Over the past year I’ve been impressed with how much the Indieweb.org Wiki has improved. Heck, it was good when I first saw it but members are very active and keep editing, improving, adding on and tweaking, relentlessly so it just gets better.
Somebody in one of the Indieweb chat channels, recently suggested the Indieweb wiki would make a good seed site (aka starter crawl) if one were starting to build a new search engine index. This is something I’ve been thinking about, off and on, for a couple of weeks and I have to say I agree: the Indieweb.org wiki would make a good seed site for a web search engine.
What’s a “seed site”?
Briefly, a seed site, or starter crawl, is a site (or one site of several) that a search engine crawler would index to find a wide variety of worthwhile pages to index. The crawler would index those URL’s found and then index more pages on those sites and in the process discovering URL’s that they lnk to and on and on. In the old days the Yahoo directory and Dmoz directory were considered prime seed sites for search engines. Later Wikipedia came along and is still considered an important seed site for outbound links. These sites were considered prime starters, in part because the outbound links had all been reviewed by human editors so a certain level of quality could be presumed.
If you want to learn more about seed sites I suggest reading Bill Slawski’s article: Seed Sites for Search Engine Web Crawls, which is worth reading if you are interested in the topic. The comments are worth reading too.
Why Use the Indieweb.org Wiki?
First, there is a wealth of good information on the wiki pages alone even without crawling the outbound links. Second, crawling the outbound links.
Now I would not use the Indieweb wiki as my only seed unless I were, somehow, creating an Indie Web only search engine but I think it would be good as part of a mix of other seed sources. The Web has changed in the last 20 years, commercial sites have taken over and the wiki’s outbound links lend a certain, needed counter balance to the commercial.
The outbound URL’s are human curated. This minimizes low quality content.
Links to some quality content that might take awhile to find by other means. Namely, a lot of quality blogs which also link out freely.
Wiki pages act as tags. This isn’t quite as useful for a search engine as a full directory taxonomy of categories, it is useful.
The wiki is not outrageously huge. Make no mistake, it’s big and growing bigger, but it’s dwarfed by something like Wikipedia and easier to digest within bandwidth limits of a startup.
It’s constantly being updated. This makes it a good source for re-crawls because new links are constantly being added.
Other Good Seed Sites:
Curlie.org – is the successor to the old Dmoz (Open Directory Project). Volunteer editors have been working on cleaning out dead links for a couple of years and possibly adding new listings so it’s not quite as dead as one might think. For somebody starting a web search engine, it’s hard to ignore 3 million or more listings. Said listings may be older sites but I’d gamble that the quality is better than links from Twitter or Facebook would be. Plus that taxonomy. I would not spend time re-crawling for new URL’s after the initial starter crawl.
Wikipedia – they don’t quite link out as freely as they once did but this is much more up to date.
Reddit – or at least large parts of Reddit. It’s big and diverse. Sub-Reddits act as tags. Constantly expanding with new links. Helps you determine what is new and popular. This is a good place to start a crawl and to re-crawl for new links. Reddit was suggested to me by some very experienced SEO’s when we were discussing this topic. I trust their judgment.
Indie Map – Maybe. I’d include it as a starter but would skip re-crawling.
Hacker News – Maybe for a seed crawl. I would try to tap HN including the comments for new fresh links.
Pinboard – constantly updating bookmarks. This would make a good seed site.
Agree or disagree on Indieweb.org as a seed site? Can you think of other seed sites I’ve missed? If so, leave a comment below. Thanks.
Search is going to change in March 2020 for Android users in the EU. In March when you buy and setup a new Android phone, EU users will be presented with 4 choices for setting the default search engine on that phone. Google will be one choice and Google has agreed to auction off the other 3 slots (so they can cash in). (I don’t know if this effects older Android phones that upgrade to a newer version of Android.)
All this is important, because a certain percentage of people will choose a search engine other than Google. Also, once chosen, people rarely change the default search engine on their devices. This presents a huge chance for the alternative search engines to gain some recognition and market share within the EU. This is a big deal.
DuckDuckGo won a spot in every country. This is good, but can they keep users, because I’ve heard their search results can be weak in some non-English searches? This is where DDG’s sole reliance on Bing for the bulk of their results might be a liability. Can Bing and therefore DDG provide satisfactory results in all European languages?
Info.com (the old Infospace.com) won in every country too. Not a real good choice. Kind of a waste of a slot and it shows the weakness of the auction model.
Qwant won in most major EU countries. This is good. Qwant uses Bing for English language searches, but they have their own crawler and index for French, German, Italian and Spanish. I hear their results in French are quite good so Qwant stands a chance of gaining users here.
PrivacyWall who are they and where did they come from? I think they have their own index, which appears small. They better crawl like crazy between now and March.
GMX is just a Google retread.
Regional search engines: Yandex (Russia) and Seznam (Czech Republic and Slovakia) are already dominant in their home languages so I expect they will pick up even more market share in this.
Of course Google is trying to subvert the intent of the EU regulators by making this an auction to the highest bidders. It’s legal, but it proves the point that Android is open source in name only, a fiction, whereas it’s really totally under Google’s control. Placement only for the highest bidders robs startups of badly needed operating and R&D funds and cripples charity based search engines from engaging in their charitable work.
Money should not be the only deciding factor. Still this is a rear guard action on Google’s part. The walls of Google’s search monopoly with Android have been breached and will this allow newer EU based search engines to come along?
My prediction is that both DuckDuckGo and Qwant will win some additional market share in Europe with this. Both of those search engines have enough comprehensive features like Maps, Wikipedia etc to compete in the mobile market. I think they can retain users who try them. I don’t see that happening with Info.com, PrivacyWall or GMX, but maybe they will rise to the occasion, add features, and meet users long term expectations.
It appears new auctions will occur ever 3 to 4 months, it will be interesting to see how the lineups shift over time.
Of course none of this is available in the US, or most of the rest of the world. In the US, Google retains it hold over Android and at the pace US trust regulators are working I don’t expect to see any significant opening up for a long time.
Get your popcorn out, this is going to be interesting.
Technorati ordered blogs by recency and relied on delicious and flickr for better tag results. Even if I did rebuild it you probably wouldn’t like it.
I miss Technorati and other RSS search engines. However, Kevin brings up a good point: Technorati was developed at a time:
Before Twitter and Facebook which post breaking news regularly in their timelines;
Mainstream media was still publishing to the Web on a print-like timetable, mostly once or twice a day;
Search engines like Google still took some time to find, process and rank new web pages (posts).
So Technorati’s focus on “recency” was appropriate for the time, but not really needed now. (Although I’d still like to see a good RSS search engine today anyway.)
What we need is a search engine that would deep spider only non-commercial, independent blogs or filter out all the commercial crap and spam, so that you are only getting results from bloggers. (Yes the big spidering search engines sort of do this, but you are unlikely to find individual blog posts on the first 5 pages of results for any popular keyword search.)
As I see it, what we need is depth in such an index with relevancy rather than recency as a ranking factor, although recency could be a secondary factor. Plus you need to filter out the spam blogs. This implies an algorithm of some sort.