Marginalia Search is a new web search engine worth bookmarking, using and paying attention to. It has it’s own crawler, algo(s) and index.
What makes Marginalia Search different is it prefers text-heavy websites and penalizes sites made with modern web design. This means it filters out commercial websites because those are the ones being churned out for commercial sales or as advertisement farms while it favors text-heavy weblogs and HTML sites.
Marginalia is about indexing the content web and ignoring the commercial web as much as possible so it’s mission is very similar to our own here at Indieseek.xyz. Except as a crawling search engine, Marginalia uses much better technology and can scale better. This is a Good Thing, because one of the biggest complaints is that the Web has become boring and over commercialized. Marginalia Search is trying to address this.
So far, I’m really liking what I see here.
I have added Marginalia Search to the directory in the following categories:
I am highlighting two search engines that are indexing the only Indie Web and/or Indieweb pages (please see below for my definitions of these terms.) Each search engine is different from each other so I’m going to highlight both their similarities and differences.
Both engines allow submissions of URL’s for inclusion.
Both engines are human reviewed at submission. This is a good thing, because a human editor can fiter out spam and websites that just don’t belong better than any algorithm.
Both will recrawl at unknown intervals for changes and to detect dead URL’s.
Both crawl for onpage text and maybe meta tags. I’m guessing here but this seems likely. Therefore, they may not do so well indexing pages that are mainly, photos, artwork or images with very little text.
Both discover new pages/websites only by submissions and by URL’s added to the index that the human editor might add manually. So neither has a general crawler that just keeps following hyperlinks like a general search engine crawler.
Both have a nice “Random” result link which is a handy and fun starting point to surf the web.
Wiby.me – Wiby is designed to index Indie Web (aka Independent Web) pages, like the old non-commercial HTML static pages of the 1990’s and early 2000’s. It does not spider a whole website. It only crawls the unique URL submitted. So only one page. Because of that it is not really designed to crawl non-commercial blogs. Wiby’s suggestion for blog webmasters is to submit one or two really good posts if you want to be in the index. Because it actually crawls onpage text this makes Wiby more sophisticated than a web directory like Indieseek.xyz and gives it a better search function.
Searchmysite.net – Searchmysite is designed to crawl deeper (maybe 50 pages or permalinks) within a single domain. (I don’t know how it treats subdomains.) This makes it better at indexing Indieweb blogs and indeed, as I write this, most of Searchmysite’s index appears to be Indieweb blogs. I cannot speak for the editor but I don’t see why Searchmysite would not also accept and crawl static HTML websites (ie. Retro or vintage HTML sites) so long as the site has some value and content that it can index, but they might not. Again, this is going to be mainly text. If you are looking for non-commercial blogs Searchmysite is the better option to start with.
Future of Indie Search
I’m highlighting these two niche search engines because in their own ways, they represent the future of Indie Web search. I’d like to see many more of them. Directories, like Indieseek.xyz, have their place, but being able to collect and search by, actual on-page content goes far beyond what a traditional directory is capable of doing. These guys could put me out of business and I’d be just fine with that!
Microsoft wants to build into the core of WordPress the ability to automatically push new and updated URLs to Bing and other search engines.
Since so much of the web is built on WordPress, this could save current and future smaller engines a lot of crawling time. I think this could be good for the Indieweb and Independent Web of non-commercial bloggers.
I like that this isn’t just for Bing but other search engines could use it too. It ight help even the crawler playing field against Google.
However, beware, I can see this quickly being abused by SEO’s so I guess each search engine would have to develop some way to filter out the dreck from the real content from this firehose.
System1 is an ad tech company that is getting into the Internet privacy business. They are best known for buying the privacy meta-search engine Startpage.
But wait there is more.
Last year we acquired the Waterfox browser, which is known for, among other things, being privacy friendly. We are also working on a private mapping solution in our MapQuest business. We believe a combined offering, which could include VPN and other privacy-related services, would be a very interesting privacy bundle for our users. Stay tuned!
So they own Startpage, Waterfox browser, and good old MapQuest, a pioneer in online mapping. Yeah they have the beginning building blocks for a suite there if they can put them all together.
DuckDuckGo is more than just a search engine, they are a privacy company. Brave browser is now expanding into privacy search. Now System1 has shown that they have bigger plans than just a search engine. Interesting.
Privacy – Mojeek was a privacy respecting search engine way before privacy became cool.
Building their own index – this is really important because there are only three other large indexes of the Web in English: Google, Bing and Yandex. Mojeek comes in at #4.
The combination of privacy respecting and having it’s own index are what makes Mojeek a potential, long term, big league player in the search engine business.
What has Changed Since 2018?
At first glance not much has changed since my original review above. Mojeek still has the same ultra-clean look it had before. The big changes come in the search results because Mojeek has been busy adding a couple of hundred servers: increasing their index size by crawling, and refreshing old pages in the index more often.
I’m seeing a fresh crawled date on at least one listing for every search I do.
I’m seeing that many other listings have been recrawled in the last few weeks so the index is fresher than ever.
There are definitely more pages in the index.
It’s Not Perfect
Mojeek has it’s spells of weirdness. Note these 2 searches.
Search for “marlin firearms” brings up both the main website for Marlin in the organic results plus the Wikipedia entry in the sidebar.
Now look at a search for “savage firearms” and you see the Savage website in the organic results but this time no Wikipedia entry! The problem lies with Wikipedia which, for some reason, lists Savage as “savage fire arms” (“fire arms” as two words) and Mojeek can’t bridge that gap.
The thing is when I first ran these searches I was looking for the Wikipedia entry, not the organic results, to get some history on these companies. That’s why I noticed it. Not a big deal just a little weird. However, do note that the organic results are pretty good, for both searches, and not exactly like the organic results Google or Bing would give you. That’s a very good thing.
There will also be times when Mojeek just didn’t have the website I was looking for but it did return a cluster of websites that mention the name of the website I wanted. I am not an expert but I take that as a sign that Mojeek’s crawler just has not indexed that site yet. I used to see this a lot back in the early search engine era (when there were many search engines of varying sizes and even on Google when it was very young) but younger web citizens who grew up with only all seeing Google might not have encountered this.
How I Use Mojeek
Mojeek is my default on my laptop, where 98% of my writing gets done.
I use it for most searches first. If it does not give me what I need then I resort to another privacy engine as a backup. Yes I often have to use the backup but Mojeek comes up with some good pages I would never get on G or B.
I use it go find Wikipedia articles.
Spell Checking: Okay this is embarrassing, I’m a terrible speller. Mojeek has a great spell checker and I have one social network place I post at where my browser spell checker does not work, so I often use Mojeek to check the spelling of a word. I have no idea if Mojeek’s spell checker is any better than any other but it works well and maybe the uncluttered results page makes it my preferred – crutch.
I think I could get by with just Mojeek if I could only use one engine for some reason. But that really is not the point. Google and Bing are giving you their opinion of what the best pages are for a given search. It’s just an opinion. We should all be using several search engines for our research to get a broader view – more opinions. Out of all the trillions of pages on the web it is ludicrous to think that the first 10, 20 or 30 results on Google are everything that needs to be said on a topic. Every single search engine has a bias built into the algorithm in some way. Every. Single. One. Which is exactly why you should use several search engines with different indexes.
Mojeek is growing into it’s role as a general web search engine. Attempting to index the ever growing Web is a noble endeavor. Their index has gotten much larger, fresher and their algo gives good results. As they grow, they will continue to get even better.
Back when I started on the Web in the 1990’s there were many major search engines, each showing just a part of the Web. I commonly used a suite of my favorite search engines and directories just to find a few websites or pages. Today my suite of privacy search engines consists of Mojeek, DuckDuckGo and once in a great while Startpage. I think you should add Mojeek to your searching suite.
Petal Search – New commercial search engine from Huawei. I think their index is fairly large. I don’t know if they are using another search engine (maybe Yandex?) for backfill. Assume it is not private.
Plumb One – Index is small, growing. Plumb uses Bing to provide search results when their own index does not have enough listings.
Search My Site – Open source, lists personal and independent websites only. This is a curated index. Only crawls sites that are submitted and only after human review. I like this one. The front page suggests searching for hobbies or interests and my searches of “role playing” and “indieweb” brought up lots of interesting results.
This is a great retro 1990’s page chock full of resources and tips on how to surf the web in 2021. There are links to new tools and starting points for web surfing. This looks like a lot of fun and I intend to use this as my starting point.
Back in the 1990’s we surfed the web, because search engines pretty much sucked. Yes it took time to surf the web but it was endlessly entertaining back in those days before everything became a commercial for some product. A lot of the tools we used to use to surf are long gone, but this guide lists replacements. Good stuff.
In addition to that page, the Webmaster at Sadgrl.online has built a remarkable website and I encourage you to explore the whole site. It’s not just nostalgia, she has created webrings, search engines, discussion groups and listed extensive guides and resources. Well worth browsing.
If you are not using Google, chances are the search engine you are using is probably powered by Microsoft’s Bing search engine. This is because setting up a search engine that actively spiders the web to create it’s own index is expensive and a lot of work. But Microsoft is willing to license out Bing search for a fee to just about anybody large and small. This means that a lot of search engines from Yahoo to AOL use Bing as their primary web search source.
Don’t get me wrong, Bing is a good search engine but when you get to 15 or more Bing clones you start getting cross eyed trying to figure out what makes one different from both the others and Bing.
So this is my list of just a couple of the best search engines that use Bing in their primary index and what makes them special.
DuckDuckGo – Bing and Yandex. Privacy, DDG does not track you nor even keep a record of your visit. DDG is a meta-search engine that mainly uses Bing but also brings in answers from hundreds of other sources depending on the query. For instance they use Apple maps for local searches and Wikipedia where relevant. For most people, DDG is a good default search engine that will work great as a daily driver without spying on you. DDG is one of my two default engines.
MetaGer – Mainly Bing in English. Privacy. MetaGer is a metasearch engine owned by a German non-profit and they don’t track you or spy on you. What is cool about MetaGer is they take privacy one step further, under each listing in the search results you will see a link to “OPEN ANONYMOUSLY” if you click on that MetaGer will try to open the site using an anonymous proxy! This really adds a lot of value to the search results. Journalists, those looking for sensitive information, people in authoritarian countries should make a note of this. They also integrate Wikipedia into the search results in a minimalist way.
SwissCows – Bing in English. Privacy, based in Switzerland and protected by strict Swiss privacy laws, SC does not track you. SwissCows bills itself as family friendly and therefore Safe Search cannot be set lower than the Moderate setting. The combination of both privacy and family friendly search results makes SwissCows the perfect choice for a family or a kid’s computer or tablet or anything a child might use.
Ecosia – Bing. Charity donation, in this instance planting trees all over the world every time you search. Because of climate change fears, Ecosia has become a popular search engine. It gives you the whole feature rich Bing search experience but links it to a good cause. Privacy on Ecosia is sort of mid-range: better than if you were using Google, but Bing is still getting some of your information.
Givero – Bing. Charity donations. Like Ecosia (above) searches done on Givero spur donations to charitable organizations, only you get to choose amongst seven different charities of an ecological, tech, child rescue or animal rescue missions. Again, privacy is mid-range like Ecosia (above) but the charitable causes are good.
The list is for everyday users. The best thing to do is figure out what is important to you (privacy or good causes) and give one of the likely search engines a try. Not being tracked or spied upon is a good feeling. Likewise, having your search activity contribute to good causes is a good feeling too.