Curlie.org the successor to the old defunct Dmoz Directory (aka Open Directory Project or ODP) has come out of hibernation and is ready to be used by you and me. They are even accepting URL submissions and editor applications. It looks like a lot of the old dead wood listings have been cleared out and new listings added in the categories. And the whole site is usable and tidy. With over 3 million sites listed Curlie is the largest human edited directory of the Web.
See, Dmoz/ODP was created as a legitimate aid for the navigation of the Web, back before search engines got good. It never sold listings for the sake of link popularity with the search engines so it is of much higher quality than the thousands of later directories that were built with the express purpose of charging a fee for links in order to get better search engine rankings. This makes Dmoz/ODP and now Curlie stand above the rest.
They Were Quiet
When AOL decided to shut down Dmoz a group of editors decided to use one of the last open source Dmoz data bundles and try to carry on with the Dmoz mission as a human edited directory. Thus Curlie was born in 2017 as a successor to Dmoz. Curlie was available but seemed dormant, you could not submit sites and there were a lot of dead listings. But the editors were busy, getting rid of dead links and adding new ones over a few years. This had to be a very big effort.
Now they are once again open and ready to be searched and browsed. Browsing Curlie is fun, I suggest you explore.
I’ve said this before, big directories cannot compete with search engines. But human edited directories can discern quality, something that search engines still cannot do. So there is still a place for them and I’m very glad that my favorite of all the really big directories has survived.
That makes it especially tragic to report that nearly all the traffic to the site is now from SEO spam bots, presumably searching for all that elusive SEO spam-free content.
This is why we can’t have nice things. Everything on the internet turns into a spam or ad infested flow.
I wish I could help this person but I don’t have the technical skills to help.
Getting human traffic: Based on my experience with the directory, it takes a long time to get any traction with traffic from humans. 1. Having the blog helped and having articles that were more than site news helped. 2. For the directory traffic comes from a. webrings b. direct links from other websites c. social networks d, discussion groups e. other directories f. search engines supply some traffic to the blog only. SE’s don’t like listing directories. I get quite a bit of search engine traffic to the blog from DuckDuckGo, Bing, Mojeek, Wiby, Google, Yandex and and some smaller engines including Searchmysite.net. Together these add up to a steady little stream, but nowhere near commercial amounts of traffic. I got to say social networks helped a lot at the very beginning.
Things we can all do to help Searchmysite.net:
If you have a blog or a HTML static website with lots of text, add your site to the Searchmysite.net search engine. Help grow his index.
Try out Searchmysite.net. Try it as a surf engine. If you like it, add it to your blog roll or link page.
Mojeek, the privacy respecting search engine that has it’s own crawler and index, has started it’s own community to get feedback and input from it’s users. The community, located at community.mojeek.com, focuses on what Mojeek is doing, features, feature requests as well as general talk about web search, privacy, surveillance capitalism, web platforms and more. The Discourse based forum is open to all.
This is a good move on Mojeek’s part. Through discussion and interaction they can get a more nuanced idea of what their users want, don’t want and find important. If you have preferences about what you expect from a web search engine you should register and give them your suggestions.
Marginalia Search is a new web search engine worth bookmarking, using and paying attention to. It has it’s own crawler, algo(s) and index.
What makes Marginalia Search different is it prefers text-heavy websites and penalizes sites made with modern web design. This means it filters out commercial websites because those are the ones being churned out for commercial sales or as advertisement farms while it favors text-heavy weblogs and HTML sites.
Marginalia is about indexing the content web and ignoring the commercial web as much as possible so it’s mission is very similar to our own here at Indieseek.xyz. Except as a crawling search engine, Marginalia uses much better technology and can scale better. This is a Good Thing, because one of the biggest complaints is that the Web has become boring and over commercialized. Marginalia Search is trying to address this.
So far, I’m really liking what I see here.
I have added Marginalia Search to the directory in the following categories:
I am highlighting two search engines that are indexing the only Indie Web and/or Indieweb pages (please see below for my definitions of these terms.) Each search engine is different from each other so I’m going to highlight both their similarities and differences.
Both engines allow submissions of URL’s for inclusion.
Both engines are human reviewed at submission. This is a good thing, because a human editor can fiter out spam and websites that just don’t belong better than any algorithm.
Both will recrawl at unknown intervals for changes and to detect dead URL’s.
Both crawl for onpage text and maybe meta tags. I’m guessing here but this seems likely. Therefore, they may not do so well indexing pages that are mainly, photos, artwork or images with very little text.
Both discover new pages/websites only by submissions and by URL’s added to the index that the human editor might add manually. So neither has a general crawler that just keeps following hyperlinks like a general search engine crawler.
Both have a nice “Random” result link which is a handy and fun starting point to surf the web.
Wiby.me – Wiby is designed to index Indie Web (aka Independent Web) pages, like the old non-commercial HTML static pages of the 1990’s and early 2000’s. It does not spider a whole website. It only crawls the unique URL submitted. So only one page. Because of that it is not really designed to crawl non-commercial blogs. Wiby’s suggestion for blog webmasters is to submit one or two really good posts if you want to be in the index. Because it actually crawls onpage text this makes Wiby more sophisticated than a web directory like Indieseek.xyz and gives it a better search function.
Searchmysite.net – Searchmysite is designed to crawl deeper (maybe 50 pages or permalinks) within a single domain. (I don’t know how it treats subdomains.) This makes it better at indexing Indieweb blogs and indeed, as I write this, most of Searchmysite’s index appears to be Indieweb blogs. I cannot speak for the editor but I don’t see why Searchmysite would not also accept and crawl static HTML websites (ie. Retro or vintage HTML sites) so long as the site has some value and content that it can index, but they might not. Again, this is going to be mainly text. If you are looking for non-commercial blogs Searchmysite is the better option to start with.
Future of Indie Search
I’m highlighting these two niche search engines because in their own ways, they represent the future of Indie Web search. I’d like to see many more of them. Directories, like Indieseek.xyz, have their place, but being able to collect and search by, actual on-page content goes far beyond what a traditional directory is capable of doing. These guys could put me out of business and I’d be just fine with that!
Microsoft wants to build into the core of WordPress the ability to automatically push new and updated URLs to Bing and other search engines.
Since so much of the web is built on WordPress, this could save current and future smaller engines a lot of crawling time. I think this could be good for the Indieweb and Independent Web of non-commercial bloggers.
I like that this isn’t just for Bing but other search engines could use it too. It ight help even the crawler playing field against Google.
However, beware, I can see this quickly being abused by SEO’s so I guess each search engine would have to develop some way to filter out the dreck from the real content from this firehose.
System1 is an ad tech company that is getting into the Internet privacy business. They are best known for buying the privacy meta-search engine Startpage.
But wait there is more.
Last year we acquired the Waterfox browser, which is known for, among other things, being privacy friendly. We are also working on a private mapping solution in our MapQuest business. We believe a combined offering, which could include VPN and other privacy-related services, would be a very interesting privacy bundle for our users. Stay tuned!
So they own Startpage, Waterfox browser, and good old MapQuest, a pioneer in online mapping. Yeah they have the beginning building blocks for a suite there if they can put them all together.
DuckDuckGo is more than just a search engine, they are a privacy company. Brave browser is now expanding into privacy search. Now System1 has shown that they have bigger plans than just a search engine. Interesting.
Privacy – Mojeek was a privacy respecting search engine way before privacy became cool.
Building their own index – this is really important because there are only three other large indexes of the Web in English: Google, Bing and Yandex. Mojeek comes in at #4.
The combination of privacy respecting and having it’s own index are what makes Mojeek a potential, long term, big league player in the search engine business.
What has Changed Since 2018?
At first glance not much has changed since my original review above. Mojeek still has the same ultra-clean look it had before. The big changes come in the search results because Mojeek has been busy adding a couple of hundred servers: increasing their index size by crawling, and refreshing old pages in the index more often.
I’m seeing a fresh crawled date on at least one listing for every search I do.
I’m seeing that many other listings have been recrawled in the last few weeks so the index is fresher than ever.
There are definitely more pages in the index.
It’s Not Perfect
Mojeek has it’s spells of weirdness. Note these 2 searches.
Search for “marlin firearms” brings up both the main website for Marlin in the organic results plus the Wikipedia entry in the sidebar.
Now look at a search for “savage firearms” and you see the Savage website in the organic results but this time no Wikipedia entry! The problem lies with Wikipedia which, for some reason, lists Savage as “savage fire arms” (“fire arms” as two words) and Mojeek can’t bridge that gap.
The thing is when I first ran these searches I was looking for the Wikipedia entry, not the organic results, to get some history on these companies. That’s why I noticed it. Not a big deal just a little weird. However, do note that the organic results are pretty good, for both searches, and not exactly like the organic results Google or Bing would give you. That’s a very good thing.
There will also be times when Mojeek just didn’t have the website I was looking for but it did return a cluster of websites that mention the name of the website I wanted. I am not an expert but I take that as a sign that Mojeek’s crawler just has not indexed that site yet. I used to see this a lot back in the early search engine era (when there were many search engines of varying sizes and even on Google when it was very young) but younger web citizens who grew up with only all seeing Google might not have encountered this.
How I Use Mojeek
Mojeek is my default on my laptop, where 98% of my writing gets done.
I use it for most searches first. If it does not give me what I need then I resort to another privacy engine as a backup. Yes I often have to use the backup but Mojeek comes up with some good pages I would never get on G or B.
I use it go find Wikipedia articles.
Spell Checking: Okay this is embarrassing, I’m a terrible speller. Mojeek has a great spell checker and I have one social network place I post at where my browser spell checker does not work, so I often use Mojeek to check the spelling of a word. I have no idea if Mojeek’s spell checker is any better than any other but it works well and maybe the uncluttered results page makes it my preferred – crutch.
I think I could get by with just Mojeek if I could only use one engine for some reason. But that really is not the point. Google and Bing are giving you their opinion of what the best pages are for a given search. It’s just an opinion. We should all be using several search engines for our research to get a broader view – more opinions. Out of all the trillions of pages on the web it is ludicrous to think that the first 10, 20 or 30 results on Google are everything that needs to be said on a topic. Every single search engine has a bias built into the algorithm in some way. Every. Single. One. Which is exactly why you should use several search engines with different indexes.
Mojeek is growing into it’s role as a general web search engine. Attempting to index the ever growing Web is a noble endeavor. Their index has gotten much larger, fresher and their algo gives good results. As they grow, they will continue to get even better.
Back when I started on the Web in the 1990’s there were many major search engines, each showing just a part of the Web. I commonly used a suite of my favorite search engines and directories just to find a few websites or pages. Today my suite of privacy search engines consists of Mojeek, DuckDuckGo and once in a great while Startpage. I think you should add Mojeek to your searching suite.
This is a great retro 1990’s page chock full of resources and tips on how to surf the web in 2021. There are links to new tools and starting points for web surfing. This looks like a lot of fun and I intend to use this as my starting point.
Back in the 1990’s we surfed the web, because search engines pretty much sucked. Yes it took time to surf the web but it was endlessly entertaining back in those days before everything became a commercial for some product. A lot of the tools we used to use to surf are long gone, but this guide lists replacements. Good stuff.
In addition to that page, the Webmaster at Sadgrl.online has built a remarkable website and I encourage you to explore the whole site. It’s not just nostalgia, she has created webrings, search engines, discussion groups and listed extensive guides and resources. Well worth browsing.