Over the past year I’ve been impressed with how much the Indieweb.org Wiki has improved.  Heck, it was good when I first saw it but members are very active and keep editing, improving, adding on and tweaking, relentlessly so it just gets better.

Somebody in one of the Indieweb chat channels, recently suggested the Indieweb wiki would make a good seed site (aka starter crawl) if one were starting to build a new search engine index.  This is something I’ve been thinking about, off and on, for a couple of weeks and I have to say I agree: the Indieweb.org wiki would make a good seed site for a web search engine.

What’s a “seed site”?

Briefly,  a seed site, or starter crawl, is a site (or one site of several) that a search engine crawler would index to find a wide variety of worthwhile pages to index.  The crawler would index those URL’s found and then index more pages on those sites and in the process discovering URL’s that they lnk to and on and on.  In the old days the Yahoo directory and Dmoz directory were considered prime seed sites for search engines.  Later Wikipedia came along and is still considered an important seed site for outbound links.  These sites were considered prime starters, in part because the outbound links had all been reviewed by human editors so a certain level of quality could be presumed.

If you want to learn more about seed sites I suggest reading Bill Slawski’s article: Seed Sites for Search Engine Web Crawls, which is worth reading if you are interested in the topic.  The comments are worth reading too.

Why Use the Indieweb.org Wiki?

First, there is a wealth of good information on the wiki pages alone even without crawling the outbound links. Second, crawling the outbound links.

Now I would not use the Indieweb wiki as my only seed unless I were, somehow, creating an Indie Web only search engine but I think it would be good as part of a mix of other seed sources.  The Web has changed in the last 20 years, commercial sites have taken over and the wiki’s outbound links lend a certain, needed counter balance to the commercial.

Other reasons:

  • The outbound URL’s are human curated. This minimizes low quality content.
  • Links to some quality content that might take awhile to find by other means.  Namely, a lot of quality blogs which also link out freely.
  • Wiki pages act as tags.  This isn’t quite as useful for a search engine as a full directory taxonomy of categories, it is useful.
  • The wiki is not outrageously huge.  Make no mistake, it’s big and growing bigger, but it’s dwarfed by something like Wikipedia and easier to digest within bandwidth limits of a startup.
  • It’s constantly being updated.  This makes it a good source for re-crawls because new links are constantly being added.

Other Good Seed Sites:

  1. Curlie.org – is the successor to the old Dmoz (Open Directory Project).  Volunteer editors have been working on cleaning out dead links for a couple of years and possibly adding new listings so it’s not quite as dead as one might think.  For somebody starting a web search engine, it’s hard to ignore 3 million or more listings.  Said listings may be older sites but I’d gamble that the quality is better than links from Twitter or Facebook would be.  Plus that taxonomy.  I would not spend time re-crawling for new URL’s after the initial starter crawl.
  2. Wikipedia – they don’t quite link out as freely as they once did but this is much more up to date.
  3. Reddit – or at least large parts of Reddit.  It’s big and diverse.  Sub-Reddits act as tags.  Constantly expanding with new links.  Helps you determine what is new and popular.  This is a good place to start a crawl and to re-crawl for new links.  Reddit was suggested to me by some very experienced SEO’s when we were discussing this topic.  I trust their judgment.
  4. Indie Map – Maybe.  I’d include it as a starter but would skip re-crawling.
  5. Hacker News – Maybe for a seed crawl.  I would try to tap HN including the comments for new fresh links.
  6. Pinboard – constantly updating bookmarks.  This would make a good seed site.

 

Agree or disagree on Indieweb.org as a seed site?  Can you think of other seed sites I’ve missed?  If so, leave a comment below.  Thanks.

This was also posted to
/en/search-engines.

Liked this post? Follow this blog to get more. Follow

Liked

Like: Searching the Creative Internet

@davidcrawshaw we are here.  Indieseek.xyz is here to help.  Sure we’re not a high tech search engine but what could be more 1990’s than a web directory?  You call it the Creative Internet (good name BTW) and we call it the Independent Web but we’re talking about the same thing. Our mission is to try and index that “Creative Internet”.

And Indieseek.xyz is not alone, There are other indexes, with similar goals.  Just so you know that a few people are thinking the same way and trying.

Liked this post? Follow this blog to get more. Follow

Liked

Like: Scoping Out Basics of #IndieWeb Search

Sounds like a great project and very worthwhile.

I can understand opt in.  I’m a little leery if the requirement is to use h-cards for that because as we have seen half the time that does not work.  Also it eliminates people on their own domain who might not have the ability to modify.  Just thinking out loud.

 

This was also posted to
/en/search-engines.

 

 

Liked this post? Follow this blog to get more. Follow

In case you don’t know, when you do a search on Indieseek.xyz at the bottom of the search results is the option to continue that same search on your choice of search engines.  With Findx shutting down I had to find a replacement search engine.

I tend to favor privacy respecting search engines wherever I can and I did manage to find a privacy respecting meta search engine run by a non-profit which has decent web results: MetaGer.  They have replaced Findx.  MetaGer has a very solid privacy policy and a 20 year track record.  I was happy to find them.

Liked this post? Follow this blog to get more. Follow

I added Indieseek.xyz search to Vivaldi browser.  This means I can search Indieseek directly from the browser search box if I want or from the address bar using a “nickname” shortcut.  Kewl.

I had to add the search manually, Vivaldi’s auto detect did not work with Indieseek.  But manual adding is easy on Vivaldi which is part of it’s beauty.

Here is an article: How to add custom search engines to your web browser  that gives instructions for all major browsers.  To add Indieseek you will need to use the manual method if your browser has one.  NOTE: the Firefox instructions won’t work. Firefox has changed the way they add search engines since Quantum came out.  IMHO a step back.

Liked this post? Follow this blog to get more. Follow

We added a usability enhancement. When you perform a search on Indieseek.xyz you now have the choice of continuing your search on several different search engines.

For an example click here (new tab) and scroll down to the bottom where you will see it. Click on a search engine name and you are off searching there for the same query.

This allows users to start with Indieseek and if we don’t have what they want they can effortlessly carry on to a search engine.

Liked this post? Follow this blog to get more. Follow