Tuesday, August 10, 2021

Reasons Why Your Site May Not Be Indexed




After further discovering the problem, I thought I'd write a post for Moz to share my experience so others don't have to spend a lot of time looking for answers about indexing.


This means that your site, or any part of it, is not added to Google's index, which means no one will find your content in search results.


list of Crawl Problems


The first thing you should look at is your Google Search Console dashboard. Forget all the other tools available. If Google sees a problem with your site, then that's the problem you want to address first. If there is a problem, the dashboard will show an error message. See below for an example. I have no issues with my current site, so I'll have to find someone else's screenshot. Thanks in advance, Nile :)


Crawl


The HTTP 404 status code is most likely the one you'll see most often. That means whatever page the link points to, cannot be found. Anything other than a 200 (and possibly 301) status code usually means something is wrong, and your site may not function as described to your visitors. 


Fix Crawl errors


Usually this type of problem is caused by one or more of the following reasons:


Robots.txt - A text file located in the root folder of your website that conveys certain guidelines for search engine crawlers. For example, if your robots.txt file has this line in it; User-agent: * Disallow: / basically tells every crawler on the web to climb up and not index ANY content on your site.


.htaccess - This is an invisible file that is also in your WWW or public_html folder. You can enable visibility in most modern text editors and FTP clients. Very bad htaccess can do bad things like infinite loop, which will never let your site load.


Meta tags - Make sure non-indexed pages have no meta tags in source code: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">


Sitemap - Your sitemap is uncertain for some reason, and you are still submitting the old/broken one to Webmaster Tools. Always check after you've resolved the issue that's shown to you in the webmaster tools dashboard, that you've run a new sitemap and resubmitted.


URL Parameters - Within Webmaster Tools there is a section where you can set URL parameters that tell Google what dynamic links you don't want to index. 


Your Pagerank is not enough - Matt Cutts revealed in an interview with Eric Enge that the number of pages that Google crawls is roughly proportional to your Pagerank.


Connectivity or DNS issues - Maybe because of whatever Google spiders can't reach your server while they try and enjoy it. Maybe your host is doing some maintenance on their network, or you've recently moved your site to a new home, in which case DNS delegation could satisfy crawler access.


Built-in problem - You may have registered a pre-existing domain. I have a client who gets a new domain (or so they think) and does everything according to the books. Writes good content, nails things on the page, has some great inbound links, but Google refuses to index them, despite accepting their sitemap.


After some investigation, it turned out that the domain had been in use a few years earlier, and was part of a large www linkspam. We had to file a reconsideration with Google.

No comments:

Post a Comment