Why Google is not indexing some of my webpages?
Are you wondering why Google is not indexing some of your pages, even if after you submit them through sitemaps? Read this articles to learn why some urls are not indexed by search engines and how to resolve this issue.
I have seen this question in numerous blogging related forums: "Some of my web pages are not indexed by Google. How can I fix this problem?". Just now an online buddy came and asked me the same question. Since I have answered this several times in many forums, I thought I will make a short post about it and share it with everyone who have the same query.
Each search engine has its own process to index webpages. Typically, their automated spiders crawl the page, store in their database, and finally decided whether to index it or not based on several parameters.
Let us see some of the common reasons why Google does not index some webpages you submit to Google.
1. It takes time to crawl and index pages from new and small sites
Even though Google is very quick on crawling active and reputed sites, it may take relatively longer time for it to crawl new pages from small sites. If you just started a new site, it may take several months before Google can correctly understand your posting frequency and adjust its crawling interval. If your site just few weeks old, don't be surprised if Google takes a few weeks to crawl and index your new pages. There is nothing to worry, it is normal. If you publish new articles regularly and consistently, Google will eventually learn the pattern and may start crawling your posts quicker.
2. Duplicate content
Once Google crawl your webpages, it may or may not add it to its index immediately. It will do several different analysis before it decide to add to the index or ignore.
Even if the page is indexed, at a later point, it may remove the page if it thinks that there is another version of the pages exists in the index. If Google sees multiple copies of the same page (pages with same content), it may keep only one version and ignore the rest. This can happen even if 2 pages have almost similar content.
3. Too little content
If the page has too little content or Google does not see much value in the page, it may be a good reason for them to ignore the page. Such pages may be removed from the index, as it discover that the content is not very valuable.
4. Spun content
Some bloggers spin articles using software and publish the spun version of of it. Publishing such content is considered as one of the worst practices on online publishing. Sometimes, it may take a while for Google to identify spun content and remove from index.
5. Page not reachable
Search engines navigate to various pages by following links in the pages that it crawls. If you have a page in your site but have no links to it from any other pages, search engines may never find such pages.
6. Page has only images or videos but no text
If you page contains only images or videos, search engines may not index such pages.
7. Robots.txt file blocking urls
Robots.txt file is used to block certain directories or urls from search engines and other robots. Open your Robots.txt file and check if there are any directions in it that blocks search engines from indexing the webpages.How to get all of your pages index?
It is normal that Google will drop out some of the pages for various reasons mentioned above. However, if you think some of your pages are left out for no valid reason and the page meets all quality criteria, then take a detailed look at the page.
- Has the page been published for few days? Sometimes, it may take few days for Google to crawl your page.
- Have you marked the page with the "no index" meta tag accidentally? View the source of the page on your browser and look for something like this:
<meta name="googlebot" content="noindex" />
You may have a slightly different version of it. Search for "noindex" in your source and see if you find something like it. The above tag is a directive to Google requesting it to not index your page.
- Do you have other pages with similar content? Take some text (4-5 words) from the page and search for it in Google, placing them within double quotes to find exact match text. Check if the same text appears in any other pages in your site or in other sites. If there are pages with similar content in your own site, you may want to remove the duplicate pages. If the same content appears in other sites, it is possible that they may have copied your article but Google thought you copied from them. File DMCA complaint against those websites to get those articles removed from the other sites.
Read more about filing DMCA complaint
- Do you have sufficient content in the page? If the text content on the page is too less or the page has only images, Google may not index the page. Try to add a couple of paragraphs of text.
- If the page has no links to it from other pages, then try to add few links to it from various other pages or your home page. Also, make sure the page is included in your sitemap file and the sitemap is submitted to Google Webmaster Tools.
- Check your robots.txt file and make sure it is not blocking the pages. Also, include a directive in the robots.txt file which publishes the url of the sitemap file.
Once you identify and resolve the problem, you may resubmit the url to Google using Google Webmaster Tools.
Read How to submit your newly launched website to Google Search Index
Tony Sir, I think this is one of the most common issues bloggers facing now a days (including me). Through this post, I have come to know many new reasons affecting website's crawling and indexing. Very helpful solution. Thanks.