Why did Google stop indexing web pages from our sitemap.xml?
We are seeing some web pages that exist in our
sitemap.xml yet are inexplicably missing out on from Google is public search index.
You can not download and install https://superuser.com/sitemap.xml - - we shield this documents due to the fact that there have actually been concerns with it in the previous - - yet googlebot can. We have actually validated using Google Webmaster Tools that the
sitemap.xml documents was taken down today and also is ranked ALRIGHT without mistakes (environment-friendly checkmark).
sitemap.xml has a checklist of the last 50,000 inquiries on our website that were asked. As an example, this inquiry ...
... exists in the
sitemap.xml as ...
<url> <loc>https://superuser.com/questions/201610/how-to-see-the-end-of-a-long-chain-of-symbolic-links</loc> <lastmod>2010-10-20</lastmod> <changefreq>daily</changefreq> <priority>0.2</priority> </url>
Searching for "How to see the end of a long chain of symbolic links" offers just one outcome to questionhub.com which is scratching our information (an entire various trouble).
You can increment the inquiry matter number and also do a specific look for the inquiry title and also you will certainly see this pattern linger.
These links are in sitemap.xml yet they are disappointing up in Google is index - - and also yet they turn up on websites that scratch our imaginative commons information. Why would certainly that be?
It resembles Google was having some technological crawl troubles today, that appear extremely like what we were experiencing:
No one appears to be immune from a Google indexing trouble that has several website proprietors frustrated. Blog sites and also internet sites, large and also tiny, aren't being indexed as promptly as they generally are-- if they're being indexed in all.
John from Google responded to the string in the Webmaster discussion forums claiming:
Just to be clear, the concerns from this string, which I have actually assessed carefully, are not as a result of adjustments in our plans or adjustments in our formulas ; they result from a technological concern on our side that will certainly be noticeably settled asap (it might occupy to a couple of days to be noticeable for all websites though)
Google does not make any kind of deal or warranty that web pages in a sitemap will certainly be indexed.
My experience has actually been that a web page needs to be connected - to (from a web page of some authority) to turn up. Is that page/question connected to directly/indirectly from a web page with some authority?
E.g. if the superuser.com homepage (which probably has several inlinks) connected straight to this inquiry, or connected to it indirectly via a variety of various other web pages, after that you can anticipate it to be indexed.
Google does not assure that we'll creep or index every one of your URLs. Nonetheless, we make use of the information in your Sitemap to learn more about your website is framework, which will certainly permit us to boost our spider timetable and also do a far better work creeping your website in the future. Most of the times, web designers will certainly gain from Sitemap entry, and also in no instance will certainly you be punished for it.
It shows up that Google is mentioning that 46,514 sent web links are in the index. Could it be a concern with (I despise to claim it) yet web page position? The scratching websites might be doing a far better work cross - connecting etc and also being rated greater. Simply an idea.
This search site:superuser.com How to see the end of a long chain of symbolic links additionally seems bring your sitemap.xml appropriately, albeit not returning the anticipated outcomes.
I assume google could be having a tough time indexing your website, 50.000 is alot. So my pointer would certainly be failure your sitemap right into items thus
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.example.com/sitemap1.xml.gz</loc> <lastmod>2004-10-01T18:23:17+00:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap2.xml.gz</loc> <lastmod>2005-01-01</lastmod> </sitemap> </sitemapindex>
If you failure you will certainly have a far better good luck of having those 50.000 links indexed.
Sitemaps.org description of the concern
You can give numerous Sitemap documents, yet each Sitemap documents that you give have to run out than 50,000 URLs and also have to be no bigger than 10MB (10,485,760 bytes). If you would certainly such as, you might compress your Sitemap documents making use of gzip to lower your transmission capacity need ; nonetheless the sitemap documents as soon as uncompressed have to be no bigger than 10MB. If you intend to detail greater than 50,000 URLs, you have to create numerous Sitemap documents.
If you do give numerous Sitemaps, you need to after that detail each Sitemap documents in a Sitemap index documents. Sitemap index documents might not detail greater than 50,000 Sitemaps and also have to be no bigger than 10MB (10,485,760 bytes) and also can be pressed. You can have greater than one Sitemap index documents. The XML layout of a Sitemap index documents is really comparable to the XML layout of a Sitemap documents.
The inquiry was simply asked the other day - offer googlebot an opportunity, you aren't the only website on the net that he needs to creep ya recognize:)
If inquiries are generally indexed within a day approximately, and also a week passes which one still isn't indexed, after that I could be worried. Yet absolutely not after 1 day.
With this sort of point there are a great deal of possible solutions.
I would certainly start by asking the amount of web pages you in fact have. (you sent 50,000 URLs a fast website :superuser.com show 125,000 indexed do you assume you just have 50K URLs and also are sending every one of them yet Google is locating 2 - 3 duplicates of each web page? or possibly you have 1Mil URLs and also just 12.5% are obtaining indexed) obtaining the large image aids to route where to seek concerns.
If absolutely nothing appears incorrect with action one, I would certainly relocate onto web content, it resembles QH has a great deal even more web content on their web page and also link out several various other "sources" although that all their web content is scratched it is feasible Google considers their web page better given that they give even more resources/information to the customer. If they are taken into consideration the authority and also all your web content coincides as their own it is feasible Google will not index your own despite the fact that you are the initial.
If you are encouraged that is not the concern construct some excellent quality web links to it, blog site this inquiry on some preferred staff member blog sites or ask some close friends to blog concerning it, probably if you have SEO close friends that run preferred blog sites they would certainly write a study concerning it etc
If you get a great deal of solid web links and also it is still not obtaining indexed seek factors it could be punished (most of the times this will not be the concern yet it never ever injures to examine).
If none of this functions after that 9 breaks of 10 it is a straightforward technological concern that is been forgotten (robotics exemption or something comparable).
If you are still have no solution after experiencing this ask Google and also wish they get you a solution.