Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEO rel="noindex" Problem #2970

Closed
shmaltz opened this issue Jun 13, 2021 · 8 comments
Closed

SEO rel="noindex" Problem #2970

shmaltz opened this issue Jun 13, 2021 · 8 comments
Labels
bug Something isn't working

Comments

@shmaltz
Copy link

shmaltz commented Jun 13, 2021

Search console has been reporting that a lot of the links on my website have the rel="noindex" tag, and is not indexing them.

Truth is, all the pages on my website are currently marked to be indexed, and there is no rel="noindex" tag on the pages.

I checked the http headers too, using this tool.

When I tried recrawling one of the problematic pages in Search Console, it still showed the issue, and it showed Referring page: my-yourls-link.com.

I checked the headers of that URL using the same tool as above, and surely the yourls link contains a X-Robots-Tag => noindex.

While it seems to make sense to have this tag on the Yourls link, I'm not sure why Google is not indexing the main URL.

I checked a Bit.ly link and ran it through the header checker tool, and they don't have any noindex tags.
Same with TinyURL. No noindex tags.

Seems like the noindex tag should be removed. None of the industry standard URL shorteners have it.

Perhaps we should undo this commit.

@shmaltz shmaltz added the bug Something isn't working label Jun 13, 2021
@ozh
Copy link
Member

ozh commented Jun 13, 2021

As I understand it, I think it's good that we have this X-Robots-Tag: noindex that tells robots to follow, but not index, short URLs

Case in point: https://www.google.com/search?q=site:bit.ly shows 78M results, but none of them are actually content from bitly, they are results from other sites.
While bitly is probably happy with this (they're basically stealing content), this would be extremely detrimental SEO-wise if you end up with duplicated content between your actual website and your URL shortener (see issue 2202 that led to the commit you're referring to)

Maybe there is something specific to your site content & short URL sharing strategy? I just dug a bit in the search console for https://blog.yourls.org/ and cannot see something as you report.

I think a better option, instead of reverting the commit, would be to add a filter to it and allow plugins to bypass or modify it

@shmaltz
Copy link
Author

shmaltz commented Jun 13, 2021

According to Google, a 301 alone is enough to transfer all link juice.

It would be awesome if you can add a filter to the app to remove the noindex header.

Thanks!

@promotionsonly
Copy link

I too was receiving these errors for my main domain (not the yourls short domain but the domain the short urls were pointing to.... Google wasn't indexing the main urls #2882

I removed the X-Robots-Tag header from the functions.php and soon the GSC issues went away for my main site. I still do not know why Google was not indexing my main urls (only the ones that were coming from short urls). It's as if it was 'carrying' the noindex x-robots-tag signal to my main domain.

Now im kinda go off on a bit of a tangent - Google was still 'indexing' my short urls with the noindex x-robots header tag.

I think having the 301 redirect is enough. Google will/is not 'indexing' the short url. Using the site: operator command is not a clear indication of what is actually indexed. Use GSC > Coverage > Valid to see what is actually indexed and what will appear in regular search queries.

If you site: command your short url domain and copy a short url from the results into GSC > URL Inspection you will see that it will return - URL is not on Google meaning this page is not in the index, even though it appeared via site: . So what 'index' is site:command querying? I do not know, my theory is that site: queries some "temporary state cache half way there" type of index. The results that appear in site: are in GSC > Coverage > Excluded.

I have yet to see any short url pages returned by regular search queries. Try googling content of the page, narrowing it down using quotes (“page content”) to help you find the exact match results. What I receive in the serps is the main url result, and not the short url results (that appears in site:). You can even try it with the bitly results... the bitly results do not get returned.

My guess is that Google filters out the short-url results because it hasnt made it to the index. The 301 is working as it should. Which is why it doesnt appear in Coverage > Valid. It gets added to Excluded > Page with redirect and serves the landing main result page to the user. It has understood what the url is suppose to do.

I also think we are confusing Google with the noindex x-robots-tag. Half results appear in Coverage > Excluded by 'noindex' tag and half appear in Page with redirect.

Now a question can be raised if it's ok (SEO-wise) to have these urls appear in site: and let Google do its thing by excluding them to not appear in the "actual regular search query index". Would it effect your main site that you are pointing to?

@shmaltz
Copy link
Author

shmaltz commented Sep 3, 2021

Any update on this?

@ozh
Copy link
Member

ozh commented Sep 3, 2021

I think a better option, instead of reverting the commit, would be to add a filter to it and allow plugins to bypass or modify it

Pull request welcome if anyone is interested in addressing this issue !

@ozh ozh closed this as completed Sep 3, 2021
@alroberts
Copy link

alroberts commented Dec 27, 2022

Hey @ozh - Not a big user of Github, so apologies if this is not the correct way to re-highlight the issue.

I have too suffered with my entire website being removed from Google. This issue has caused me headaches for the best part of a year and pretty much ALL of my search engine visibility has gone, with the exception of pages which weren't passed through this great shortner.

Bit of context here, Yourls serves a WordPress website via a WordPress plugin and basically replaces the default WordPress shorturl with the ones created by Yourls.

Both the shortener and the website reside on the same parent domain, the differences is, Yourls sits in a subfolder (https://domain.com/s/).

I have just entered into the functions.php file and changed the following from true to false:

// Tell (Google)bots not to index this short URL, see #2202 if ( !headers_sent() ) { header( "X-Robots-Tag: noindex", false ); }

Is this the "fix" here. Google ought to be able to understand that it's a 301 redirect, but it seems that it inherits the NoIndex attribute and passes that along to the full URL.

It's sad :( My website once enjoyed top search results but the past year has seen that diminished and it's taken me forever to figure out why.

@shmaltz
Copy link
Author

shmaltz commented Dec 27, 2022

@alroberts Curious if this will fix the problem. Our sites have also been suffering from this issue.

@ozh
Copy link
Member

ozh commented Feb 12, 2023

People monitoring this issue : YOURLS 1.9.2 (soon) will address this issue with a filter allowing you to use a plugin and fine tune the SEO behavior you'll like. See #3517 for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants