A
A
AlexandrMa2022-02-01 14:15:35
Search Engine Optimization
AlexandrMa, 2022-02-01 14:15:35

How to block PDF indexing?

The site has 1000 pages. About 500 DPF files. It is necessary to hide these files from indexing, because they are not unique and generally superfluous, they interfere with the promotion of the pages themselves. You only need the ability to download them to the user. Entries made in the robots file

Disallow: /pdf/
Disallow: *.pdf

for all bots and specifically for Google and Yandex.

As a result, Yandex blocked, Google continues to stubbornly index. A year has passed, the situation has not changed. Google writes "Indexed despite blocking in the robots.txt file"

Does it make sense to transfer files to a subdomain with a redirect?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
P
Pavel, 2022-02-01
Sayk @PiSaiK

The best option is when the final document gives the heading noindex. For example, add a snippet to the site's root .htaccess file or httpd.conf file:

<Files ~ "\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</Files>

Then you need to act like this
1. Do not prohibit indexing PDF - remove instructions from robots.txt
2. Give the title as written above
3. Add rel="nofollow" to links to documents
And then all documents will fall out of the index

A
Artem Gvozdev, 2022-02-01
@arty23_03

the easiest option is to do basic authentication on urls with pdf

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question