5 years ago (2015-04-19)  computer technology Technical application |   First to comment  4 
post score 0 times, average 0.0

In general, we hope that the more search spiders visit our websites, the better. I believe that every new webmaster will be a frequent visitor to webmaster tools. When they get up early in the morning, they are concerned about the amount of their own web sites. Zhang Hao Yulin, rising is worry, drop is happy, I want to tell the majority of webmasters friends this is completely unnecessary, the amount included is not the purpose, I think our focus should be how to make your own website has more Baidu search traffic . The robots.txt is the control file of the search engine crawling website. It tells the search engine which pages can climb and which can't climb according to certain syntax. For the introduction and writing syntax of robots.txtd, you can refer to this blog post: Web Spider Access Control File The wording of robot.txt. Maybe you want to say, the more you collect, the better? Actually, the website's webpage does not contain as many records as possible. Everyone knows that search engines compare webpage similarity on the web (when the two pages with too high similarity will disperse the weight), not only will the web site be different from one another. Compare, and compare different pages on the same site. So, for example, for an individual blog's author archive and home page, the page content is almost the same, and we can completely shield spiders from accessing the archived pages of the author.Let me explain in detail how wordpress writes robots.txt for SEO optimization.

WordPress how to write robots.txt for SEO optimization

 

First, shield some links that are not necessarily included

1. Block and capture site search results

Disallow: ?s=* This does not need to explain, shield the capture site search results.The absence of these links in the station does not mean that there is no site outside the station. If it is included, it will cause content similar to that of TAG pages.

2, shield spider capture program files

Disallow: /wp-*/ Shield spider to capture program files, wp-* means that wp-admin, wp-include, etc. folders do not let search spiders crawl, which saves search engine spider resources.

3, shield feed

Disallow: /feed/* Disallow: /*/*/feed/* Disallow: /*/*/*/feed/* The feed link in the header code is mainly for prompting the browser user to subscribe to the site, but the general site There are RSS output and site maps, so blocking the search engine to crawl these links is quite necessary, should be the content inside the feed is basically the repetition of your article content, the same content will make Baidu reduce single page weight, and it also saves spiders Resources and server pressure.

4, shield and capture message links

Disallow:/*?replytocom* Disallow: /comments/ Disallow: /*/comments/ Block message messages.It should be pointed out that blocking message links is not to say that spiders are not allowed to include the comment page of your article. After this link is opened, the entire page has only one comment. It is completely unnecessary to be included, and it also saves spider resources. Shielded.

5. Block other links to avoid duplicate content and privacy issues

Disallow: /date/ Disallow: /author/ Disallow: /category/ Disallow: /?p=*&preview=true Disallow: /?page_id=*&preview=true Disallow: /wp-login.php These blocking rules can be based on your own The need to determine whether to create, shield data, author, category and other pages are to avoid too much duplication of content,

6, Disallow: /?p=*

Block the capture of short links.The short link in the default head, Baidu search engine spiders will try to capture, although the final short link will be 301 redirected to a fixed link, but this still causes a waste of spider resources.

7. Mask specific formats

Disallow: /*.js$ Disallow: /*.css$ Blocks the capture of js and css format files, saves spider resources, and reduces server stress. You can block whether your images are captured based on actual requirements.

8. Other pages that do not want to be crawled

Disallow: /*?connect=* Disallow: /kod/* Disallow: /api/*

  • /*?connect=*: My blog login link
  • /kod/*: Online File Management Links
  • /api/*: My homemade API link

Second, the use of robots.txt need to pay attention to several places:

  • 1. Rules with an independent User-agent will be excluded from the rules of the wildcard "*" User agent;
  • 2. The command is case-sensitive, ignoring unknown instructions. The following figure shows the test results of the robots.txt file of this blog in the Google administrator tool.
  • 3, character parameters after "#" will be ignored;
  • 4, can write a link to the sitemap file, to facilitate search engine spiders crawling the entire station content.
  • 5. Each line represents an instruction, blanks and interlaces are ignored;
  • 6. Use the Allow command as few as possible, because different search engines will treat the Allow commands in different locations differently.

These Disallow instructions above are not mandatory and can be written on demand.It is also advisable to open the Baidu webmaster tool at the site to check if the site's robots.txt is normal.

Third, the use of Baidu webmaster tools robots.txt tool

Baidu webmaster tool robots.txt tool URL: http://zhanzhang.baidu.com/robots/index Baidu webmaster tool robots.txt tool use Method    

  • Detect and update: In the text box, enter the website click detection and update. Baidu will crawl your robots.txt file. If you recently updated the robots.txt, it will immediately notify the Baidu search spider to update its crawling rules. Your modified robots.txt takes effect immediately.
  • Rule verification: You can extract your own robots.txt, then verify that your robots.txt syntax is correct, and verify that you want to prevent spider crawling URLs from being effectively blocked;
  • Create a build: According to your needs, fool-like to generate robots.txt, for owners of white may wish to try.

appendix

Wang Baiyuan’s blog robots.txt is shared as follows:

 

 

This article has been printed on copyright and is protected by copyright laws. It must not be reproduced without permission.If you need to reprint, please contact the author or visit the copyright to obtain the authorization. If you feel that this article is useful to you, you can click the "Sponsoring Author" below to call the author!

Reprinted Note Source: Baiyuan's Blog>>https://wangbaiyuan.cn/en/wordpress-how-to-write-a-robots-txt-to-seo-optimization-2.html

Post comment

Style

No Comment

登录

Forget password?

您也可以使用第三方帐号快捷登录

切换登录

注册

TW