Bernhard Scheirle


About Me

Hello, my name is Bernhard.
I'm a computer science student at Karlsruher Institute of Technology.

Contact Me



blogroll


SEO – How to exclude pages and articles with Pelican


If you want to prevent search engine robots to crawl certain pages or articles (like your impressum) you basically have two options:

  1. robots.txt
  2. noindex HTML meta tag

robots.txt

Since there are already tons of tutorials and articles about how a robots.txt works or how you write one, I'll skip this here. If you want more information visit robotstxt.org/robotstxt.

How to use a robots.txt with Pelican1

First you have to create your robots.txt and save it to content/extra/robots.txt. Then you tell Pelican to simply copy your robots.txt to the output folder. Therefore you have to add extra to the STATIC_PATHS2 list. Now all files in content/extra will be copied to output/extra/*. This is of cause the wrong folder for your robots.txt. With the EXTRA_PATH_METADATA3 setting you can change the path for the robots.txt to output/robots.txt.

Example – pelicanconf.py (excerpt)

1
2
3
4
STATIC_PATHS = ['extra', 'images' ] 
EXTRA_PATH_METADATA = {
    'extra/robots.txt': {'path': 'robots.txt'},
}

robots.txt content

In my Opinion the robots.txt is only useful to exclude all your drafts (status: draft). For specific pages and articles you can use the HTML meta tag method, which is more flexible and easier to use.

Example – robots.txt

1
2
User-agent: *
Disallow: /drafts/

noindex HTML meta tag

As above if you want more general information on how the meta tag works visit robotstxt.org/meta.

Example

1
2
3
<meta name="robots" content="noindex" />  <!-- Don't index this document -->
<meta name="robots" content="nofollow" /> <!-- Don't follow links in this document -->
<meta name="robots" content="noindex, nofollow" />  <!-- You guessed it -->

How to use the noindex HTML meta tag with Pelican1

The idea is to add an optional meta keyword (named meta_robots) to your pages and articles. Depending on it's value robots are disallowed to index (or follow links in) the current document.

Valid values for meta_robots are: noindex, nofollow and noindex, nofollow.

Example – Page / Article:

1
2
3
4
Title: My Title
Date: 2014-02-21 13:36
meta_robots: noindex
[...]

Theme – base.html:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<head>
{% if article %}
    {% set object = article %}
{% elif page %}
    {% set object = page %}
{% endif %}
{% if object and object.metadata['meta_robots'] %}
    <meta name="robots" content="{{ object.metadata['meta_robots'] }}" />
{% endif %}
[...]
</head>

  1. Pelican is a static site generator, written in Python, that requires no database or server-side logic. http://blog.getpelican.com/ 

  2. STATIC_PATHS documentation: docs.getpelican.com/…/?highlight=static_paths 

  3. EXTRA_PATH_METADATA documentation: docs.getpelican.com/…/?highlight=extra_path_metadata 

Comments


There are no comments yet. Why aren't you the FIRST and shout something?

Add a Comment

You may format you comment with Markdown.

Comment Atom Feed