Updating sitemap file on Heroku

Preparing your Heroku application for using AWS S3 is really easy. You can just follow the steps at the Heroku AWS guide.

Because you need to do some more steps to get it all working I summed up the instructions to have it all in one place. Our own website and this blog are using the sitemap_generator  gem and it is deployed on Heroku too.

Basic AWS and Heroku setup

  1. Signup for AWS S3 and AWS Cloud Front on Amazon Web Services
  2. Create your bucket at your AWS S3 management console
  3. Set AWS access keys on heroku config (you’ll find it at ‘My Account’)
    heroku config:set AWS_ACCESS_KEY_ID=xxx AWS_SECRET_ACCESS_KEY=yyy
  4. Set AWS bucket name on heroku config
    heroku config:set S3_BUCKET_NAME=your-bucket-name

 

Configure sitemap_generator gem

The next step is to configure the sitemap_generator  gem to use the S3Adapter  so that it will upload the generated sitemap automatically to your S3 bucket. Theses instructions were taken from Generate Sitemaps on read only filesystems like Heroku.

  1. Add the fog  gem to your application
  2. Set the following Heroku environment variables
    heroku config:set FOG_PROVIDER=AWS 
    heroku config:set FOG_DIRECTORY=your-bucket-name
  3. Update your sitemap.rb  configuration
    # Set the host name for URL creation
    SitemapGenerator::Sitemap.default_host = "http://www.yourhost.com"
    
    # pick a place safe to write the files
    SitemapGenerator::Sitemap.public_path = 'tmp/'
    
    # store on S3 using Fog
    SitemapGenerator::Sitemap.adapter = SitemapGenerator::S3Adapter.new
    
    # inform the map cross-linking where to find the other maps
    SitemapGenerator::Sitemap.sitemaps_host = "http://#{ENV['FOG_DIRECTORY']}.s3.amazonaws.com/"
    
    # pick a namespace within your bucket to organize your maps
    SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
  4. Now you can create your sitemap file with the following command
    heroku run rake sitemap:create
    This will create the sitemap1.xml.gz  file and upload it via flog to your AWS S3 storage under the sitemap folder.

 

Redirect to your sitemap1.xml.gz file on AWS S3

Now we need to make your sitemap available within your rails application. The following steps are inspired by this blog post.

  1. At AWS Cloud Front add a new download distribution for your S3 bucket.
  2. Go to your S3 management console, navigate to your bucket, select the bucket and select ‘Make Public’ from the actions menu. If you forget to do this you will get a Access Denied  XML dump on your site instead of your requested resource.
  3. In your rails application add the following route:
    match ‘/sitemap1.xml.gz’ => ‘sitemaps#show’
  4. Add a new SitemapsController to your application and redirect to the URL you get from CloudFront. Your CloudFront domain will look something like your-distribution-id.cloudfront.net . Using this domain you can now access your sitemap file via the path /sitemaps/sitemap1.xml.gz .
class SitemapsController < ApplicationController

  def show
    # Redirect to CloudFront and S3
    redirect_to "http://your-distribution-id.cloudfront.net/sitemaps/sitemap1.xml.gz"
  end

end

 

Now your sitemap1.xml.gz  is again available at http://your-domain.com/sitemap1.xml.gz . This is very useful, because e. g. Google Webmaster Tools does not allow to add sitemap files from domain other than your own domain.

One response to “Updating sitemap file on Heroku”

  1. The same is also possible with:
    robot.txt
    Sitemap: http://my-site.s3.amazon.com/sitemaps/sitemap.xml.gz

    config/routes.rb
    get ‘/sitemap.xml.gz’, to: redirect(“http://#{ENV[‘FOG_DIRECTORY’]}.s3.amazonaws.com/sitemaps/sitemap.xml.gz”)

Leave a Reply

Your email address will not be published. Required fields are marked *