Using mod_pagespeed

mod_pagespeed is an Apache 2.x module. It is added into an existing Apache installation, and configured with the pagespeed.conf configuration file.

mod_pagespeed can be installed from source or binary form.

Installation Tips

In binary form mod_pagespeed is available as as Debian package for Linux distributions such as Ubuntu, installable with dpkg. It is also available as an RPM package for CentOS or compatible Linux distributions.

You can browse or check out the source code in the open source repository.

Configuration

mod_pagespeed contains an Apache "output filter" plus several content handlers.

Note: The location of the configuration file is dependent on the Linux distribution on which mod_pagespeed is installed.

On Debian/Ubuntu Linux distributions, the directory will be:

/etc/apache2/mods-available

On CentOS/Fedora, the directory will be:

/etc/httpd/conf.d

The mod_pagespeed configuration directives should be wrapped inside an

IfModule

:

<IfModule pagespeed_module>
....
</IfModule>

Configuring Handlers

mod_pagespeed contains three handlers:

  1. mod_pagespeed_resource_generator: serves optimized resources
  2. mod_pagespeed_statistics: shows server statistics since startup, from which one can compute average latency, and thereby measure the effectiveness of various rewriting passes
  3. mod_pagespeed_beacon: part of the infrastructure we provide for measuring page latency.

The following settings for the handlers can be used as a guideline:

    # This page shows statistics about the mod_pagespeed module.
    <Location /mod_pagespeed_statistics>
        Order allow,deny
        # One may insert other "Allow from" lines to add hosts that are
        # allowed to look at generated statistics.  Another possibility is
        # to comment out the "Order" and "Allow" options from the config
        # file, to allow any client that can reach the server to examine
        # statistics.  This might be appropriate in an experimental setup or
        # if the Apache server is protected by a reverse proxy that will
        # filter URLs to avoid exposing these statistics, which may
        # reveal site metrics that should not be shared otherwise.
        Allow from localhost
        SetHandler mod_pagespeed_statistics
    </Location>

    # This handles the client-side instrumentation callbacks which are injected
    # by the add_instrumentation filter.
    <Location /mod_pagespeed_beacon>
          SetHandler mod_pagespeed_beacon
    </Location>

Setting up the Output Filter

The output filter is used to parse, optimize, and re-serialize HTML content that is generated elsewhere in the Apache server.

Note:This output filter always generates uncompressed HTML. This filter does not interfere with the operation of mod_deflate used for compression. It runs upstream of it.

    # Direct Apache to send all HTML output to the mod_pagespeed output handler.
    SetOutputFilter MOD_PAGESPEED_OUTPUT_FILTER

Configuring Domains

In addition to optimizing HTML resources, mod_pagespeed restricts itself to optimizing resources (JavaScript, CSS, images) that are served from domains that must be explicitly listed in the configuration file. For example:

    ModPagespeedDomain http://my_site.com
    ModPagespeedDomain http://cdn.my_site.com

mod_pagespeed will rewrite resources found from these two explicitly listed domains. Addtionally, it will rewrite resources that are served from the same domain as the HTML file, or are specified as a path relative to the HTML. When resources are rewritten, their domain and path are not changed. However, the leaf-name is changed to encode rewriting information that can be used to identify and serve the optimized resource.

Configuring Server-Side Cache for mod_pagespeed

In order to rewrite resources, mod_pagespeed must cache them on the server. The output filter must be configured with paths where it can write cache files, and tuned to limit the amount of disk space consumed. The file-based cache has a built in LRU mechanism to remove old files, targeting a certain total disk space usage, and a certain interval for the cleanup process. It is also useful to have a small in-memory write-through LRU-cache that's kept in each Apache process. Keep in mind that in pre-fork mode, Apache spawns dozens of processes, so the total memory consumed (ModPagespeedLRUCacheKbPerProcess * num_processes) must fit into the capabilitiess of the HTTP server.

The default values are, which perform reasonably well are:

    ModPagespeedFileCacheSizeKb          102400
    ModPagespeedFileCacheCleanIntervalMs 3600000
    ModPagespeedLRUCacheKbPerProcess     1024
    ModPagespeedLRUCacheByteLimit        16384

 

mod_pagespeed requires a file-path for the cache. The user can use the following suggested setting:

    ModPagespeedFileCachePath            "/var/mod_pagespeed/cache/"

mod_pagespeed also requires another file-path, although it's not currently used. It's reserved for future use as a shared database in a multi-server environment. The user can use the following suggested setting:

   ModPagespeedGeneratedFilePrefix      "/var/mod_pagespeed/files/"

Configuring HTML rewriting filters

The 'ModPagespeedEnableFilters' configuration file directive allows specification of one or more filters by name, separated by commas. The directive may be issued multiple times, to split the filter lists logically. The order of the directives is not important: the rewriters are run in a pre-defined order, which corresponds to the order presented in the documentation.

    # Adds a 'head' element do the document if one is not supplied already.
    # Several other filters turn this on automatically even if it is not
    # specified explicitly.
    ModPagespeedEnableFilters add_head

    # Combines multiple heads into one.  Technically HTML documents
    # are not allowed to have multiple s, but sites which
    # aggregate content from multiple sources sometimes have them.
    # This filter moves the content from later s into the first
    # head.  This filter can change the order of content (e.g. CSS and
    # JS) in the later s relative to intervening 
    # elements.
    # ModPagespeedEnableFilters combine_heads

    # Experimental filter to completely remove scripts from a page.  Obviously
    # this will break functionality, and is disabled by default.  This can be
    # used to facilitate timing tests showing the maximum benefits of improving
    # javascript.
    # ModPagespeedEnableFilters strip_scripts

    # Large blocks of inline Javascript and CSS can benefit from
    # moving into external files so they can be cached in the browser,
    # even if the HTML is not cacheable.  See
    # ModPagespeedCssOutlineMinBytes and ModPagespeedJsOutlineMinBytes
    # These filters can currently only be
    # enabled in a single-server configuration.
    ModPagespeedEnableFilters outline_css,outline_javascript

    # Moves CSS elements into the   This filter requires add_head.
    ModPagespeedEnableFilters move_css_to_head

    # Combines multiple CSS elements into one
    ModPagespeedEnableFilters combine_css

    # Rewrites Javscript and CSS files to remove excess whitespace and comments
    ModPagespeedEnableFilters rewrite_css,rewrite_javascript

    # Inlines small CSS and JS files into the HTML document.  See
    # ModPagespeedCssInlineMaxBytes
    ModPagespeedEnableFilters inline_css,inline_javascript

    # Optimizes images, re-encoding them for smaller byte-size, removing
    # excess pixels that will not be displayed, and inlining small images.
    # See ModPagespeedImgInlineMaxBytes
    ModPagespeedEnableFilters rewrite_images

    # Adds width/height attributes to  tags that lack them.
    ModPagespeedEnableFilters insert_img_dimensions

    # Removes comments in HTML files.  This is turned off by default to avoid
    # breaking common techniques for deferring javascript execution by running
    # javascript that searches the DOM for HTML comments.
    ModPagespeedEnableFilters remove_comments

    # Removes excess whitespace in HTML files (avoiding "pre",
    # "script", "style", and "textarea").  This is off by default as
    # it's possible to apply these tags to the DOM in a way that is
    # not statically detected.  If your web-site only uses those
    # attributes directly in HTML markup, and does not apply them via
    # CSS and javascript, then it's safe to enable this filter.
    # ModPagespeedEnableFilters collapse_whitespace

    # Removes attributes which are not significant according to the HTML spec.
    ModPagespeedEnableFilters elide_attributes

    # Finds all images, css, and javascript resources that are publically
    # cacheable for less than one month, and extend their cache lifetime to
    # one year.  This is safe because this transformation adds a content
    # hash to the URL so that if the content changes the URL will change, and
    # we will get the desired caching behavior.  All other filters that write
    # resources use the same content-hashing technique to allow safe caching
    # for one year.
    ModPagespeedEnableFilters extend_cache

    # Removes quotes around HTML attributes that are not lexically required
    ModPagespeedEnableFilters remove_quotes

    # Adds javascript at the beginning and end of the page to allow latency
    # information to be sent back to the server.  This information can be
    # viewed from a web browser from /mod_pagespeed_statistics
    ModPagespeedEnableFilters add_instrumentation

Configuration Options for the Filters

By default mod_pagespeed uses a core set of filters that are generally safe for most web sites. To disable the core set, one can specify

    ModPagespeedRewriteLevel PassThrough

and then enable specific filters with the ModPagespeedEnableFilters directive.

The core set of filters is set to:

   add_head
   combine_css
   rewrite_css
   rewrite_javascript
   inline_css
   inline_javascript
   rewrite_images
   insert_img_dimensions
   extend_cache

To turn off specific filters in the core set, specify:

    ModPagespeedDisableFilters filtera,filterb
AND

remove all ModPagespeedEnableFilters directives. To turn on specific filters not in the core set, specify:

    ModPagespeedEnableFilters filtera,filterb

Tuning the Filters

Once the rewriters are selected, some of them may also be tuned. These parameters control the inlining and outlining thresholds of various resources.

    ModPagespeedCssInlineMaxBytes        2048
    ModPagespeedImgInlineMaxBytes        2048
    ModPagespeedJsInlineMaxBytes         2048
    ModPagespeedCssOutlineMinBytes       3000
    ModPagespeedJsOutlineMinBytes        3000

Note: The default settings are reasonable and intuitive, but as of this writing (Oct 2010) have not been experimentally tuned.

Configuring mod_pagespeed_examples

mod_pagespeed ships with a directory of sample HTML, Javascript, Image, and CSS files to demonstrate the rewrite passes that it executes. These also form the basis of an installation smoke-test to ensure that the configured system is operating correctly. Assuming the files are installed in /var/www/mod_pagespeed_example, the following configuration file fragment will enable them to be served using reasonable caching headers.

    # These caching headers are set up for the mod_pagespeed example, and
    # also serve as a demonstration of good values to set for the entire
    # site, if it is to be optimized by mod_pagespeed.

    <Directory /var/www/mod_pagespeed_example>
      # To enable to show that mod_pagespeed to rewrites web pages, we must
      # turn off Etags for HTML files and eliminate caching altogether.
      # mod_pagespeed should rewrite HTML files each time they are served.
      # The first time mod_pagespeed sees an HTML file, it may not optimize
      # it fully.  It will optimize better after the second view.  Caching
      # defeats this behavior.
      <FilesMatch "\.(html|htm)$">
        Header unset Etag
        Header set Cache-control "max-age=0, no-cache, no-store"
      </FilesMatch>

      # Images, styles, and javascript are all cache-extended for
      # a year by rewriting URLs to include a content hash..  mod_pagespeed,
      # can only do this if the resources are cacheable in the first place.
      # The origin caching policy, set here to 10 minutes, dictates how
      # frequently mod_pagespeed must re-read the content files and recompute
      # the content-hash.  As long as the content doesn't actually change,
      # the content-hash will remain the same, and the resources stored
      # in browser caches will stay relevant.
      <FilesMatch "\.(jpg|jpeg|gif|png|js|css)$">
        Header unset Etag
        Header set Cache-control "public, max-age=600"
      </FilesMatch>
    </Directory>

Configuring Caching

mod_pagespeed requires publically cacheable resources to provide maximum benefit. As discussed in the "Cache Extender" filter, the origin TTL specified in the Apache configuration file dictates how quickly changes made to the source can propagate to users' browser caches. However, using mod_pagespeed, resources referenced statically from HTML files will be served with a one-year cache lifetime, but with a URL that is versioned using a content hash.

The cache settings suggested above for mod_pagespeed_example also serve as our recommended starting point for ensuring that your sites' content is cacheable, and thus rewritable by mod_pagespeed.

Basic Operations

Turning OFF mod_pagespeed

To turn off mod_pagespeed completely, insert as the top line of pagespeed.conf:

   ModPagespeed off

Turning ON mod_pagespeed

To turn mod_pagespeed ON, insert as the top line of pagespeed.conf:

   ModPagespeed on

Trying out mod_pagespeed using mod_proxy

Ideally, you will experiment with mod_pagespeed on an Apache server that is already serving its own content. However, to experiment with mod_pagespeed on an Apache server that does not serve its own content, you can set up Apache as proxy:

    # Proxy configuration file to enable mod_pagespeed to rewrite external
    # content.  In this configuration we use assume a browser proxy,
    # pointing to HOSTNAME:80.

    LoadModule proxy_module /etc/apache2/modules/mod_proxy.so
    # Depends: proxy
    LoadModule proxy_http_module /etc/apache2/modules/mod_proxy_http.so

    <IfModule mod_proxy.c>
      ProxyRequests On
      ProxyVia On

      # limit connections to LAN clients
      <Proxy *>
        AddDefaultCharset off
        Order Deny,Allow
        Allow from all
      </Proxy>
      ProxyPreserveHost On
      ProxyStatus On
      ProxyBadHeader Ignore

      # Enable/disable the handling of HTTP/1.1 "Via:" headers.
      # ("Full" adds the server version; "Block" removes all outgoing Via: headers)
      # Set to one of: Off | On | Full | Block
      ProxyVia On
    </IfModule>

Set the browser proxy to point to that proxy server, and you will then be able to view any Internet site rewritten by Apache and mod_pagespeed.