Friday, March 11, 2011

mod_pagespeed Authorizing and Mapping Domains

Authorizing Domains

In addition to optimizing HTML resources, mod_pagespeed restricts itself to optimizing resources (JavaScript, CSS, images) that are served from domains that must be explicitly listed in the configuration file. For example:
ModPagespeedDomain http://example.com
    ModPagespeedDomain http://cdn.example.com
mod_pagespeed will rewrite resources found from these two explicitly listed domains. Additionally, it will rewrite resources that are served from the same domain as the HTML file, or are specified as a path relative to the HTML. When resources are rewritten, their domain and path are not changed. However, the leaf name is changed to encode rewriting information that can be used to identify and serve the optimized resource.
These directives can be used in .htaccess files and <Directory> scopes.

Mapping Origin Domains

In order to improve the performance of web pages, mod_pagespeed must examine and modify the content of resources referenced on those pages. To do that, it must fetch those resources using HTTP, using the URL reference specified on the HTML page.
In some cases, the URL specified in the HTML file is not the best URL to use to fetch the resource from the Apache server. Scenarios where this is a concern include:
  1. If the server is behind a load balancer, and it's more efficient to reference the server directly by its IP address, or as 'localhost'.
  2. The server has a special DNS configuration
  3. The server is behind a firewall preventing outbound connections
  4. The server is running in a CDN or proxy, and must go back to the origin server for the resources
In these situations the remedy is to map the origin domain:
ModPagespeedMapOriginDomain origin_to_fetch_from origin_specified_in_html
Wildcards can also be used in the origin_specified_in_html, e.g.
ModPagespeedMapOriginDomain localhost *.example.com
By specifiying a source domain in this directive, you are authorizing mod_pagespeed to rewrite resources found in that domain. For example, in the above directive, '*.example.com' gets authorized for rewrites from HTML files, but 'localhost' does not. See ModPagespeedDomain.
When mod_pagespeed fetches resources from a mapped origin domain, it specifies the source domain in the Host: header in the request.
These directives can be used in .htaccess files and <Directory> scopes.

Mapping Rewrite Domains

When mod_pagespeed rewrites a resource, it updates the HTML to refer to the resource by its new name. Generally mod_pagespeed leaves the resource at the same origin and path that was originally found in the HTML. However, it is possible to map the domain of rewritten resources. Examples of why this might be desirable include:
  1. Serving static content from cookieless domains, to reduce the size of HTTP requests from the browser. See Minimizing Payload
  2. To move content to a Content Delivery Network (CDN)
This is done using the configuration file directive:
ModPagespeedMapRewriteDomain domain_to_write_into_html domain_specified_in_html
Wildcards can also be used in the domain_specified_in_html, e.g.
ModPagespeedMapRewriteDomain cdn.example.com *example.com
Note: It is the responsbility of the site administrator to ensure that Apache httpd is installed with mod_pagespeed on the domain_to_write_into_html. This might be a separate server, or there may be a single server with multiple domains mapped into it. The files must be accessible via the same path on the destination server as was specified in the HTML file. No other files should be stored on the domain_to_write_into_html -- it should be functionally equivalent to domain_specified_in_html.
For example, if mod_pagespeed cache_extends http://www.example.com/styles/style.css to http://cdn.example.com/styles/style.css.pagespeed.ce.HASH.css, then cdn.example.com will have to have a mechanism in place to either rewrite that file in place, or refer back to the origin server to pull the rewritten content.
Note: It is the responsbility of the site administrator to ensure that moving resources onto domains does not create a security vulnerability. In particular, if the target domain has cookies, then any JavaScript loaded from a resource moved to a domain with cookies will gain access to those cookies. In general, moving resources to a cookieless domain is a great way to improve security. Be aware that CSS can load JavaScript in certain environments.
By specifiying a domain in this directive, either as source or destination, you are authorizing mod_pagespeed to rewrite resources found in this domain. See ModPagespeedDomain.
These directives can be used in .htaccess files and <Directory> scopes.

Sharding Domains

Best practices suggest minimizing round-trip times by parallelizing downloads across hostnames. mod_pagespeed can partially automate this for resources that it rewrites, using the directive:
ModPagespeedShardDomain domain_to_shard shard1,shard2,shard3...
Wildcards cannot be used in this directive.
This will distribute the domains for rewritten URLs among the specified shards.
ModPagespeedShardDomain example.com static1.example.com,static2.example.com
Using this directive, mod_pagespeed will distribute roughly half the resources rewritten from example.com into static1.example.com, and the rest to static2.example.com. You can specify as many shards as you like. The optimum number of shards is a topic of active research, and is browser-dependent. Configuring between 2 and 4 shards should yield good results. Changing the number of shards will cause mod_pagespeed to choose different names for resources, resulting in a partial cache flush.
When used in combination with ModPagespeedRewriteDomain, the Rewrite mappings will be done first. Then the shard selection occurs. Origin domains are always tracked so that when a browser sends a sharded URL back to the Apache server, mod_pagespeed can find it.
Let's look at an example:
ModPagespeedShardDomain example.com static1.example.com,static2.example.com
  ModPagespeedMapRewriteDomain example.com www.example.com
  ModPagespeedMapOriginDomain localhost example.com
In this example, example.com and www.example.com are "tied" together via ModPagespeedMapRewriteDomain. The origin-mapping to localhost propagates automatically to www.example.com, static1.example.com, and static2.example.com. So when mod_pagepseed cache-extends an HTML stylesheet reference http://www.example.com/styles.css, it will be:
  1. Fetched by the server rewriting the HTML from localhost
  2. Rewritten to http://example.com/styles.css.pagespeed.ce.HASH.css
  3. Sharded to http://static1.example.com/styles.css.pagespeed.ce.HASH.css
Note: It is the responsbility of the site administrator to set up the shard entries in their DNS or CNAME configuration. Also, please see the note above about the servers for rewrite domains -- this applies to sharded domains as well. They must have access to the same content as the original domain.
By specifiying a domain in this directive, either as source or destination, you are authorizing mod_pagespeed to rewrite resources found in this domain. See ModPagespeedDomain.
This directive can be used in .htaccess files and Directory scopes. However, you should be very careful about the use of ModPagespeedShardDomain in htaccess files. To maximize browser-cache effectiveness, sharding should be consistent across an entire web-site.

No comments: