Recommendations for a Green Web for Administrators

Configure HTTPs Caching Support

Since the configuration of the user’s web browser cannot be affected by web designers or administrators, they have to focus on the server-side configuration aspects of caching. Where approx. 80% of the web users have their browsers configured for caching, however 20% always have an empty cache (Theurer 2007). The inclusion of caching metadata by the web server will significantly decrease the amount of HTTP requests and HTTP responses. Caching in HTTP/1.1 is designed to reduce

  • the need to send requests to servers ("expiration" mechanism) and
  • the need to send full responses back to the clients ("validation" mechanism).

The validation mechanism does not reduce the amount of HTTP-requests but it reduces the payload of the HTTP responses that are sent back to the client and thus addresses network bandwidth reduction (Fielding et al. 1999, p. 74).

In order to facilitate the expiration mechanism of HTTP servers, administrators can specify an Expires or Cache-Control header in their response. The Expires header as described with HTTP/1.0 (Berners- Lee 1996, p. 41) defines the absolute date after which the response is expected to be stale. One minor problem with the Expires header is that it uses explicit date and time strings and thus requires server and client clocks to be synchronized (Crocker 1982, p. 26; Souders 2007, p. 22). Beginning with HTTP/1.1 that limitation has been overcome with the Cache- Control header. It uses the max-age directive to define the seconds, which the requested resource may remain in the cache. To stay compatible with older HTTP clients that do not support the HTTP/1.1 protocol, one can define the Expires header alongside with the Cache-Control header. In that case the Cache-Control header overrides the Expires header.

The validation mechanism is utilized by HTTP clients that have a resource entry in their cache that has already expired. In that case the client may send a conditional HTTP request to the server. This conditional HTTP request looks exactly like a normal HTTP request but in addition it carries a so called "validator" that may be used by the server to decide whether the resource requested by the client is still up to date or needs to be refreshed. In case that it needs a refresh the new data is sent to the client, otherwise the server responds with the HTTP status code "304 Not modified". There are two validators that may be used: Last-Modified dates or Entity Tag cache validators. In case of the Last-Modified date a cache entry is considered to be valid if the requested resource has not been modified since the given Last- Modified date. An Entity Tag is a unique identifier for a specific version (entity) of a resource (Fielding et al. 1999, p. 85). The calculation of the Entity Tags depends on the implementation of the web server. The HTTP/1.1 specification states that servers should send both, Entity Tag and Last-Modified values in their responses. HTTP/1.1 clients are forced by the specification to use Entity Tags in cache-conditional requests if provided by the server. In addition clients should also apply a Last-Modified date if one was set (Fielding et al. 1999, p. 86).

In order to reduce the total amount of HTTP requests or HTTP payload sizes we suggest configuring the client cache support properly. This means:

  1. setting far future expiration dates and cachecontrol headers for resources that infrequently change
  2. setting Last-Modified headers and Entity Tags for all resources that do not need recalculation on subsequent requests (mainly static content)

A simple example configuration fragment for the popular Apache web server may look like this:

ExpiresActive On
<FilesMatch "\.(html|jpg|png|js|css)$">
ExpiresDefault "access plus 355 days"
FileETag MTime Size
</FilesMatch>

The Apache configuration directive ExpiresDefault handles both, the generation of an Expires header and the generation of a Cache-Control header for the given resource types.

Use Compression

Today, many modern web browsers support some kind of compression. Compression reduces not only response size and thus transfer time, but also power consumption as a result of the smaller size and shorter transfer times.

Web browsers usually support the GZIP compression format or DEFLATE. Both formats are especially named in the HTTP/1.1 specification. The Accept-Encoding header is used by web browsers to indicate which content encodings they support. A web server may compress the content using one of the compression methods listed by the browser and must notify the browser within the Content- Encoding response header which compression method is used (Fielding et al. 1999).

Compression is not as simple as it seems, because there are some older browser versions that claim that they support compression but actually do not, because of incompatibilities or bugs. On the other hand more than 95% (ADTECH, 2008) of all installed and used browsers in Europe are known to support GZIP compression. Therefore regarding our goal of "Green Web Engineering" it is reasonable to enable compression on the server side.

However, not all content types are suitable for compression, e.g. compressed image file formats, compressed music and video files or PDF documents. Compressing these file types is sometimes even counterproductive. Hence, compression should be used for files that are well compressible like text based files. With a simple example website that was compressed via GZIP compression with an Apache web server, we achieved traffic savings as shown in table 1. The average saving for the whole example site is approx. 60% (assuming PNG-images are not compressed).

Table 1: Total vs. compressed filesize example.
Example content Total size (KB) Compressed size (KB) Savings
index.html 5.45 2.44 55.3%
style.css 2.73 0.68 75.1%
prototype.js 126.00 29.51 76.6%
ida-logo.png 24.80 24.86 -0.2%
ucb-logo.png 9.27 9.28 -0.1%

Apply "Green IT" Concepts

So far, several web hosters exist that offer web hosting with renewable energy. Additionally, administrators can apply the newest techniques regarding Green IT like virtualization strategies.

References

  • ADTECH, 2008. Survey Unveils Extent of Internet Explorer Domination Across the European Browser Landscape. [Online] ADTECH: London. Available at: www.adtech.com [Accessed 10 Oct. 2009].
  • Berners-Lee, T., Fielding, R. & Frystyk, H., 1996. Hypertext Transfer Protocol -- HTTP/1.0. Request for Comments 1945. [Online] Network Working Group. Available at: tools.ietf.org [Accessed 10 Oct. 2009].
  • Crocker, David H., 1982. Standard for the Format of ARPA Internet Text Messages. Request for Comments 822. [Online] University of Delaware. Available at: tools.ietf.org html/rfc822 [Accessed 10 Oct. 2009].
  • Fielding, R.; Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P. & Berners-Lee, T., 1999. Hypertext Transfer Protocol -- HTTP/1.1. Request for Comments 2616. [Online] The Internet Society. Available at: tools.ietf.org rfc2616 [Accessed 10 Oct. 2009].
  • Souders, S., 2007. High Performance Web Sites. Sebastopol: O’Reilly Media.
  • Theurer, T., 2007. Performance Research, Part 2: Browser Cache Usage - Exposed! [Online] Available at: www.yuiblog.com performance-research-part-2/ [Accessed 10 Oct. 2009].

Acknowledgements

This text has been published by INSTICC Press:

  • Dick, Markus; Naumann, Stefan; Held, Alexandra: Green Web Engineering. A Set of Principles to Support the Development and Operation of "Green" Websites and their Utilization during a Website’s Life Cycle. Filipe, Joaquim; Cordeiro, José (eds.): WEBIST 2010 : Proceedings of the 6th International Conference on Web Information Systems and Technologies, April 7 - 10, 2010, Valencia, Spain, Volume 1. Setúbal: INSTICC Press, 2010, pp. 48 - 55.