Static content generation
Ian Bicking has an interesting pieces on using static publishing in a CMS. The choice between static and dynamic when building software for the web is a critical one, and one that I think deserves in-depth discussion.
In a dynamic site, pages are assembled “on the fly” as and when they are requested. Most PHP powered sites do this and as PHP as a technology actively encourages dynamic content creation. Generating pages dynamically allows for all sorts of clever applications, from random quote generators to full on web applications such as Hotmail.
In a static publishing system, HTML pages are pre-generated by the publishing software and stored as flat files on the web server, ready to be served. This approach is less flexible than dynamic generation in many ways and is often ignored as an option as a result, but in fact the vast majority of content sites consist of primarily static pages and could be powered by static content generation without any loss of functionality to the end user.
The most widespread example of a static publishing system I’ve seen is Moveable Type, which rebuilds static files for a site each time a weblog entry is added or modified—although it can be configured to serve content dynamically instead.
At first glance, the benefits of dynamic publishing are obvious. What is frequently ignored are the benefits of static publishing, at least for content-driven sites which don’t have any heavy need for dynamic features. The most obvious benefit is performance; serving static files is what web servers such as Apache are optimised to do, and they can do it fast. A second advantage is reliability, as Ian explains:
A big part is that it takes the pressure off of going live. I can be sure before going live that the public website is correct. The actual CMS may explode in flames, but the site will be fine. Going live with a web application is always a stressful process, and anything that reduces the stress of that is a great benefit. As time goes on, static publishing is also a big stress reduction for the system administrator, since a simple Apache configuration is a lot more reliable under different loads and configurations than any dynamic site will be.
I’ve been developing dynamic sites almost exclusively for the past two or three years, but a couple of my most recent projects were static rather than dynamic. These were the LJWorld.com Coupons site and the KUSports.com photo galleries. I wanted to write both of these in Python, because doing so would make the process of transferring them over to our new mod_python powered CMS (currently in development) far less involved. Unfortunately our main production servers don’t currently have mod_python configured, and we weren’t overly keen on setting it up there for the sake of a couple of small projects. Instead I decided to write the administration interfaces using Python CGI scripts, but generate the actual front end pages (which would see far heavier traffic) as static files.
In addition to the performance and reliability benefits, an additional benefit is that static generation provides a simple “staging area” style feature for free. Both the coupons and the gallery interfaces allow users to make multiple changes to site content safe in the knowledge that none of the changes will become visible until the “Publish Site” button is selected. At first I was worried that this extra step could prove confusing, but in practise it allows our content producers to make changes in a safe environment, without fear of accidentally breaking the public site while they are working.
Static content generation certainly isn’t appropriate for every project, but for plain content sites sites that don’t need dynamic features it’s a much more viable option than many people think.
James Robertson - 13th December 2003 08:14 - #
KO - 13th December 2003 09:10 - #
I've been a commercial web developer for oooooh... a looong time now - mainly in PHP, and an applications developer before that. One of my first major sites was Cinema.com. Its changed a bit since I left the company - but not much. Originally it was developed to showcase some of the parent companies (www.easysoft.com) ODBC middle ware.
It started being a dynamic web site - every page was assembled on the fly using PHP (under apache on Linux) fed from a SQL Server database.
After a while, we decided to cache the pages as content wasn't changing THAT often to reduce the drain on SQL Server. Once new content was added, my caching system would re-cache the new pages (just the generated html).
This way great in the beginning - pages loaded faster and SQL server could go have snooze in the afternoon....
After about 3 months of adding new content (4 people typing in news, cinema show ties, movie reviews, interviews, synopsis, cast and crew details...) the cache was managing about 500,000 pages with approximately 100 pages being added every day and we we running out of physical disk space to store the cache!
In the end, we disabled the cache and reclaimed the oodles of disk space for other projects.
I guess I'm saying... static content is great for smaller sites, but needs some serious resources when implemented on large scale sites.
Richard Allsebrook - 13th December 2003 09:46 - #
I've been experimenting a bit with using Ant for website deployment. The source tree:
/content/transformations/wwwroot.htaccessand stylesheets.Now, in the build.xml for Ant some targets were defined:
buildclear-buildvalidate-builddeployNow, starting
would build the entire site to a
/builddir, validate all.xhtmlfiles in the build and on success upload the build by FTP to the server.All steps --building, validating, deploying-- were performed lazy by Ant. I have yet to experiment some more with this approach, but for this test, everything worked like a charm.
Martijn Vermaat - 13th December 2003 13:20 - #
disk space is cheap. It scales well....
Processing power, and software licenses (SQL Server) don't scale..
ssn - 13th December 2003 17:40 - #
Surely it'd be possible to cache a PHP site with Apache easily with Squid too. It's not rocketscience but I must admit that you need to be more intelligent than a Microsoft shop developer to configure it.
Peter Bengtsson - 13th December 2003 17:59 - #
Yup, disk space is cheap, but in the end, it was cheaper to upgrade the sql server box (a bit more RAM and a few query tweaks) to keep up with demand and turn off the cache.
As for the licence, we had the dual-processor doesnt-matter-how-many-connect(tm) version so it was all bought and payed for.
I was'nt trying to infer this was a catch all statement - some large sites work fine with static content.
Richard Allsebrook - 13th December 2003 18:24 - #
Richard Allsebrook - 13th December 2003 18:27 - #
I was planning to mention caching as a third option in my blog entry but it slipped my mind when I sat down to actually write it. Caching is a great way of combining the benefits of static and dynamic publishing - in fact, this blog uses caching to reduce the load on the database. If you view source on the front page or any of the entry pages you'll see a comment at the bottom of the HTML stating the time that the page was last dynamically created - the front page recreates about once a minute (to keep the "X hours X minutes" stuff up to date) while the entry pages are regenerated when a new comment is added.
There's one big disadvantage to this: the list of recently updated blogs down the right hand side is cached for the entries, so an entry page that hasn't had a comment in a while will display an out of date blogroll. The smartest fix for this would be to remove the blogroll from entry pages (where it isn't really needed) - I've been meaning to do that for nearly a year and just haven't got round to it yet.
Simon Willison - 13th December 2003 18:40 - #
P01 - 13th December 2003 22:49 - #
The benefits of a static CMS goes deep into the land of where you can deploy your site, the resources it will gulp to run them, and the nescessity of the complexity in creating them.
For instance, creating a site with dynamic content that carries a lot of documentation can not be easily shipped on a product CD if you so wanted. Other times you have low bandwidth or resources for your webserver, say for small but important projects. And sometimes installing and patching and updating a host of various technologies is overkill for the content you're producing.
Blatant plug: All these things were the very reason I created the xSiteable project for advanced but static web sites. I can be slashdotted anytime, even on my little webserver, without any problems. :)
Alexander - 14th December 2003 01:57 - #
You could also use server side includes with your static pages. Some people do this with MT, to minimise the number of page rewrites needed.
I've got one site which is made up of a bunch of hand coded static pages which then use SSI to insert static files published from their products/content CMS. Doing it this way meant I only needed to have the CMS publish simple tables, no complicated template guff. Also, I can combine and re-use little bits of content as needed without having to remember every place where it's used.
Simon, you could put your blog roll into a separate file, and then SSI it into every page that needed it.
Eric Scheid - 14th December 2003 07:36 - #
<object data="/blogroll.htm" type="text/html" ></object>. However that way the external page is not included but embedded which could cause some problems of accessibility.P01 - 14th December 2003 09:46 - #
Kurt Wiersma - 15th December 2003 17:28 - #
www.indymedia.org ran off some code that would cache lots of the different components of the page (while leaving others dynamic), including all the index pages for posted articles. When the number of posted articles was somewhere around 200,000 (300,000 now), we started realizing something was amiss -- turned out every index for the entire site was being recreated everytime someone posted an article. Scaling issue, I guess. Reads were fast, though...
That site is not being migrated to Mir, Java-based software that's even more based around static publishing. It publishes to SSIs, so that it's reasonable to have up-to-date lists of recent posts in every page. But I'm sure they've learned from these scaling issues we've had with other software.
You still have to be real careful with the database queries, though. It's easy to write an unindexable query, and you'll never see the performance issues until it's too late.
Ian Bicking - 17th December 2003 07:53 - #
I have set up a blog for discussing ideas relating to a simple, 1-Man CMS which allows "simple folk" to create and modify their website.
This is just an idea so far and many other systems come close to what I envision but are PHP/MySQL web based.
The idea is to use a client based tool, XML files for content storage, XSLT for templating, HTA for the UI and FTP for publishing. We want to avoid large technical requirements (MySQL, PHP...) for the user.
By using simple standard technologies I hope to minimize internal application complexity and increase robustness while also allowing for ultimate flexibility for the webmaster. In short the app should be:
- For the user: Simple
- For the webmaster: Simple*
- For the developer: Simple
*If they know the stanard tools (XML, XSLT, XPATH, HTML)The basic formular is: XML + XSLT = My Website
PS: Why is this blog demanding XHTML in the comments???Jack - 19th September 2005 21:45 - #
I have set up a blog for discussing ideas relating to a simple, 1-Man CMS which allows "simple folk" to create and modify their website.
This is just an idea so far and many other systems come close to what I envision but are PHP/MySQL web based.
The idea is to use a client based tool, XML files for content storage, XSLT for templating, HTA for the UI and FTP for publishing. We want to avoid large technical requirements (MySQL, PHP...) for the user.
By using simple standard technologies I hope to minimize internal application complexity and increase robustness while also allowing for ultimate flexibility for the webmaster. In short the app should be:
- For the user: Simple
- For the webmaster: Simple*
- For the developer: Simple
*If they know the stanard tools (XML, XSLT, XPATH, HTML)The basic formular is: XML + XSLT = My Website
PS: Why is this blog demanding XHTML in the comments???Jack - 19th September 2005 21:46 - #