There seems no automatic way to keep a site valid with web standards unless you close it off to the rest of the world to contribute to it. I will not do that anytime soon.
There is: I’m doing it. Next Thursday will mark the one year anniversary of my switching to application/xhtml+xml as the content-type header for this site, for user agents that support it. Using that content-type forces Gecko engine browsers to refuse to render pages if they are not well-formed XML, so if a page is invalid I hear about it pretty quickly.
So how do you keep a frequently updated site with data from external sources and user comments valid? There are really only two things you need to do. Firstly, ensure that everything going IN to the system (entries and comments) is valid XHTML. I do that using a simple validation system for comments and a bookmarklet for my own entries. Secondly, any and all data from external sources (my blogroll from blo.gs, blogmark URLs added using a bookmarklet) needs to be entity-escaped before being displayed on the site. In my case, a call to PHP’s htmlspecialchars() function is all that’s needed.
I’m not saying my system is ideal—the need for well formed markup in comments is a major usability issue even on a site with an audience consisting mostly of web developers. But it’s certainly possible to operate a site in XHTML with frequent updates and user comments while staying valid at the same time.