Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Object relational mappings are over-rated...

... At least according to Tim Bray:

Lots of architects have learned, painfully, that you usually can’t magick relational rows away behind object/class abstractions. The right way to think about a database is as a set of normalized tables that are designed to be addressed with SQL strings.

This brings in to focus something I’ve been thinking about for quite a while. I always try to abstract away the details of the data storage layer behind some kind of interface, so that none of my front end code ever has to make a direct database call, instead calling methods on my application objects with names like “get_user_by_id” or “get_blog_entries_for_date”. The idea is that I could, if I wanted to, completely switch out the underlying data source. I should also be able to write new front-end functionality that calls the existing application layer methods without having to change anything in the data access code.

There are a couple of problems here. Firstly, I’ve already got a data abstraction layer in the form of SQL. Although there are many discrepancies between implementations, basic SQL statements such as selects and joins should work on pretty much any database. Additionally, in my three years of building web applications I can’t think of a single time I’ve needed to change the data layer for an existing application.

Modern database features such as stored procedures and constraints allow you to move huge chunks of application logic that relates to data coherency in to the database itself. In fact, PostgreSQL even allows you to write stored procedures in third party languages such as Python or Perl. With carefully written stored procedures, much of the important application logic could live in the database along with the data itself—a concept comparable to object oriented programming where the data and the methods that can be run on the data are encapsulated together.

I certainly haven’t made up my mind one way or the other, and at the end of the day the best approach is the one that gets a project completed in a reasonable time frame and that results in easily maintainable code. Plenty of food for thought, in any case.

This is Object relational mappings are over-rated... by Simon Willison, posted on 1st January 2004.

View blog reactions

Next: Targets for 2004

Previous: Happy New Year!

12 comments

  1. The biggest problem of using advanced SQL features available in databases like PostgreSQL is lock-in. If you have all your logic in stored procedures and constraints, good look moving that application anywhere without major overhall.

    Like you said, an advantage of keeping logic in the database is it makes it equivalent to an independent layer of the program, complete with logic and data. This allows you to have two completely seperate applications access the data more easily.

    Calling stored procedures is a pain in many languages anyway, so that benefit may be reduced.

    On a side note: How do you use Python for stored procedures in PostgreSQL?

    -Bryan

    Bryan Helmkamp - 1st January 2004 22:57 - #

  2. Like you said, an advantage of keeping logic in the database is it makes it equivalent to an independent layer of the program, complete with logic and data. This allows you to have two completely seperate applications access the data more easily.

    Thanks for making my own key point much clearer! I also agree with your point about lock-in, but I do think it is often an over-emphasized problem - at some point you have to lock in to an underlying technology (usually the programming language you use) and locking in to an open source project is inherently less risky than a commercial one, where you have no chance of fixing any critical bugs yourself (or hiring someone to do it for you).

    To write Postgres stored procedures in Python see this article. It seems to work pretty well.

    Simon Willison - 1st January 2004 23:08 - #

  3. I have spent lots of brain time on this. It seems to me that as far as abstracting database access goes, these are your options:
    • SQL statements inline with other logic - no abstraction.
    • Functions wrapping SQL statements. They probably return a list or dictionary (to use Pythonic terms) and might do some validation of the arguments.
    • Objects wrapping data in the database. Most of the methods wrap a SQL statement. Most people use a 1:1 table->object mapping, but it has occured to me that's pretty limited. Multi-table JOINs are a fact of life, you might as well make your objects deal with it.
    • Use an external format of your own specification to define the schema and the data objects. Write a program to generate the SQL statements and the data objects. From this meta-schema you can rebuild the database, migrate between schema and generate your code. This was recommended to me by Dave Thomas, so it is verifiably pragmatic :). They might use this technique at Thoughtworks as well, I don't recall.
    What I've ended up doing is functions wrapping SQL statements. These are easy to 'test first by sight'. Everything else I've done ended up stalling in trying to write the tests first, which brings me to the conundrum I ran into: How much/what kind of unit testing/mock testing is appropriate for your data access layer?

    Adam Keys - 2nd January 2004 20:44 - #

  4. stored procedures are okay, but they seduce you, because they are, well, procedural

    model your data relationships properly, pay particular attention to primary and foreign keys, and do not be shy about defining domains and constraints

    a database is not a repository of whatever your application objects, methods, functions, processes, wrappers, and api calls might deem necessary; a well designed database should drive the design of application logic, not the other way round

    and as for portability, one word: views

    separation of application logic from data structures predates object-oriented programming by decades; it is heartwarming to see object-oriented programmers discover its merits

    rudy - 3rd January 2004 03:03 - #

  5. I don't know how many of you have read anything by Fabian Pascal, but he goes on at length about his distaste for the whole object-relational approach and ODBMS's in general. When I first read some of his stuff, it was a bit of a tough chew, but dealing with the hunk of bastardized code in the job I presently have has made all of his rantings a lot clearer.

    "There are a couple of problems here."

    There are a couple of other problems here as well. First, if you use an RDMBS at the foundation, you are likely to see what the ODBMS crowd calls an 'impedance mismatch'. This obviously is becasue objects don't relate (or map) to tables. It takes tons of code to get this to work and it's sloooowww! So they come up with the ODBMS designed specifically to facilitate this approach. But as it turns out, it's still slow (so somebody comes up with 'http://www.intersystems.com/match5/') to help the objects and their data remain persisent to avoid the time consuming process of traversing a tree like structure.

    But this brings up the potential problem of data being less than consistent. What a mess! Not to mention the fact that storing data in tree like structures also makes it real easy to store duplicate data in the database and potentially run into corruption issues later on.

    As a matter of fact, we ran into this just the other day and I didn't realize what it was exactly until just now!

    Ultimately, OR mapping may have a use in some areas, but IN MOST CASES, this IS NOT a good idea. Objects have their place in helping to ease (if not solve) 'that software crisis', but this should'nt be expanded to include the datastore.

    Cheers,

    BDKR - 5th January 2004 03:14 - #

  6. I think any object-oriented programmer worth his or her salt will create a ORM if they aren't already using one. When you start using an RDBMS in a OO program, you quickly want domain objects. When you create domain objects, you can't help but start to factor out some of the SQL generation and create higher-level data declarations, which is the same thing as an ORM.

    That said, many people are not OO programmers (or at least are willing to forsake that tradition at times). Working with an RDBMS certainly does not reward OO thinking, and working heavily with SQL even less so. The relational model is very different from the object model, and things like stored procedures and views do nothing to bridge that gap.

    I think there are real advantages of an RDBMS over an ODBMS. The RDBMS tends to have a highly explicit and robust data model, which is extremely important for data that is kept for the long-term. I think it's a worthwhile exercise to spend time thinking out the persistent data model, and an ORM can make that data model easily manifest in an OO world. In the process you eschew some of the relational features, but I find that to be an acceptable compromise.

    I agree that backend portability isn't that important (unless you are distributing your software). But domain objects are important, and a robust data model is important, and both of these are well served by an RDBMS + ORM.

    Ian Bicking - 11th January 2004 19:04 - #

  7. I see two issues here. There is the original statement about data abstraction and the later statements about ODBMS. I can not see how they belong in the same discussion. Data abstraction and data storage are two very different concepts.

    First of all ODBMS is not meant to replace RDBMS. It provides a persistent storage of highly complex data for large applications that need to save objects (simple desc). I will admit the ODBMS craze seems to have come and gone but it is still has a niche.

    I think you guys are a little off when it comes to data abstraction. I am aware that part of the motivation for abstraction is the ability to swap the 'dbms' but that is only part of the picture. The idea is that the data storage structure and the application code/structure will evolve in different ways and at different rates over the lifespan of the app. Keeping the two parts independent provides a certain flexibility when making changes. The simplest example is one already mentioned, the user validation/management functions/classes. Often there are changes to user management after a project leaves the beta stage

    Sujeevan Ratnasingham - 7th April 2004 01:48 - #

  8. Mmmh,

    have also spent a lot of time thinking about OR mapping and whether its worth it. I really feel like we have gone full circle.

    Early on we had dumb green screen terminals talking to a central mainframe - all the business rules where implemented by a few COBOL developers. There was a symbiotic relationship between these people and the company. The code was cryptic, the changes slow and the frontend not very user friendly.

    Next it was the user friendly fat VB/Delphi/C++ client with a database control or some embedded SQL). Then client-server became the big buzz word. The advantage being that the server formed a business API (and the server could talk to mainframes etc).

    After that it was working with a business objects that represented the rows (since debugging a 700 line stored procedures was such a nightmare) and having the application distributed across multiple machines (in the belief that this would give you more scalability - Corba vendors conned us). Also languages like java arrived which where going to solve the problems of memory leaks and other had to trace C++ type problems - (You still get memory leaks and multithreading problems with java - not to mention the fact that its slow)

    Also web frontends appeared since rolling out 1000 fat clients was such a nightmare with dll conflicts. But then there where the problems with browser compatibility and the browser wars. Staff turnover increased as y2k and dot com boom started.

    Now days I work with EJB and a slow compile build cycle but we have started developing some of our tools in php where the one page contains the sql, the business logic and the presentation layer.

    Many languages seem to be moving towards OO features, runtime errors and regular expressions (java=JUnit, java 1.4, PHP = PHP 5)

    It seems to me the secret is to start with a good database design, have a few individuals writing stored procedures and simple php simple libraries of validation/formatting etc business rules. Oh and a symbiotic relationship between developer and company. A bit like the mainframe days ;)

    Paul Birnie - 13th April 2004 18:22 - #

  9. The Best in Adult Toys.

    Sex Toys - 16th May 2004 19:59 - #

  10. Hey Simon, there's a spam comment on this post you should clean up.

    Keith - 22nd November 2004 18:49 - #

  11. Am looking the qualities(merits) of RDBMS over DBMS

    Juvenali - 1st September 2005 12:03 - #

  12. have also spent a lot of time thinking about OR mapping and whether its worth it. I really feel like we have gone full circle. It seems to me the secret is to start with a good database design, >>>>>>> me too :=) Good database design ENTITY > PARTY > ORGANIZATION > SALES_ORGANIZATION For starters in database modelling we all think we got to see everything and do the most normalized design ever so we go to the SALES_ORGANIZATION model.After that we soon realize that concrete model of ORGANIZATION's bring us just so many entityTypes we don't have enough resources to build it. For OO fans they go to the most abstract model where they could define their data objects as ENTITY and everything would became just so adaptable.They even build metadata editors for that purpose.But after they finish their application they realize that they have missed only a single point which is "performance".(yes they are slowwwwwwwwww) To have a balance between both worlds , the good data model(not best) will have PARTY table and have meaningfull attributeTypes which are practical.Such as there could be many to many relationships between countries and people in theory.But the fact is we don't recognize that in practice.We should look out for performance especially in our core objects such as PARTY which describes PERSON and ORGANIZATION. PS:Also it is kind of nonsense using entities for dataobjects just because we wanna reuse our code.With our reuse we give up using many tools that are avaible to us.

    Erdem Sahin - 6th March 2006 17:30 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2004/01/01/foodForThought

A django site