The first thing that needs to be decided is what defines the identity of a persisted entity? There are 3 choices:
1. Object Instance. Each object instance is unique. This is the default java implementation.
2. Business Key. Fields are chosen to define the identity of the object.
3. Primary Key. The primary key on the related database table is used.
With the object instance definition, every object that gets created is different then all other objects, even when the share the same data and represent the same row in the database.
Most (but not all) entities have a natural key: one or more fields that make that entity unique. An employee has a social security number. A magazine has a name and publication date. These fields are usually separate then the primary key, especially when hibernate is used. Under this scenario, 2 entities can share the same identity and represent 2 different rows in the database.
Using the primary key will ensure consistency between the identity of an object and the identity of the database row backing that object. The exception to this is when it has not yet been assigned a primary key.
At first glance, the definition based on the primary key is the correct one from a semantic point of view. However, there are scenarios in which the other 2 definitions make more sense.
Before choosing which definition to use, the implementation consequences should be taken into account. The definition will define how equals, hashcode, and in some cases compareTo should be implemented. If two objects have the same identity then equals should return true, hashcode should return the same value, and compareTo should return 0. The implementations of equals and hashcode must be consistent with each other. There is more leeway with compareTo.
The conversation on hibernate forums explains many different implementations and consequences here: http://forum.hibernate.org/viewtopic.php?t=928172&postdays=0&postorder=asc&start=0&sid=e34c88ebf24b90219be45fbc81752084
Strictly from a semantic point of view, the object instance definition is the least useful, but the easiest to implement (since it’s the default in java) and has the least implementation consequences.
The business key definition seems to have lost favor recently. It does offer better semantics then object identity, but is harder to implement (although some IDE’s can generate it for you).
Identity based on the primary key is the best from a semantic point of view, but has the most implementation consequences. Consider this example. A Person object is created. The id defaults to a value (usually null or 0). This is the first problem. If another Person object is created, it too will have the default id value. The 2 Person objects will then be considered equal, even if they represent 2 different people and will eventually be stored in 2 different database rows.
The second problem is the objects identifier will be generated and assigned when the object is stored. The java libraries assume the identity of an object is based on immutable fields, and therefore doesn’t change. If 2 objects are considered equal, then they will remain equal for the life of both objects. Since the id changes, this isn't true. More importantly, the hashcode will return different values.
The implications of this are mostly confined to HashSets and HashMaps. When an object is stored in either collection, a hashcode is generated. This value is used to assign the object to a bucket. When the id changes, so does the hashcode and bucket. If the object is looked up afterwards, it can’t be found.
The link above gave 3 solutions:
1. Switch to another identity definition.
2. Use a different identity definition if equals or hashcode are called before the object is persisted. It will need to store the definition used to ensure consistency for its life. A variation of the same definition can also be used.
3. Generate the id when the object is created. If there is a version field, hibernate can use it to distinguish between insert and update operations. If there isn’t a version field, the application must ensure the correct operation is used.
I’ll add two more options. First, don’t use HasSets. A List can be used instead in most cases. The application can ensure there are no duplicates in the list.
Second, have the HashSets and HashMaps rehash itself if the entities it contains change by implementing the observable pattern with HashMaps. When the object changes its identifier, it should notify the containing HashMap. The HashMap then rehashes that object.
Conclusion
There is no right answer. Which definition and implementation you use depends on the context. The best solution is to use the java default implementation (the object identity definition) as the default. If another definition is needed for a specific entity, then it can be overridden. The business key definition is the next best solution, since it is easier to implement.
What do you think? Are there other definitions? Should everyone always use the primary key definition? Don't forget to rate the post +1 Martin Fowler (brilliant) or +1 Paris Hilton (stupid).

No comments:
Post a Comment