Mapping to datasource

Mapping to datasource
In PAEE, this is mainly concerned with relational databases, but can apply to other datasources like http req/resp, messaging etc
Deals with data source layer
Separates the mixing of db logic and application logic

Good design

Separate database code from other code (business and presentation)
- provide database classes to access the db
- all sql in same place
- factories for each db
- connection pools
- Error handling
  - sql exceptions isolated to this layer
  - wrap as runtime exception

Architectural patterns

drive the way in which the domain logic talks to the database
choice that's strongly affected by how you design your domain logic.
choice you make here is far-reaching for your design and thus difficult to refactor
Issues with database usage
- Most people dont know how to write good sql, and should be tuned by experts (DBAs)

Active Record

Simple, for uncomplicated structures
Each entity represents a table row and knows how to connect to the DB, persist itself to the DB and find other sibling objects from the DB, using static methods.
- The entity has simple business logic
Some domain logic might be in here or the calling code (ie transaction scripts)
data structure (Fields) should match exactly the database table
Good for
- CRUD ops
- Derivations and validations based on a single record
- Work for single table
Issues
- it has an enormous amount of responsibilities, breaking the SRP
- As complexity grows, and we refactor the domain objects into smaller objects, we lose the one-on-one mapping of objects to table rows and the pattern breaks.
- logic and sql mixed, so hard to tests
- couples the object design to the database design. This makes it more difficult to refactor either design

Gateway

The gateway represents a table in the DB and takes care of connecting, persisting and finding objects in the DB.
simple as possible, generic, mechanic, reusable, maybe even generated. They should know nothing about business logic.
make a direct mapping from table column to object property
Usually stateless
Types
- Table Data Gateway
  - one instance handles all rows in table.
  - used as mapping table to gateway
    but you can have a single Table Data Gateway that handles all methods for all table
  - works well with Table Module.
  - Good place to have stored procedures
  - Can return appropriate domain object if accessed via domain layer
    issue - you then have bidirectional dependencies the domain objects and the gateway
    Fix - use a data mapper
  - issues
    How it handles returning 1 or many data from find queries
    use of data transfer object
- Row Data Gateway
  - handles single row in table. For each row a new instance is created.
    acts as an object that exactly mimics a single record
  - contains only data access logic, it does NOT contain any Domain Logic.
  - works well with Transaction Script.
    not worth using with domain model, instead use active record
  - With Metadata Mapping, It can be used for creating automatically built database access code.
  - Use a separate class for queries, that returns a this gateway
Issues
- If changes are done to either the model or the DB, those changes need to be reflected in its counterpart.
- fix - avoid changes in lots of places, by making the row gateway match exactly the db structure

Data Mapper

The data mapper makes the translation between a set of data and one or several objects and from an object into one or several tables.
It can, and should, use some kind of entity management system to cache the objects that were already extracted from the DB.
the concrete task of connecting, finding and persisting a set of data in the DB should be left to a data gateway.
- implement a data mapper as a decorator around a data gateway.
Isolates the domain and data source layer, by having indirection layer, which maps the domain objects to the database tables
- Allows both layers to vary independently
Useful for more complex domain logic
- So when trying to use patterns and oo, does not fit well with structure of database tables, this allows you to separate the domain and data layer, to allow domain to take advantage of patterns etc
- This mean data needs to be transferred between between domain and data layer, which needs transformed to meet each layers needs
whole layer of Data Mapper can be substituted, either for testing purposes or to allow a single domain layer to work with different databases
Mappers need a variety of strategies to handle classes that turn into multiple fields, classes that have multiple tables, classes with inheritance, and the joys of connecting together objects once they've been sorted out.
to inserts and updates, the database mapping layer needs to understand what objects have changed, which new ones have been created, and which ones have been destroyed. It also has to fit the whole workload into a transactional framework.
- The Unit of Work

Object-Relational Behavioural Patterns

Repeatedly persisting and loading the same object to/from the DB can have a severe impact in performance
need to prevent concurrently changing and persisting different instances that represent the same data
Need to keep track of the objects that are loaded, in order to:
- Not read an object from the DB more than once
- Not write an object to the DB more than once
- Always use the same object version for all editing operations (prevent concurrency)
- Limit the entity nested graph loaded

Unit of work

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.
keeps all this information in one place
- encapsulated and dont have to worry about it
platform for more complicated situations, such as handling business transactions that span several system transactions using locking
Avoids making to many calls to the DB
- Make changes to objects (in memory) and submit all changed objects in one or few calls
let it know about objects you've read so that it can check for inconsistent reads by verifying that none of the objects changed on the database during the business transaction.
Implementation
- when it comes time to commit, the Unit of Work decides what to do. It opens a transaction, does any concurrency checking and writes changes out to the database
  - Good
    Application programmers never explicitly call methods for database updates. This way they don't have to keep track of what's changed or
    worry about how referential integrity affects the order in which they need to do things
- create a snapshot of the entity when loading it and comparing each managed entity to its snapshot when flushing the managed data into the DB, so that we only persist the modified objects.
- Object is loaded is registered as clean, modified state changed to dirty, only write dirty state objects to DB
- Good
  - allows for transparency (implicit persistence of changed entities)
  - decoupling of the entity from the UoW
- Bad
  - adds some computation overhead
- How to keep track of objects that have changed
  - the caller doing it (Called registration)
    Any objects that aren't registered won't be written out on commi
    Can lead to forgetfulness, but give flexibility to make changes that dont need to be committed
    Better to have a copy do such thigns
  - by getting the object to tell the Unit of Work (object registration)
    place registration methods in object methods. Loading an object from the database registers the object as clean
    the Unit of Work needs either to be passed to the object or to be in a well-known place
    the developer of the object has to remember to add a registration call in the right places.
    Good place for ASP
  - Unit of work controller
    Unit of Work handles all reads from the database and registers clean objects whenever they're read. Rather than marking objects as dirty the Unit of Work takes a copy at read time and then compares the object at commit time
    Good
    it allows a selective update of only those fields that were actually changed
    avoids registration calls in the domain objects
For testing
- providing a transient constructor that doesn't register with the Unit of Work
- providing a Special Case Unit of Work that does nothing with a commit
When
- update order when a database uses referential integrity.
  - Most of the time you can avoid this issue by ensuring that the database only checks referential integrity when the transaction commits rather than with each SQL call
  - In smaller systems this can be done with explicit code that contains details about which tables to write first based on the foreign key dependencies.
  - In a larger application it's better to use metadata to figure out which order to write to the database.
- minimize deadlocks
  - If every transaction uses the same sequence of tables to edit, you greatly reduce the risk of deadlocks.
  - to hold a fixed sequence of table writes so that you always touch the tables in the same order.
- handling batch updates
  - a batch update is to send multiple SQL commands as a single unit so that they can be processed in a single remote call.
    For DB chagnes are sent in rapid succession
  - Example impl
    JDBC -> to batch individual statements
    manual -> building up a string that has multiple SQL statements and then submitting as one statement
Alternative
- to explicitly save any object whenever you alter it.
  - may get many more database calls than you want since
- can leave all your updates to the end, by keeping track of all the objects that have changed.
  - use variables in your code for this, but they soon become unmanageable once you have more than a few
- give each object a dirty flag that you set when the object changes. Then you need to find all the dirty objects at the end of your transaction and write them out
  - Depends on how easy it is to find the dirty flags
  - Domain model hierachy is hard to traverse compared to a single hierachy

Identity Map

Ensures that each object gets loaded only once by keeping every loaded object in a map. Looks up objects using the map when referring to them
- During the sessions (req/resp or message processing)
identity map is the structure that is used by the unit of work to keep track of the loaded entities.
Load once from database, read and update many in memory
a caching mechanism, but its primary goal is to maintain unique entities in memory.
Improving consistency
- ie updating the same row twice inconsistently
Not for performance
Good
- Only load once from database
- acts as a cache for database reads
- Avoid update conflicts within a singlesessions
Bad
- Can store a lot of data
- If there are changes in database, will be dealing with out of date data in map.
  - Deal with only inmemory data
  - Reload into map, when transaction is done or after time and inmem data has not be changed
- If data is lost in map (ie app crashes), then database is not updated, thus db has stale data
  - Can use persistent cache
when to use
- If for immutable objects, may not need it
  - no need to worry about modification issues
  - but still useful as cache and helps to prevent the use of the wrong form of equality test
- Dont need for dependent mappings
  - as persistance controlled by parent
- Not helpful for updates that cross sessions
Implemetation
- have a series of maps containing objects that have been pulled from the database
- an isomorphic schema, you'll have one map per database table.
- When you load an object from the database, you first check the map. If there's an object in it that corresponds to the one you're loading, you return it.
  - If not, you go to the database, putting the objects into the map for future reference as you load them
- Keys of map
  - primary key of the corresponding database table
    This works well if the key is a single column and immutable.
  - A surrogate primary key fits in very well with this approach because you can use it as the key in the map.
  - should be simple type, so comparisons are easy
  - encapsulating different kinds of database key behind a single key object.
- Explicit or Generic
  - explicit Identity Map is accessed with distinct methods for each kind of object you need
  - ie findPerson(1)
  - Good
    Compile time checkign
    explicit interface
    it's easier to see what maps are available and what they're called.
    Specific methods for each table/map, tailor made for application
  - Bad
    Need to constantly update methods, and maps when new table exists
  - A generic map uses a single method for all kinds of objects, with perhaps a parameter to indicate which kind of object you need,
    ie find("Person", 1).
    Good
    Can be resuable, and dont need to create new one every time a new table or map is created
    Bad
    Too abstract, specific methods might meet your needs
    Runtime (ie reflection)
    only use a generic map if all your objects have the same type of key.
- Amount
  - one map per class and one map for the whole session
  - A single map for the session works only if you have database-unique keys
    Once you have one Identity Map, the benefit is that you have only one place to go and no awkward decisions about inheritance.
    you don't have to add new ones when you add database tables.
  - one map per class/table
    for multiple maps
    issues - need transactional protection
    if your database schema and object models are the same
    If different schemas, better have the maps based on the objects
  - Issues with inheritance
    Keeping them separate can make polymorphic references much more awkward, since any lookup needs to know to look in all maps
    use a single map for each inheritance tree, but that means that you should also make your keys unique across inheritance trees
- Location
  - Easy to find
  - Tied to the process context you are working in
  - ensure that each session gets it's own instance that's isolated from any other session's instance
    put the Identity Map on a session-specific object
    Place in unit of work
    Place in a registry thats tied to the session
  - Use of multiple maps means need of transactional protection
    Can use object database as a transactional cache
  - Read only data
    No need to worry about sharing across sessions
    In performance-intensive systems it can be very beneficial to load in all read-only data once and have it available to the whole process
    read-only Identity Maps held in a process context and your updatable Identity Maps in a session context
    rarely updated
    Can treat as read only
    Can flushing the process-wide Identity Map and potentially bouncing the server when it happens

Lazy Loading

An object that doesn’t contain all the data you need but know how to get it.
- Deals with rich data models and loading object hierarchies
An object does not contain all of the data you need but knows how to get it
When we have many nested entities, when loading one entity from the DB, we may end up loading a huge set of data that maybe we don’t even need.
using lazy loading we store a reference to the nested entity, instead of the entity itself.
- When we first access the nested entity we load the entity from the DB.
Good
- Save time and resources.
- Avoids loading lots of data at the start, but only when needed
Bad
- Adds complexity, needs a good case to use
- “ripple loading”, aka “N+1 problem”, which happens when we have a list of entities with nested lazy loaded entities. When we loop through the list, we will be loading the nested entities one by one, instead of all in one go.
- Fix - use eager loading (pre-load the nested entities) when loading lists of entities.
- Lazy Load is that it can easily cause more database accesses than you need.
- Fix - is never to have a collection of Lazy Loads but, rather make the collection itself a Lazy Load and, when you load it, load all the contents.
  - limitation of this tactic is when the collection is very large
How
- lazy init
  - simple
  - every access to the field checks first to see if it's null. If so, it calculates the value of the field before returning the field
  - have to ensure that the field is self-encapsulated, meaning that all access to the field, even from within the class, is done through a getting method.
  - Using a null to signal a field that hasn't been loaded yet
    unless null is a valid value, need a special case to use as check to start lazy loading
  - Bad
    Dependency between object and database, works well with active record and table/row data gateway
- Virtual Proxy
  - Used with data mapper
  - layer of indirection
  - object that looks like the object that should be in the field, but doesn't actually contain anything. Only when one of its methods is called does it load the correct object from the database.
  - Good is replacable for the object that should be there
  - Bad - it is not the same object, can have identity issues (ie for comparisons)
    need to override equals
  - Bad - lots of proxies might be needed
  - can have more than one virtual proxy for the same real object. All of these proxies will have different object identities, yet they represent the same conceptual object.
  - https://www.educative.io/answers/what-is-the-virtual-proxy-design-pattern
- value holder
  - object that wraps some other object. To get the underlying object you ask the value holder for its value, but only on the first access does it pull the data from the database.
  - Bad - the class needs to know that it's present and that you lose the explicitness of strong typing
    fix - ensuring that the value holder is never passed out beyond its owning class.
- ghost
  - is the real object in a partial state
  - When you load the object from the database it contains just its ID. Whenever you try to access a field it loads its full state
  - an object, where every field is lazy-initialized in one fell swoop, or as a virtual proxy, where the object is its own virtual proxy
  - instead of loading all data in one go, can load in groups instead
  - you can put it immediately in its Identity Map
    avoid all problems due to cyclic references when reading in data
  - Inheritance often poses a problem with Lazy Load.
    , you'll need to know what type of ghost to create, which you often can't tell without loading the thing properly
- If data is quick to get hold of or commonly used, can fetch that first
- Lazy Load is a good candidate for aspect-oriented programming. You can put Lazy Load behavior into a separate aspect, which allows you to change the lazy load strategy separately as well as freeing the domain developers from having to deal with lazy loading issue
- have separate database interaction objects for the different use cases.
  - ie one with eager and one with lazy loads, and let code decide which to use (ie via strategy)
- Have a range of degrees of laziness
  - you really need only two: a complete load and enough of a load for identification purposes in a list. Adding more usually adds more complexity than is worthwhile.
When to use
- deciding how much you want to pull back from the database as you load an object, and how many database calls that will require
- pointless to use Lazy Load on a field that's stored in the same row as the rest of the object, because most of the time it doesn't cost any more to bring back extra data in a call, even if the data field is quite large
- only worth considering Lazy Load if the field requires an extra database call to access.
- deciding when you want to take the hit of bringing back the data
- good idea to bring everything you'll need in one call so you have it in place, particularly if it corresponds to a single interaction with a UI.
- The best time to use Lazy Load is when it involves an extra call and the data you're calling isn't used when the main object is used.

Reading Data

Its often better to pull out more rows and filter out the ones not needed then issuing one query for each row.
- never do repeated queries on the same table to get multiple rows
- although you have to be wary of locking too many rows with pessimistic concurrency control
To avoid mulitple trips to db
- use joins so that you can pull multiple tables back with a single query
- that databases are optimized to handle up to three or four joins per query.
clustering commonly referenced data togethe
careful use of indexes
the database's ability to cache in memory.
you should profile your application with your specific database and data.
When reading in data I like to think of the methods as finders that wrap SQL select statements with a method-structured interface

Object-Relational Structural Patterns

used when mapping between in-memory objects and db tables
- Solves the difference in representation issue
  - Objects handle links by storing references that are held by the runtime of either memory-managed environments or memory addresses.
  - Relational databases handle links by forming a key into another table.
- objects can easily use collections to handle multiple references from a single field, while normalization forces all relation links to be single valued.
  - leads to reversals of the data structure between objects and tables.
  - An order object naturally has a collection of line item objects that don't need any reference back to the order.
  - the table structure is the other way around—the line item must include a foreign key reference to the order since the order can't have a multivalued field.
Mostly used for Data Mapper
- some for row gateway & active record
- not used with table gateway
Mapping 1-1, 1-N, N-N relationships are simple to do with ORM
If we need to query by the members of the value objects, we can explode the VO members into columns of the entity object to which it belongs to.
If we don’t need to query by the members of the value object, we can store it as a string in one column of the entity. This string can be any type of object serialization format.

Identity Field

Saves a database ID field in an object to maintain identity between object in memory and database row.
Store the primary key in an object
One to One Relationship.

Foreign Key Mapping

Maps an association between objects to a foreign key reference between tables
Lets say you want save an record object which is linked to an album object.
changes to album objects effects its records.
- Imagine I am deleting an album, all records linked to that album needs to be deleted as well.
One to Many Relationship.

Association Table Mapping

Saves an association as a table with foreign keys to the tables that are linked by the association.
Many to Many Relationship.

Dependent Mapping

Has one class perform the database mapping for a child class.
Some of the objects naturally appear in the context of another object and they are not referenced by any other table.
- So you can save that objects with their par rents mapper.

Embedded Value

Many small objects make sense in OO system, may not make sense in as a table in database.
Embedded Value maps an object into several fields of another object's table.
example employee salary
you would not create a currency table which contains employee salaries amount and their currency types.
Instead you would add them as 2 other columns as part of employee table.

Serialized LOB

Saves a graph of objects by serializing them into a single large object (LOB), which it stores in a database field. (Check BLOB or CLOB).
That is used on network communication or to store an object with its state and use it in the future sometime.
In these days, usually XML/json format is used for that.
Good
- save a lot on round trips to db

Single Table Inheritance

Relational database do not support inheritance, so when mapping your objects you need to find a way of saving them.
Single table inheritance, uses a single table which contains all fields for all sub types and sub types only uses the columns/fields those are related to them.
First choice, use other two when issues with irrelevant and wasted columns affect performance
good
- Minimise the joins, which can add up when processing inheritance structure in multiple tables
- easy to modify
Issue
- Only some columns are going to be used by a subset of the class hierarchy
  - leads to non normalised table
  - wasting space
    many databases do a very good job of compressing wasted table space.
  - its size, making it a bottleneck for accesses.

Class Table Inheritance

Represents an inheritance hierarchy of classes with one table for each class.
Maps each class to a table and each table has the columns of the parent classes.
example
- Tennis player has a table for itself, basketball player has a table for itself and there is a table called player which contains common fields for both player types. So tennis player uses player table and tennis player table to store data.
Good
- hierachy changes have little impact on the schema
Issues
- You need to manage two tables to load one object data.
- Need to use joins which is not performant

Concrete Table Inheritance

Represents an inheritance hierarchy of classes with one table for each concrete class.
Example
- Tennis player has a table for itself, basketball player has it has a table for itself. All data related to tennis player object are stored in tennis player table.
Issues
- each change to main class ( player class ) you need to make changes to each player table on database.
- brittle to chagnes
- the lack of a superclass table can make key management awkward and get in the way of referential integrity, although it does reduce lock contention on the superclass table
Good
- Avoids joins

Inheritance Mappers

A structure to organize database mappers to handle inheritance hierarchies.
You need to minimize the amount of code to save and load the data to database. And you need to manage Super class - Sub class loading and saving. Those can be done by inheritance mapper.

Object-Relational Metadata Mapping Patterns

Metadata Mapping

Query Object

Repository

PreviousLayering/organisation of code NextSession State

Last updated 2 years ago

Was this helpful?