Wednesday, October 29, 2014

Contained FHIR Resources

Currently under discussion in the FHIR community is what to do with contained resources, and how they could be searched (or not).  I argue that they should be able to be searched, and that FHIR should specify how they can be returned in a standard way, but also that NOT every FHIR conforming system need be able to search for contained resources.  At present, to retrieve any type of contained resource in FHIR, you must perform a query that will return its container, and you cannot directly query the resource type of that resource.  You can chain a query into the resource type, but you have to know then, the association between the container and the contained resource.

But not every system will know that, and it may not very well have ever been considered by the system designer. In analytics, the analytic engines often pick up some pretty strange associations in the data that they process. They completely ignore any notion of what you or I would consider to be elegant information system design.  Even so, having found an association through analytics, now you want to be able to do something based on it.  You might use the presence of a particular fact to drive some sort of decision support, perhaps a dashboard, or something else.  So, you need to be able to get to instances of that fact without necessarily knowing the association.

"Wait!" You argue, "of course the analytics system knows the associations too."  And so it does.  For this design.  But the association discovered could be an important trigger that works with other systems that work with similar, but not identical data.  You find these sorts of associations published all the time in the medical literature: Presence of X leads to Y.  So even the analysis may be from somewhere else.  And you don't care about the container.  You care about finding all the instances of resource X so that you can prevent Y, or treat Y, or even simply reward the behavior associated with X and its related resources.  You might want to count all the Xs, or inspect them further to see if this case qualifies for some sort of intervention, or do something else with them.  What you need to do is not particularly important, just that you need to do something with these things.  So, if you cannot find a particular X, then you have data locked away that isn't all that useful.

Let's look at the mechanics of how this might be implemented:

Using an analogy, lets say you have a system representing people, drivers, cars, registrations, drivers, licenses, and a registration authority and a licensing authority.  Each could be represented as a separate resource and might have a separate identity. And a road trip resource might have a destination, and be associated with driver, a car and the people who are its passengers.

But what do you do about the hitchhiker that you might need to keep track of for the road trip, but don't much care about later?  In FHIR, you could create the hitchhiker as a contained passenger resource associated with the road trip.  After all, outside of the context of the road trip, he doesn't much matter, but he should be listed as part of the resources associated with the road trip.

Now, as resources go, any person resource can be searched, and you can locate any person who has been on a road trip from Chicago to Milwaukee.  Or can you?  Actually you cannot.  Because if John picks up a hitchhiker (Keith) for a road trip from Chicago to Milwaukee, and decides he doesn't care to track Keith as a person, he could just contain Keith as a person resource in the road trip resource using the passenger attribute of the road trip resource.

So far, so good.  But now look at what happens when John goes to do some queries about how his car is being used.  The way that FHIR works today, we would be able find all the drivers, all the road trips associated with the car registered in his name, but what he cannot find is all the passengers who have been in his car.  That's fine, you say, we didn't care about that passenger enough to make it a full blow resource anyway.  And there is some benefit here, because you don't have to return that contained passenger as a resource.

BUT: What happens when you query for road trips?  The query specification says that you are permitted to support chaining of queries across associations.  So John could arguably query for a road trip where Keith was listed as a passenger.  But he could not query for passengers named Keith who have been on road trips with him and get that one where Keith was listed as a contained resource.  So even though the contained resource couldn't be returned, it still needs to be indexed so that the chained search works right.

Implicitly, you can view a containment as a particular kind of association (an aggregation).  The container had the contained resource embedded within it.  And the contained resource has this implicit association with one and only one container.  So what if we said that "_container" acted as if it was an attribute on the Any resource that resolved to the resource that held a contained resource.  If we did that, we could resolve a lot of challenges with search on contained resources.

If you wanted to search for resources that were only contained, you could do that by ensuring that _container was valued.  If you wanted to search for resources that were not contained, you could do that by ensuring _container was not valued.  If you wanted to search for resources that were contained by a resource of a particular type, you'd use _container:ResourceType, just as you would for other associations.

The mechanics of how this would work are pretty clear, and for those use cases for when you want to access "contained" resources, now you have a way to specify that they be present in the search results.  For a contained resource, you could ask to _include resources that appear along the _container association.  If I lost you, let me put it this way.  When John goes to search for people who have been on road trips in his car, he can say GET [base]/Person?_container:RoadTrip&_include=Person._container and he would get a bunch of Person resources, and where they were contained, he would get their containers (which would happen to include those Person resources)*.

Now, an analytics system that simply wants to keep track of what Keith is doing really doesn't care whether the activity is RoadTrip, or Sleeping.  It just wants to find Keith and see what he is up to.  It doesn't care about all the associations, it just needs to report to Keith's wife what he is doing.  Now it can do so without any prior knowledge about how John is keeping track of what he does with Keith.

   -- Keith

* Looking at this a bit more deeply, I can see that GET [base]/Person?_container:RoadTrip is pretty useless without the _include=Person._container.  I'm not sure where to go with that.  We could determine that use of _container required _include, we could make that automatic, or we could come up with something else.  For now, I'm not really worried about it, since it doesn't much matter for the immediate discussion.


0 comments:

Post a Comment