There’s an important fix in .NET 4.6 related to Workflow Foundation’s bookmarking features. The mechanisms by which workflow instances are bookmarked, suspended and resumed are fairly complex, and I don’t think the details are very well documented. Having fought through some issues myself, and received some help via MSDN forums (thanks Jim Carley) and bug reports to Microsoft, here’s how I understand bookmarks to work.
First, what is a bookmark? As its name suggests, it’s a way to remember where you were in the workflow, so that you can continue where you left off if your workflow idles (and is subsequently evicted from the runtime) for some reason.
As it transpires, there seem to be two types of bookmark. There are those that arise from the enforcement of the ordering of the operations that define the workflow (Jim calls these “protocol bookmarks”). The Receive activity is a classic example of this type of bookmark, and you can “resume” them by name (indeed, that’s how the runtime ultimately “invokes” the workflow operation due to a message coming into the service endpoint). There are also bookmarks that are not so obvious when looking at a workflow design – they’re introduced by the workflow runtime for its own housekeeping purposes (“internal bookmarks”). The Delay activity is an example of this – the runtime creates a bookmark so that the instance can resume automatically at the appropriate time. The Pick activity is another, less obvious example; here the runtime needs a bookmark that gets resumed once the Trigger of a PickBranch completes, so that it can subsequently schedule the Action for that branch. Even if the trigger also contains protocol bookmarks, or if the Action part does nothing, the internal bookmark is still required.
When a message for a protocol bookmark (i.e. Receive) comes in, if there’s no active protocol bookmark that matches it, ordinarily the caller will see an “out-of-order” FaultException. However, if there are also active internal bookmarks, the runtime will wait to see if that internal bookmark gets resumed (which could potentially cause the protocol bookmark that we’re actually trying to resume to become active, in which case we can just resume it, transparently to the caller). If the internal bookmark does not get resumed, eventually a TimeoutException will be returned to the caller. Consider a pre-4.6 workflow such as the following (WorkflowA):
The protocol for this workflow is simple – we want StepA to happen first, then DoSomething (which for our purposes is assumed to not create an internal bookmark at any point), and then StepB to happen only after all that. For a particular instance of this workflow, let’s consider some operation call sequences:
I think these are all fairly intuitive and work as expected.
Now consider a workflow like this (WorkflowB):
Here we have a choice between two operations before the workflow instance completes. Here are our three test call sequences for this workflow:
Our first two scenarios actually work the same way in both workflows, but WorkflowB times out for the third case.
This is not intuitive and certainly not ideal, since a caller would have to wait for the timeout to expire, and wouldn’t actually be told the truth about the failure even after that.
Unfortunately with content-based correlation, this scenario is likely to happen if two separate callers attempt to open a workflow for the same correlated identity at about the same time. The first caller will succeed and create the workflow instance, the second caller will timeout. Ideally you’d want the second caller to immediately get the “Out-of-Order” exception, but you can’t guarantee that in the presence of internal bookmarks.
So, don’t use activities that use internal bookmarks. Right?
Well, they’re pretty much unavoidable if you want to have an extensible workflow framework. If you want to give domain experts the full range of activities with which to develop their workflows, banning Delay, Pick, and State are serious hindrances. Pick is especially useful for a “four-eyes” class of workflows whereby a request for an action to be performed must be reviewed and approved (or rejected) by a different user.
So, to the solution.
With 4.6 – the above workflows behave the same way as they do pre-4.6, by default. However, you can now take control of what the runtime does when it encounters a message for a Receive for which there is no active bookmark, while internal bookmarks are present. The new setting can be placed in the app.config (or web.config) file under appSettings:
<add key="microsoft:WorkflowServices:FilterResumeTimeoutInSeconds" value="60"/>
With this setting, you can specify how long the timeout should be in our “Competing Creators” scenario above. If you set it to zero, you won’t actually get a TimeoutException at all – instead you’ll get the same “Out-of-Order” FaultException you’d get in the “Premature” scenario. This seems to solve the problem nicely.