Docker Support in Visual Studio 2017

Having experimented with Docker on Windows about a year ago (on a Technical Preview of Windows Server 2016), I thought I’d take a quick look at how container support has evolved with the latest version of the operating system and dev tools. At the time the integration of Docker into Windows was fairly flaky (to be expected for such a bleeding edge technology), but I was able to get a relatively large Windows-dependent monolith to run inside a set of Windows Containers. Most of this involved crafting Dockerfiles and running docker commands manually (and eventually scripting these commands to automate the build process).

Docker tools are now included in Visual Studio, so you can enable Docker support in a new (web) project when you create it:


Alternatively, you can add Docker support after the fact from the project context menu:


Either way, you get the following:

  • a default Dockerfile in your project. This is for packaging a container image for your service automatically.
  • a default docker-compose.yml file in your solution. This is for orchestrating the topology and startup of all of the services in your solution.
  • a docker-compose.ci.build.yml file in your solution. You can actually spin up a “build” container in order to run the dotnet commands to restore Nuget packages, build and publish (stage) your binaries and dependencies. This is great because now even the build process will be consistent across all developers machines (and the central build system), so in addition to eliminating the “runs on my machine” problem, it also alleviates any “builds on my machine” problems, where developers might have different tool versions or environment setup.

The first time you run, there’s a significant delay while the appropriate image files are downloaded, but this is a one-time hit.

Your service image contains a CLR debugger, which Visual Studio can hook up to, so you can step through your service code.

What’s interesting (and not a little surprising) is that if you’re targeting .NET Core, your image will target a Linux base image by default. Docker for Windows also defaults to running containers in a Linux Hyper-V VM, so when you hit F5 the container will run your .NET Core service on Linux, and you’ll be remote debugging into a Linux machine.

All very cool stuff, and works well right out of the box without too much fuss. Of course, you can also do all of this manually from the command-line if you prefer to develop in a lightweight editor like VSCode.


Microsoft Azure Hybrid–At Last!

From Reuters: http://www.reuters.com/article/us-microsoft-azure-idUSKBN19V1KQ

I think many enterprises have been waiting for this for a long time, and will trigger a significant uptick in Azure usage. There are a couple of things about the cloud that instill resistance for enterprises:

Privacy/Security/Regulatory. Especially in finance, and especially in European jurisdictions, there are often regulatory requirements around where data can physically reside, so in some cases the shared public cloud isn’t even an option. Even without regulations, putting sensitive data in the cloud has made enterprises nervous.

Development cost. Unless you’ve demonstrated spectacular foresight and discipline with your application architecture (and you haven’t), there’s going to be some redesign work required in order to use the cloud effectively. That’s going to take a while to build and test, and you’re going to be paying for your cloud infrastructure through all of that as pure overhead until you go live.

The ability to run some parts of your applications on the cloud and some on premises, yet still use the same tools, patterns and skillsets across both, largely dismisses these two hurdles. The holy grail is that you can keep your sensitive data and services locally on premise, while still taking advantage of the ability to scale up public-facing front-end applications, and for development and testing you can utilize a “local Azure”, running on hardware you own, that looks and feels like the shared public version, without incurring a significant cost overhead. When you’re happy with your solution, you can easily move the less sensitive parts to the public cloud, because it’s the same animal you’ve been using on-premise.

I’m looking forward to experimenting with Azure again to see how flexible this hybrid approach is in reality.


Disappearing Scrollbar in Edge

This has been bugging the hell out of me lately. I’m not sure when Edge started doing this, but it’s really quite annoying if you’re accustomed to using the vertical scrollbar on the right of the browser to “page” down or up by clicking the areas above and below the actual scrollbar “thumb” (the bit you can drag to scroll quickly). It doesn’t seem to happen on every site, it seems to be consistent on sites with content that spans all the way to the right-hand edge.

When reading a long blog post or web page, I usually position the mouse pointer near the bottom of the scrollbar and click to page down through the article as I progress. Unfortunately, in Edge the scrollbar now entirely disappears after a few seconds of mouse inactivity, and doesn’t automatically reactivate on a mouse click – only when you move the mouse pointer completely off the area where the scrollbar used to be, and then back over to where it should be. I find myself constantly having to look over to the right in order to perform that little dance every minute or so. Very distracting. I guess I could use the keyboard for Page Up/Page Down, but on laptops nowadays you need to use the Fn key to get those, and I find a left-click more convenient.

I wonder who thought this would be a good idea? I suppose there’s some non-mouse, non-desktop scenario (mobile?) where it makes sense, but on desktops it certainly violates the Principle of Least Astonishment, and since I don’t see any options to turn the behavior off, I guess I’ll have to live with it or switch back to Firefox…


Graph Databases and Neo4j

My latest consulting project made heavy use of the graph database product Neo4j. I had not previously had an opportunity to look at graph databases, so this was a major selling point in accepting the gig. I had also suffered through some major pain with relational databases via object-relational mapping layers (ORMs), so I was keen to experience a different way of managing large-scale data sets in financial systems.

I won’t (can’t) for various reasons provide a lot of detail about the specific use cases on this project, but I’ll put out some thoughts on the general pros and cons of graph databases based on several months of experience with Neo4j. I think I’m going to be using (and recommending) it a lot more going forward.

What is a graph database?

A graph essentially consists of only two main types of data structure: nodes and relationships. Nodes typically represent domain objects (entities), and relationships represent how those entities, well, how they are related to each other. In graph databases, both nodes and relationships are first-class concepts. Compare this to relational databases, where rows (in tables) are the core construct and relationships must be modeled with foreign keys (row attributes) and JOIN-ed when querying, or to document-store databases, where whole aggregates model entities and contain references to other aggregates (which usually must be loaded with separate queries and in their entirety in order to “dereference” a related data point). Depending on your use cases these queries can be complex and/or expensive.

Why graphs?

Performance. If your data is highly connected (meaning that the relationships between entities can be very fluid, complex and are equally or more important than the actual entities and properties), a graph will be a more natural way to model that data. Typical examples of highly connected data sets can be found in recommendation engines, security/permission systems and social network applications. Since all relationship “lookups” are essentially constant-time operations (as opposed to index- or table-scans), even very complex multi-level queries perform extremely efficiently if you know which nodes to start with. Graph folks term this “index-free adjacency”. Anecdotally, over the course of the last few months, with a database of the order of a hundred million nodes/relationships, executing queries that sometimes traversed tens of thousands of nodes across up to a dozen levels of depth, I don’t think I saw any queries take more than about 100ms to complete. For large result sets the overhead of transferring and processing results client-side was by far biggest bottleneck in the application.

Query Complexity. Queries against graph data can be a lot more concise and expressive than an equivalent SQL query that joins across multiple tables (especially in hierarchical use-cases).

What are the downsides?

Few or no schema constraints. If you’re used to your database protecting you against storing data of the wrong type, or forcing proper referential integrity according to a strict data model definition, graph databases are unlikely to make you happy. Even a stupid mistake like storing the string version of a number in a property that should be numeric can lead to a lot of head-scratching, since the graph will happily allow you to do that (but won’t correctly match when querying). At first I thought this was a deal-breaker, but it’s not as big a problem as you’d think if you’re following test-driven disciplines and institute some automated tools for periodic sanity/consistency checks. You’re not waiting for a customer to find your query/ORM-mapping bugs at run-time anyway, right? Besides, a typical major pain point in RDMBS/ORM-based systems are schema upgrades as your rigid data model changes, especially when relations subtly change shape between versions.

Hardware requirements. Optimally, in production you’re going to want a dedicated multi-node cluster where each node has enough RAM to potentially keep your entire graph in memory, so budget accordingly.


Neo4j is one of many graph databases (this technology is very hot right now and new ones seem to pop up every week). It is widely used by many large corporations and they are well ahead of the alternatives in terms of market share. I don’t want to give a full detailed review of Neo4j in this post, but their main selling points are:

ACID compliance. Many NoSQL technologies sometimes can only offer “eventual consistency”.

High availability clustering. Neo4j recently introduced a new form of HA called “Causal Clustering”, based on the Raft Protocol, which I haven’t yet had a chance to evaluate. At first I read this as “Casual Clustering” and had visions of nodes deciding arbitrarily on a whim whether or not they wanted to join or pull out of the pool or respond to queries… The older form of clustering replicates the graph across multiple physical nodes, and nodes elect a “master” which will be the primary target of all write operations. Non-master nodes can be load-balanced for distributed read operations.

Integrated query language (Cypher). This is a very elegant and succinct (compared to SQL) pattern-matching language for graph queries. Statements match patterns in the graph and then act on the resulting sub-graph (returning, updating or adding nodes and/or relationships).

Web interface. The interface is very nice and offers great visualizations of graph results. You can issue and save Cypher queries, perform admin-level operations and inspect the “shape” of your graph (i.e. what node labels and relationship types you currently have).

Flexible APIs. There is an HTTP REST interface, but they also now offer a TCP binary protocol called BOLT which is much more efficient.

Native Graph Model. Neo4j touts the purity of the graph model over other alternatives which attempt to mix and match different paradigms (eg. documents, key/value store and graph). I don’t have any direct experience of such “multi-model” databases, but I plan to compare Neo4j with other graph database products on a project in the future, so stay tuned.

Extensibility. You can add your own procedures and extensions in Java (very similar to CLR stored procedures in SQL Server).

Support. I can only speak to the enterprise-level experience, but Neo4j are very quick to respond to support questions and really stick with you until the problem is resolved.

On the whole, I think graph databases in general, and Neo4j in particular, have a bright future ahead.

Anyone else using graphs yet?


Project.json Nuget Package Management

Recently we've been looking at the new approach to Nuget package management in .NET projects, dubbed Project JSON. This initiative started in the ASP.NET world, but seems to be becoming the preferred method for .NET projects to declare Nuget package dependencies. The new approach changes the way that packages are resolved and restored, and has some very useful advantages.


The larger context for looking at this was that we're starting to push several of our core framework assemblies into a separate source control repository, separate from the rest of the applications and modules that make use of those core components to create shipping products. The core components use Semantic Versioning ("product" releases are versioned according to Marketing!), are delivered as Nuget packages internally to our developers, and are no longer branched along with the Main codebase – they follow a “straight-line” of incremental changes.

We expect to gain a few benefits from this. First, developers will not have to spend compute cycles compiling the core components. Second, there will be a natural barrier to changing the core components. This may be viewed as an advantage or a disadvantage depending on your outlook, but the idea is that over time the core components should become and remain very stable (in terms of rate-of-change, not code quality), and "ship" on their own cadence. If core projects are all simple project references alongside the application code in the same solution, it's very easy and tempting to make small (sometimes breaking) changes to these components while working on something else, without necessarily fully considering the larger impact of those changes (or independently verifying the changes with new unit tests).

Perhaps a core architecture team will be responsible for the roadmap of core framework changes, and the resulting packages will be treated like any other third-party dependency by application and product developers. It's important that changes to the "stable" core components are carefully planned and tested, since many other components (and possibly customer-developed extensions) depend upon them. The applications and business modules that ultimately form shipping products are less "stable", in the sense that they can change quite frequently as customer requirements change.

However, there are also some fundamental implications to doing this. Developers still need to be able to step into core component code when debugging. Nuget packages support a .symbols.nupkg file alongside the actual binary package, and build servers like TeamCity can serve up those symbols to Visual Studio automatically when stepping into code from the package.

In addition, released software should be resistant to casually taking new (or updated) Nuget dependencies once shipped, since this can complicate "hotfixes" and customer upgrades. The Main branch, of course, representing the next major shipping version, could update to the latest dependencies more freely.

Nuget today

By default, Nuget manages package dependencies via a packages.config XML file that gets added to the project the first time you add a reference to a Nuget package. The name and version of the dependency (and, recursively, any dependencies of that package) is recorded in the packages.config file, and Nuget will use this information during package restore (which will usually take place just before compiling), to obtain the right binaries, targets and propeties to include in the compilation. In addition to the entries in packages.config, references are also added to your project file (e.g. the .csproj file for C# projects). Some packages are slightly more complex than simply adding references to one or more managed assemblies, however (for example those that may include and P/Invoke native libraries). In these cases, even more "cruft" is added to the project file, to manage the insertion of special targets and properties into the MSBuild process so that the project can compile correctly and leave you with something runnable in your bin folder.

Resolved packages are usually stored in the source tree in a packages folder next to the solution file, and you usually want to exclude these from source control to avoid bloating the codebase unnecessarily.

Another snag to watch out for is potential version conflicts between projects that call for different versions of the same dependency. In the packages.config file, it's possible to declare that you want an exact version of a dependency, but by default you're saying "I want any version on or after this one", so if multiple projects ask for different versions, Nuget is forced to add an app.config file to your project with the appropriate binding redirects, to allow for the earlier references to be resolved at runtime to the latest version.

When you update a dependency to a newer version, every packages.config file that refers to the old version, and every project file that contains references to assemblies from those packages, and every app.config that has binding redirects for those assemblies, needs to be updated. This produces a lot of churn in source control and actually can take a while on large solutions with many dependencies.

Clearly there are a lot of moving parts to getting this all to work, and Nuget actually does a great job in managing it all, but it can get quite fragile if you have many Nuget references - even more so if you also have manually-maintained targets and properties in a project file for other custom build steps, or if you have a combination of managed and native (e.g. .vcxproj) files that require Nuget references. Nuget doesn't yet do well with crossing the managed-native boundary with these references, so you're usually just using Nuget to install the right package, and then manually maintaining the references yourself in the project files. In this world it's easy to make a mistake when editing files that Nuget is also trying to keep straight during package updates.

Nuget future

With the new approach, things are a lot simpler and cleaner. There's really only one file to worry about – a JSON file called project.json that replaces your packages.config file, and contains the same package name and version specifications. The neat thing is that this one file is all that Nuget needs to resolve dependencies – in your project file you no longer even need assembly references to the individual libraries delivered by the Nuget package, nor any special .targets or .props file references. At package restore time, Nuget will build a whole dependency graph of the packages that will be used in the build process, and will inject the appropriate steps into the executed build commands as necessary. You'll get warnings or errors if there are potential version conflicts for any packages.

Package files are no longer downloaded to the packages folder next to your solution – instead by default they're cached under the user's profile folder, so packages now only ever really need to be downloaded one time, and you don't have to care about excluding the packages folder from source control, because it's no longer in your source tree anyway.

Another neat advantage is that your project.json file only really needs to contain top-level dependencies – any packages that those dependencies in turn depend upon do not need to be explicitly mentioned in the project.json file, but the restore process will still correctly resolve them.

But by far the biggest win for our scenario is project.json's Floating Versions capability. This allows us to define the "greater than or equal to" version dependency with a wildcard (e.g. 1.0.*), such that during package restore, Nuget will use the latest available (1.0.X) package in the repository (by default, a specific version specification calls for an exact match). Now, when our core components are updated by the core team (at least for non-breaking changes), literally nothing (not even the project.json file) needs to change in our application and product codebase – a recompile will pick up the new versions from TeamCity automatically. Of course, when we branch Main for a specific release, we will likely update the version specifications in those branches to remove the wildcard and "fix" on a specific "official" version of our core dependencies, so that when we work on release branches to fix bugs, we can't inadvertently introduce a new core package dependency.

The new approach is available in Visual Studio 2015 Update 1. Unfortunately there's no automated way yet (that I know of) to upgrade existing projects, so I've been doing the following, based on Oren Novotny's advice:

  1. Add an "empty" project.json file to the project in Visual Studio.

  2. Delete the packages.config file.

  3. Unload the project in Visual Studio and edit it. Remove any elements that refer to Nuget .targets or .props files, and remove any reference elements with HintPaths that point to assemblies under the old packages folder.

  4. Reload the project.

  5. Build the project. Watch a bunch of compilation errors fly by, but these will help to identify which Nuget packages to re-reference. On large codebases it's possible to end up with redundant references as code is refactored, so this step has actually helped us to cleanup some dependencies.

  6. Right-click the project and "Manage Nuget packages". Add a reference to one of the packages for which there was a compilation error. If you actually know the dependencies between Nuget packages you can be smart here and start with the top-level ones.

  7. Repeat from 5. until the build succeeds.

For a large codebase this is a lot of mechanical work, but so far the benefits have been quite positive. I may attempt to put together a VSIX context menu to automate the conversion if I get a chance, but I suspect that the Project JSON folks are already working on that and will beat me to it...