Skip to content

Jimmy Bogard
Syndicate content
Strong opinions, weakly held
Updated: 1 min 52 sec ago

AutoMapper 3.2.0 released

Tue, 04/15/2014 - 15:28

Full release notes on the GitHub site

Big features/improvements:

  • LINQ queryable extensions greatly improved
    • ICollection supported
    • MaxDepth supported
    • Custom MapFrom expressions supported (including aggregations)
    • Inherited mapping configuration applied
    • Windows Universal Apps supported
  • Fixed NuGet package to not have DLL in project
  • iOS confirmed to work
  • ReverseMap ignores both directions (only one Ignore() or IgnoreMap attribute needed)
  • Pre conditions on member mappings (called before resolving anything)
  • Exposing ResolutionContext everywhere, including current mapping engine instance

A lot of small improvements, too. I’ve ensured that every new extension to the public API includes code documentation. The toughest part of this release was coming up with a good solution to the multi-platform support and MSBuild’s refusal to copy indirect references to all projects.

As always, if you find any issues with this release, please report over on GitHub.

Enjoy!

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Using AutoMapper to perform LINQ aggregations

Tue, 04/08/2014 - 18:41

In the last post I showed how AutoMapper and its LINQ projection can prevent SELECT N+1 problems and other lazy loading problems. That was pretty cool, but wait, there’s more! What about complex aggregation? LINQ can support all sorts of interesting queries, that when done in memory, could result in really inefficient code.

Let’s start small, what if in our model of courses and instructors, we wanted to display the number of courses an instructor teaches and the number of students in a class. This is easy to do in the view:

@foreach (var item in Model.Instructors)
{
    <tr>
        <td>
            @item.Courses.Count()
        </td>
    </tr>
}
<!-- later down -->
@foreach (var item in Model.Courses)
{
    <tr class="@selectedRow">
        <td>
            @item.Enrollments.Count()
        </td>
    </tr>
}

But at runtime this will result in another SELECT for each row to count the items:

image

We could eager fetch those rows ahead of time, but this is also less efficient than just performing a SQL correlated subquery to calculate that SUM. With AutoMapper, we can just declare this property on our ViewModel class:

public class CourseModel
{
    public int CourseID { get; set; }
    public string Title { get; set; }
    public string DepartmentName { get; set; }
    public int EnrollmentsCount { get; set; }
}

AutoMapper can recognize extension methods, and automatically looks for System.Linq extension methods. The underlying expression created looks something like this:

courses =
    from i in db.Instructors
    from c in i.Courses
    where i.ID == id
    select new InstructorIndexData.CourseModel
    {
        CourseID = c.CourseID,
        DepartmentName = c.Department.Name,
        Title = c.Title,
        EnrollmentsCount = c.Enrollments.Count()
    };

LINQ providers can recognize that aggregation and use it to alter the underlying query. Here’s what that looks like in SQL Profiler:

SELECT 
    [Project1].[CourseID] AS [CourseID], 
    [Project1].[Title] AS [Title], 
    [Project1].[Name] AS [Name], 
    (SELECT 
        COUNT(1) AS [A1]
        FROM [dbo].[Enrollment] AS [Extent5]
        WHERE [Project1].[CourseID] = [Extent5].[CourseID]) AS [C1]
    FROM --etc etc etc

That’s pretty cool, just create the property with the right name on your view model and you’ll get an optimized query built for doing an aggregation.

But wait, there’s more! What about more complex operations? It turns out that we can do whatever we like in MapFrom as long as the query provider supports it.

Complex aggregations

Let’s do something more complex. How about counting the number of students whose name starts with the letter “A”? First, let’s create a property on our view model to hold this information:

public class CourseModel
{
    public int CourseID { get; set; }
    public string Title { get; set; }
    public string DepartmentName { get; set; }
    public int EnrollmentsCount { get; set; }
    public int EnrollmentsStartingWithA { get; set; }
}

Because AutoMapper can’t infer what the heck this property means, since there’s no match on the source type even including extension methods, we’ll need to create a custom mapping projection using MapFrom:

cfg.CreateMap<Course, InstructorIndexData.CourseModel>()
    .ForMember(m => m.EnrollmentsStartingWithA, opt => opt.MapFrom(
        c => c.Enrollments.Where(e => e.Student.LastName.StartsWith("A")).Count()
    )
);

At this point, I need to make sure I select the overloads for the aggregation methods that are supported by my LINQ query provider. There’s another overload of Count() that takes a predicate to filter items, but it’s not supported. Instead, I need to chain a Where then Count. The SQL generated is now efficient:

SELECT 
    [Project2].[CourseID] AS [CourseID], 
    [Project2].[Title] AS [Title], 
    [Project2].[Name] AS [Name], 
    [Project2].[C1] AS [C1], 
    (SELECT 
        COUNT(1) AS [A1]
        FROM  [dbo].[Enrollment] AS [Extent6]
        INNER JOIN [dbo].[Person] AS [Extent7]
            ON ([Extent7].[Discriminator] = N''Student'')
            AND ([Extent6].[StudentID] = [Extent7].[ID])
        WHERE ([Project2].[CourseID] = [Extent6].[CourseID])
            AND ([Extent7].[LastName] LIKE N''A%'')) AS [C2]

This is a lot easier than me pulling back all students and looping through them in memory. I can go pretty crazy here, but as long as those query operators are supported by your LINQ provider, AutoMapper will pass through your MapFrom expression to the final outputted Select expression. Here’s the equivalent Select LINQ projection for the above example:

courses =
    from i in db.Instructors
    from c in i.Courses
    where i.ID == id
    select new InstructorIndexData.CourseModel
    {
        CourseID = c.CourseID,
        DepartmentName = c.Department.Name,
        Title = c.Title,
        EnrollmentsCount = c.Enrollments.Count(),
        EnrollmentsStartingWithA = c.Enrollments
            .Where(e => e.Student.LastName.StartsWith("A")).Count()
    };

As long as you can LINQ it, AutoMapper can build it. This combined with preventing lazy loading problems is a compelling reason to go the view model/AutoMapper route, since we can rely on the power of our underlying LINQ provider to build out the correct, efficient SQL query better than we can. That, I think, is wicked awesome.

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Using AutoMapper to prevent SELECT N+1 problems

Thu, 04/03/2014 - 15:13

Back in my post about efficient querying with AutoMapper, LINQ and future queries, one piece I glossed over was how View Models and LINQ projection can prevent SELECT N+1 problems. In the original controller action, I had code like this:

public ActionResult Index(int? id, int? courseID)
{
    var viewModel = new InstructorIndexData();
 
    viewModel.Instructors = db.Instructors
        .Include(i => i.OfficeAssignment)
        .Include(i => i.Courses.Select(c => c.Department))
        .OrderBy(i => i.LastName);
 
    if (id != null)
    {
        ViewBag.InstructorID = id.Value;
        viewModel.Courses = viewModel.Instructors.Where(
            i => i.ID == id.Value).Single().Courses;
    }
 
    if (courseID != null)
    {
        ViewBag.CourseID = courseID.Value;
        viewModel.Enrollments = viewModel.Courses.Where(
            x => x.CourseID == courseID).Single().Enrollments;
    }
 
    return View(viewModel);
}

See that “Include” part? That’s because the view shows information from navigation and collection properties on my Instructor model:

public class Instructor : Person
{
    [DataType(DataType.Date)]
    [DisplayFormat(DataFormatString = "{0:yyyy-MM-dd}", ApplyFormatInEditMode = true)]
    [Display(Name = "Hire Date")]
    public DateTime HireDate { get; set; }

    public virtual ICollection<CourseInstructor> Courses { get; set; }
    public virtual OfficeAssignment OfficeAssignment { get; set; }
}

public abstract class Person
{
    public int ID { get; set; }

    [Required]
    [StringLength(50)]
    [Display(Name = "Last Name")]
    public string LastName { get; set; }
    [Required]
    [StringLength(50, ErrorMessage = "First name cannot be longer than 50 characters.")]
    [Column("FirstName")]
    [Display(Name = "First Name")]
    public string FirstMidName { get; set; }

    [Display(Name = "Full Name")]
    public string FullName
    {
        get
        {
            return LastName + ", " + FirstMidName;
        }
    }
}

If I just use properties on the Instructor/Person table, only one query is needed. However, if my view happens to use other information on different tables, additional queries are needed. If I’m looping through a collection association, I could potentially have a query issued for each loop iteration. Probably not what was expected!

ORMs let us address this by eagerly fetching associations, via JOINs. In EF this can be done via the “Include” method on a LINQ query. In NHibernate, this can be done via Fetch (depending on the query API you use). This is addresses the symptom, but is not a good long-term solution.

Because our domain model exposes all data available, it’s easy to just show extra information on a view without batting an eye. However, unless we keep a database profiler open at all times, it’s not obvious to me as a developer that a given association will result in a new query. This is where AutoMapper’s LINQ projections come into play. First, we have a View Model that contains only the data we wish to show on the screen, and nothing more:

public class InstructorIndexData
{
    public IEnumerable<InstructorModel> Instructors { get; set; }

    public class InstructorModel
    {
        public int ID { get; set; }

        [Display(Name = "Last Name")]
        public string LastName { get; set; }
            
        [Display(Name = "First Name")]
        public string FirstMidName { get; set; }

        [DisplayFormat(DataFormatString = "{0:yyyy-MM-dd}", ApplyFormatInEditMode = true)]
        [Display(Name = "Hire Date")]
        public DateTime HireDate { get; set; }

        public string OfficeAssignmentLocation { get; set; }

        public IEnumerable<InstructorCourseModel> Courses { get; set; } 
    }

    public class InstructorCourseModel
    {
        public int CourseID { get; set; }
        public string Title { get; set; }
    }
}

At this point, if we used AutoMapper’s normal Map method, we could still potentially have SELECT N+1 problems. Instead, we’ll use the LINQ projection capabilities of AutoMapper :

var viewModel = new InstructorIndexData();
 
viewModel.Instructors = db.Instructors
    .OrderBy(i => i.LastName)
    .Project().To<InstructorIndexData.InstructorModel>()
;
 

Which results in exactly one query to fetch all Instructor information, using LEFT JOINs to pull in various associations. So how does this work? The LINQ projection is quite simple – it merely looks at the destination type to build out the Select portion of a query. Here’s the equivalent LINQ query:

from i in db.Instructors
orderby i.LastName
select new InstructorIndexData.InstructorModel
{
    ID = i.ID,
    FirstMidName = i.FirstMidName,
    LastName = i.LastName,
    HireDate = i.HireDate,
    OfficeAssignmentLocation = i.OfficeAssignment.Location,
    Courses = i.Courses.Select(c => new InstructorIndexData.InstructorCourseModel
    {
        CourseID = c.CourseID,
        Title = c.Title
    }).ToList()
};

Since Entity Framework recognizes our SELECT projection and can automatically build the JOINs based on the data we include, we don’t have to do anything to Include any navigation or collection properties in our SQL query – they’re automatically included!

With AutoMapper’s LINQ projection capabilities, we eliminate any possibility of lazy loading or SELECT N+1 problems in the future. That, I think, is awesome.

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Successful IoC container usage

Thu, 03/20/2014 - 16:08

Every now and again I here the meme of IoC containers being bad, they lead to bad developer practices, they’re too complicated and on and on. IoC containers – like any sharp tool – can be easily abused. Dependency injection, as a concept, is here to stay. Heck, it’s even in Angular.

Good usage of IoC containers goes hand in hand with good OO design. Dependency injection won’t make your design better, but it can enable better design.

So what do I use IoC containers for? First and foremost, dependency injection. If I have a 3rd-party dependency, I’ll inject it. This enables me to swap implementations or isolate that dependency behind a façade. Additionally, if I want to provide different configurations of that component for different environments, dependency injection allows me to modify that behavior without modifying services using that component.

I am, however, very judicious in my facades. I don’t wrap 3rd party libraries, like a Repository does with your DbContext or ISession. If a library needs simplification or unification (Adapter pattern), that’s where wrapping the dependency helps.

I also don’t create deep compositional graphs. I don’t get stricken with service-itis, where every function has to have an IFooService and FooService implementation.

Instead, I focus on capturing concepts in my application. In one I’m looking at, I have concepts for:

  • Queries
  • Commands
  • Validators
  • Notifications
  • Model binders
  • Filters
  • Search providers
  • PDF document generators
  • Search providers
  • REST document readers/writers

Each of these has multiple implementers of a common interface, often as a generic interface. These are all examples of the good OO design patterns – the behavioral patterns, including:

  • Chain of responsibility
  • Command
  • Mediator
  • Strategy
  • Visitor

I strive to find concepts in my system, and build abstractions around those concepts. The IModelBinderProvider interface, for example, is a chain of responsibility implementation, where we have a concept of providing a model binder based on inputs, and each provider deciding to provide a model binder (or not).

The final usage is around lifecycle/lifetime management. This is much easier if you have a container and ecosystem that provides explicit scoping using child/nested containers. Web API for example has an “IDepedencyScope” which acts as a composition root for each request. I either have singleton components, composition root-scoped components (like your DbContext/ISession), or resolve-scoped components (instantiated once per call to Resolve).

Ultimately, successful container usage comes down to proper OO, limiting abstractions and focusing on concepts. Composition can be achieved in many forms – often supported directly in the language, such as pattern matching or mixins – but no language has it perfect so being able to still rely on dependency injection without a lot of fuss can be extremely powerful.

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Avoid many-to-many mappings in ORMs

Wed, 03/12/2014 - 15:30

Going through and reviewing the Contoso University codebase, really to get caught up on EF 6 features, I found a relationship between two tables that resulted in a many-to-many mapping. We have these tables:

image

A Course can have many Instructors, and a Person (Instructor) can have many Courses. The EF code-first mapping for this relationship looks like:

modelBuilder.Entity<Course>()
    .HasMany(c => c.Instructors).WithMany(i => i.Courses)
    .Map(t => t.MapLeftKey("CourseID")
        .MapRightKey("InstructorID")
        .ToTable("CourseInstructor"));

The NHibernate mapping would look similar, with a .ManyToMany() mapping on one or both sides of the relationship. From the Course and Person entities, I treat the one-to-many direction as a normal collection:

public class Course
{
    /* Blah blah blah */
    public virtual ICollection<Instructor> Instructors { get; set; }
}

public class Instructor : Person
{
    /* Blah blah blah */
    public virtual ICollection<Course> Courses { get; set; }
}

From each direction, we don’t ever interact with the junction table, we just follow a relationship from each direction as if it were a one-to-many. There are a few reasons why I don’t like this sort of modeling. Many-to-many relationships are normal in databases, but in my entity model I don’t like treating these relationships as if the junction table doesn’t exist. Some of the reasons include:

  • No place to put behavior/data concerning the relationship. There is no CourseInstructor class to add properties to
  • In order to navigate a direction, I have to query through the originating entity, instead of starting with the junction table
  • It’s not obvious as a developer that the many-to-many relationship exists – I have to look and compare both sides to understand the relationship
  • The queries that result in this model often don’t line up to the SQL I would have written myself

For these reasons, I instead always start with my junction tables modeled explicitly:

public class CourseInstructor
{
    public virtual Course Course { get; set; }
    public virtual Instructor Instructor { get; set; }
}

From each side of the relationship, I can decide (or not) to model each direction of this relationship:

public class Course
{
    /* Blah blah blah */
    public virtual ICollection<CourseInstructor> Instructors { get; set; }
}
 
public class Instructor : Person
{
    /* Blah blah blah */
    public virtual ICollection<CourseInstructor> Courses { get; set; }
}

Many times, I’ll even avoid creating the collection properties on my entities, to force myself to decide whether or not I’m constraining my selection or if I really need to grab the entities on the other side. I can now build queries like this:

courses = db.CourseInstructors
    .Where(ci => ci.InstructorID == id)
    .Select(ci => ci.Course)

I can skip going through other side of the many-to-many relationship altogether, and start straight from the junction table and go from there. It’s obvious to the developer, and often times the ORM itself has an easier time constructing sensible queries.

I do lose a bit of convenience around pretending the junction table doesn’t exist from the Course and Instructor entities, but I’m happy with the tradeoff of a little convenience for greater flexibility and explicitness.

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Efficient querying with LINQ, AutoMapper and Future queries

Tue, 03/11/2014 - 21:31

Even after all these years, I’m still a big fan of ORMs. One common complaint over the years is people using ORMs use the tool identically for both reads and writes. With writes, I want tracked entities, managed relationships, cascades and the like. With reads, I don’t need any of that gunk, just an efficient means of getting a read-only view of my data. If I need to make multiple queries to gather the data, this often results in queries that return lots of data over multiple round-trips.

We can do better!

Let’s say we have a controller action (taken from the Contoso University Entity Framework sample) that pulls in instructor, course, and enrollment information:

public ActionResult Index(int? id, int? courseID)
{
    var viewModel = new InstructorIndexData();
 
    viewModel.Instructors = db.Instructors
        .Include(i => i.OfficeAssignment)
        .Include(i => i.Courses.Select(c => c.Department))
        .OrderBy(i => i.LastName);
 
    if (id != null)
    {
        ViewBag.InstructorID = id.Value;
        viewModel.Courses = viewModel.Instructors.Where(
            i => i.ID == id.Value).Single().Courses;
    }
 
    if (courseID != null)
    {
        ViewBag.CourseID = courseID.Value;
        viewModel.Enrollments = viewModel.Courses.Where(
            x => x.CourseID == courseID).Single().Enrollments;
    }
 
    return View(viewModel);
}

This doesn’t look so bad at first glance, but what isn’t so obvious here is that this involves four round trips to the database, one for each set of data, plus some wonky lazy loading I couldn’t figure out:

image

We could alter the original query to eagerly fetch with left outer joins those other two items, but that could seriously increase the amount of data we have returned. Since I’m only interested one instructor/course at a time here, I don’t really want to pull back all courses and enrollees.

There’s a bigger issue here too – I’m passing around a live queryable around, making it possible to modify, iterate and otherwise make a general mess of things.  Additionally, I pull back live entities and all entity data – again, more inefficient the wider and larger my tables become. Since the entities could be live, tracked entities, I’d want to be careful not to modify my entities on the way to the view for reading purposes.

Ideally, I’d hit the database exactly once for only the data I need, and nothing more. This is what I often see people create stored procedures for – building up the exact resultset you need at the database level, only getting what we need. First, let’s pull in AutoMapper and create a ViewModel that represents our projected data:

public class InstructorIndexData
{
    public IEnumerable<InstructorModel> Instructors { get; set; }
    public IEnumerable<CourseModel> Courses { get; set; }
    public IEnumerable<EnrollmentModel> Enrollments { get; set; }

    public class InstructorModel
    {
        public int ID { get; set; }

        [Display(Name = "Last Name")]
        public string LastName { get; set; }
            
        [Display(Name = "First Name")]
        public string FirstMidName { get; set; }

        [DisplayFormat(DataFormatString = "{0:yyyy-MM-dd}", ApplyFormatInEditMode = true)]
        [Display(Name = "Hire Date")]
        public DateTime HireDate { get; set; }

        public string OfficeAssignmentLocation { get; set; }

        public IEnumerable<InstructorCourseModel> Courses { get; set; } 
    }

    public class InstructorCourseModel
    {
        public int CourseID { get; set; }
        public string Title { get; set; }
    }

    public class CourseModel
    {
        public int CourseID { get; set; }
        public string Title { get; set; }
        public string DepartmentName { get; set; }
    }

    public class EnrollmentModel
    {
        [DisplayFormat(NullDisplayText = "No grade")]
        public Grade? Grade { get; set; }
        public string StudentLastName { get; set; }
        public string StudentFirstMidName { get; set; }
        public string StudentFullName
        {
            get
            {
                return StudentLastName + ", " + StudentFirstMidName;
            }
        }
    }
}

We can flatten many members out (Department.Name to DepartmentName). Next, let’s modify our controller action to project with LINQ and AutoMapper:

public ActionResult Index(int? id, int? courseID)
{
    var viewModel = new InstructorIndexData();

    viewModel.Instructors = db.Instructors
        .Project().To<InstructorIndexData.InstructorModel>()
        .OrderBy(i => i.LastName)
       ;

    if (id != null)
    {
        ViewBag.InstructorID = id.Value;
        viewModel.Courses = db.Instructors
            .Where(i => i.ID == id)
            .SelectMany(i => i.Courses)
            .Project().To<InstructorIndexData.CourseModel>();
    }

    if (courseID != null)
    {
        ViewBag.CourseID = courseID.Value;
        viewModel.Enrollments = db.Enrollments
            .Where(x => x.CourseID == courseID)
            .Project().To<InstructorIndexData.EnrollmentModel>();
    }

    return View(viewModel);
}

Finally, we’ll need to configure AutoMapper to build mapping definitions for these types:

Mapper.Initialize(cfg =>
{
    cfg.CreateMap<Instructor, InstructorIndexData.InstructorModel>();
    cfg.CreateMap<Course, InstructorIndexData.InstructorCourseModel>();
    cfg.CreateMap<Course, InstructorIndexData.CourseModel>();
    cfg.CreateMap<Enrollment, InstructorIndexData.EnrollmentModel>();
});

With these changes, our SQL has improved (somewhat) in reducing the data returned to only what I have in my view models:

exec sp_executesql N'SELECT 
    [Extent1].[CourseID] AS [CourseID], 
    [Extent1].[Grade] AS [Grade], 
    [Extent2].[LastName] AS [LastName], 
    [Extent3].[FirstName] AS [FirstName]
    FROM   [dbo].[Enrollment] AS [Extent1]
    LEFT OUTER JOIN [dbo].[Person] AS [Extent2] ON ([Extent2].[Discriminator] = N''Student'') AND ([Extent1].[StudentID] = [Extent2].[ID])
    LEFT OUTER JOIN [dbo].[Person] AS [Extent3] ON ([Extent3].[Discriminator] = N''Student'') AND ([Extent1].[StudentID] = [Extent3].[ID])
    WHERE [Extent1].[CourseID] = @p__linq__0',N'@p__linq__0 int',@p__linq__0=2042

We’re now only selecting the columns back that we’re interested in. I’m not an EF expert, so this is about as good as it gets, SQL-wise. EF does however recognize we’re using navigation properties, and will alter the SQL accordingly with joins.

We’re still issuing three different queries to the server, how can we get them all back at once? We can do this with Future queries, an extension to EF that allows us to gather up multiple queries and execute them all when the first executes. Pulling in the “EntityFramework.Extended” NuGet package, we only need to add “Future” to our LINQ methods in our controller:

public ActionResult Index(int? id, int? courseID)
{
    var instructors = db.Instructors
        .OrderBy(i => i.LastName)
        .Project().To<InstructorIndexData.InstructorModel>()
        .Future();

    var courses = Enumerable.Empty<InstructorIndexData.CourseModel>();
    var enrollments = Enumerable.Empty<InstructorIndexData.EnrollmentModel>();

    if (id != null)
    {
        ViewBag.InstructorID = id.Value;
        courses = db.Instructors
            .Where(i => i.ID == id)
            .SelectMany(i => i.Courses)
            .Project().To<InstructorIndexData.CourseModel>()
            .Future();
    }

    if (courseID != null)
    {
        ViewBag.CourseID = courseID.Value;
        enrollments = db.Enrollments
            .Where(x => x.CourseID == courseID)
            .Project().To<InstructorIndexData.EnrollmentModel>()
            .Future();
    }

    var viewModel = new InstructorIndexData
    {
        Instructors = instructors.ToList(),
        Courses = courses.ToList(),
        Enrollments = enrollments.ToList()
    };

    return View(viewModel);
}

Which results in all 3 queries getting sent at once to the server in one call:

image

One other modification I made is I ensured that all projection occurred within the controller action, by calling “ToList” on all the IQueryable/FutureQuery objects. I’d rather not have the view be able to modify the query or otherwise introduce any potential problems.

Now, the SQL generated is…interesting to say the least, but that’s not something I can control here. What has improved is I’m now only returning exactly the data I want, into objects that aren’t tracked by Entity Framework (and thus can’t be accidentally modified and updated through change tracking), and all my data is transferred in exactly one database command. I intentionally left the model/mapping alone so that it was a simple conversion, but I would likely go further to make sure the manner in which I’m querying is as efficient as possible.

AutoMapper’s auto-projection of LINQ queries plus Entity Framework’s FutureQuery extensions lets me be as efficient as possible in querying with LINQ, without resorting to stored procedures.

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Reducing NServiceBus Saga load

Thu, 02/27/2014 - 23:36

When presented with concurrency issues with NServiceBus sagas, you’re generally presented with two options:

  • Relax the transaction isolation level
  • Reduce worker thread count, forcing serialized processing of messages

Both of these are generally not a great solution, as neither actually tries to solve the problem of concurrent access to our shared resource (the saga entity). The process manager pattern can be quite powerful to solve asynchronous workflow problems, but it does come with a cost – shared state.

Suppose we had a process that received a batch of operations to perform, and needed to notify a third party when the batch of operations is done. It looks like we need something to keep track of what’s “done” or not, and something to perform the work. Keeping track of work to be done sounds like a good fit for a saga, so our first attempt might look something like this:

image

Our process will be:

  1. Send message to start batch saga
  2. Send messages to workers for each item of work to be done
  3. Listen for work done messages, check if work done
  4. If work done, send batch done message

The problem with this approach is that we’re creating a shared resource for our work to be done. Even if we do something completely naïve for tracking work:

public class BatchSaga : IContainSagaData {
  public int TotalWork { get; set; }
  public int WorkCompleted { get; set }
  
  public void Handle(WorkCompleted message) {
    WorkCompleted++;
    
    if (WorkCompleted == TotalWork) {
      Bus.Send(new BatchDone());
      MarkAsComplete();
    }
  }
}

Even if we’re only tracking the count of work completed (or decrementing a counter, doesn’t matter), the problem is that only one “work done” message can be processed at a time. Our actual work might be isolated, letting us scale out our workers to N nodes, but the notification of done still has to get back into a single file line for our saga. Even if we up the worker count on the saga side, modifications to our saga entity must be serialized, done only one at a time. Upping the number of workers on the saga side is only going to lead to concurrency violations, exceptions, and an overall much slower process.

Reduction through elimination

I picture this as a manufacturing facility supervisor. A batch of work comes in, and the supervisor hands out work to workers. Can you imagine if after each item was completed, the worker sends an email to the supervisor, with their checklist, to notify they were done? The supervisor would quickly become overwhelmed by the sheer volume of email, to be processed one-by-one.

We need to eliminate our bottleneck in the supervisor by separating out responsibilities. Currently, our supervisor/saga has two responsibilities:

  1. Keep track of work done
  2. Check if work complete

But doesn’t a worker already know if work is done or not? Why does the worker need to notify the supervisor that work is done? What if this responsibility was the worker’s job?

Let’s see if we can modify our saga to be a little bit more reasonable. What if we were able to easily update each item of work individually, separate from the others? I imagine in my head a tally sheet, where each worker can go up to a big whiteboard and check their item off the list. No worker interferes with each other, as they’re only concerned about their own little box. The saga is responsible for creating the initial board of work, but workers can update themselves.

At this point, our saga starts to look like:

image

Our saga now only checks the sheet, which doesn’t block with a worker updating it. Our saga now only reads, not writes. In this picture, we still get notifications for every single worker, that still has to go in a single queue. We can modify our saga slightly by instead of getting notifications for every worker, we register a timeout message. Does the “batch done” message need to go out immediately after the last worker is done? Or some time later? If, say, we only need to notify batch done, we can use timeouts instead, and simply poll every so often to check for done.

With timeouts, we’re greatly reducing network traffic, and potentially, reducing the time between when workers are actually done from when we notify that we’re done. Suppose we have 100K items to send to our workers. That means we’ll have 100K “Work Done” messages needing to be processed by our saga. How long will it take that to process? Instead, a timeout message can just periodically check done-ness:

image

We can even relax our constraints, and allow dirty reads on checking the work. This is now possible since recording the work and checking the work are two different steps. We’ve also greatly reduced our network load, and provided predictability into our SLA for notifying on work done.

Reducing load

To reduce load in our saga, we needed to clearly delineate the responsibilities. It’s easy to build a chatty system and not see the pain when you have small datasets or no network load. Once we start imagining how the real world tackles problems like these, the realities of network computing become much more obvious, and a clear solution presents itself. In this case, a supervisor receiving notifications for everything and keeping track of a giant list just wouldn’t scale.

By going with a lower-intensive option and trading off immediate notification for predictability, we’ve actually increased the accuracy of our system. It’s important to keep in mind the nature of queues as FIFO systems with limited concurrency, and sagas having a shared resource, and what this implies in the workflows and business processes you model.

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs

Converting AutoMapper to a Portable Class Library

Thu, 02/20/2014 - 01:11

In the early days of AutoMapper, I got requests to support AutoMapper in other platforms (mainly Silverlight). I didn’t have any experience in those other platforms, but I thought it would be easy. It was not. I wound up with a picture like this:

image

A separate project per platform, and using tricks like “Add as Link” to have multiple projects share the same source file. The problem with this approach is I have to remember to add files across all projects, and some features aren’t available in some platforms. I resorted to compiler directives, and ultimately, dropped support for the other platforms because it slowed me down.

I gave up on supporting multiple platforms, until Portable Class Libraries entered the picture. Converting AutoMapper to a PCL was a bit of a journey, but a worthwhile one, as now I get to support with one library:

  • .NET 4
  • Silverlight 5
  • Windows Phone 8
  • Metro (whatever it’s called now)
  • Android
  • iOS

And even more important, now that I’ve made the journey, new features/refactorings/whatever is much, much easier as now I have one library to go from.

Well, almost. We’ll get back to that. First, let’s look at the journey for going PCL.

Initial conversion

My first step was to simply understand what was supported and what wasn’t. I started by converting the existing library to PCL, and just see what didn’t work.

PCLs work by creating subset targeting libraries of different platform choices, and using type forwarding at runtime to direct calls to the correct platform API. But not every feature is supported on every platform, so you’ll find different profiles created for the different combinations of platforms you select, representing those common set of APIs. Don’t have the API supported for all the platforms? Not available for you. The profiles have figured out the hard part of what is common across all the APIs, and exposed it directly for you.

At first, I had a lot of compile errors. Things that weren’t available across my initial set of profiles I supported included:

  • Anything System.Data
  • Concurrent collections
  • Random other types (HashSet, TypeDescriptor, etc)
  • Reflection.Emit
  • Expression tree compilation (!)
  • Random overloads (not all type metadata method overloads exist, just the most parameter-ful)

The last one was easy, the BCL team didn’t port every overload of a method, that’s time better spent exposing other types. Remember, just because a type exists in all platforms doesn’t mean that it’s supported in PCL – the BCL team has to expose it in their profile-specific assembly.

After the easy fixes, I had a choice. I wanted to still keep all the features and performance of the major users of AutoMapper – .NET 4, but how to expose that behavior? I had two options:

  • Feature-specific extensions
  • Platform-specific extensions

I decided to go with the latter, as the current usage scenario is just to do “install-package AutoMapper” and go from there. Luckily, NuGet supports PCL out-of-the-box, so I can create a single NuGet package and include platform-specific assemblies.

Platform-specific assemblies through dependency injection

Looking through the different unsupported objects in my base portable class library, I saw two main types:

  • Behavior that exists, but accomplishable in a different way
  • Behavior that does not exist and is not possible

One example is something like a ConcurrentDictionary – sure, that doesn’t exist in some platforms, but does my core AutoMapper assembly need to depend directly on ConcurrentDictionary? Or can it use an abstraction? In my case, I wound up creating abstractions for different implementations, like:

  • Lazy<T>
  • ReaderWriterLockSlim
  • IDictionary<TKey, TValue>

And then each of these would have an interface for a factory:

public interface INullableConverterFactory
{
  INullableConverter Create(Type nullableType);
}

A platform-specific library could then provide an override implementation for that type:

public class NullableConverterFactoryOverride : INullableConverterFactory
{
  public INullableConverter Create(Type nullableType)
  {
    return new NullableConverterImpl(new NullableConverter(nullableType));
  }

Finally, I need a way to, at runtime, load the correct platform-specific implementation. I used a technique found in the PCL Contrib project called the “Probing Adapter Resolver”. A class that needs a specific platform-specific implementation will ask a common resolver for one:

public class EnumMapper : IObjectMapper
{
  private static readonly INullableConverterFactory NullableConverterFactory =
    PlatformAdapter.Resolve<INullableConverterFactory>();

At runtime, the probing resolver scans for assemblies named “AutoMapper.Xyz” where “Xyz” is a known platform name (NET4, SL4, WP8 etc.). If there is an implementation of that interface in that platform-specific assembly, I’ll use that one. Otherwise, I fall back on what is defined in AutoMapper.

Because NuGet only installs specific assemblies for your platform, I can make sure that for .NET 4, only AutoMapper.dll and AutoMapper.Net4.dll are referenced, even though all the platform assemblies are in the NuGet package.

This also means that I can supply additional features for platforms that support them. .NET 4 supports IDataReader, so that feature is supported in that platform, discovered dynamically at runtime. The resolver caches those probes, so it’s fast once the correct override type is resolved. It also means that I can supply different strategies based on what’s available. Windows Phone 8 doesn’t support Reflection.Emit or compiling expression trees, so it gets a slower reflection-based implementation of mapping to classes.

I got all of this up and running, and compiling, and tested, and released in AutoMapper 3.0. Then a fun (ha) exception started cropping up on the mailing list and in GitHub issues.

MSBuild, indirect dependencies and you

Those that have used ELMAH or NHibernate might have run into this issue already. Suppose we have the following solution structure:

  • Core
    • Elmah
  • UI
    • Core

My Core project references Elmah, and my UI project references Core (but not Elmah). Nothing in Core actually uses Elmah, but instead it’s the Web.config in the UI project that configures it. What you’ll find is that even though you’ve explicitly referenced Elmah in Core, because your assembly doesn’t actually link to that other assembly, MSBuild will not copy the Elmah assembly over to the UI project.

For AutoMapper, this meant that because the platform-specific assemblies were never referenced by user code, and discovered dynamically at runtime, the possibility existed that MSBuild would not copy the assembly over to your ultimate application’s output folder. I detailed the problem in a post a few months ago, along with a few solutions.

It’s a stupid, annoying problem, but isn’t going away any time soon. My options were to:

  • Instruct users to reference AutoMapper in every project
  • Create a build warning
  • Include the platform-specific assembly as content, to force it to be copied over

I wound up going with the last option. Initially, I did this in your project itself, and you’d see “AutoMapper.Net4.dll” in your project as content. Confusing to users, and not always wanted. Instead, I wound up creating an MSBuild hook to do this dynamically, and I now will inject into the build pipeline directly to dynamically add the platform-specific assembly as content:

<?xml version="1.0" encoding="utf-8"?>
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

  <!-- Build sequence modification -->
  <Target Name="CopyAutoMapperAssembly"
          AfterTargets="ResolveAssemblyReferences">
    <CreateItem Include="%(ReferencePath.FullPath)"
                Condition="$([System.String]::new('%(ReferencePath.Filename)').StartsWith('AutoMapper.'))">
      <Output TaskParameter="Include"
              ItemName="_AutoMapperReference"/>
    </CreateItem>

    <Message Importance="low" Text="AutoMapper Platform Extension Assembly: %(_AutoMapperReference.FullPath)"/>

    <ItemGroup>
      <Content Include="%(_AutoMapperReference.FullPath)" Condition="'%(_AutoMapperReference.FullPath)' != ''">
        <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
      </Content>
    </ItemGroup>
  </Target>
</Project>

This method doesn’t modify your project file to include the assembly at content, but it does dynamically insert that content item during the build. Your project file doesn’t change, and you don’t see the platform-specific assembly as content in your solution. The only change to your project file is to reference this new targets file, but that’s a common approach NuGet packages do to provide build-time behavior. This change isn’t released, but is in the latest pre-release package.

It’s not bullet-proof, but it’s much better than my previous approaches. If you manually delete bin folders, MSBuild can be tricked into thinking nothing has changed and won’t execute this script. It’s a corner case, and one I can live with.

Conclusion

Overall, I’m happy with the overall approach, and it’s made AutoMapper more pluggable. I now have better ways of extending AutoMapper’s behavior, simply because I was forced to. I can target multiple platforms quite easily, and can rely on PCLs to work out the differences.

If you’re developing a reusable library, I’d go PCL from the start, as it’s much easier to do so now than later. I’d worry about target platforms, as the more platforms you support the greater chance some feature doesn’t exist across platforms. But there are ways around this, as I’ve found, and the end result is still light years better than my old approach of just copying a bunch of files around, hoping for the best.

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Categories: Blogs