Entity Framework 6 – Bulk Insert and Returning of Generated Primary Keys

Bit of a code dump today. I was tasked with making database insertions quicker. On a good day we have millions of rows which we inserted using the Entity Framework – which is inefficient and done one row at a time.

I’ve started by working with the library EFUtilities which looked very promising – it uses the Entity Frameworks’s metadata and model to create a temporary table, bulk inserts into it, and copies data to the real table. The following code expand the capabilities of the library:

  • Improved thread-safety.
  • Add mode of ignoring failures – Sometimes we want to insert millions of rows, and we don’t really care about a few invalid ones.
  • Main motivation – Return the generated primary keys (or IDs) of the inserted rows. EFUtilities leaved all IDs as 0, which is very limiting.

Continue reading

Advertisement

Finding intersecting ranges – Small variant using Expression.Constant

Last time we saw how use expression to find intersecting ranges. Here’s another useful overload of the same method – it is probably common to have one of ranges on the entity object (for example – the employee’s vacation), while knowing the other (for example the week of training). In that case, we can revise the implementation a little to use to concrete values:

public static Expression<Func<TEntity, bool>>
    RangesIntersect<TEntity, TComparable>(TComparable from, TComparable to,
                                        Expression<Func<TEntity, TComparable>> getEntityRangeStart,
                                        Expression<Func<TEntity, TComparable>> getEntityRangeEnd)
{
    ParameterExpression newLambdaParameter = Expression.Parameter(typeof(TEntity));
    var rewireStart = RewireLambdaExpression(getEntityRangeStart, newLambdaParameter);
    var rewireEnd = RewireLambdaExpression(getEntityRangeEnd, newLambdaParameter);

    return Expression.Lambda<Func<TEntity, bool>>(
        RangesIntersect(Expression.Constant(from), Expression.Constant(to),
                        rewireStart.Body, rewireEnd.Body),
        newLambdaParameter);
}

public static IQueryable<TEntity> WhereRangesIntersect<TEntity, TComparable>(
    this IQueryable<TEntity> query,
    TComparable range1Start,
    TComparable range1End,
    Expression<Func<TEntity, TComparable>> getEntityRange2Start,
    Expression<Func<TEntity, TComparable>> getEntityRange2End)
{
    return query.Where(RangesIntersect(range1Start, range1End, getEntityRange2Start, getEntityRange2End));
}

Example Use

using (VacationsEntities model = new VacationsEntities())
{
    var trainingStart = new DateTime(2013, 1, 1);
    var trainingEnd = trainingStart.AddDays(7);
    var query = model.Vacations
                     .WhereRangesIntersect(trainingStart,
                                           trainingEnd,
                                           v => v.Start.Value,
                                           v => v.End.Value);
    // ...

Resulting SQL

The Entity Framework chose to convert the two Expression.Constants to date literals instead of using SQL parameters:

SELECT 
[Extent1].[VacationId] AS [VacationId], 
[Extent1].[EmployeeId] AS [EmployeeId], 
[Extent1].[Start] AS [Start], 
[Extent1].[End] AS [End]
FROM [Vacations] AS [Extent1]
WHERE (convert(datetime, '2013-01-01 00:00:00.000', 121) <= [Extent1].[End]) AND
      ([Extent1].[Start] <= convert(datetime, '2013-01-08 00:00:00.000', 121))

Finding intersecting ranges using expression trees and the Entity Framework

Background

We’re using the Entity Framework as ORM. We have entities of Employee and Vacation. Employees may go on vacations, that’s fair. A vacation is a range of dates: start and end.
Our company will hold a week long safety training, so we need to find all employees who will go on vacation and will miss any day of the training.

Or, more generally – we have two ranges, and we want to find only the items where the ranges intersect.

Sounds simple enough but as it turns out, there is no immediate way of creating such a query. Naturally, we are more interested in a general solution, not a where clause that we would have to duplicate each time.

Logic

Given two ranges, we want to return true if the ranges intersect. A simple implementation would be:

bool intersecting = (range1Start <= range2End) &&
                    (range2Start <= range1End);

<= means that we treat our ranges as inclusive at both ends – {1,2} and {2,3} would intersect at the point 2. < would be exclusive at any end or both.

Implementation

The Entity Framework works by translating expression trees into SQL. We can’t write regular code like we would with Linq to Objects, that wouldn’t work: the Entity Framework only understands Expressions:

/// <summary>
/// Get two ranges represented by four expressions, and check if they intersect. Inclusive on both ends,
/// and allow intersection in a single point.
/// </summary>
/// <param name="range1From">Start value of the first range</param>
/// <param name="range1To">End value of the first range</param>
/// <param name="range2From">Start value of the second range</param>
/// <param name="range2To">End value of the second range</param>
/// <returns>A BinaryExpression that checks if the two ranges intersect.</returns>
public static BinaryExpression RangesIntersect(Expression range1From, Expression range1To,
                                               Expression range2From, Expression range2To)
{
    return(Expression.AndAlso(
        Expression.LessThanOrEqual(range1From, range2To),
        Expression.LessThanOrEqual(range2From, range1To)
        ));
}

Not That Easy

Next, we want to use the Queryable.Where overload that takes a predicate of type Expression<Func<TSource, Boolean>>.
My first attempt, which was wrong, was:

public static Expression<Func<TEntity, bool>> RangesIntersect<TEntity, TComparable>(
                        Expression<Func<TEntity, TComparable>> getEntityRange1Start,
                        Expression<Func<TEntity, TComparable>> getEntityRange1End,
                        Expression<Func<TEntity, TComparable>> getEntityRange2Start,
                        Expression<Func<TEntity, TComparable>> getEntityRange2End)
{
    (!) Incorrect code
    return Expression.Lambda<Func<TEntity, bool>>(
        RangesIntersect(getEntityRange1Start.Body, getEntityRange1End.Body,
                        getEntityRange2Start.Body, getEntityRange2End.Body),
        getEntityRange1Start.Parameters);
}

Let’s make a quick test to see how it doesn’t work:

var trainingStart = new DateTime(2013, 1, 1);
var trainingEnd = trainingStart.AddDays(7);
var vacation = new Vacation {Start = new DateTime(2013, 1, 5),
                             End = new DateTime(2013, 1, 10)};

var intersectExpression = RangeIntersection
      .RangesIntersect<Vacation, DateTime?>(
             v => trainingStart, v => trainingEnd,
             v => v.Start, v => v.End);
bool intersect = intersectExpression.Compile()(vacation);

Exception:

variable ‘v’ of type ‘Vacation’ referenced from scope ”, but it is not defined

Had we used the Expression in Linq, we would have gotten the exception:

The parameter ‘v’ was not bound in the specified LINQ to Entities query expression.

This fails because the parameter of the first Expression (which used in getEntityRange1Start.Parameters) is not present in the other three Expressions. It looks like all of them should be using the same v, but they are independent.
To get it to work we need all four expressions to use the same parameter. A quick search finds the handy ExpressionSubstitute class, by Marc Gravell:

/// <summary>
/// Visit an Expression, and replace one sub expression with another.
/// </summary>
/// <remarks> http://stackoverflow.com/a/8611105/7586 </remarks>
class ExpressionSubstitute : ExpressionVisitor
{
    public readonly Expression from, to;
    public ExpressionSubstitute(Expression from, Expression to)
    {
        this.from = from;
        this.to = to;
    }
    public override Expression Visit(Expression node)
    {
        if (node == from) return to;
        return base.Visit(node);
    }
}

Now we can easily change the expressions’ parameters:

public static Expression<Func<TEntity, TReturnType>> RewireLambdaExpression<TEntity, TReturnType>(
    Expression<Func<TEntity, TReturnType>> expression,
    ParameterExpression newLambdaParameter)
{
    var newExp = new ExpressionSubstitute(expression.Parameters.Single(), newLambdaParameter).Visit(expression);
    return (Expression<Func<TEntity, TReturnType>>)newExp;
}

Correct Code

We can finally rewire the expressions, and write the extension method:

public static Expression<Func<TEntity, bool>> RangesIntersect<TEntity, TComparable>(
                                Expression<Func<TEntity, TComparable>> getEntityRange1Start,
                                Expression<Func<TEntity, TComparable>> getEntityRange1End,
                                Expression<Func<TEntity, TComparable>> getEntityRange2Start,
                                Expression<Func<TEntity, TComparable>> getEntityRange2End)
{
    ParameterExpression newLambdaParameter = Expression.Parameter(typeof(TEntity));
    var rewireStart1 = RewireLambdaExpression(getEntityRange1Start, newLambdaParameter);
    var rewireEnd1 = RewireLambdaExpression(getEntityRange1End, newLambdaParameter);
    var rewireStart2 = RewireLambdaExpression(getEntityRange2Start, newLambdaParameter);
    var rewireEnd2 = RewireLambdaExpression(getEntityRange2End, newLambdaParameter);

    return Expression.Lambda<Func<TEntity, bool>>(RangesIntersect(rewireStart1.Body, rewireEnd1.Body,
                                                                  rewireStart2.Body, rewireEnd2.Body),
                                                  newLambdaParameter);
}

public static IQueryable<TEntity> WhereRangesIntersect<TEntity, TComparable>(
    this IQueryable<TEntity> query,
    Expression<Func<TEntity, TComparable>> getEntityRange1Start,
    Expression<Func<TEntity, TComparable>> getEntityRange1End,
    Expression<Func<TEntity, TComparable>> getEntityRange2Start,
    Expression<Func<TEntity, TComparable>> getEntityRange2End)
{
    return query.Where(RangesIntersect(getEntityRange1Start, getEntityRange1End,
                                       getEntityRange2Start, getEntityRange2End));
}

Result

Now we can write Linq queries that look like:

using (VacationsEntities model = new VacationsEntities())
{
    var trainingStart = new DateTime(2013, 1, 1);
    var trainingEnd = trainingStart.AddDays(7);
    var query = model.Vacations
        .WhereRangesIntersect(v => trainingStart, v => trainingEnd,
                              v => v.Start, v => v.End);
    // ...
}

We can also look at the resulting SQL, using ((ObjectQuery<Vacation>)query).ToTraceString():

SELECT 
[Extent1].[VacationId] AS [VacationId], 
[Extent1].[EmployeeId] AS [EmployeeId], 
[Extent1].[Start] AS [Start], 
[Extent1].[End] AS [End]
FROM [Vacations] AS [Extent1]
WHERE (@p__linq__0 <= [Extent1].[End]) AND
      ([Extent1].[Start] <= @p__linq__1)

The expression also works with OData. For example, using Stack Overflow’s endpoint:

var uri = new Uri("http://data.stackexchange.com/stackoverflow/atom");
var entities = new StackOverflowOData.Entities(uri);
var trainingStart = new DateTime(2012, 3, 15);
var trainingEnd = trainingStart.AddDays(7);

var q = entities.Posts
    .WhereRangesIntersect(p => p.CreationDate.Value, p => p.ClosedDate.Value,
                          p => trainingStart, p => trainingEnd).Take(10);

The url filter will be (using q.ToString()):

&$filter=(CreationDate le datetime'2012-03-22T00:00:00') and
         (datetime'2012-03-15T00:00:00' le ClosedDate)

To me, this is has been a nice exercise, and finally gave me a reason to play with expression trees.

See Also