2013年12月24日星期二

Should you return IQueryable from Your Repositories

    Should you return IQueryable<T> from Your Repositories
    Posted a year ago by Alex Ford (edited a year ago)


    The internet has been abuzz for a long time with fierce debates about whether or not you should return IQueryable<T> types from your repositories. I do have a fairly strong opinion on this issue so I figured I would weigh in on the discussion. First let's take a look at what they are talking about. We'll say they are using Entity Framework for the sake of my examples but they could just as easily be using LINQ to SQL.
    Why would someone want to return IQueryable<T>?
    Well it is true that IQueryable<T> is extremely powerful because any LINQ written against an IQueryable collection is supposed to be translated into an expression tree which is then translated into the data store's native language, such as SQL. This means that you only get back the data you query for rather than having all the data returned and then filtered down in memory on the server. Take a look at this quick example of a repository that returns IQueryable.
    public interface IBookRepository
    {
       
    IQueryable<Book> Books { get; }
       
    void AddBook(Book book);
       
    void DeleteBook(Book book);
       
    void SaveChanges();
    }
    public class BookRepository : IBookRepository
    {
       
    private Entities _objectContext;
       
    public IQueryable<Book> Books { get { return _objectContext.Books; } }
    public BookRepository()
        {
            _objectContext =
    new Entities();
        }
    public void AddBook(Book book)
        {
            _objectContext.
    Books.AddObject(book);
        }
    public void DeleteBook(Book book)
        {
            _objectContext.
    Books.DeleteObject(book);
        }
    public void SaveChanges()
        {
            _objectContext.
    SaveChanges();
        }
    }
    The thing people love about this interface is that you can get books however you want.
    IBookRepository bookRepo = new BookRepository();
    var booksByRowling = bookRepo.Books.Where(x => x.Author == "J.K. Rowling");
    You are in complete control over how the data is retrieved which at first seems really awesome, but I'll show you in a minute why I don't think it's a good practice.
    Why I don't think you should use IQueryable<T> outside of your repository.
    I don't believe that you should return IQueryable for several reasons. First, it breaks unit testing capabilities. Half the point of compartmentalizing your data persistence logic into repositories is so that you can isolate and test that logic. If you're passing the buck up to the domain logic level then you're no longer able to test data persistence logic and domain logic by itself. In my opinion it completely breaks separation of concerns. You can't unit test a property of type IQueryable because you couldn't possibly account for all the logic possibilities that could be handed off from your domain logic. The person writing the domain logic is completely responsible for writing sound queries against the database! That's just absurd in my humble opinion.
    Straight away the inability to properly test a component of your application should be reason enough to steer clear. However, there is another major issue that I haven't really seen discussed elsewhere on the web in relation to this debate. Returning IQueryable from your repository automatically forces the domain logic to make several assumptions. As I sort of mentioned before the first assumption is that the domain is responsible for writing query logic. If you divide up the pieces of your application among different developers then those developers working on domain logic are forced to write code against that IQueryable interface in order to retrieve the data they want. The second assumption it makes is that the data store will ALWAYS know how to properly implement IQueryable. If you ever try to re-factor your repository layer to use a different data storage technique then you will likely also have to re-factor your domain logic.
    Aside from test-ability the other major reason for implementing the repository pattern is to create an application seam. The repository layer is supposed to hide the data persistence code from the rest of the application. The domain logic is ideally supposed to talk to the repository layer through interfaces and remain ignorant of what is actually going on behind those interfaces. This enables you to re-write your interfaces at any time to use a completely different data persistence technology. By returning IQueryable you automatically tie your repository layer to any data persistence technology that properly implements IQueryable. Yes technically you could implement IQueryable yourself and attempt to translate the LINQ written against it into something usable for whatever data persistence tech you're using, but that's a huge commitment and far from fool-proof. If you still decide to return IQueryable then you may as well resolve yourself to the fact that you will be stuck using either LINQ to SQL or Entity Framework because as of this writing I am not aware of any other tech that properly implements IQueryable.
    The proper way to do it.
    I definitely understand the desire to return such a useful and powerful type. But for the sake of test-ability and maintainability I highly advise against it. Instead you should be exposing necessary functions through your repository interfaces. Make your methods more verbose so that writing domain logic that uses it is simple and easy. For example, instead of writing a method called GetBooks write a method called GetBooksByAuthor(string author) and another called GetBooksByDateRange(DateTime startDate, DateTime endDate), etc. Yes the method names get larger and yes you will be writing more methods, but when you think about it from the perspective of maintainability it becomes pretty clear that it's the right way to go.
    Many people have argued that IQueryable<T> makes it so easy to do paging because you can just do var pageOfBooks = bookRepo.Books.skip(itemsPerPage * (page-1)).Take(itemsPerPage); in your domain logic. It's not that much harder to write a method on your repository and implement that paging logic there.
    public IList<Book> GetBooksByPage(int page, int itemsPerPage)
    {
       
    return _objectContext.Books.Skip(itemsPerPage * (page-1)).Take(itemsPerPage).ToList();
    }
    You still get the benefit of having that paging executed in the data store so you're not returning superfluous rows that you're not even going to use, but it's compartmentalized in a repository method so that it's hidden from the rest of your application.
    The counter-argument to this is a scenario where you need a page of results but you also need the count for how many items there are total (usually for calculating the maximum number of pages). To do this people often think they need two methods. One method to get the paged result set and another method to get the item count. If you did have to use two methods for that it does seem a little silly, but you don't. See my post about how to properly return a paged result set.
    A related discussion is whether or not you should return IEnumerable<T> or IList<T>. Personally I think it really depends. IEnumerable is usually used for lazy loading, which means that the collection isn't populated until you iterate over it (such as in a foreach loop or by using certain LINQ methods like ToList()). This means that if you loop over data in an MVC view then the data is being retrieved at the time it's being rendered to the view. There are definitely edge-cases where this is useful, such as when accessing a navigation property on an entity object, but it's not really useful in the verbose repository pattern that I'm promoting. This is because I like my repository methods to be very specific which almost always means that whatever data I'm getting back is going to be used because I'm calling specific methods to get specifically what I want, instead of calling generic methods and then filtering through the results. So I always return IList and call the ToList() LINQ method after my query results in my repository method so that the query is executed and the data is loaded immediately. By the time my data hits my domain it is already in memory waiting to be consumed.
    The bottom line is that returning IEnumerable is questionable at best, but returning IQueryable is useless and ruins maintainability and test-ability.

    7 Comment(s)


    posted 2 months ago
    It's all a matter of opinion. In your scenario your repository is eventually going to have a very large number of properties and methods as you'll have to go back to modify your repository every time you want a result in a different layout or format. Also, if your repository is based on an interface to support Dependency Injection, your interface is going to be massive. You'll have to modify your interface, then any fakes that you're using in your tests will have to implement those new methods and will probably just have notsupportedexceptions in them.
    An even bigger problem comes when you start to have a more complex object model in your entities and you need to pull back a large number of entities which reference a sub entity collection (one to many). if you're always returning IEnumerable<entity> then you'll actually be returning entire rows of data when you might only need a couple fields from the entities in question.
    An example (pretend it's a controller action method):
    var author = bookRepo.GetAuthorByName("J.K. Rowling"); // say this returns an Author
    var books = author.Books;
    return View(books.Select(b => b.Title);
    In the above, the entire author record is returned, then the entire book records are returned just to get the name.
    Instead if it were returning IQueryable<Author>, with a slight modification the select from the database server returns only the book titles in a simpler query. Less across the wire, less for the SQL server to process.
    var bookTitles = bookRepo.Authors.Where(a => a.Name == "J.K. Rowling")
    .
    SelectMany(a => a.Books.Select(b => b.Title))
    .
    ToList();
    var books = author.Books;
    return View(bookTitles);
    Analyze the query with sql profiler or the like.
    I agree with the Testing pains or IQueryable, but the flexibility of IQueryable outweighs that in my opinion.
    Just be sure to have both unit tests using faked/mocked entities along with integration tests against a test database.
    Mattias
    posted 6 months ago
    Interesting. I am making my first web api using entity framework these days, and I have been returning IQueryables from my repositories so far, knowing about the limitations it puts on my ability to write good tests. The reason is of course because my controller needs to shape the data (include/exclude fields) based on the request. Sometimes you want the response to include images, sometimes you'd just want links to them. Also paging is something I handle outside the repository by now. I will probably refactor this tomorrow. I can't really picture it without seeing my code, but I am afraid that, sometimes, when I use multiple repositories (in a unit of work) to get my job done, I will have to access the database multiple times. We'll see. :)

    posted 7 months ago
    Actually Robert that's why my repository methods are more specific and only return a page of results.
    public IList&lt;Book&gt; GetBooksByPage(int page, int itemsPerPage)
    {
       
    return _objectContext.Books.Skip(itemsPerPage * (page-1)).Take(itemsPerPage).ToList();
    }
    _objectContext.Books is still IQueryable within the repository so SQL is generated that only returns a single page of results from the database and then that page of results is returned from the repository as a list.
    Roberto Rodríguez Bouzo
    posted 7 months ago
    About the using of Iqueryable or Ienumerable there is concern I think you haven´t talked about. What happen when you have a database with let´s say 500000 records. If you dont use IQueryable you will be getting all records from the database in any single query you make. And therefore you app will we really slow.
    Thanks
    steve
    posted 7 months ago
    I think your post assumes you are doing all the design and development from scratch. And have a limited number of tables... a perfect world scenario.
    As always, legacy issues come into account, and the current project I am working on has hundreds of tables and the majority of code is very old.
    I do not have the time nor patience to write an individual repository for every single object. In this case, a generic Repository&lt;T&gt; that returns IQueryable&lt;T&gt; works out really nicely.
    And I am still able to implement a fake repository that takes an IEnumerable&lt;T&gt; in the constructor to unit test my queries against. Sure it isn't the exact same thing as pure SQL, but at some point, don't you have to trust the framework is doing what it is supposed to be doing? Hence why you don't write unit tests for Math.Add()
    You are allowed to treat those things as reliable/safe.
    Custom repositories per table/object is not a realistic pattern in my experience. If it is a heavily used application, it isn't maintainable over a significant length of time. You will be swamped by these individual classes, and when maintaining that layer is harder than maintaining an ORM layer, what is the point of even having an ORM anyway? Just write stored procs in custom classes, it'll be faster. Bleh!
    posted 9 months ago
    Thanks for this article. I've been reading a lot about what patterns to use for data access with LINQ to SQL recently and I really agree with you.
    Now I just need to figure out how to write efficient LINQ to SQL queries with eager loading :-)
    SoftJunk
    posted a year ago
    I love this post very much as I always believe repository should not let higher layers to be linked with database. Thought my repository do have only one SELECT method, but my service layer sets all kind of where conditions an d order conditions and paging etc information in different properties of repository (List, dictionary, string etc).
    And in the select method I generate the expression to pull the data from database. So I am not in problem if I have to go to different database source, cause I just have to change the code in repository to interact with database :)