Should
you return IQueryable<T> from Your Repositories
The internet has been abuzz for a long time with fierce
debates about whether or not you should return IQueryable<T> types from
your repositories. I do have a fairly strong opinion on this issue so I
figured I would weigh in on the discussion. First let's take a look at what
they are talking about. We'll say they are using Entity
Framework for the sake of my examples but they
could just as easily be using LINQ to SQL.
Why would someone want to return IQueryable<T>?
Well it is true that IQueryable<T> is
extremely powerful because any LINQ written against an IQueryable collection
is supposed to be translated into an expression tree which is then translated
into the data store's native language, such as SQL. This means that you only
get back the data you query for rather than having all the data returned and
then filtered down in memory on the server. Take a look at this quick example
of a repository that returns IQueryable.
public interface IBookRepository
{
IQueryable<Book> Books { get; }
void AddBook(Book book);
void DeleteBook(Book book);
void SaveChanges();
}
{
IQueryable<Book> Books { get; }
void AddBook(Book book);
void DeleteBook(Book book);
void SaveChanges();
}
public class BookRepository : IBookRepository
{
private Entities _objectContext;
public IQueryable<Book> Books { get { return _objectContext.Books; } }
{
private Entities _objectContext;
public IQueryable<Book> Books { get { return _objectContext.Books; } }
public BookRepository()
{
_objectContext = new Entities();
}
{
_objectContext = new Entities();
}
public void AddBook(Book book)
{
_objectContext.Books.AddObject(book);
}
{
_objectContext.Books.AddObject(book);
}
public void DeleteBook(Book book)
{
_objectContext.Books.DeleteObject(book);
}
{
_objectContext.Books.DeleteObject(book);
}
public void SaveChanges()
{
_objectContext.SaveChanges();
}
}
{
_objectContext.SaveChanges();
}
}
The
thing people love about this interface is that you can get books however you
want.
IBookRepository bookRepo = new BookRepository();
var booksByRowling = bookRepo.Books.Where(x => x.Author == "J.K. Rowling");
var booksByRowling = bookRepo.Books.Where(x => x.Author == "J.K. Rowling");
You
are in complete control over how the data is retrieved which at first seems
really awesome, but I'll show you in a minute why I don't think it's a good
practice.
Why I don't think you should use IQueryable<T> outside of your repository.
I don't believe that you should return IQueryable for
several reasons. First, it breaks unit testing capabilities. Half the point of
compartmentalizing your data persistence logic into repositories is so that
you can isolate and test that logic. If you're passing the buck up to the
domain logic level then you're no longer able to test data persistence logic
and domain logic by itself. In my opinion it completely breaks separation of
concerns. You can't unit test a property of type IQueryable because
you couldn't possibly account for all the logic possibilities that could be
handed off from your domain logic. The person writing the domain logic is
completely responsible for writing sound queries against the database! That's
just absurd in my humble opinion.
Straight away the inability to properly test a component
of your application should be reason enough to steer clear. However, there is
another major issue that I haven't really seen discussed elsewhere on the web
in relation to this debate. Returning IQueryable from your repository automatically forces the domain
logic to make several assumptions. As I sort of mentioned before the first
assumption is that the domain is responsible for writing query logic. If you
divide up the pieces of your application among different developers then those
developers working on domain logic are forced to write code against that IQueryable interface
in order to retrieve the data they want. The second assumption it makes is
that the data store will ALWAYS know how to properly implement IQueryable. If you
ever try to re-factor your repository layer to use a different data storage
technique then you will likely also have to re-factor your domain logic.
Aside from test-ability the other major reason for
implementing the repository pattern is to create an application seam. The
repository layer is supposed to hide the data persistence code from the rest
of the application. The domain logic is ideally supposed to talk to the
repository layer through interfaces and remain ignorant of what is actually
going on behind those interfaces. This enables you to re-write your interfaces
at any time to use a completely different data persistence technology. By
returning IQueryable you automatically tie your repository layer to any data
persistence technology that properly implements IQueryable. Yes technically you
could implement IQueryable yourself and attempt to translate the LINQ written
against it into something usable for whatever data persistence tech you're
using, but that's a huge commitment and far from fool-proof. If you still
decide to return IQueryable then you may as well resolve yourself to the fact that
you will be stuck using either LINQ to SQL or Entity Framework because as of
this writing I am not aware of any other tech that properly implements IQueryable.
The proper way to do it.
I definitely understand the desire to return such a
useful and powerful type. But for the sake of test-ability and maintainability
I highly advise against it. Instead you should be exposing necessary functions
through your repository interfaces. Make your methods more verbose so that
writing domain logic that uses it is simple and easy. For example, instead of
writing a method called GetBooks write a method called GetBooksByAuthor(string author) and another
called GetBooksByDateRange(DateTime startDate, DateTime endDate), etc. Yes
the method names get larger and yes you will be writing more methods, but when
you think about it from the perspective of maintainability it becomes pretty
clear that it's the right way to go.
Many people have argued that IQueryable<T> makes it so easy to do paging because you can just do var pageOfBooks =
bookRepo.Books.skip(itemsPerPage * (page-1)).Take(itemsPerPage); in your domain logic. It's not that much harder to
write a method on your repository and implement that paging logic there.
public IList<Book> GetBooksByPage(int page, int
itemsPerPage)
{
return _objectContext.Books.Skip(itemsPerPage * (page-1)).Take(itemsPerPage).ToList();
}
{
return _objectContext.Books.Skip(itemsPerPage * (page-1)).Take(itemsPerPage).ToList();
}
You
still get the benefit of having that paging executed in the data store so
you're not returning superfluous rows that you're not even going to use, but
it's compartmentalized in a repository method so that it's hidden from the
rest of your application.
The counter-argument to this is a scenario where you
need a page of results but you also need the count for how many items there
are total (usually for calculating the maximum number of pages). To do this
people often think they need two methods. One method to get the paged result
set and another method to get the item count. If you did have to use two
methods for that it does seem a little silly, but you don't. See
my post about how to properly return a paged
result set.
A related discussion is whether or not you should return
IEnumerable<T> or IList<T>. Personally
I think it really depends. IEnumerable is usually used for lazy loading, which means that the collection isn't populated until
you iterate over it (such as in a foreach loop or by using certain LINQ methods like ToList()). This means that if you loop over data in an MVC view
then the data is being retrieved at the time it's being rendered to the view.
There are definitely edge-cases where this is useful, such as when accessing a
navigation property on an entity object, but it's not really useful in the
verbose repository pattern that I'm promoting. This is because I like my
repository methods to be very specific which almost always means that whatever
data I'm getting back is going to be used because I'm calling specific methods
to get specifically what I want, instead of calling generic methods and then
filtering through the results. So I always return IList and call the ToList() LINQ method after my query results in my repository
method so that the query is executed and the data is loaded immediately. By
the time my data hits my domain it is already in memory waiting to be
consumed.
The bottom line is that returning IEnumerable is
questionable at best, but returning IQueryable is useless and ruins maintainability and test-ability.
7 Comment(s)
posted 2 months ago
It's
all a matter of opinion. In your scenario your repository is eventually going
to have a very large number of properties and methods as you'll have to go
back to modify your repository every time you want a result in a different
layout or format. Also, if your repository is based on an interface to support
Dependency Injection, your interface is going to be massive. You'll have to
modify your interface, then any fakes that you're using in your tests will
have to implement those new methods and will probably just have
notsupportedexceptions in them.
An
even bigger problem comes when you start to have a more complex object model
in your entities and you need to pull back a large number of entities which
reference a sub entity collection (one to many). if you're always returning
IEnumerable<entity> then you'll actually be returning entire rows of
data when you might only need a couple fields from the entities in question.
An
example (pretend it's a controller action method):
var author = bookRepo.GetAuthorByName("J.K.
Rowling"); // say this returns an Author
var books = author.Books;
return View(books.Select(b => b.Title);
var books = author.Books;
return View(books.Select(b => b.Title);
In
the above, the entire author record is returned, then the entire book records
are returned just to get the name.
Instead
if it were returning IQueryable<Author>, with a slight modification the
select from the database server returns only the book titles in a simpler
query. Less across the wire, less for the SQL server to process.
var bookTitles =
bookRepo.Authors.Where(a => a.Name == "J.K.
Rowling")
.SelectMany(a => a.Books.Select(b => b.Title))
.ToList();
var books = author.Books;
return View(bookTitles);
.SelectMany(a => a.Books.Select(b => b.Title))
.ToList();
var books = author.Books;
return View(bookTitles);
Analyze
the query with sql profiler or the like.
I
agree with the Testing pains or IQueryable, but the flexibility of IQueryable
outweighs that in my opinion.
Just
be sure to have both unit tests using faked/mocked entities along with
integration tests against a test database.
Mattias
posted 6 months ago
Interesting.
I am making my first web api using entity framework these days, and I have
been returning IQueryables from my repositories so far, knowing about the
limitations it puts on my ability to write good tests. The reason is of course
because my controller needs to shape the data (include/exclude fields) based
on the request. Sometimes you want the response to include images, sometimes
you'd just want links to them. Also paging is something I handle outside the
repository by now. I will probably refactor this tomorrow. I can't really
picture it without seeing my code, but I am afraid that, sometimes, when I use
multiple repositories (in a unit of work) to get my job done, I will have to
access the database multiple times. We'll see. :)
posted 7 months ago
Actually
Robert that's why my repository methods are more specific and only return a
page of results.
public IList<Book> GetBooksByPage(int page, int
itemsPerPage)
{
return _objectContext.Books.Skip(itemsPerPage * (page-1)).Take(itemsPerPage).ToList();
}
{
return _objectContext.Books.Skip(itemsPerPage * (page-1)).Take(itemsPerPage).ToList();
}
_objectContext.Books is still IQueryable within the repository so SQL is generated that only
returns a single page of results from the database and then that page of
results is returned from the repository as a list.
Roberto Rodríguez Bouzo
posted 7 months ago
About
the using of Iqueryable or Ienumerable there is concern I think you haven´t
talked about. What happen when you have a database with let´s say 500000
records. If you dont use IQueryable you will be getting all records from the
database in any single query you make. And therefore you app will we really
slow.
Thanks
steve
posted 7 months ago
I
think your post assumes you are doing all the design and development from
scratch. And have a limited number of tables... a perfect world scenario.
As
always, legacy issues come into account, and the current project I am working
on has hundreds of tables and the majority of code is very old.
I
do not have the time nor patience to write an individual repository for every
single object. In this case, a generic Repository<T> that
returns IQueryable<T> works out really nicely.
And
I am still able to implement a fake repository that takes an
IEnumerable<T> in the constructor to unit test my queries
against. Sure it isn't the exact same
thing as pure SQL, but at some point, don't you have to trust the framework is
doing what it is supposed to be doing? Hence why you don't write unit tests
for Math.Add()
You
are allowed to treat those things as reliable/safe.
Custom
repositories per table/object is not a realistic pattern in my experience. If
it is a heavily used application, it isn't maintainable over a significant
length of time. You will be swamped by these individual classes, and when
maintaining that layer is harder than maintaining an ORM layer, what is the
point of even having an ORM anyway? Just write stored procs in custom classes,
it'll be faster. Bleh!
posted 9 months ago
Thanks
for this article. I've been reading a lot about what patterns to use for data
access with LINQ to SQL recently and I really agree with you.
Now
I just need to figure out how to write efficient LINQ to SQL queries with
eager loading :-)
SoftJunk
posted a year ago
I
love this post very much as I always believe repository should not let higher
layers to be linked with database. Thought my repository do have only one
SELECT method, but my service layer sets all kind of where conditions an d
order conditions and paging etc information in different properties of
repository (List, dictionary, string etc).
And
in the select method I generate the expression to pull the data from database.
So I am not in problem if I have to go to different database source, cause I
just have to change the code in repository to interact with database :)