EF custom queries

Jan 1, 2014 at 12:15 PM
Custom queries are a very important part because some complex queries might not be created using linq, or a specific db has custom query tag/hints/notes that deviate from sparql standard.

I played a bit with running custom queries on a context and there are various issues, so i'll first point how i would like it to behave.


Consider having entity of type Person( Id, Name, Age) like in unit tests.

1 . load list of persons(lazy load):
var selectQuery = "
select  ?id
where {
        ?id a <http://www.example.org/schema/Person>
}
";
context.ExecuteQuery<Person>(selectQuery)
2 . load list of persons(eager load):
var selectQuery = "
select  ?id, ?name, ?age
where {
       ....
}
";
context.ExecuteQuery<Person>(selectQuery, lazyload: false)
//since there are more select params and each param name corresponds to a Person //property I would expect the mapping to work by convention; this should also work for //anonymous types if variable-property names match.

3 . same as 2 but with mapping.
This is useful when variable names don't have same names as property names
context.ExecuteQuery<Person>(selectQuery, varaibleToPropertyMapping)
4 . load list of persons(eager load), but as triples:
var selectQuery = "
select  ?s ?p ?o
where {
   ?s ?p ?o.
        ?s a <http://www.example.org/schema/Person>
}";
context.ExecuteQuery<Person>(selectQuery, lazyLoad:false, loadAsTriples: true)
//in this case the results are similar to a CONSTRUCT but created with a SELECT

5 .bind a query result to whatever type i want, independently of query execution:
Consider a query that returns triples for more than one type (load more data in one call)
var queryResult = context.ExecuteQuery(selectQuery, loadAsTriples: true, );
var persons = context.BindResultAs<Person>(queryResult);
//or: persons = queryResult.As<Person>();
var customers = context.BindAs<Customer>(queryResult);
6 . It would be nice to have some unit tests or docs examples of custom queries.
7 . Nice to have for the future: allow to inject custom expressions inside linq expressions so we can have type safe query but with some custom parts.
Consider I want to limit some queries to a maximum execution time and db allows this. If I can't inject this into a linq query in some way then all those queries need to be custom strings instead.

for example:

db:info db:maxQueryExecutionTime 20s
select ?s
where{
db:info db:whereOptimizer Value.
?s a _person.

}

I tried to work with custom query in code but "context.ExecuteQuery<Person>" accepts a "SparqlQueryContext" that has a constructor with lots of unnecessary params when the type is
already defined in context:
var queryContext = new SparqlQueryContext(
                        @"select *
where {
        ?s a <http://www.example.org/schema/Person>
}",
  new List<Tuple<string, string>>(),
  null,
  new List<string>(),
  new List<Tuple<MemberInfo, string>>(),
  null
                        );
                    var results = context.ExecuteQuery<Person>(queryContext);
Case *1 works ok with code above but would look much better with
"context.ExecuteQuery<Person>(selectQuery)"

*2,3,4,5 i don't think are possible at the moment, at least not in a elegant way.
Coordinator
Jan 2, 2014 at 5:58 PM
feugen24 wrote:
Custom queries are a very important part because some complex queries might not be created using linq, or a specific db has custom query tag/hints/notes that deviate from sparql standard.

I played a bit with running custom queries on a context and there are various issues, so i'll first point how i would like it to behave.


Consider having entity of type Person( Id, Name, Age) like in unit tests.

1 . load list of persons(lazy load):
I think this could be quite easy to add support for - the requirement would be a query that results in a single column, result values would be expected to be URIs and the results are just data bound as normal. Basically this could provide a wrapper around the current context.ExecuteQuery<T> method that creates the SparqlQueryContext for you.
2 . load list of persons(eager load):
This is more complicated because in the general case you will have a much more complex structure than just an entity with two literal properties. The most important issue to consider is entities that have collection properties (e.g. if the Person class had a Friends property). In that case a SELECT query won't really cut it because of the way you get a row for every possible solution - add two or three collection properties and you quickly end up with an explosion in the number of rows in the query results. A better approach would be to require eager loading queries to CONSTRUCT a graph from which the entities can then be loaded. If you do that then the requirement could be that your query CONSTRUCTs a graph that contains all of the entities and properties you want to load and includes some special resource (with a well-known identifier) that all of the result entities are connected to in some way. Note that it is not enough to rely on the rdf:type of the entity because you could have entities of that type that are not the direct results of the query, but are in fact just property values for the entities that are the results of the query (think again of the Friends property as an example).

3 . same as 2 but with mapping.
Same issue with collections. as (2)

4 . load list of persons(eager load), but as triples:
This is a possibility I suppose, but in practice it doesn't gain you much over the CONSTRUCT form of the query. I guess one reason for supporting it might be to support a SPARQL endpoint that only allows SELECT queries. One difference here would be that if you limit the values in ?s to only being the IDs of the entities you want in the results set, then it is a bit easier for the results processor to know what to include and what not to include...so maybe its not a totally crazy idea :-)
5 .bind a query result to whatever type i want, independently of query execution:
Consider a query that returns triples for more than one type (load more data in one call)
var queryResult = context.ExecuteQuery(selectQuery, loadAsTriples: true, );
var persons = context.BindResultAs<Person>(queryResult);
//or: persons = queryResult.As<Person>();
var customers = context.BindAs<Customer>(queryResult);
Maybe, though I wouldn't necessarily see this as a filter but instead a way of projecting the same set of results as different kinds of things (so if your query returns 100 entities, both persons and customers would contain 100 results, regardless of what rdf:type the entities have). If you wanted a filter that could be also added though as a client-side linq expression:
var onlyCustomers = queryResult.OfType<Customer>()
6 . It would be nice to have some unit tests or docs examples of custom queries.
I agree. I think it would make sense ot work some more on what sort of things are feasible and get a good syntax for those worked out and use them as the basis for creating some unit tests so that we do a test-drive development of this stuff.
7 . Nice to have for the future: allow to inject custom expressions inside linq expressions so we can have type safe query but with some custom parts.
Consider I want to limit some queries to a maximum execution time and db allows this. If I can't inject this into a linq query in some way then all those queries need to be custom strings instead.
I think this might be better handled by tidying up the way in which the context is constructed so that you can create overrides to hook and modify aspects of the SPARQL query creation and execution process.
Jan 3, 2014 at 4:34 PM
1 .
 load list of persons(lazy load):
I think this could be quite easy to add support for - the requirement would be a query that results in a single column, result values would be expected to be URIs and the results are just data bound as normal. Basically this could provide a wrapper around the current context.ExecuteQuery<T> method that creates the SparqlQueryContext for you.
Maybe SparqlQueryContext should be named "SparqlLinqQueryContext" and maybe SparqlLinqQueryContext should derive from SparqlQueryContext
Manages the context information required during the processing of an entity framework LINQ query into SPARQL
At this moment it has lot of unnecessary stuff for just a query.

2 ,3 .
This is more complicated because in the general case you will have a much more complex structure than just an entity with two literal properties. The most important issue to consider is entities that have collection properties (e.g. if the Person class had a Friends property). In that case a SELECT query won't really cut it because of the way you get a row for every possible solution - add two or three collection properties and you quickly end up with an explosion in the number of rows in the query results. A better approach would be to require eager loading queries to CONSTRUCT a graph from which the entities can then be loaded. If you do that then the requirement could be that your query CONSTRUCTs a graph that contains all of the entities and properties you want to load and includes some special resource (with a well-known identifier) that all of the result entities are connected to in some way. Note that it is not enough to rely on the rdf:type of the entity because you could have entities of that type that are not the direct results of the query, but are in fact just property values for the entities that are the results of the query (think again of the Friends property as an example).
Agreed.

4.
4 . load list of persons(eager load), but as triples:
This is a possibility I suppose, but in practice it doesn't gain you much over the CONSTRUCT form of the query. I guess one reason for supporting it might be to support a SPARQL endpoint that only allows SELECT queries. One difference here would be that if you limit the values in ?s to only being the IDs of the entities you want in the results set, then it is a bit easier for the results processor to know what to include and what not to include...so maybe its not a totally crazy idea :-)
What is gained over construct is the possibility to return ordered triples from the server. I think it's the only way i can do that. ( it's related to the example i gave in other post)


5 .
5 .bind a query result to whatever type i want, independently of query execution:
Consider a query that returns triples for more than one type (load more data in one call)
var queryResult = context.ExecuteQuery(selectQuery, loadAsTriples: true, );
var persons = context.BindResultAs<Person>(queryResult);
//or: persons = queryResult.As<Person>();
var customers = context.BindAs<Customer>(queryResult);
Maybe, though I wouldn't necessarily see this as a filter but instead a way of projecting the same set of results as different kinds of things (so if your query returns 100 entities, both persons and customers would contain 100 results, regardless of what rdf:type the entities have). If you wanted a filter that could be also added though as a client-side linq expression:
var onlyCustomers = queryResult.OfType<Customer>()
"queryResult.OfType<Customer>()" seems ok. The scenario I was thinking is loading multiple types of entities (e.g Customers, Order, Items) in a query that is in fact a union of 3 subqueries, one for each type. So the query would return triples (with select or construct) for many types. Then I can retrieve all the entities in one call, and filter on type.
Jan 6, 2014 at 5:27 PM
I have made some changes related to custom queries. Could I create a fork?
Coordinator
Jan 6, 2014 at 7:02 PM
feugen24 wrote:
I have made some changes related to custom queries. Could I create a fork?
Please do - I pretty much agree with everything you have written in the above. It would be best if you could fork off from the develop branch. I am working at the moment on just restructuring the constructor chain so that you can insert the ISparqQueryProcessor and ISparqUpdateProcessor directly from code and fixing the bugs in the Linq stuff that this has thrown up. I haven't started on any of the stuff above yet and I would be happy to review and merge a fork.
Jan 7, 2014 at 11:00 AM
created pull request