Query return results logic

Dec 28, 2013 at 9:48 AM
Hi,

I've changed describe to construct in order to make it work with take/skip, but there are parts of the code that i don't understand why it is implemented like this.

In SparqlResultDataObjectHelper.binddataobjects method there is
 if (xmlReader.IsStartElement())
            {
                if ("RDF".Equals(xmlReader.LocalName) && RdfNamespace.Equals(xmlReader.NamespaceURI))
                {
                    var dataRdfObj = BindRdfDataObjects(xmlReader);
                    return dataRdfObj;
                }
            }
            var dataObj = BindDataObjects(xmlReader);
then when creating the query:
new SparqlQueryContext(
                    useDescribe && !_queryBuilder.IsDistinct && !_queryBuilder.IsOrdered
                        ? _queryBuilder.GetSparqlDescribeString()
                        : _queryBuilder.GetSparqlString(),
This means for IsDistinct or IsOrdered queries, it expects the entity URI id's(one triple per entity/object) and it will later load them in DataObject(CheckLoaded), so a lazy loading in this case, but when I just use skip/take(without distinct/order) it will return all the triples so it will actually load them on the spot(BindRdfDataObjects).

This is confusing, because I would expect to have the same behavior regardless of the query type. I'm thinking I would have a flag (lazy loading) on context/query and when it's true just use sparql select to get only one triple per entity (and lazy load them when a property is read), else load using construct all the entity triples.
In this way it will be consistent.
So at this point i'm not sure why and how you expect it to behave (maybe there are some limitations i don't know about)
Coordinator
Dec 28, 2013 at 1:42 PM
DISTINCT queries are excluded because there is no DESCRIBE DISTINCT in SPARQL.
ORDER BY queries are excluded because there is no guarantee of the order of triples in the graph returned by the DESCRIBE so you would end up having to apply the sort on the client side, which just gets complicated.

Because of these limitations (especially for DISTINCT, but also because of the complexity of implementing generic client-side sorting) there is no user control over what is "lazy loaded" vs what is "eager loaded" through a DESCRIBE query.
Dec 29, 2013 at 9:29 AM
Edited Dec 30, 2013 at 12:29 PM
Ok, i understand about describe limitation but construct also doesn't preserve ordering (you suggested in other post to use construct instead of describe)

The problem is that in case of ordering/distinct, instead of returning all data, the server returns an ordered id's list in and then if a user wants to display this on UI then for each person the api will create a query in order to load entity triples. So for 50 persons it will do 51 queries.

So i tested a bit on how a query would look like that will allow all this to be eager loaded :

For a list of persons with name, age - eager loading:
#CONSTRUCT { ?s ?p ?o}
#where{
        SELECT ?s ?p ?o
        where {
                ?s ?p ?o.
                
                #1#{?s a ?o}
                #2#UNION  {?s <http://www.example.org/schema/name> ?o}
                #3#UNION  {?s <http://www.example.org/schema/age> ?o}.
                {
                        # get entities with filters and pagination
                        select distinct ?s ?age
                        where
                        {
                                ?s <http://www.example.org/schema/age> ?age.
                                ?s a <http://www.example.org/schema/Person>.
                        }
                        order by desc(?age)
                        limit 2
                        offset 2
                }
        }
        order by desc(?age)
#}
I'm not sure if un-commenting CONSTRUCT would preserve order on all db's( it seems to in my tests).
Un-commenting #1,2# would be a projection on name, #1,3# on age.


Or maybe an easier alternative is to add OPTIONAL clauses for each property in case there is no .select projection in the query, so the generated select would look like: "select ?personid ?age ?name"
Coordinator
Dec 31, 2013 at 2:51 PM
In general graphs are not ordered, so I think that there is no guarantee that ordering would be preserved and as far as I know there is no requirement that ordering needs to be preserved written into the spec.

One way you could possible make this more efficient right now would be to query first just for the entity IDs then use those to create another query that uses that list of IDs in a Contains() LINQ expression. That way you can get your results in two round trips instead of N+1. I have no idea how efficient the query processing on the server would be though...

However in terms of future development I thought about this some more over the last couple of days and I think one possible way to support ordering in eager loading scenarios would be to ensure that the sort variable values are separately recorded in the CONSTRUCTed graph using well-known triples. Then on the client side it would be possible to query the results graph to get the actual items in the sort order. It would mean two queries (one on the server side and then another on the client side) but only one round-trip. However, this won't work when using DISTINCT in conjunction with LIMIT and OFFSET because the DISTINCT in SPARQL applies to all of the SELECT variable bindings which means that an entity could end up with multiple solutions for the sort variables. If you aren't paging thats not a big deal, a Distinct() LINQ filter can be applied on the client side, but if you are paging this ends up changing the number of results so potentially you end up getting your OFFSET and LIMIT value out of sync with the atual number of returned entities. So, although I think it might be possible to at least extend eager loading support to sorted queries, I have a feeling that paged sorted queries where you expect each entity to appear only once are going to be a problem. (Not a problem if your data ensures that you only have one possible solution for sort variables of course, but in the general case we cannot assume that to be true).

I'm going to log a task to look into this a bit. I feel like the more we can make EF queries eager loading by default (maybe later adding an option for lazy loading), the better - especially for client-server scenarios.
Jan 3, 2014 at 12:06 PM
Edited Jan 3, 2014 at 12:57 PM
In general graphs are not ordered, so I think that there is no guarantee that ordering would be preserved and as far as I know there is no requirement that ordering needs to be preserved written into the spec.
I'm assuming you refer to CONSTRUCT.

I kind of understand what you are saying but I don't understand why this won't work with the above query, considering inner query only filters on ids. So now it would look like
                SELECT ?s ?p ?o
                where {
                        ?s ?p ?o.
                        ?s <http://www.example.org/schema/age> ?age.
                        select distinct ?s
                        where
                        {
                                ?s <http://www.example.org/schema/age> ?age.
                                ?s a <http://www.example.org/schema/Person>.
                        }
                        order by desc(?age)
                        limit 3
                        offset 2
                }
                order by desc(?age)            
     
The inner query would filter on person id's and the outer query would get triples in order because of the second order by. So on client it's just a matter of parsing the triples. Am I missing something?
Coordinator
Jan 3, 2014 at 2:33 PM
feugen24 wrote:
In general graphs are not ordered, so I think that there is no guarantee that ordering would be preserved and as far as I know there is no requirement that ordering needs to be preserved written into the spec.
I'm assuming you refer to CONSTRUCT.

I am yes (also applies to DESCRIBE).

I kind of understand what you are saying but I don't understand why this won't work with the above query, considering inner query only filters on ids. So now it would look like
                SELECT ?s ?p ?o
                where {
                        ?s ?p ?o.
                        ?s <http://www.example.org/schema/age> ?age.
                        select distinct ?s
                        where
                        {
                                ?s <http://www.example.org/schema/age> ?age.
                                ?s a <http://www.example.org/schema/Person>.
                        }
                        order by desc(?age)
                        limit 3
                        offset 2
                }
                order by desc(?age)            
     
The inner query would filter on person id's and the outer query would get triples in order because of the second order by. So on client it's just a matter of parsing the triples. Am I missing something?
Thats some pretty torturous SPARQL you have there :-) - I can't see why it wouldn't work, but I would love to see what the query processor makes of it. It gets harder to apply this pattern if you sort on an indirect property or on some aggregate (e.g. sort by number of friends, or sort by date/time of last tweet), but it is probably still doable. Of course if all the EF has to do is to assume that you are giving it ordered triples then its not a problem that EF needs to solve, but it is a problem for the person writing the custom query so having some patterns like this to follow would be really useful.