This project has moved. For the latest updates, please go here.

SPARQL Query Speed Differences...

Mar 25, 2014 at 5:49 PM
I have a sparql query that takes 150-300ms to complete, but if I remove 1 criteria, then it comes back in 1ms. Here are my examples:

PREFIX properties: http://www.me.com/0.1/PropertyTypes/
PREFIX links: http://www.me.com/0.1/LinkTypes/
PREFIX id: http://www.brightstardb.com/.well-known/genid/
SELECT DISTINCT *
WHERE {
?Memory properties:Address "O:1.2"^^http://www.w3.org/2001/XMLSchema#string .
?Memory links:ToDataFile ?DataFile .
?DataFile links:ToProgram id:1b38c98e-eab7-41db-bad7-49016c751c1a.
}
SLOW: 150-300ms

PREFIX properties: http://www.me.com/0.1/PropertyTypes/
PREFIX links: http://www.me.com/0.1/LinkTypes/
PREFIX id: http://www.brightstardb.com/.well-known/genid/
SELECT DISTINCT *
WHERE {
?Memory properties:Address "O:1.2"^^http://www.w3.org/2001/XMLSchema#string .
?Memory links:ToDataFile ?DataFile .
}
FAST: 1ms

Notice the additional criteria of:
?DataFile links:ToProgram id:1b38c98e-eab7-41db-bad7-49016c751c1a.
causes the query to be very slow. In my store, there is only 1 Memory object that has an Address of "O:1.2" (for now) and both queries return a single triple (as they should). Why would the addition of the criteria for an exact matching genid cause the query to be so slow? Is there another sparql syntax that would be faster when trying to match against a known genid?

I'm new to sparql (month or so). It took me a while to figure out that I needed the extra ^^ in the Address matching criteria. Regex FILTER was too slow.

Also, when the sparql query is executed, does each criteria get executed in order thus modifying the previous criteria's set. If so, would defining the most restrictive criteria first increase the performance of the query? Are there any good links on the web that describe how to create optimized sparql queries or to understand the what is going on behind the scene?

Thanks for your help,
Mike
Mar 25, 2014 at 9:38 PM
At first sight I don't think you should have that difference.
Not sure what your dataset is but, during my works with triple stores I noticed each one has various issues, for example query planer might get things wrong, you found a bug, etc.
So what I do is use dnr store manager and connect to multiple databases/sparql endpoints, that are loaded with same data, so I can see if my query is wrong. This will consume some time at start in order to deploy a db but then it's easy to check problems.

Links: here, also go on http://answers.semanticweb.com/ and stackoverflow (filtered on sparql tag), filter on hottest questions, maybe search on performance.
Coordinator
Mar 26, 2014 at 5:00 PM
I can't see an obvious reason why the query should be that much slower. There may just be a bad ordering of the joins between the different patterns, or it could be that the extra pattern generates a large number of candidates to join on. At the moment we only have very basic query planning implemented and I think it could quite simply be a case that the pattern of your query is being handled poorly.

You will certainly find regex filtering to be slower than a direct match because the value can't be used in the index lookup, whereas if you specify the value directly in the triple pattern we can just do an index lookup with the value. The strange thing is that in this case the additional criteria should just be one more index lookup which will result in a list of bindings for ?DataFile - could that pattern be returning a lot of results if it were evaluated by itself ?

The other thing to be aware of is that the server does cache query results - as long as there are no new transactions on the store it will try to return cached results for your query - perhaps that is what happened for the fast query (though I'm not sure why it wouldn't also happen for the slower one).

If you would like me to take a look in more detail you can send me the data that is in your store (just an NTriples export would be fine), then I can run the queries and trace through the query execution to see what it is doing under the hood. You can message me through my Codeplex account.

Cheers

Kal