This project has moved. For the latest updates, please go here.

EntityFramework + BrightstarDB are too slow

Aug 18, 2016 at 7:20 AM
Edited Aug 18, 2016 at 7:41 AM
Hi,

I have an ontology with a TypeA and a TypeB. Between those types there is a many-to-many relationship. So I can query all B's compatible to A's with the following SPARQL query. In my ontology there is a custom predicate for the compatibility.
SELECT DISTINCT ?labelB
WHERE {
  ?a    a    my:typeA.
  ?b    a    my:typeB;
         my:compatible ?a;
         rdfs:label    ?labelB.
}
This query executes in about 44ms and gets faster with each call (I've used the Brightstar RDF Client API to query my store). Now I've tried to achieve thhe same with EF:
[Entity("my:typeA")]
public interface TypeA
{
    [Identifier]
    string Id { get; }

    [PropertyType("rdfs:label")]
    string Name { get; }

    [InversePropertyType("my:compatible")]
    ICollection<TypeB> CompatibleBs { get; }
}

[Entity("my:typeB")]
public interface TypeB
{
    [Identifier]
    string Id { get; }

    [PropertyType("rdfs:label")]
    string Name { get; }

    [InversePropertyType("my:compatible")]
    ICollection<TypeA> CompatibleAs { get; }
}
Now I've queried the store by the following LINQ query:
var allCompatibleBs = context.TypeAs.SelectMany(a => a.CompatibleBs);
This query took about 35-50seconds which is much slower as the SPARQL query.

EDIT: I've added a Distinct() to my LINQ query. Now it needs around 3 seonds but this is still not as good as the RDF Client query.
var allCompatibleBs = context.TypeAs.SelectMany(a => a.CompatibleBs).Distinct();
So is there a way to inspect the SPARQL query that will be generated out of the LINQ syntax or do you see something else I did wrong?

Kind regards,
Severin
Coordinator
Aug 19, 2016 at 8:55 AM
Hi,

The biggest difference between the SPARQL query and the LINQ query is that the LINQ query is retrieving all properties of the B's, whereas the SPARQL query is only retrieving the labels. You could do a query like:
var allCompatibleBs = context.TypeAs.SelectMany(a=>a.CompatibleBs.Select(b=>b.Name));
Which would be a bit more equivalent to the SPARQL query and may execute faster.

The SPARQL query that is generated from the LINQ should get logged by B* when it is executed. If you are running B* as a service the query should be logged to the log.txt file which will typically be in the same directory where B* stores the data. If B* is embedded in your own application then the simplest thing might be to force it to log to the console using:
Logging.EnableConsoleOutput(true);
Otherwise take a look at http://brightstardb.readthedocs.io/en/latest/Running_BrightstarDB/#configuring-logging for more on setting up the logging.

Hope this helps!

If you can share more information about the query being generated and/or some code and data that would be really helpful for me to see if there are any performance improvements I can make to the LINQ-to-SPARQL part of the process.

Cheers

Kal
Aug 19, 2016 at 10:57 AM
Edited Aug 22, 2016 at 5:55 AM
So my old and slow version gives this log output:
BrightstarDB Information: 0 : Query semantic-platform-database-eval CONSTRUCT {?x003Cgeneratedx003Ex005Fx0030 ?x003Cgeneratedx003Ex005Fx0030_p ?x003Cgeneratedx003Ex005Fx0030_o .?x003Cgeneratedx003Ex005Fx0030 <http://www.brightstardb.com/.well-known/model/selectVariable> "x003Cgeneratedx003Ex005Fx0030" .}WHERE {?x003Cgeneratedx003Ex005Fx0030 ?x003Cgeneratedx003Ex005Fx0030_p ?x003Cgeneratedx003Ex005Fx0030_o .{ SELECT DISTINCT ?x003Cgeneratedx003Ex005Fx0030 WHERE {?c a <http://mynamespace.com#typeA> .?x003Cgeneratedx003Ex005Fx0030 a <http://mynamespace.com#typeB> .?x003Cgeneratedx003Ex005Fx0030 <http://mynamespace.com#compatible> ?c .}} }
BrightstarDB Information: 0 : Query CONSTRUCT { 
  ?x003Cgeneratedx003Ex005Fx0030 ?x003Cgeneratedx003Ex005Fx0030_p ?x003Cgeneratedx003Ex005Fx0030_o . 
  ?x003Cgeneratedx003Ex005Fx0030 <http://www.brightstardb.com/.well-known/model/selectVariable> "x003Cgeneratedx003Ex005Fx0030" . 
}
WHERE 
{ 
  ?x003Cgeneratedx003Ex005Fx0030 ?x003Cgeneratedx003Ex005Fx0030_p ?x003Cgeneratedx003Ex005Fx0030_o . 
  { { {SELECT DISTINCT ?x003Cgeneratedx003Ex005Fx0030 WHERE
   { 
     ?c a <http://mynamespace.com#typeA> . 
     ?x003Cgeneratedx003Ex005Fx0030 <http://mynamespace.com#compatible> ?c . 
     ?x003Cgeneratedx003Ex005Fx0030 a <http://mynamespace.com#typeB> . 
   }
  }}}
}
With your suggestion (the extra Select) it creates a much faster query:
BrightstarDB Information: 0 : Query semantic-platform-database-eval SELECT DISTINCT ?x003Cgeneratedx003Ex005Fx0032 WHERE {?c a <http://mynamespace.com#typeA> .?x003Cgeneratedx003Ex005Fx0032 <http://mynamespace.com#compatible> ?c .}
BrightstarDB Information: 0 : Query SELECT DISTINCT ?x003Cgeneratedx003Ex005Fx0032 WHERE
{ 
  ?c a <http://mynamespace.com#typeA> . 
  ?x003Cgeneratedx003Ex005Fx0032 <http://mynamespace.com#compatible> ?c . 
}
The first query doesn't look that bad compared to the SPARQL query I am using directly with the RDF Client API. But in both queries there is no "rdfs:label" for the name property. How is that coming?

EDIT:
When removing the CONSTRUCT from the first,old and slow query it is as fast as the second query. So it seems that this CONSTRUCT is the troublemaker. The CONSTRUCT is slow and doesn't return a result at all (double-checked it with Polaris). So how can I get all items of a dataset efficiently without any extra LINQ statement?