This project has moved and is read-only. For the latest updates, please go here.

Real world test, am I on the right way ?

Aug 2, 2014 at 8:43 AM
Hello,

I have done some "real world tests".
Create new material, add stock, move stock and remove stock. 10 stock operations each pass.

Each test has 1000 passes.

BrightstarDB BTree store :
Time : 603 sec
StoreSize : 3535 MB

BrightstarDB SQLite store :
Time : 650 sec
StoreSize : 30 MB

The good old ORM:
Time : 148 sec
StoreSize : 6 MB

Yes, the values are correct.

Whats happen here.
My first test takes about 4 hours. I had in some linq queries the IsMaster at the first place and then the material="22777". So dotNetRdf make no optimizations and exectues the first statement with all data and checks the result for the material number.
I have changes this to "where material=="22777" && IsMaster" and it runs much faster (from 4 hours to 600 seconds).

At the end of the test, I query 1000 Materials (with only 2 attributes filled) to a list, this takes about 3 seconds (the expected execution time is 300 ms). There is a sort by, so each material number is gathered by dotNetRDF to do a sort an then each subject is gathered.
If I do the sort after a ".ToList()" it takes about 2 seconds. The time is needed to gather all the subjects ...

I'm a little frustrated, I have spend so much time and I'm not sure what to do now.
Aug 2, 2014 at 9:00 AM
Hi,

Please be patient - BrightstarDB is relatively new and work to improve performance is on the plan. I believe that we can definitely improve performance. A lot of performance improvement will come from implementing BrightstarDB-specific optimizations for the default dotNetRDF query algebra - as your testing has shown some simple changes will yield big improvements. Not to mention then moving on to building more specific indexes and work on improving raw triple throughput. This is not work that will happen overnight, but it is important and it will receive attention.

I hope you will continue to work with us to help make BrightstarDB better.

Cheers

Kal
Aug 2, 2014 at 1:04 PM
Hello,

thank you.

I have found many thinks to make it faster.

But with a newer Version of dotNetRdf the queries runs all sequential. I have tried to debug the code but I have found nothing ...

I think, the main problem is dotNetRDF.

I have not found a good Query optimizer. Normally a simple QueryOptimizer looks to the lowest possible result set and converts this condition up. For such things the QueryOptimizer needs Index information's.
I think it can not combine this indices like : http://www.codeproject.com/Articles/375413/RaptorDB-the-Document-Store#mgindex

Here is a actual very small query. You can see what dotNetRDF is doing :
BrightstarDB Information: 0 : 13.35.01.289 (9) Query start : 13.35.01.289
BrightstarDB Information: 0 : 13.35.01.289 (9) Query WMS CONSTRUCT {?BewArt ?BewArt_p ?BewArt_o .?BewArt <http://db/sel> "BewArt" .}WHERE {?BewArt ?BewArt_p ?BewArt_o .{ SELECT ?BewArt WHERE {?BewArt<http://db/bestandsArt/isWA> true.?BewArt<http://db/bestandsArt/isMaster> true.?BewArt a <http://db/BestandsArt> .} LIMIT 1} }
BrightstarDB Information: 0 : 13.35.01.289 (9) Query CONSTRUCT { 
  ?BewArt ?BewArt_p ?BewArt_o . 
  ?BewArt <http://db/sel> "BewArt" . 
}
WHERE 
{ 
  {SELECT ?BewArt WHERE
   { 
     ?BewArt <http://db/bestandsArt/isWA> true . 
     ?BewArt <http://db/bestandsArt/isMaster> true . 
     ?BewArt a <http://db/BestandsArt> . 
   }
  LIMIT 1 }
  ?BewArt ?BewArt_p ?BewArt_o . 
}

BrightstarDB Information: 0 : 13.35.01.289 (9) Match :  | @predicate - http://db/bestandsArt/isWA | @object - true | @graphId - http://db/graph
BrightstarDB Information: 0 : 13.35.01.292 (9) Match :  | @subject - http://id/86695718-d384-4c10-94e9-883514d590b2 | @predicate - http://db/bestandsArt/isMaster | @object - true | @graphId - http://db/graph
BrightstarDB Information: 0 : 13.35.01.294 (9) Match :  | @subject - http://id/86695718-d384-4c10-94e9-883514d590b2 | @predicate - http://www.w3.org/1999/02/22-rdf-syntax-ns#type | @object - http://db/BestandsArt | @graphId - http://db/graph
BrightstarDB Information: 0 : 13.35.01.297 (9) Match :  | @subject - http://id/86695718-d384-4c10-94e9-883514d590b2 | @graphId - http://db/graph
I'm not sure, what I can do to make this better. We need a good query optimizer and better query execution unit ...

I'm so happy, that I use the EntityName and the FieldName to identify the field. One of my query had a match to IsMaster, which is in every Entity ...

Have you looked at the store size ?
Aug 2, 2014 at 1:46 PM
I agree that index statistics are going to be really important. I added some basic statistics as an optional item to BrightstarDB in a previous release. The statistics are pretty basic at the moment (just a count of the # subjects and objects for each predicate) but it would be a good start and its probably not too hard to add other statistics (e.g. count # distinct subjects and objects for a predicate so you know if a predicate tends to be clustered).

DNR has some interfaces for plugging in query optimization - I need to look in more detail at how that works and what options there are for extending / overriding the algebra. Its important to be able to pull up things like FILTER statements into the pattern matching parts of the algebra as that is another good way to reduce intermediate result set sizes. I'm just not sure if/how that is possible with the default query algebra DNR produces.

On store size - does your statistics include the transactions.bs file ? As I have already said a couple of times, I do plan to allow this to be turned off as it is a big source of store size. Its really not part of the actual indexes for the store, its never used and its there to provide a feature that really no one uses and probably no one will use until there is some support for replication.

There are other size optimizations that could be applied such as using prefixes for the keys in BTree nodes and compressing nodes. You will notice that data.bs for example compresses really well (I found a 400MB data.bs file would zip to 51MB using just standard levels of ZIP compression), so we are almost certainly missing a trick by not using a zipstream to read/write pages - I just need to figure out how to make that work with our current approach to random access on store pages. Size optimization may also have a beneficial effect on performance (though there may be a tradeoff with the processor overhead of zip/unzip).

Cheers

Kal
Aug 2, 2014 at 2:12 PM
Hello,

no, it's only the data.bs. This comes from the blocksize with empty blocks and was to be expected result (5. post)
https://brightstardb.codeplex.com/discussions/551598

Block 100 at pos 0 len -1 !?
BrightstarDB Information: 0 : FilePage SetData : 100 - len : -1 - Pos : 0
Block 101 at pos 0 len -1 !?
BrightstarDB Information: 0 : FilePage SetData : 101 - len : -1 - Pos : 0
Block 103 at pos 0 len -1 !?
Block 103 at pos 1948 len 8
Block 103 at pos 1956len 8
Block 103 at pos 1964len 8
BrightstarDB Information: 0 : FilePage SetData : 103 - len : -1 - Pos : 0
BrightstarDB Information: 0 : FilePage SetData : 103 - len : 8 - Pos : 1948
BrightstarDB Information: 0 : FilePage SetData : 103 - len : 8 - Pos : 1956
BrightstarDB Information: 0 : FilePage SetData : 103 - len : 8 - Pos : 1964
To solve this, you need a block management system with a dynamic block length, a block index and a free block list ...

It's not important for me, but it's a nice comparison.

Best regards
Aug 2, 2014 at 6:56 PM
Hello,

I have found a issue, may be in dotNetRDF.

The SPARQL looks good.
BrightstarDB Information: 0 : 19.34.09.180 (10) Query CONSTRUCT { 
  ?Mat ?Mat_p ?Mat_o . 
  ?Mat <http://db/sel> "Mat" . 
}
WHERE 
{ 
  {SELECT ?Mat WHERE
   { 
     ?Mat <http://db/material/nummer> "123Stk"^^<http://www.w3.org/2001/XMLSchema#string> . 
     ?Mat <http://db/material/isMaster> true . 
     MINUS { ?Mat <http://db/material/status> 9  . } 
     { ?Mat a <http://db/Material> . } 
   }
  LIMIT 1 }
  ?Mat ?Mat_p ?Mat_o . 
}
Why does the last line occurs 1000x (for each material) ?
BrightstarDB Information: 0 : 19.34.09.180 (10) Match :  | @predicate - http://db/material/nummer | @object - 123Stk | @graphId - http://db/graph
BrightstarDB Information: 0 : 19.34.09.183 (10) Read : <http://id/4e7767cd-4acf-459c-8405-fb8542eaf10d> <http://db/material/nummer> 123Stk^^http://www.w3.org/2001/XMLSchema#string@
BrightstarDB Information: 0 : 19.34.09.197 (10) Match :  | @subject - http://id/4e7767cd-4acf-459c-8405-fb8542eaf10d | @predicate - http://db/material/isMaster | @object - true | @graphId - http://db/graph
BrightstarDB Information: 0 : 19.34.09.200 (10) Read : <http://id/4e7767cd-4acf-459c-8405-fb8542eaf10d> <http://db/material/isMaster> true^^http://www.w3.org/2001/XMLSchema#boolean@
BrightstarDB Information: 0 : 19.34.09.201 (10) Match :  | @predicate - http://db/material/status | @object - 9 | @graphId - http://db/graph
BrightstarDB Information: 0 : 19.34.09.203 (10) Read : <http://id/21d4ba95-3519-4713-bf83-65de320c3a14> <http://db/material/status> 9^^http://www.w3.org/2001/XMLSchema#integer@
BrightstarDB Information: 0 : 19.34.09.203 (10) Read : <http://id/6182450b-06a3-4920-af1e-224650f5f417> <http://db/material/status> 9^^http://www.w3.org/2001/XMLSchema#integer@
BrightstarDB Information: 0 : 19.34.09.203 (10) Read : <http://id/bb8d860b-b339-47e5-a0a6-2b5186278e9d> <http://db/material/status> 9^^http://www.w3.org/2001/XMLSchema#integer@
BrightstarDB Information: 0 : 19.34.09.203 (10) Read : <http://id/0fa424f8-de74-4573-854e-fa1316716d67> <http://db/material/status> 9^^http://www.w3.org/2001/XMLSchema#integer@
Aug 4, 2014 at 9:48 AM
CyborgDE wrote:
Hello,

no, it's only the data.bs. This comes from the blocksize with empty blocks and was to be expected result (5. post)
https://brightstardb.codeplex.com/discussions/551598

Block 100 at pos 0 len -1 !?
BrightstarDB Information: 0 : FilePage SetData : 100 - len : -1 - Pos : 0
This means copy all the data from the source buffer into the page starting at position 0. We are not actually trying to copy -1 bytes ;-)

To solve this, you need a block management system with a dynamic block length, a block index and a free block list ...

Hmmm... that is assuming that the BTree index is not getting sufficiently utilized. Sure when the tree is almost empty some pages are going to be under utilized, but we try not to split nodes until they are properly full so I can't see where there is scope for much improvement. I'm happy to be proved wrong though.
We did at one stage have a page index to dynamically reassign page ids to data file offsets, it turned out to be a performance bottleneck, so we went to the strategy of using statically defined, offset-based page ids. This does have implications for size of append-only stores (we need to update more pages for an insert because we can't redirect child pointers), but has no implication on the size of the rewrite store.
It's not important for me, but it's a nice comparison.

In general our approach has been to assume that storage is cheap, and if we can get better performance at the price of more MB on disk that is acceptable, even on mobile devices. That said, if we could squeeze more data into individual pages that could assist both disk usage and performance.
Cheers

Kal
Aug 5, 2014 at 8:42 AM
CyborgDE wrote:
Why does the last line occurs 1000x (for each material) ?
BrightstarDB Information: 0 : 19.34.09.180 (10) Match :  | @predicate - http://db/material/nummer | @object - 123Stk | @graphId - http://db/graph
BrightstarDB Information: 0 : 19.34.09.183 (10) Read : <http://id/4e7767cd-4acf-459c-8405-fb8542eaf10d> <http://db/material/nummer> 123Stk^^http://www.w3.org/2001/XMLSchema#string@
BrightstarDB Information: 0 : 19.34.09.197 (10) Match :  | @subject - http://id/4e7767cd-4acf-459c-8405-fb8542eaf10d | @predicate - http://db/material/isMaster | @object - true | @graphId - http://db/graph
BrightstarDB Information: 0 : 19.34.09.200 (10) Read : <http://id/4e7767cd-4acf-459c-8405-fb8542eaf10d> <http://db/material/isMaster> true^^http://www.w3.org/2001/XMLSchema#boolean@
BrightstarDB Information: 0 : 19.34.09.201 (10) Match :  | @predicate - http://db/material/status | @object - 9 | @graphId - http://db/graph
BrightstarDB Information: 0 : 19.34.09.203 (10) Read : <http://id/21d4ba95-3519-4713-bf83-65de320c3a14> <http://db/material/status> 9^^http://www.w3.org/2001/XMLSchema#integer@
BrightstarDB Information: 0 : 19.34.09.203 (10) Read : <http://id/6182450b-06a3-4920-af1e-224650f5f417> <http://db/material/status> 9^^http://www.w3.org/2001/XMLSchema#integer@
BrightstarDB Information: 0 : 19.34.09.203 (10) Read : <http://id/bb8d860b-b339-47e5-a0a6-2b5186278e9d> <http://db/material/status> 9^^http://www.w3.org/2001/XMLSchema#integer@
BrightstarDB Information: 0 : 19.34.09.203 (10) Read : <http://id/0fa424f8-de74-4573-854e-fa1316716d67> <http://db/material/status> 9^^http://www.w3.org/2001/XMLSchema#integer@
dotNetRDF uses index joins wherever possible so it takes the binding for ?Mat from the LHS of the MINUS and substitutes them into the RHS of it which leads to many specific index scans rather than a single less specific index scan.

Depending on the dataset this may not be the best strategy particular when accessing the dataset requires going to disk and thus becomes IO bound rather than memory/compute bound
Aug 5, 2014 at 8:46 AM
CyborgDE wrote:
BrightstarDB Information: 0 : 13.35.01.289 (9) Match : | @predicate - http://db/bestandsArt/isWA | @object - true | @graphId - http://db/graph BrightstarDB Information: 0 : 13.35.01.292 (9) Match : | @subject - http://id/86695718-d384-4c10-94e9-883514d590b2 | @predicate - http://db/bestandsArt/isMaster | @object - true | @graphId - http://db/graph BrightstarDB Information: 0 : 13.35.01.294 (9) Match : | @subject - http://id/86695718-d384-4c10-94e9-883514d590b2 | @predicate - http://www.w3.org/1999/02/22-rdf-syntax-ns#type | @object - http://db/BestandsArt | @graphId - http://db/graph BrightstarDB Information: 0 : 13.35.01.297 (9) Match : | @subject - http://id/86695718-d384-4c10-94e9-883514d590b2 | @graphId - http://db/graph ``` I'm not sure, what I can do to make this better. We need a good query optimizer and better query execution unit ...
Brightstar can choose to override BGP execution if it wanted to and thus swap out the index join approach that dotNetRDF uses for an alternative approach (and I think they even did this in the past) but on the whole index joins are far more efficient than other kinds of joins hence why they stick with the default strategy AFAIK
Aug 5, 2014 at 11:13 AM
Edited Aug 5, 2014 at 2:16 PM
Hello,

I my first test with BrightstarDB I found out that the query builder uses the FILTER and dotNetRDF do not use the index when uses the FILTER (Issue 116).
In a hard job where I had to understand everything and also perform some optimizations. I changed the Query Builder to the WHERE so that dotNetRDF can use the index, this works good.

But there are still some issues:
  • With dotNetRDF 1.6 the queries are not working
  • The index is a hash and there is no possibility to make searches like "start with", "between", "greater", ...
  • dotNetRDF has no index based query optimization (can not of course).
  • The Storesize is 300 times greater (expected was 5 times greater) then the ORM (which should be replaced). This a a known issue (regarding the fixed blocksize) and in work. But the store is still in developent, so I can not use it now.
  • The ORM solution is two time faster (my hope was that it is faster)
I have spend 4 Week development, now I must earn money ...

From the first post, the time for the test is 247 seconds but I must manually optimize the queries.

The most problems are solved, with a SQLite store (later maybe a SQL store, some customers want to have a SQL Server.).

I pass the primary SQL Statement as a triple to the store, get back the triples and let do dotNetRDF the rest.