This project has moved. For the latest updates, please go here.

Index

Jul 2, 2014 at 4:19 AM
Hello,

I have added 40000 actors to the sample project.

If I query the actors, over the field name this takes about 10 seconds with 50 result records.
ctx.Actors.Where( q => q.Name.Equals("Rababer" ))
Is there no index or how can I specify an index for this fields ?

Thank you.

Best regards
Coordinator
Jul 3, 2014 at 8:28 PM
Hi,

There are BTree indexes in BrightstarDB for all the triples in the store. There isn't at the moment any way to index specific fields separately, we just index everything.

That said, the performance you are getting is pretty poor - it sounds like something is fundamentally wrong with either the query execution plan or the btree page caching. That's the bad news :) The good news is that we were discussing next steps recently and have decided that our plan is to focus on performance (especially query performance) in the 1.8 milestone - I'll definitely add this case to our performance test cases.

In the meantime, it may be worth looking at if changing the BrightstarDB.PageCacheSize and/or BrightstarDB.ResourceCacheLimit values help to improve performance for you. (see http://brightstardb.readthedocs.org/en/latest/Running_BrightstarDB/#additional-configuration-options)

Cheers

Kal
Jul 4, 2014 at 1:00 PM
Hello,

thank you.

I forgot to mention that I'm working under Windows 8.1 with the portable library.

I'm not sure what to do now. Working with SQLite, which is much faster in insert and read or trust that you will find the error ???
I want to create a WMS on Windows 8.1 as a shop app (Base DB structure and main functions are ready).
I have reached the limits of relational databases and search for other solutions. A query may well contain 100,000 objects. These are transferred from a ORM into a class tree and I'll break this down into a list of objects ( the tree is defined by the main object ).

I have done some test with DatabaseBenchmark2.0.2 and your results are not good:

SQLITE
Write rec/sec : 5741
Read rec/sec : 228311
db Size : 22 MB

BrightstarDB
Write rec/sec : 1456 ( I have some ideas to optimize this )
Read rec/sec : 2815 ( factor 100 !? Is here is the same problem ? )
db Size : 1100 MB (wow, to big. To many text in the DataBase, can be smaller)

I like your solution, but the read is to slow and the db is to big.
My be I can spend some hours for optimization ...

Some additional points :

And there I cannot do a SaveChanges() as second time.
After a couple of hours I found a workaround in the file AppendOnlyFilePageStore.cs near line 175 :
                        lock (_stream)
                        {

                                    //DE 3
                                    if ( _stream.Length == 0 && _stream.Position > 0 ) {
                                        _stream = _peristenceManager.GetInputStream( _path );
                                    }

                                    page = new FilePage( _stream, pageId, _pageSize );
                            if (_backgroundPageWriter != null)
                            {
                                _backgroundPageWriter.ResetTimestamp(pageId);
                            }
#if DEBUG_PAGESTORE
                            Logging.LogDebug("Load {0} {1}", pageId, BitConverter.ToInt32(page.Data, 0));
#endif
                        }
after this change ( 3 Lines after "//DE 3") , I can save many times ...

And for performance reasons maybe this is in ConcurrentQueue.cs better:
        public bool TryDequeue(out T item)
        {
            lock (_queue)
            {
                try
                {
                         //DE 4
                         if ( _queue == null | _queue.Count == 0 ) {
                             item = default( T );
                             return false;
                         } else {
                             item = _queue.Dequeue();
                             return true;
                         }
                }
                catch (InvalidOperationException)
                {
                    item = default(T);
                    return false;
                }
            }
        }
I think you can get a very better performance, when you remove the string parsing ...


Best regards,
Coordinator
Jul 5, 2014 at 6:49 PM
Hi,

Thanks for posting the benchmarks and for the suggestions for fixes / improvements. I hope you will consider sticking with BrightstarDB and help us to improve it!

A few comments:

1) If you want to limit database size then use the rewriteable store. It will be much more compact after repeated updates. See http://brightstardb.readthedocs.org/en/latest/Store_Persistence_Types/ for more information.

2) There is a transaction log file that we keep (transactions.bs) which is actually just a list of all the triples added/deleted in NTriples format. It is really there to allow the rebuild of a store by replaying transactions. However it does take up a lot of space because it is not compressed in any way. I wonder if for applications like yours it might be a good idea to allow you to disable the transaction log. This would do two things: (a) more than halve the amount of space used by the store and (b) eliminate one of the file writes that takes place during an update. I'll have to take a look into the code, but I think its not a big change to make it possible to turn off the transaction log (maybe as an option when you create the store so you can create it without transaction logging enabled).

3) The string parsing is primarily RDF parsing because BrightstarDB is actually an RDF triple store. So the records are converted into triples and sent to the store - this does mean that there is a serialization/deserialization overhead, but I think that this is something we are stuck with. However it does mean that parsing and serialization are both areas where we need to do some careful examination to see if we can make it faster.

Cheers

Kal
Cheers

KAl
Jul 7, 2014 at 6:44 AM
Hello,

thank you for you answer.

I have done some tests.

It seems, that the query time comes from the dotNetRDF. So I have used the current source.
With the current source, a simple query with a single result take about 12 seconds. Wow ...

I tested the 1.05 -> the same.

I tested the 1.04, works like the library (10ms to 400ms for a simple query with one result row).
With the query from above, dotNetRDF makes a Filter.evaluate (in Filter.cs). The Filter gather all records (round about 10 seconds) and select the filtered records later.
This is bad. Now I have small data with about 40000 records, what is with 4000000 records !?

The same behavior has the 1.05 with a "limit 1", gather all and use the first ...

Maybe your SPARQL can resolve this.

The current query:
CONSTRUCT { 
  ?f ?f_p ?f_o . 
  ?f <http://www.brightstardb.com/.well-known/model/selectVariable> "f" . 
}
WHERE 
{ 
  {SELECT ?f WHERE
   { 
     ?f a <http://brightstardb.com/namespaces/default/Film> . 
     ?f <http://brightstardb.com/namespaces/default/name> ?v0 . 
     FILTER(?v0 = "Star Wars"^^<http://www.w3.org/2001/XMLSchema#string>)  . 
   }
  LIMIT 1 }
  ?f ?f_p ?f_o . 
}
Best regards
Coordinator
Jul 7, 2014 at 8:14 AM
Edited Jul 7, 2014 at 8:16 AM
Hi,

Thanks for tracking this down. This is a really good case where we could definitely use some query optimisation. As you noted the query processor gathers all of the possible matches for ?v0 (which is all names of all films) and then iterates through them applying the filter - this is not at all efficient and is effectively skipping the use of perfectly useable indexes. However, the FILTER step could be replaced by a lookup as this is an equivalent SPARQL query:
CONSTRUCT { 
  ?f ?f_p ?f_o . 
  ?f <http://www.brightstardb.com/.well-known/model/selectVariable> "f" . 
}
WHERE 
{ 
  {SELECT ?f WHERE
   { 
     ?f a <http://brightstardb.com/namespaces/default/Film> . 
     ?f <http://brightstardb.com/namespaces/default/name> "Star Wars" ^^ <http://www.w3.org/2001/XMLSchema#string> . 
   }
  LIMIT 1 }
  ?f ?f_p ?f_o . 
}
In this case we will only find the ?f that has the name "Star Wars" using an index lookup, the FILTER step is gone entirely. Of course that won't work with all FILTER statements, but a direct equality comparison with no other boolean operators should just be a straight replacement and as we don't need ?v0 for any other purpose we can just directly replace the variable (if we still need it elsewhere we can add a Bind() step to ensure the value is kept) . There should be some code in the LINQ provider that applies this optimization, but my guess is that since I changed the codebase for eager loading I have somewhere made it so that this branch doesn't get exectuted any longer. In any case I think that the best place to make this optimization is in the SPARQL processing code.

I've logged this as an issue (https://github.com/BrightstarDB/BrightstarDB/issues/116)
Jul 7, 2014 at 9:04 AM
Edited Jul 7, 2014 at 10:05 AM
Hello,

yes, with Polaris I found this out:

The query :
PREFIX b: <http://brightstardb.com/namespaces/default/>
SELECT ?name
WHERE
{
  ?x b:name ?name FILTER regex(?name, "Rababer"^^<http://www.w3.org/2001/XMLSchema#string>)
}
takes about 1567 ms with 38 rows.

and the query :
PREFIX b: <http://brightstardb.com/namespaces/default/>
SELECT ?name
WHERE
{
  ?x b:name ?name.
  ?x b:name "Rababer"^^<http://www.w3.org/2001/XMLSchema#string>
}
takes about 17 ms with 38 rows.

It seems to be usable !
Coordinator
Jul 7, 2014 at 9:35 AM
Interesting! I would have expected less performance because you still have a triple pattern that pulls all names of all x. What happens if you do instead:
SELECT ?x WHERE {
  ?x b:name "Rababer"^^<http://www.w3.org/2001/XMLSchema#string>
}
(note selecting the ?x rather than the name, since we know that the name has to be "Rababer"...)

Still, it is good to confirm that making that change to the query optimization step should bring a significant speedup to the query that is causing you problems!
Jul 7, 2014 at 10:02 AM
takes about 7 ms with 38 rows but with the ID as result column and not the name.
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
  <head>
    <variable name="x" />
  </head>
  <results>
    <result>
      <binding name="x">
        <uri>http://www.brightstardb.com/.well-known/genid/ba6828c8-6206-426a-996e-a04fd56cbe63</uri>
      </binding>
    </result>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
  <head>
    <variable name="name" />
  </head>
  <results>
    <result>
      <binding name="name">
        <literal datatype="http://www.w3.org/2001/XMLSchema#string">Rababer</literal>
      </binding>
    </result>
Jul 8, 2014 at 8:13 AM
Edited Jul 8, 2014 at 8:21 AM
I have changed your source.

The first query :
CONSTRUCT { 
  ?a ?a_p ?a_o . 
  ?a <http://www.brightstardb.com/.well-known/model/selectVariable> "a" . 
}
WHERE 
{ 
  {SELECT ?a WHERE
   { 
     ?a a <http://brightstardb.com/namespaces/default/Actor> . 
     ?a <http://brightstardb.com/namespaces/default/name> "Harrison Ford"^^<http://www.w3.org/2001/XMLSchema#string> . 
   }
  LIMIT 1 }
  ?a ?a_p ?a_o . 
}
takes now 63 ms from 124ms

and the query
CONSTRUCT { 
  ?q ?q_p ?q_o . 
  ?q <http://www.brightstardb.com/.well-known/model/selectVariable> "q" . 
}
WHERE 
{ 
  {SELECT ?q WHERE
   { 
     ?q a <http://brightstardb.com/namespaces/default/Actor> . 
     ?q <http://brightstardb.com/namespaces/default/name> "Rababer"^^<http://www.w3.org/2001/XMLSchema#string> . 
   }
  }
  ?q ?q_p ?q_o . 
}
takes now 4410 ms from 9800ms ...

With the new version of dotNetRDF, every query take about 11 seconds ...
Jul 8, 2014 at 8:46 AM
Edited Jul 8, 2014 at 8:50 AM
I have changed the query a little bit:
CONSTRUCT { 
  ?q ?q_p ?q_o . 
  ?q <http://www.brightstardb.com/.well-known/model/selectVariable> "q" . 
}
WHERE 
{ 
  {SELECT ?q WHERE
   { 
     ?q <http://brightstardb.com/namespaces/default/name> "Rababer"^^<http://www.w3.org/2001/XMLSchema#string> . 
     ?q a <http://brightstardb.com/namespaces/default/Actor> . 
   }
  }
  ?q ?q_p ?q_o . 
}
takes now 51 ms !?

The new version of dotNetRDF is still not working ...
Jul 8, 2014 at 1:07 PM
Hello,

next speedup in save of 100 Actors :

from
Funktionsname   Gesamt-CPU (%)  Eigen-CPU (%)   Gesamt-CPU (ms) Eigen-CPU (ms)  Modul
 + BrightstarDB.Portable.Compatibility.Array::ConstrainedCopy   2.75 %  0.20 %  454 33  BrightstarDB.Portable.dll

Job completed in 80,0042
to
Funktionsname   Gesamt-CPU (%)  Eigen-CPU (%)   Gesamt-CPU (ms) Eigen-CPU (ms)  Modul
 - BrightstarDB.Portable.Compatibility.Array::ConstrainedCopy   0.01 %  0.00 %  1   0   BrightstarDB.Portable.dll

Job completed in 51,0039
in Array.cs you use the following code:
    public static void ConstrainedCopy(System.Array source, int srcOffset, System.Array destination, int destOffset, int count)
        {
        for ( int i = 0; i < count; i++ )
        {
              destination.SetValue(source.GetValue(srcOffset + i), destOffset + i);
        }
    }
i have changed this to :
    public static void ConstrainedCopy(System.Array source, int srcOffset, System.Array destination, int destOffset, int count)
    {
        System.Array sav = new byte[count];
        System.Array.Copy( destination, destOffset, sav, 0, count );
        try {
            System.Array.Copy( source, srcOffset, destination, destOffset, count );
        } catch ( Exception ex ) {
            System.Array.Copy( sav, 0, destination, destOffset, count );
            throw ex;
        }
    }
In the Test from 10 sec down to 7 sec ...

Best regards
Jul 8, 2014 at 1:43 PM
Hello,

5% speedup in SaveChanges :
Change in FilePage.cs function Write:
                    long ret = _modified;
+                    if ( outputStream.Position != (long)_writeOffset ) {
                         outputStream.Seek( (long)_writeOffset, SeekOrigin.Begin );
+                    }
                    outputStream.Write(_data, 0, _pageSize);
Jul 8, 2014 at 2:48 PM
Edited Jul 8, 2014 at 3:21 PM
Hello,

please do not poll.

From:
Funktionsname   Gesamt-CPU (%)  Eigen-CPU (%)   Gesamt-CPU (ms) Eigen-CPU (ms)  Modul
 - BrightstarDB.Client.EmbeddedDataObjectStore::DoSaveChanges   34.38 % 1.57 %  3698    169 BrightstarDB.Portable.dll
To:
Funktionsname   Gesamt-CPU (%)  Eigen-CPU (%)   Gesamt-CPU (ms) Eigen-CPU (ms)  Modul
 - BrightstarDB.Client.EmbeddedDataObjectStore::DoSaveChanges   5.92 %  0.02 %  279 1   BrightstarDB.Portable.dll
EmbeddedDataObjectStore.cs
            var status = _serverCore.GetJobStatus(_storeName, jobId.ToString());
+            status.WaitEvent.WaitOne();
            while (!(status.JobStatus == JobStatus.CompletedOk || status.JobStatus == JobStatus.TransactionError))
            {
                // wait for completion.
#if !PORTABLE
                Thread.Sleep(5);
#endif
                status = _serverCore.GetJobStatus(_storeName, jobId.ToString());
            }
JobExecutionStatus.cs
    internal class JobExecutionStatus
    {
        ....

+        //DE
+        /// <summary>
+        /// WaitEvent 
+        /// </summary>
+         public AutoResetEvent WaitEvent { get; set; }
    }
ServerCore.cs:
        private StoreWorker GetStoreWorker(string storeName)
        {
            lock (_stores)
            {
+                //DE 3
+                StoreWorker result = null;
+                if ( _stores.TryGetValue( _baseLocation + "\\" + storeName, out result ) ) {
+                      return result;
+                }
-                //if (_stores.ContainsKey(_baseLocation + "\\" + storeName))
-                //{
-                //    return _stores[_baseLocation + "\\" + storeName];
-                //}
                if (!DoesStoreExist(storeName))
                {
                    throw new NoSuchStoreException(storeName);
                }
                     return CreateStoreWorker( storeName );
        public Guid Export(string fileName, string graphUri, RdfFormat exportFormat, string jobLabel = null)
        {
            Logging.LogDebug("Export {0}, {1}, {2}", fileName, graphUri, exportFormat.DefaultExtension);
            var jobId = Guid.NewGuid();
            var exportJob = new ExportJob(jobId, jobLabel, this, fileName, graphUri, exportFormat);
            _jobExecutionStatus.TryAdd(jobId.ToString(),
                                       new JobExecutionStatus
                                           {
                                               JobId = jobId,
                                               JobStatus = JobStatus.Started,
                                               Queued = DateTime.UtcNow,
                                               Started = DateTime.UtcNow,
                                               Label = jobLabel,
+                                               //DE
+                                               WaitEvent = new AutoResetEvent(false)
                                            });
            exportJob.Run((id, ex) =>
                              {
                                  JobExecutionStatus jobExecutionStatus;
                                  if (_jobExecutionStatus.TryGetValue(id.ToString(), out jobExecutionStatus))
                                  {
                                      jobExecutionStatus.Information = "Export failed";
                                      jobExecutionStatus.ExceptionDetail = GetExceptionDetail(ex);
                                      jobExecutionStatus.JobStatus = JobStatus.TransactionError;
                                      jobExecutionStatus.Ended = DateTime.UtcNow;

+                                      //DE
+                                      jobExecutionStatus.WaitEvent.Set();
                                  }
                              },
                          id =>
                              {
                                  JobExecutionStatus jobExecutionStatus;
                                  if (_jobExecutionStatus.TryGetValue(id.ToString(), out jobExecutionStatus))
                                  {
                                      jobExecutionStatus.Information = "Export completed";
                                      jobExecutionStatus.JobStatus = JobStatus.CompletedOk;
                                      jobExecutionStatus.Ended = DateTime.UtcNow;
                                        
+                                      //DE
+                                      jobExecutionStatus.WaitEvent.Set();
                                  }
                              });
            return jobId;
        }
        public void QueueJob(Job job, bool incrementTransactionCount = true)
        {
            ....
                        new JobExecutionStatus
                            {
                                JobId = job.JobId,
                                JobStatus = JobStatus.Pending,
                                Queued = DateTime.UtcNow,
                                Label = job.Label,
+                                //DE
+                                WaitEvent = new AutoResetEvent(false)
                            }))
        private void ProcessJobs(object state)
        {
            ...
                                jobExecutionStatus.Information = "Job Completed";
                                jobExecutionStatus.Ended = DateTime.UtcNow;
                                jobExecutionStatus.JobStatus = JobStatus.CompletedOk;
+                                //DE
+                                jobExecutionStatus.WaitEvent.Set();
                            }
                            catch (Exception ex)
                            {
                                Logging.LogError(BrightstarEventId.JobProcessingError,
                                                 "Error Processing Transaction {0}",
                                                 ex);
                                jobExecutionStatus.Information = job.ErrorMessage ?? "Job Error";
                                jobExecutionStatus.Ended = DateTime.UtcNow;
                                jobExecutionStatus.ExceptionDetail = GetExceptionDetail(ex);
                                jobExecutionStatus.JobStatus = JobStatus.TransactionError;
+                                //DE
+                                jobExecutionStatus.WaitEvent.Set();
                                     }
                            finally
With all these modifications I can save 1000 Actors in 3 seconds, previously in 10 seconds ...
Coordinator
Jul 8, 2014 at 9:06 PM
Great work, thank you! I've committed the code changes you posted pretty much without alteration to the develop branch.

The change to the query structure I'm going to review. It would make sense for us to generate SPARQL that processes more efficiently with BrightstarDB, but the real fix is to implement some query planning strategy. However if it looks like its going to take a while to implement, it seems that at least moving the type triple pattern to the end of the query would be a first good step.
Jul 10, 2014 at 3:46 AM
I have moved only the EQUAL from FILTER to WHERE.
Normally a INDEX can greater, smaller, between and start with. I have not yet found it in SPARQL.