Handling blobs and inheritence/specialization

Sep 18, 2014 at 8:26 PM

For my project, I am investigating the possibility of NoSQL. I habe already MongoDB and RavenDB in view, but B* is also very promising. Still I have two topics I have to address.
1) MongoDB has GridFS, RavenDB has the support for attachments. What is yous proposal for handling such BLOBs?
2) As with B* the model is defined using interfaces, and there is no meaning of inheritence between interfaces in c# (although the xyntax is there), how can one implement specialization, or at leas an efficient dynamic map-reduce like query?
To see what I am looking for:
In my model I have a base abstract class (W), that contains basic properties of a workflow (like, starting date, creator, title) and another that is base abstract class for a workflow step (S). Using these, I will define many concrete workflows (W1 will use S1, S2 and S3 - W2 will use S4, S5, S6). W1 and W2 will extend W, S1...S6 will extend S. As the users will create different workflow instances, the data store will contain many instances of W1, W2 and S1...S6 respectively.
And I need to be able to get all W1, W2...W100 instances separately (those would be the collections), but also to all of them as W - but without enumerating and unioning them one by one.
I have read about Become<T>() method but as I see, that's something different.
I have also seen OfType<T> in Linq but I can't see how to use it since T can't extend an other type as it is an interface.
So: how to implement IS-A and AS for the model within B* EF on query level?

Thank you for you answers,
Sep 19, 2014 at 10:52 AM

Thanks for taking a look at BrightstarDB!

On your first question, BrightstarDB handles binary data by storing it as base-64 encoded strings. Long strings (larger than ~64 chars) get stored in a resource index, so all your blobs would probably end up in there. The data can be retrieved just as any other data - you can use SPARQL queries and deal with base-64 encoded strings, or you can use our entity framework (declare properties of type byte[]). We don't currently support the direct notion of attachments or an underlying file system, but the basic binary data support should give you the platform you need to build that sort of stuff if it is a requirement. What sort of size of attachments are you envisaging - is this something like attaching word processor documents to a workflow task ? While you could store lots of binary data in BrightstarDB that might not be the best way to go - it might be better instead for your application to manage a file store and just keep references to the files in BrightstarDB. I don't know for sure but I guess that is what Raven does with attachments. GridFS is a nice feature of MonoDB when you really want to keep the files synchronized with the data (and take advantage of the replication features of the datastore). It would be nice to have that sort of support in BrightstarDB, but we need to get to the stage of supporting replication first! :)

With regards to your second question - I'm not quite clear if you have two or three levels to the model. It sounds like you have a base class for all workflows (W) and that would have properties like startTime, creator etc. Then what I'm not sure of is if W1 is a class or an instance - it sounds like you want it to be a class that would extend W with additional properties (e.g. approvedBy, totalValue). If that is the case then the instances would be instances of W1. This sounds like it is then just a question of defining a base interface for W, an interface for your concrete workflow W1 and then instantiating instances of W1. If you did that with the entity framework you would end up with a context object that has a collection of W instances (all workflows) and a collection of W1 instances and a collection of W2 instances. You can use these to effectively query all workflows using their basic properties (e.g. find all workflows started after a given date) or a specific type of workflow (e.g. find all W1 workflows with a totalValue > 500). To list all workflows as W instances you would simply enumerate the W collection (or depending on your use case, page through them using OrderBy, Skip and Take).

In some workflow systems I have used in the past they make a distinction between the workflow template (a description of the workflow and its steps and dependencies between them) and the workflow instance (a running instance of the workflow template) - the instance then simply keeps a reference to the template it is an instance of and it tracks its own state. It might be that making this sort of distinction in your model might be in the long term more flexible than relying on a type hierarchy. Of course that really depends on your application requirements!

Hope this helps


Sep 19, 2014 at 6:48 PM
Edited Sep 19, 2014 at 6:49 PM

Thank you for your answer.
1) Ok. I got it. Base64 encoding megabytes is not quite efficient. So, until there is no optimized solution for that I have to consider using the file system. By the way, I intend to to store files not larger than 5MB.
2) You got it right: W is a base class, W1 would extend it. W1 is the concrete template, instances of W1 are the running ones. I haven't experimented with it, but I can't imagine how that model can work in B*.
As you wrote, I define the interface of W, and separately the interface of W1. But when I create an instance of W1 how will that be also an instance of W? If they would be classes, something like "class W1: W {}" would do it. But "interface W1: W {}" has no actual meaning. In RavenDB this inheritence is transparent, in MongoDB I can use map-reduce.
The only thing I can imagine is creating two separate objects: one for W and one for W1, with a discriminator in W (telling me that there is actually a running W1 workflow), and linking the instance of W to the intance of W1 and vice-versa. But this way I need to make the link simply "object", and the discriminator as string or as "Type" typed field. That's not ACID :(
Or have I misunderstood something in your reply.

Thank you anyway.