Discovering objects in C++

snack · 2017-03-31 18:34:12

Hi, I'm facing the following problem: I have a consumer algorithm which can fetch data from multiple producers. I need a way for the consumer to discover all the available producers, i.e. to retrieve a reference to all of them. I usually solve this kind of issue by creating a singleton manager class which holds all the references to the producers; the manager can be interrogated by the consumers to get a list of producers. However, I read that singletons may create some troubles in tests, among other issues they might introduce. So I'm wondering if a path for solving this kind of problem the right way exists. I searched the web but I ended up with scarce results and much confusion, so I'd need some guidance.
Thanks in advance

mpan · 2017-03-31 23:40:34

It doesn’t matter if singletons cause problems in tests. They may, but it’s unimportant here. Singletons used that way are anti-patterns and a design error.

What you want is a directory that will deliver you a list of producers. Two commons ways doing that is either having a central service or using dependency injection.

== Central service ==
The central service may be anything accessible by your program, ranging from a process-level state shared by whole code (quite easy) to a separate process to which you talk via IPC/network (considerably complex). Whichever solution you want depends on the actual needs. That’s a type of aspect-oriented programming in which you make choice of producers orthogonal to code that uses them.

A good example of such approach are logging frameworks and you may see Apache log4cxx as an example. It uses a utility class LogManager to obtain actual instances of Logger and it’s up to LogManager’s static methods to determine how to provide one. Note that currently a process-level singleton is used as the backend¹, but that’s an implementation detail in which you — as the programmer going to log a message — are not interested at all. LogManager abstracts way that issue. In the future it may be non-singleton, a different level singleton or whatever else the developers of the library will invent.

Typical solutions here would be:

A directory obtained from a utility class, log4cxx-style. Be warned: the directory itself is not a singleton and coresponds to loggers, not LogManager in log4cxx! You obtain it by identifier from your utility class and that allows you for easy re-configuration in the future. A pseudocode example of how you would use it:
At the setup location
```
directory = Directories.getDirectory("default");
directory.addProducer(yourProducerA);
directory.addProducer(yourProducerB);
```
At the use location
```
directory = Directories.getDirectory("default");
for (auto producer: directory.getProducers()) {
    // … do something with the producer …
}
```
If it is suitable for your scenario, setup may as well be obtained from a configuration file or anything you want.
A contextual directory. Each thread (or call stack, to be precise) has its own directory assigned. You may set, in thread’s context, a new directory or manipulate the existing one and then called methods will see that particular directory of producers. This, however, comes with two traps:
1. Since it’s stack-based, you must ensure that if you modify the context, you must revert the modifications later, independent of how you return from a block of the code: be it linear execution, return from a function or an exception. That is easily obtained in C++ using RAII, but you have to impement that. So you must be able to do something like that:
```
// … blah blah
{
    auto reverter = DirectoryCtx.begin();
    // … possibly modify the context
    callToSomethingThatUsesDirectory();
}
// … blah blah
```
  No matter what happens, when the execution arrives at “}”, the context has to be reverted to the state it was before entering that block.
2. Sometimes you want contexts that are not spanning threads, but object graphs. For example you create an object, push it to some queue and you don’t even know which thread will pick it up. In such scenario the solution typically applied is taking the creating threads’s context and storing it in the object (and that has to be very explicitly documented!) and then the object ignores context of the calling thread, using its own instead. Not a hard thing, but — again — easy to miss.
A pseudocode for using that approach:
At setup location (usually the initial thread before any other one is started):
```
directory = DirectoryCtx.getDirectory();
directory.addProducer(yourProducerA);
directory.addProducer(yourProducerB);
```
At some other location which modifies the context:
```
{
    auto reverter = DirectoryCtx.begin();
    auto directory = DirectoryCtx.getDirectory();
    directory.addProducer(customProducerC);
    doSomethingThatUsesProducers();
} // … and here `customProducerC` is no longer in the context’s directory!
```
At the location that uses producers:
```
directory = DirectoryCtx.getDirectory();
for (auto producer: directory.getProducers()) {
    // … do something with the producer …
}
```
A separate process, possbly even running on another machine. That’s the most complex and demanding solutions, if used without thinking it is a performance killer, but lets you share state between processes or even machines. In the solution you have a server which accepts, via IPC, requests to provide a producers for a given name. The calling process receives that directory and uses it. A simple usage scenario would be:
Setup:
```
directoryLink = Directories.connectTo("your_unique_connection_string_a_la_database_conn_str");
directoryLink.addProducer(yourProducerA);
directoryLink.addProducer(yourProducerB);
```
Usage:
```
directory = /*obtain directory link*/;
for (auto producer: directory.getProducers()) {
    // … do something with the producer …
}
```
If you’re paying attention, you have noticed a catch: in the location when you use the producers link, you need to obtain the previously set directory link. You can, of course, populate it to each object you create. Impossible. But if you don’t, we’re back to square one. The solution here is apply one of the previously described methods to obtain the link’s identifier. That can be easily configured at any time. So the final solution would be:
```
string connString = getAppConf("directory");
directoryLink = Directories.connectTo(connString);
directoryLink.addProducer(yourProducerA);
directoryLink.addProducer(yourProducerB);
```
Usage:
```
string connString = getAppConf("directory");
directory = Directories.connectTo(connString);
for (auto producer: directory.getProducers()) {
    // … do something with the producer …
}
```
Notice that the connection string can be store in your application’s settings the same was as, for example, database connection string is stored.
That solution is, however, useful only if you have some way to transfer producers between processes. This isn’t always the case.
While giving enourmous flexibility and being a very elegant solution, I would strongly oppose using that idea until you actually need a directory used by multiple processes. The code is complex, it has number of traps, it may require interprocess resources management, it is terrific to debug, it is easy to make it a performance bottleneck, ad hoc implementations are security abominations and so on. Better stick with the earlier two.

== Dependency injection ==
In that case you have a field in your object that is a pointer to a producers directory. However, it is not set explicitly by the code that creates the object. Instead the object asks an dependency-injection library/framework to provide the object. The library/framework lives in a plane that is perpendicular to your code and is highly customizable.

Beautiful, right? Yes. Unless you have to do that in C++. First of all there was no sane, elegant method to use DI in C++ for long time due to language’s limitations. Now, with later versions of C++, it is possible. It’s still hard and not very pleasant, but doable. However, I have heard of no mature libraries that provide the feature for C++. There are some experimental ones, like boost-experimental/di, but that’s all.

Therefore while dependency injection is a viable solution for your problem in general case, it doesn’t seem to be very appealing in C++ world.

== On singletons ==
While having a very bad fame, singletons are not wrong by themselves. They’re just heavily abused. Often to hide existence of a global variable: some add singletons to make people stop complaining about globals, because now they can say “hey, look — no access to global variable, it has class keyword and therefore it is OOP”. No, not really.

Singletons may be the right solution. They have a very specific meaning which determines if something can be a singleton. The condition is simple: if something, by its inherent nature, can have exactly one instance and it will never, ever be able to have more of them, it is a singleton. It’s not about sharing state — it’s about preventing existence of more than one instance. The cases are very rare. A good example of mistaken singleton could be a computer display. In 80s you could imagine that they are singletons, because computers have one display — right? In 2017 it doesn’t seem that true anymore.
____
¹ And that’s the actual valid use case for a singleton. Valid, yet still not required.

Last edited by mpan (2017-03-31 23:44:51)

snack · 2017-04-03 10:36:13

Wow mpan, your reply took me a long time to be read and be (partially) understood. I had to catch up with some concepts since I'm not a computer scientist, just a physicist playing software development and trying to do it the right way (if it ever exists...). I appreciate your long and detailed answer very much.

Now, back on topic: I chose a singleton for my "producer locator" for various reasons. These are (most important first):
1) it's the only solution I knew
2) my users (physicists) are familiar with this pattern so they will use a code with singletons. They are also very unwilling to learn new patterns.
3) a single instance of the producer locator is what I think I need: producers must be all retrievable by consumers from the same location service; I don't want producers to be registered to different locator instances so that consumers have to query all of them to find a given producer. Using a global, non-singleton locator would not forbid some dumb user to create another locator, register his producer to it and making it impossible to retrieve from consumers querying only the global locator.
4) using a singleton I can simplify the creation of concrete producers by registering them to the locator in the constructor of the producer base class:

#include "ProducerLocator.h"
class IProducer{
public:
  IProducer(){
    ProducerLocator::Instance().Register(this);
  }
  
  ...

}

so that my users creating a concrete producer class won't need to explicitly register it to the locator (simplifying code writing is very important when you deal with developers not devoted to writing good code and learning APIs; usually physicists want to get physics results, and for them writing code is just something to be done quick and dirty. Not to mention the generally low coding skills and knowledge).

That given, I am interested in some of your proposed solutions, especially the dependency injection. I found that in C++ one possible way to implement it is by using the constructor, so that I would write:

class ConcreteConsumer{
public:
  ConcreteConsumer(ProducerLocator &prodLocator, <other arguments>){
  ...
  }
}

This will be an interesting approach, will make the dependency explicit and also facilitate writing tests. However, I have some doubts:
1) when instantiating a ConcreteConsumer, how to get the reference to the service locator to be passed to the constructor? The only thing I came up with is creating a ConsumerFactory which creates the locator and injects it to every ConcreteConsumer it creates. Then in my consumer tests I could create a mock locator and pass it to the consumers. Do you see any inconvenience or benefit with this approach?
2) how to register a producer to the locator? If the locator is managed by the ConsumerFactory as above, I could think about a ProducerFactory having access to the locator inside ConsumerFactory in some way, but this seems a very bad encapsulation and dependency problem.
3) generally speaking, do you think it's worth complicating the code to get rid of the locator singleton? It works for me and simplifies users code (e.g. users do not need to define a consumer constructor taking a reference to the locator service, store it in an internal member etc.), but I fear it might produce some nasty issue which I am not able to foresee at this point.

Thanks again for your support.

mpan · 2017-04-08 10:18:29

Sorry for slow answer — somehow I missed the thread.

Before I go further, a note: I am a developer and I see things as a developer. I take into account future costs of a solution: that includes extending it, software maintenance, mixing it with other code (probably not written by me) et cetera. If you’re seeking for “just work” solution that will take short time to implement, what I said may not be what you’re seeking for.

snack wrote:

my users (physicists) are familiar with this pattern so they will use a code with singletons.

The thing is that is not a singleton. It’s just a global variable hidden into a class to imitate object-oriented solution.

snack wrote:

I don't want producers to be registered to different locator instances so that consumers have to query all of them to find a given producer

They query only a single instance: the one that they’re configured to query. There is actally no way to “query all”, as there is not way to find all of them. That would require another directory ;). It’s not your problem if the user registers a producer into a wrong directory. It’s also not very probable, given that there will be some “default” one which users are asked to use in a typical scenario and which is provided in examples.

snack wrote:

using a singleton I can simplify the creation of concrete producers by registering them to the locator in the constructor of the producer base class

Constructor’s job is to initialize an object and only that. It should never, ever do anything else. In particular it’s unacceptable for a constructor to manipulate some global state of an application. But see my note at the begining: if what you want is a solution that will not break instantly and — at least for some time — no one will notice something is wrong, you may go for that. It will bite back later.

As for the dependency injection: see how existing [experimental] libraries and some code from dev blogs attempts to do that. It will be a true PITA in C++, however. That’s why I told about the solution at the end and that part was very short. Would be my last choice for that language. Usually not worth leaving any other [working] solution.

snack · 2017-04-10 13:06:39

I'm not a developer, but I try to act like a developer because I understand that if professionals do something in someway there's a very high probability that that's the right way. That's why I asked for help about this topic, and your advices are very helpful and welcome.
I think I understood your point about singletons and global variables, effectively what I meant to use was a global variable. After thinking and researching on the web I think that I will try to implement the dependency injection pattern in my producer and consumer classes via a dedicated method like SetDirectory() implemented in the respective base classes. The framework will take care of instantiating the directory object and inject the dependency on it on newly constructed producers/consumers through the object factories. I still have to iron out some details but I think it could work and let me expunge any singleton and global variable in my design.
Thanks again for your advices.

Arch Linux

#1 2017-03-31 18:34:12

Discovering objects in C++

#2 2017-03-31 23:40:34

Re: Discovering objects in C++

#3 2017-04-03 10:36:13

Re: Discovering objects in C++

#4 2017-04-08 10:18:29

Re: Discovering objects in C++

#5 2017-04-10 13:06:39

Re: Discovering objects in C++

Board footer