In recent years, the use of biologically inspired methods such as the evolutionary algorithm have been increasingly employed to solve and analyze complex computational problems. BELBIC(Brain Emotional Learning Based Intelligent Controller) is one such controller which is proposed by Caro Lucas, Danial Shahmirzadi and Nima Sheikholeslami and adopts the network model developed by Moren and Balkenius to mimic those parts of the brain which are known to produce emotion (namely, the amygdala, orbitofrontal cortex, thalamus and sensory input cortex).
Emotions and learning
Traditionally, the study of learning in biological systems was conducted at the expense of overlooking its lesser known counterparts: motivation and emotion. However these phenomena can not be separated. Motivation is the drive that causes any system to do anything – without it, there is no reason to act. Emotions indicate how successful a course of actions have been and whether another set of actions should have been taken instead – they are a constant feedback to the learning system. Learning on the other hand, guarantees that motivation and emotional subsystems are able to adapt to constantly changing conditions.
Thus, in the study of biological organisms, emotions have arisen to prominence as an integral part of any biologically inspired system. But how does any living organism benefit from its emotions? It is crucial to answer this question as we attempt to increasingly employ biologically inspired methods in solving computational problems.
Every creature has innate abilities that accommodate its survival in the world. It can identify food, shelter, partners, and danger. But these “simple mappings between stimuli and reactions will not be enough to keep the organisms from encountering problems.” For example, if a given animal knows that its predator has qualities A, B and C, it will escape all creatures that have those qualities. And thus waste much of its energy and resources on non-existent danger.
We can not expect evolution to provide more advanced algorithms for assessing danger, because the predator is also evolving at the same speed. Thus, biological systems need to be equipped with the ability to learn. This learning and re-learning mechanism allows them to adapt to highly complex and advanced situations.
To learn effectively, every learning organism needs an evaluation of the current situation and also feedback on how beneficial the results of learning were. On the most part, these evaluation mechanisms are built-in. And so we encounter a new problem: whereas creatures take appropriate measures in real time based on their evaluations, these built-in evaluation procedures are developed in evolutionary time. But all creatures need to learn of new evaluation techniques in their lifetime just as they learn the proper reactions.
This is where the ability to condition emotional reactions comes into play. Biological organisms associate innate emotional stimuli with other stimuli they encounter in the world and thus give them an emotional significance when needed. These evaluations can be monitored to operate at very specific times, specific places or when accompanied by other specific stimuli.
There is another reason why these observations are so significant and that is the creation of artificial systems. These systems do not evolve over time but are designed with certain abilities from the start. Thus, their adaptability must be built-in.
A model is a simplified description of a phenomenon. It brings to life some aspects of this phenomenon while overlooking others. What aspects are kept in the model and what are overlooked greatly depends on the topic of study. Thus, the nature of a model depends on the purpose the investigator plans to carry out. A computational model is one which can be mathematically analyzed, tested and simulated using computer systems.
In mammals, emotional responses are processed in a part of the brain called the limbic system which lies in the cerebral cortex. The main components of the limbic system are the amygdala, orbitofrontal cortex, thalamus and the sensory cortex.
The amygdala is an almond shaped area which is placed such that it can communicate with all other cortices within the limbic system. The primary affective conditioning of the system occurs within the amygdala. That is, the association between a stimulus and its emotional consequence takes place in this region.
It has been suggested that learning takes place in two fundamental steps. First, a particular stimulus is correlated with an emotional response. This stimulus can be an endless number of phenomena from observing a face, to detecting a scent, hearing a noise, etc. Second, this emotional consequence shapes an association between the stimulus and the response. This analysis is quite influential in part because it was one of the first to suggest that emotions play a key part in learning. In more recent studies, it has been shown that the association between a stimulus and its emotional consequence take place in the amygdala. “In this region, highly analyzed stimulus representations in the cortex are associated with an emotional value. Therefore, emotions are properties of stimuli”.
The task of the amygdala is thus to assign a primary emotional value to each stimulus that has been paired with a primary reinforcer – the reinforcer is the reward and punishment that the mammal receives. This task is aided by the orbitofrontal complex. “In terms of learning theory, the amygdala appears to handle the presentation of primary reinforcement, while the orbitofrontal cortex is involved in the detection of omission of reinforcement.”
The first thing we notice in the computational model developed by Moren and Balkenius is that quite a number of interacting learning systems exist in the brain that deal with emotional learning. The computational model is presented below where:
- Th : Thalamus
- CX : Sensory Cortex
- A : Input structures in the amygdala
- E : Output structures in the amygdala
- O : Orbitofrontal Cortex
- Rew/Pun : External signals identifying the presentation of reward and punishment
- CR/UR : conditioned response/unconditioned response
- V : Associative strength from cortical representation to the amygdala that is changed by learning
- W : Inhibitory connection from orbitofrontal cortex to the amygdala that is changed during learning
This signal is then analyzed in the cortical area – CX. In biological systems, the sensory cortex operates by distributing the incoming signals appropriately between the amygdala and the orbitofrontal cortex. This sensory representation in CX is then sent to the amygdala A, through the pathway V.
This is the main pathway for learning in this model. Reward and punishment enter the amygdala to strengthen the connection between the amygdala and the pathway. At a later stage if a similar representation is activated in the cortex, E becomes activated and produces an emotional response.
O, the orbitofrontal cortex, operates based on the difference between the perceived (i.e. expected) reward/punishment and the actual received reward/punishment. This perceived reward/punishment is the one that has been developed in the brain over time using learning mechanisms and it reaches the orbitofrontal cortex via the sensory cortex and the amygdala. The received reward/punishment on the other hand, comes courtesy of the outside world and is the actual reward/punishment that the specie has just obtained. If these two are identical, the output is the same as always through E. If not, the orbitofronal cortex inhibits and restrains emotional response to make way for further learning. So the path W is only activated in such conditions.
In most industrial processes that contain complex nonlinearities, control algorithms are used to create linearized models. One reason is that these linear models are developed using straightforward methods from process test data.
However, if the process is highly complex and nonlinear, subject to frequent disturbances, a nonlinear model will be required. Biologically motivated intelligent controllers have been increasingly employed in these situations. Amongst them, fuzzy logic, neural networks and genetic algorithms are some of the most widely employed tools in control applications with highly complex, nonlinear settings.
BELBIC is one such nonlinear controller – a neuromorphic controller based on the computational learning model shown above to produce the control action. This model is employed much like an algorithm in these control engineering applications. In these new approaches, intelligence is not given to the system from the outside but is actually acquired by the system itself.
This simple model has been employed as a feedback controller to be applied to control design problems. One logic behind this use in control engineering is a belief held by many experts in the field that there has been too much focus on fully rational deliberative approaches, whereas in many real-world circumstances, we are only provided with a bounded rationality. Factors like computational complexity, multiplicity of objectives and prevalence of uncertainty lead to a desire to obtain more ad-hoc, rule-of-thumb approaches. Emotional decision making is highly capable of addressing these issues because it is neither fully cognitive nor fully behavioral.
BELBIC, which is a model free controller, suffers from the same drawback of all intelligent model free controllers: it cannot be applied on unstable systems or systems with unstable equilibrium point. This is a natural result of the trial and error manner of the learning procedure, i.e. exploration for finding the appropriate control signals can lead to instability. By integrating imitative learning and fuzzy inference systems, BELBIC is generalized in order to be capable of controlling unstable systems.