Information transmission and recovery in neural communications channels

Biological neural communications channels transport environmental information from sensors through chains of active dynamical neurons to neural centers for decisions and actions to achieve required functions. These kinds of communications channels are able to create information and to transfer information from one time scale to the other because of the intrinsic nonlinear dynamics of the component neurons. We discuss a very simple neural information channel composed of sensory input in the form of a spike train that arrives at a model neuron, then moves through a realistic synapse to a second neuron where the information in the initial sensory signal is read. Our model neurons are four-dimensional generalizations of the Hindmarsh-Rose neuron, and we use a model of chemical synapse derived from ﬁrst-order kinetics. The four-dimensional model neuron has a rich variety of dynamical behaviors, including periodic bursting, chaotic bursting, continuous spiking, and multistability. We show that, for many of these regimes, the parameters of the chemical synapse can be tuned so that information about the stimulus that is unreadable at the ﬁrst neuron in the channel can be recovered by the dynamical activity of the synapse and the second neuron. Information creation by nonlinear dynamical systems that allow chaotic oscillations is familiar in their autonomous oscillations. It is associated with the instabilities that lead to positive Lyapunov exponents in their dynamical behavior. Our results indicate how nonlinear neurons acting as input/output systems along a communications channel can recover information apparently ‘‘lost’’ in earlier junctions on the channel. Our measure of information transmission is the average mutual information between elements, and because the channel is active and nonlinear, the average mutual information between the sensory source and the ﬁnal neuron may be greater than the average mutual information at an earlier neuron in the channel. This behavior is strikingly different than the passive role communications channels usually play, and the ‘‘data processing theorem’’ of conventional communications theory is violated by these neural channels. Our calculations indicate that neurons can reinforce reliable transmission along a chain even when the synapses and the neurons are not completely reliable components. This phenomenon is generic in parameter space, robust in the presence of noise, and independent of the discretization process. Our results suggest a framework in which one might understand the apparent design complexity of neural information transduction networks. If networks with many dynamical neurons can recover information not apparent at various waystations in the communications channel, such networks may be more robust to noisy signals, may be more capable of communicating many types of encoded sensory neural information, and may be the appropriate design for components, neurons and synapses, which can be individually imprecise, inaccurate ‘‘devices.’’


I. INTRODUCTION
The transmission of sensory information from the environment to decision centers through neural communications channels requires a high degree of reliability and sensitivity from networks of heterogenous, often inaccurate, sometimes unreliable components. The properties of the channel itself, assuming the sensor is accurate, must be richer than conventional channels studied in engineering applications. Those channels are passive and, when of high quality, can relay inputs accurately to a receiver.
Neural communications channels are composed of dynamically active elements capable of complex autonomous oscillations. Individually, nonlinear neurons can create information in a way that is familiar in the study of nonlinear dynamics ͓1͔. The process of ''information creation'' is intimately associated with the instabilities that allow chaotic behavior of these nonlinear systems: two states of the system, indistinguishable because only finite resolution observations can occur, may through the action of the instabilities of the nonlinear dynamics find themselves in the future widely separated in the state space, and thus distinguishable. Information about different states that was unavailable at one time may become available at a later time. However, it is important to recall that this ''new'' information is only about the neuron itself.
In this paper, we examine the role of this aspect of nonlinear systems when they are part of a communications chain. Our interest is in information transmission channels that model the actions of realistic neurons and realistic synaptic connections among them. We show that information that may be unavailable or ''lost,'' or hidden below observational resolution at one waystation on the neural chain, *Author to whom correspondence should be addressed. may be recovered at a later waystation and thus become useful again. Our discussion of the transmission properties of active neural channels is phrased in the context of an idealized channel composed on one neuron N1 that receives information in the form of a spike train and passes this on, modulated by its own dynamics, through a realistic synapse to a second neuron N2. When the synapse and the receiver neuron are properly tuned, information that is hidden at N1 is again available at N2. In a quantitative fashion, we show that the average mutual information between the sensory signal sequence and N2 can be larger than the average mutual information between the sensory signal sequence and N1. Further, this information recovery is quite robust in the parameter space of the neurons and the synapse and it persists when noise is added to both the incoming sensory signal and the output of the synaptic connection.
While the model is made much simpler than a realistic neural channel, it serves to illustrate in a concrete way the role of nonlinear oscillations of communications channel components. In this paper, we focus on neural communications channels and particulars of biological neural information encoding, but we anticipate that the lessons of this paper may prove of value to the design of other, more familiar, information transmission channels.
In establishing our model, we must delve a bit into issues associated with the manner in which information from sensory systems is encoded in a neural system. Understanding neural codes is a major issue in neuroscience ͓2͔. It is generally agreed that the natural framework in which to quantify the communication process between neurons is information theory ͓3-5͔, which is a powerful tool for the analysis of input-output relations, and has proven to be useful in measuring the efficiency and reliability of several neural systems ͓6-9͔. To calculate the usual information measures, such as entropy and mutual information, minimal assumptions about the nature of the neural code are required. In principle, neural signals are continuous functions of time, but they can only transmit a finite amount of information because of the bounded accuracy of biological systems and the unavoidable noisy environment. Only some features or events of the signal are transmitted and carry the relevant information. For most neurons, the generation of an action potential ͑or spike͒ is the most important event in its behavior, and it is generally agreed that the action potential is the fundamental unit of information for the nervous system. Traditionally, it is argued that other details of the sensory signal, such as the particular waveforms of spikes, are not relevant, and that the information that is not carried by the spike train is lost.
On the other hand, it appears that there is not a unique neural code ͓10͔. Although the idea of a ''rate code'' has been widely accepted since the seminal work of Adrian ͓11͔, there is strong experimental evidence that in certain neural systems the precise timing of the spikes also plays a significant role in the communication process ͓12,13͔. Moreover, it is not clear at all if the information processing is always performed by single neurons or if population coding is needed. It is possible that both types of coding are present in neural systems. In that case, different neural subsystems can have different codes. The neural communications channel must be flexible enough to accommodate this variety in a reliable manner.
In addition, some neurons produce spikes in bursts, and this can also be seen as a special code. For example, there is some evidence that in the hippocampus, relevant information is carried by bursts rather than single spikes ͓14͔, and more generally, burst firing is an efficient and reliable way to propagate impulses in neural networks with low connection strengths. Further, in some situations the details of the spiking within a burst can also be relevant.
Different codes yield different numerical values for our information measures, and for some complex cases, as with bursting neurons, the choice of neural code, say the timing of bursts or the timing of spikes, may determine the amount of information that is conveyed by the neural signal.
On the other hand, it appears that even when one calculates information measures free from any assumption about what are the relevant features of the signals, there is a critical dependence on the time resolution of the discretization ͓15͔ used in representing the stream of action potentials. Ideal information measures are achieved only in the limit of infinitely accurate time resolution and infinitely long signals. Clearly realistic biological systems must cope with the absence of this idealized situation. Our model results suggest mechanisms that may be utilized for this purpose.
In this paper, we investigate how information transfer in an active information channel depends on the coding assumptions and how different neural codes can interact in simple neural models. We study a system of two spikingbursting neurons with unidirectional coupling. The intrinsic dynamics and the entropy generation along a chain of such neurons were reported in ͓16͔. For this system, we will show that any realistic measurement with finite time resolution can lead to the striking result that information ''lost'' after the first neuron responds to a spike train input can be recovered after the second neuron acts on the output of the first neuron. The mechanism for this result has been indicated above. One consequence of this work is that familiar results of information theory for passive channels need careful examination in their application to neural information transport. We will explicitly indicate an example of this below.
We speculate that nature takes advantage of this ability to recover hidden information in order to develop reliable neural information transmission systems necessarily built with unreliable synaptic connections among inaccurate components.
This paper is organized as follows: In the next section, we review some results from information theory relevant for our work. In Sec. III, we present our models for bursting neurons and synaptic connections. Section IV is devoted to the results of our simulations and the calculation of the information measures. Our results and conclusions are discussed in the final Sec. V. An appendix contains some technical details about evaluating information theoretic quantities in our systems.

A. Representing neural signals
In this paper, we are concerned with information transfer among neurons and the role of neurons as active dynamical elements of a network in achieving reliable communication of information from sensory input to processing or decision centers. It is assumed here that all relevant information is contained in the time course of the membrane potential x(t) of the neurons. This we take as the ''neural signal'' of the component neurons in our networks. Chemical signaling among neurons may follow some of the patterns we discuss here, but we do not consider that here.
In principle, neural signals are continuous in time and should be treated by continuous information theory ͓17͔. However, we employ a discrete treatment since ͑a͒ continuous treatment of complex signals with unknown distributions cannot be implemented in practice, ͑b͒ in a real environment signals cannot be transmitted and decoded with infinite accuracy, and ͑c͒ there is much experimental evidence that in most neurons all the relevant information is carried by action potentials. Statement ͑c͒ suggests we can use a discrete amplitude coding that records the presence or absence of action potentials and disregards details of the action potential waveforms. For example, we might use a binary code indicating whether there is an action potential ͑1͒ or not ͑0͒. A continuous coding in time is still possible, but we adopt a further discretization in time based on ͑a͒ and ͑b͒.
We need to specify a particular rule to translate the membrane potential as a function of time x(t) into a discrete sequence of symbols (s 1 ,s 2 ,s 3 , . . . ). The symbols ͑or words͒ occur at some definite time t i , but can typically contain information about the past.
Concretely, we take windows of length T in the time series and divide them into L bins of duration ⌬tϭT/L. We assign the value 1 or 0 to each bin according to the occurrence or nonoccurrence of some event; for example, an action potential in that bin. Thus, if we have a time series of length N⌬t, we can have NϪLϩ1 words of L bits, counting overlapping intervals, in each window of length T. We use the term ''event'' instead of action potential because for more complex signals we can choose different types of events as we will see in the next section. For a particular time series, our rules comprised of the choice of an event and of the quantities T and ⌬t define a certain coding space in which the usual information measures, such as entropy and mutual information, are computed. In this paper, we will inquire how these information measures depend on the choice of such a coding space.
Our information source is taken to be a synaptic input spike train translated into the discrete space of stimuli s i . The information channel is composed of a neuron that receives the spike train, a synaptic connection, and a receiver that is another neuron. The receiver neuron membrane potential is translated to the space of responses (r j ) using our encoding rules.

B. Ideas from information theory: A summary of required results
We review now some ideas from information theory. The entropy, in bits, associated with a given sequence of stimuli is given by where p(s i ) is the probability of occurrence of the word or symbol s i . The symbol alphabet is all possible words, here all binary numbers of L bits, and we sum over that alphabet. An analogous expression with p(r j ), the probability of occurrence of a particular response sequence, is used to calculate the entropy of the receiver neuron H͑R ͒ϭϪ ͚ r j p͑r j ͒log 2 p͑r j ͒. ͑2͒ The conditional entropy for the response sequence R, given a stimulus sequence S is where p(r j ͉s i ) is the conditional probability of occurrence of the word r j in the output system given that the word s i occurs in the input. This entropy is also called noise entropy ͓6͔ because it quantifies the variability of the response for a fixed stimulus. In a symmetric manner, we define the conditional entropy of the stimulus when the response is known as Here, p(s i ͉r j ) is the conditional probability for the word s i in the stimulus, when the response is known to be r j . H(S͉R) is also called stimulus equivocation ͓18͔, since it quantifies the uncertainty about the stimulus sequence that remains having seen the response sequence. Both conditional entropies are positive semidefinite and H(S͉R)рH(S) and H(R͉S)рH(R), because the observation of the response ͑stimulus͒ cannot increase the uncertainty about the stimulus ͑response͒.
The essential quantity in evaluating a communications channel is the average mutual information between the stimulus sequence and the response sequence I(S,R) ϭI(R,S). It answers the question: On average how much, in bits, do we know about the stimulus sequence, having observed the response sequence, or vice versa? It is the critical measure of the ability to recover information encoded in the stimulus from observations of the response. It admits many equivalent forms ͓18͔. We use two that involve the conditional entropies This quantity is also positive semidefinite, as follows from the inequalities of the previous paragraph.
The expressions ͑5͒ and ͑6͒ admit two slightly different interpretations. In the first, H(S) represents the maximum information that could be encoded and H(S͉R) can be interpreted as the information lost in the communication process. In the second, H(R) corresponds to the maximal information that could be received and H(R͉S) can be seen as the part of this information that is independent of the stimulus.
We will also make use of the normalized average mutual information, which quantifies the efficiency of the information transmission:

͑7͒
This is dimensionless and, since H(S) is the maximum amount of information that can be encoded, 0рE(R,S) р1. E(R,S)ϭ0 means the stimulus and the response system are independent, and all incoming information is lost in the channel. E(R,S)ϭ1 means there is perfect matching between stimulus and response, so all information is preserved in transmission.
The last item from information theory that we will find useful for our work is the so-called ''data processing inequality'' theorem ͓6͔. If we have a communication chain in which the stimulus S is transmitted first to a receiver with response sequence R1 and then this response sequence is transmitted in turn to a second receptor with response sequence R2, the theorem states that This result has a clear intuitive meaning: Information not present at the intermediate waystation along the communications chain and not seen in the sequence R1 cannot be recovered further along in the response sequence R2. Information lost cannot be recovered. In our work this last result plays an important role. We will study a model of a neural processing chain, as displayed in Fig. 1 where the neuron elements along the chain are active nonlinear systems able to create information when running autonomously ͓1͔. In our model, a synaptic input S is injected into the first neuron N1. This is connected through an excitatory chemical synapse CH to a second neuron N2. The synaptic input is the stimulus sequence S for our system, and the membrane potentials of the neurons are the responses R1 and R2. We calculate the conditional entropies and average mutual informations among these three stages: I(S,R1), I(S,R2), and I(R1,R2). Now we turn to some details of the models adopted for our bursting neurons, for the synaptic input sequence and for the chemical synapse.

A. Neuron model
We work with a four-dimensional model of a spikingbursting neuron. This is an extension of the Hindmarsh-Rose model of thalamocortical neurons ͓19͔, and it was developed to reproduce the observed complex behavior of isolated neurons from the stomatogastric ganglion of the California spiny lobster. The model contains the intracellular membrane voltage x(t) and several currents represented as polynomials among the dynamical variables in the vector field of the differential equations. The polynomial form came from an attempt ͓19͔ to simplify the complicated current-voltage relationships of Hodgkin-Huxley conductance based models by providing accurate polynomial representations of these current-voltage relations within the limited dynamical range of neural activity. The equations take the form ͓20͔ where g,h,l ,, and , are parameters chosen to be g ϭ0.0278, hϭ1.605, l ϭ1.619, ϭ0.002 15, and ϭ0.0009. We use these parameter values for our model neurons throughout this paper. J dc will be varied as required. The dynamical variable x(t) represents the membrane potential, y(t) is a ''fast'' recovery current, and z(t) and w(t) are two slow adaptation variables (ϽӶ1). w(t) represents very slow exchange of intracellular calcium between the cytoplasm to the endoplasmic reticulum. J dc corresponds to an injected dc current and will be our main control parameter. J(t) represents the synaptic input for the neuron. As in other models of bursting neural activity, this model requires the combination of slow (z,w) and fast (x,y) subsystems. The fast subsystem alternates between quiescent and periodic behavior as the variables z and w change, giving rise to the bursting behavior.
The isolated neuron, with J(t)ϭ0, displays a wide variety of dynamical behaviors controlled by the parameter J dc . For the model parameters given above we observe ͑a͒ quiescent membrane voltage for J dc Ͻ0.73 and bistability in state space near a subcritical Hopf bifurcation at J dc ϭ0.82; ͑b͒ periodic bursting for 0.82ϽJ dc Ͻ3.0; ͑c͒ chaotic bursting for 3.0ϽJ dc Ͻ3.25; and ͑d͒ continuous spiking for J dc Ͼ3.25.
This last regime is very interesting. It models an excitable cell near the boundary of chaotic bursting, since a small perturbation can induce short bursting sequences ͓21͔. This excitable regime will be the main region that we will explore in the next section.
FIG. 1. Schematic diagram of our model of a neural information transmission channel. The synaptic stimulus current sequence S is injected into the bursting neuron N1 that is unidirectionally coupled to a second bursting neuron N2 through an excitatory chemical synapse CH. Both neurons are modeled by the four-equation model ͑9͒-͑12͒. The synaptic input and the chemical synapse are determined by Eqs. ͑13͒ and ͑14͒-͑15͒, respectively. We analyze the average mutual information between the stimulus and the response neurons I(S,R1) and I(S,R2) as well as I (R1,R2).

B. Stimulus model
We represent a stimulus input as a train of spikes arriving at N1: with amplitude J 0 , firing times t i , and a characteristic decay time . ⌰(x) is the Heaviside function: ⌰(x)ϭ1, for x Ͼ0, and ⌰(x)ϭ0 for xϽ0. We will use inhibitory input (J 0 Ͻ0). The firing times are drawn from a constant distribution of interspike intervals with no dependence on the past firing times. We will use some standard interspike interval histograms ͑ISIH͒, such as exponential decays, as distributions from which to draw the interspike intervals, but we will also discuss cases where we used an ISIH with a bimodal distribution. The main question we ask of our model neural communications channel is how the information content of the stimulus spike train is represented in the response sequence R1 at N1 and in the response sequence R2 at N2.

C. Synapse model
For the chemical synapse we adopt a simple model derived from first-order kinetics ͓22͔, but also incorporating dynamics in the neurotransmitter concentration n(t). An action potential from N1 rising above threshold x(t)Ͼx th stimulates the release of neurotransmitters with concentration n(t) in the synaptic cleft. The neurotransmitters bind to ligand-gated cation channels increasing the conductance in the postsynaptic membrane of the receiving neuron N2.
When all the channels are open, the conductance reaches its maximum value g 0 . The conductance is an increasing function of the neurotransmitter concentration n(t) saturating when n(t)ӷn 0 .
We represent the simple kinetics of the neurotransmitter as where ␣ is a loss rate for the neurotransmitter and ⌰(x) is the Heaviside function. In response to this release of neurotransmitter, we model the postsynaptic current going into neuron N2 as The parameters x th and x rev are thresholds for the neurotransmitter release and the reversal current, respectively. We use a sigmoidal function for the saturating conductance of N2. and n 0 determine the steepness and the midpoint of this saturation.
Our model neural information transmission system consists of nine dynamical variables "x k (t),y k (t),z k (t),w k (t)…, kϭ1,2 and n(t). One current, the stimulus J 1 (t), is specified, and the other current J 2 (t) is determined by x 1 (t) and n(t). The full system of equations reads for N1, for the chemical synapse, and for N2, In Fig. 2 we display a short section of a time series showing the stimulus J 1 (t) and the membrane potentials x 1 (t) and x 2 (t). Both neurons, absent inputs, are placed in the continuous-spiking excitable regime. The inhibitory synaptic spike input induces hyperpolarizations; it lowers the value of x 1 (t). This stimulates hyperpolarizations in N2 through the excitatory chemical synapse. The coincidence of spikes in the input with hyperpolarizations in the N1 depends on J 0 as well as the past history.
After hyperpolarization by the stimulus, the spiking is very fast and the N1 is less able to hyperpolarize. This results in a refractory period for the occurrence of another hyperpolarization. This refractory period will set a limit to the maximum information transfer in the bursting coding space. As we will see in the next section, the match between spikes in the input and hyperpolarizations in the neurons can be closely related to the information transfer through our model channel.

IV. NEURAL CODES AND INFORMATION TRANSMISSION
Now we turn to the calculation of the various average mutual information values indicated in Fig. 1. We investigate PRE 62 the information connection between our spike train stimulus and each response I(S,R1) and I(S,R2) as well as the information connection between the two response locations I(R1,R2). Our interest lies in how these quantities depend on the choice of the coding space, on the time resolution ⌬t, on the word size L, on the stimulus ensemble, on the region in parameter space where we operate our response neurons, and on the level of disturbance of the transmission by additive noise in J 1 (t) and J 2 (t).
To evaluate the entropies and average mutual informations for our communications system, we need to calculate the probabilities of occurrence p(s i ), p(r j ), and p(s i ,r j ) of binary words for a particular choice of coding space. This space is defined by our choice for an event in the neural signal; for example, one hyperpolarization or one spike or both, the time resolution ⌬t, and the word size L. From the synaptic input J 1 (t) we can evaluate p(s i ). The frequencies of occurrence of events in the response neurons comes from encoding x 1 (t) and x 2 (t). We evaluate these from integrating our nine degree of freedom dynamical system and then counting the number of appearances of the possible coding words. In the limit of an infinite number of samples p(r j ) ϭn(r j )/M , where n j is the number of observations of the word r j in the output and M is the total number of samples. For finite integration times we always underestimate the real value of the entropy, since we are neglecting most of the small terms in the sum of Eq. ͑1͒. A correction term and an estimation of the error were derived in ͓23,24͔. The formulas for these corrections are discussed in the Appendix. Most simulations were stopped when the estimated error was less than one percent of the observed entropy value.
As we explained in Sec. II, we construct our binary words looking at the neural signal through a window in time with L bins of size ⌬t and setting s i ϭ1 if there is an event in that bin and s i ϭ0 otherwise. We will call M the total number of words used in the simulation. For computational reasons we are limited to windows Lр16. But even with this limitation the number of all possible words in the product space ͑input-output͒ where we compute the conditional entropies is big enough (M Ͼ10 7 ) to require extensive calculation. With our spike train synaptic input J 1 (t) an event is always taken to be the occurrence of a peak or, for J 0 Ͻ0, a valley. We will work with two different coding spaces for the neural responses x 1 (t) and x 2 (t): bursting coding space ͑abbreviated BCS͒, in which a hyperpolarization is the event in the time series, and spiking coding space ͑abbreviated SCS͒, in which the event is the occurrence of a spike.
The main neuron operating region explored will be the continuous spiking regime near the boundary of the chaotic bursting region; this is the excitable region. In this regime, the spikes of J 1 (t) can induce hyperpolarizations in the neurons ͑see Fig. 2͒. In this region, the BCS is more natural, and we begin with that. Subsequently we investigate SCS with the neurons in the same parameter region.
In the following subsections we will show how a straightforward calculation of the information measures for theses two coding spaces can lead to some striking results.

A. Bursting coding space
If we place N1 in the continuous spiking regime, the spikes of the synaptic input can induce some hyperpolarizations, provided that the system is not too far from the bursting regime at J dc1 Ϸ3.25. The continuous spiking of the first neuron stimulates an almost constant rate of neurotransmitter release n(t). This leads to continuous excitation of the second neuron through the chemical synapse. We can choose a value of J dc2 such that the second neuron is in the continuous spiking regime as long as it receives the excitation from the first neuron. When N1 undergoes a hyperpolarization, n(t) decreases and so does the synaptic excitation in the second neuron. This perturbation can induce, in turn, a hyperpolarization in the second neuron. In this way, an event in J 1 (t) can induce an event ͑hyperpolarization͒ in N1 and following that induce an event in N2.
We have described, in a very simplified way, how information can be transferred from the input sequence S through two consecutive sections of our channel: from the stimulus to the output sequence R1 of N1, and from there to the output sequence of R2 of N2. From the description of the previous paragraph it might appear that we can only observe an event in N2 if there is an event in N1. From the conventional point of view of information theory, if the transmission from the source of a pattern of spikes in S ''failed'' in the first part of channel, this cannot be recovered in the second part of the channel and the data processing inequality ͓Eq. ͑8͔͒ holds for our information transmission chain.
However, we are dealing with dynamical systems, and we no longer have a passive information channel. When the transmission of a spike from the input fails to be sensed at N1, the first neuron has no hyperpolarization or equivalently no event in the BCS. However, the rate of spiking in N1 is slightly modified as can be seen in Fig. 2. This information is FIG. 2. Time series of the synaptic input J 1 (t), the membrane potential of the first neuron x 1 (t) and the membrane potential of the second neuron x 2 (t), for the model system governed by Eqs. ͑9͒-͑15͒. Parameter values for each neuron, for the excitatory chemical synapse, and for J 1 (t) are given in the text. lost for the BCS form of coding, but preserved in the dynamics of N1. It can be utilized downstream to induce an hyperpolarization in N2, leading to recovery of the ''lost'' information.
Actually, the chemical synapses are highly sensitive to variations in the spiking rate since they are basically integrators with a nonlinear saturating function. We can place our synaptic threshold n 0 so that N2 can detect these small variations in the spiking rate of N1, and undergo hyperpolarization. In the series displayed in Fig. 2, we chose the parameters in order to have a sensitive second neuron, able to recover these ''lost'' events. The parameter values we used are J 0 ϭϪ0.05 ͑inhibitory͒ and ϭ10, for the sensory input; x th ϭϪ1, ␣ϭ0.05, for the neurotransmitter dynamics; and g 0 ϭ0.1, x rev ϭ3, ϭ50, and n 0 ϭ4 for the postsynaptic current J 2 (t). The dc currents placing N1 and N2 in the excitable region are J dc1 ϭ3.4 and J dc2 ϭ3.4. With these parameter values and no input signal J 1 (t)ϭ0, both neurons were spiking continuously.
As can be seen in Fig. 2, most of the input spikes J 1 (t) are lost in the BCS of N1 x 1 (t), but almost all of them are recovered in the BCS of N2 x 2 (t). This will lead us to the striking result that, for this coding space, the data processing inequality ͑8͒ no longer holds. The information ''lost'' in the first part of our channel is recovered in the second part, and hence the average mutual information I(S,R1) between the stimulus and N1 will be less than the average mutual information between the stimulus and N2, I(S,R2). So we find that which contradicts the data processing inequality ͑8͒. We call this result the information recovery inequality. By using the superscript BCS in Eq. ͑16͒ we stress that this result holds in the bursting coding space and with the dynamical scenario described above. Below we analyze the robustness of the information recovery inequality as a function of the injected DC currents and show it holds over broad operating regimes for the neurons.
From the point of view of the BCS there is some hidden information in x 1 (t) related to the stimulus that can be revealed by means of the dynamical behavior of the chemical synapse and N2. As we pointed out, this ''lost'' information is likely to be coded in the interspike intervals inside the bursts. We will address this issue in Sec. IV B.
The information recovery inequality ͑16͒ is not a consequence of finite time resolution, because the lost events in N1 cannot be recovered in the BCS for any window size or any time resolution. It was our selection of what constitutes an event in defining the BCS that is responsible for the violation of the data processing inequality. To verify this, we have calculated the normalized average mutual informations I(S,R1) and I(S,R2) for a wide range of values of L and ⌬t. These results are displayed in Fig. 3. The information recovery inequality ͑16͒ holds for all the values explored.
As one can see in Fig. 3͑a͒, the normalized average mutual informations between the signal and the two neurons E(S,R1) and E(S,R2) are only weakly dependent on ⌬t.
There is a small maximum at ⌬tϷ40, which corresponds to the optimum resolution in time for the hyperpolarization events. E(R1,R2) is more sensitive at high ⌬t where resolution is degraded.
On the other hand E(S,R1) and E(S,R2) monotonically increase with the window size, approaching a constant asymptotic value. To study this dependence we fixed ⌬t at the optimal time resolution ⌬tϭ40. Using a least-squares fit to the form E ϱ ϪE 0 e ϪL/L 0 , we found E ϱ (S,R1)ϭ0.149 Ϯ0.001 and E ϱ (S,R2)ϭ0.528Ϯ0.002.
In what follows, the calculated values of normalized average mutual information are reported at fixed L and ⌬t, and they are unbiased, unless we explicitly state otherwise. All values will be taken at ⌬tϭ40, the optimal value, and by extrapolation at L→ϱ. We also use the bias corrections derived in the appendix. The L→ϱ extrapolation cannot be applied to the entropies as they grow with L.
It appears that the average mutual information ͓even the normalized version ͑7͔͒ is also stimulus-dependent. The efficiency of the information channel depends on the statistics of the input. Thus, we also need to specify the properties of the input we are using. The results derived above were obtained using a signal with maximal entropy for a specified average interspike interval. As shown in ͓2͔, this corresponds to an exponential probability distribution in the number of spikes within a window. The mean interspike interval was approximately equal to the mean bursting rate of the system near the chaotic region; here this is about 400 time units. Now we want to study a richer input. We assume that the incoming signal is originated by a renewal process, that is, the stimulus is completely determined by its interspike time interval probability distribution or histogram, called an ISIH. There are many different characteristic ISIHs in real neurons. To illustrate how changing the ISIH affects the communication process, we examined a bimodal distribution for spike intervals in J 1 (t). In our simulations, the interspike times t i Ϫt iϪ1 were drawn from the distribution where t 1 and t 2 are characteristic times, 1 and 2 decay times for the peaks, c 12 is a parameter controlling the relative height of the peaks, and W 0 is an overall normalization factor. We used t 1 ϭ200, t 2 ϭ600, 1 ϭ70, 2 ϭ200, and c 12 ϭ0.76. In Fig. 4, we display the ISIH of the synaptic input and also the computed histograms of time intervals between events for each neuron output. This figure illustrates in a very clear way the recovery of the ''lost'' information. The bimodal structure is completely lost in the interspike interval distribution at N1 ͑dashed line͒ and recovered accurately in the interspike interval distribution at N2 ͑dotted line͒. The left peak at N2 is somewhat decreased from its value in the stimulus because of the refractory effect that we mentioned earlier. Note that the histograms of the interevent times for the neurons display a multipeaked structure. The origin of this structure arises because the bursts always possess an integer number of spikes. Even when the interspike intervals are variable, they follow a rigid sequence.
It is still possible that our information recovery inequality result ͑16͒ depends on the particular choice of stimulus ensemble, as this is true for most of information measures. We first used a maximum entropy stimulus that has maximum potential information to transfer and a mean information rate that was ''tuned'' to give a maximum rate of information transfer. This means that the average interspike interval of the stimulus is greater than the refractory period of the neurons but not big enough to lower the rate of information transfer. It is unlikely that stimulus signals in nature all have maximum entropy, so we also investigated a richer input signal with two characteristic times and verified that the information recovery inequality still holds.

B. Spiking coding space
We suggested in Sec. IV A that the ''lost'' information in the BCS of N1 could be stored in modulations of the spike rate within a burst and then recovered at N2 through the dynamical action of the nonlinear neurons. A careful inspection of the time series strongly supports this hypothesis. One might expect then that studying the SCS where each event corresponds to one action potential we would recover the data processing inequality ͑8͒. In this subsection, we will show that this is not the case. The information recovery inequality ͑16͒ holds for large parameter regions even when we use SCS.
We keep the same parameter values noted in Sec. IV A. Since now we are dealing with a significatively smaller typical time interval between events, we need a high time resolution. This requires ⌬t less than the minimum interspike interval. We also need to increase the word size because in the input signal the events are still slow. In all calculations in the SCS we adopt a word size of 16 bits. This, in turn, increases the integration time, as explained in the appendix.
We first explore the dependence of the information measures on ⌬t. In Fig. 5, we show the normalized average mutual information values of E(S,R1) and E(S,R2) in the SCS. For some intermediate ⌬ts we recover the data processing inequality ͑8͒. However, for small ⌬t we have the same result as in the BCS, recovery of ''lost'' information expressed by the information recovery inequality ͑16͒. The For the neurons, we calculated the histogram of times between events, here hyperpolarizations, in the parameter region described in Sec. IV A. The bimodal structure is lost in the first neuron ͑dashed line͒, but almost completely recovered in the second one ͑dotted line͒. Histograms were obtained from a simulation with a total integration time of t tot ϭ2ϫ10 9 . There were over 5ϫ10 5 events in the stimulus, 10 5 events in R1, and about 5ϫ10 5 events in R2. crossing of the E(S,R) values results because spiking after a hyperpolarization is faster in N2 than in N1. Since we covered a wide range of ⌬t there is some intermediate region where the time resolution is high enough to resolve all the spikes in N1 but less able to resolve the spikes in N2.
For ⌬t less than the minimum interspike interval of N2, namely, ⌬tϭ3, we still find that N2 can recover the information ''lost'' by N1. This means that using SCS we also have the information recovery inequality We will discuss a possible explanation of these results in Sec. V.

C. Robustness of the results in parameter space
So far, we explained how the chemical synapse can be tuned to convert small variations in the spiking rate of N1 to hyperpolarizations in N2. It may seem that this was a fortunate choice of parameter values, and that one might be unlikely to observe this phenomenon in a real environment. In this section, we want to explore the robustness of our results in parameter space and with additive noise in the synaptic currents J i (t). We explore here only variations in the external dc currents J dc1 and J dc2 as the mode of operation of the individual autonomous neurons is so dependent on these currents.
We first calculate the average mutual information values I(S,R1) and I(S,R2) using the BCS as a function of J dc1 and J dc2 for a fixed sensory input J 1 (t) with interspike intervals drawn from a bimodal distribution, ⌬tϭ40, and L ϭ10. In order to clarify our results we define the ratio which is the relative transmission efficiency as seen at R1 and at R2 for sensory sequences S. When EϾ1 we satisfy the information recovery inequality ͑16͒, while the data processing inequality ͑8͒ is associated with EϽ1.
We display a contour plot of E(J dc1 ,J dc2 ) in Fig. 6. There is a wide region where EϾ1. Thus, the system satisfies the information recovery inequality generically, not as a special circumstance. It is remarkable that the information recovery inequality ͑16͒ holds for quite different dynamical behaviors of N1 and N2. The region labeled A in Fig. 6 corresponds to the excitable regime studied in the previous two sections.
Here, E(S,R2) reaches its optimal value. The region B is associated with periodic bursting when there is no input. In region C the relative efficiency diverges, because the average mutual information between the stimulus and the first neuron goes to zero. In this regime, N1 is spiking continuously with no apparent hyperpolarizations, nevertheless some hyperpolarizations related to the stimulus are observed in N2. This is an extreme case of the phenomenon that we are describing.
As an example of the hidden information for a different region in parameter space, we display in Fig. 7 the time series of the stimulus and the two neuron signals for our system with parameter values J dc1 ϭ2.2 and J dc2 ϭ3.7. This corresponds to region B in Fig. 6. In the absence of input, N1 and N2 are synchronized in the periodic bursting regime. When a small inhibitory input sequence is added, J 0 ϭ Ϫ0.05, the periodic bursting in N1 is almost unchanged; only the spiking rate is slightly modified. By contrast, the behavior of N2 is dramatically altered. Some normal hyperpolarizations are missing, and the spiking rate shows strong modulation. In this regime, the efficiency is lower than that corresponding to the continuous spiking regime explored in the previous sections. However, the average mutual information in the BCS between the stimulus and N2 is E(S,R2) ϭ0.018 but still greater than between the stimulus sequence and N1 E(S,R1)ϭ0.009. Even when the information transmission is not really very good, the information recovery inequality can hold. FIG. 6. Contour plot of the relative efficiency E defined by Eq. ͑19͒ as a function of (J dc1 ,J dc2 ). The nonshadowed region corresponds to (EϾ1), where the information recovery inequality is satisfied. Of special interest are the regions labeled: ͑A͒ continuous spiking, explored in Sec. IV A and IV B; ͑B͒ periodic bursting and ͑C͒ periodic spiking ͑see text͒.
FIG. 7. Time series of the synaptic input J 1 (t), the membrane potential of the first neuron x 1 (t) and the membrane potential of the second neuron x 2 (t), for the model system governed by Eqs. ͑9͒-͑15͒ in the periodic bursting region described in Sec. IV C ͑region B in Fig. 6͒.

D. Robustness in the presence of synaptic noise
Now we explore how the information measures change as we add some noise to the system. In standard information theory one normally expects that the average mutual information will decrease when noise is added. We will show that this is not always the case for a dynamical information transmission channel.
We add Gaussian white noise with zero mean and variance D to the synaptic currents J 1 (t) and J 2 (t). We will study both the continuous spiking and periodic bursting regime fixing ⌬tϭ40 and Lϭ10 in the BCS.
In the continuous spiking region we previously explored, J dc1 ϭ3.4, J dc2 ϭ3.4, the efficiency values E(S,R1) and E(S,R2) are quite high. In particular, E(S,R2) is limited by the refractory time only. Therefore, the addition of noise cannot further increase this efficiency. In Fig. 8͑a͒ we show the dependence of the E(S,R1) and E(S,R2) on the noise level. Although in this case the noise inhibits the communication process, the inequality recovery inequality ͑16͒ still holds for low and intermediate noise levels. Note that each curve has a resonant peak for a noise amplitude of DϷ0.01.
Next we look at the periodic bursting region where J dc1 ϭ2.2 and J dc2 ϭ3.7. The efficiency values are quite low here. In this case, the addition of noise enhances the transmission of information, as shown in Fig. 8͑b͒. We observe the same resonant peaks as before, but now these peaks correspond to an optimal noise level for the transmission of the signal.
Other nonlinear processes have shown enhanced information transmission characteristics in the presence of noise. In ͓25͔, the authors observed this in the case of a nonperiodic stochastic resonance. The resonant peaks in the mutual information as a function of the noise level were studied in ͓26,27͔.
Our situation is different. In the case of stochastic resonance the effect occurs because of noise, while here noise may enhance an existing effect. The enhancement is associated with the stimulation of nonlinear instabilities in N1 and in N2 beyond those acting when noise is absent in the J dc1 ,J dc2 regime where periodic spiking bursting is seen. In that regime limit cycle behavior is quite stable and N2 cannot operate long enough to enhance information ''hidden'' at N1. The Lyapunov exponents of the processes in this J dc1 ,J dc2 regime are too small. Noise, however, may increase the effectiveness of the required instabilities, and as we see, it often does. We expect this stochastic enhancement of channel capacity by noise to hold only for an intermediate range of noise levels. If the noise is too small, the induced instability is not large enough. If the noise is too large, the signal is swamped in the usual manner and channel capacity is lowered.

V. DISCUSSION
Information transmission in neural circuits involves active nonlinear elements. These can create information when they operate autonomously ͓1͔ and, as we have shown in this paper, recover it when they act as input/output elements as part of an information transmission system. In our models we have explicitly demonstrated that the average mutual information between an input sequence, called S here, and a response sequence R2 at a neuron N2 downstream of another response neuron N1 can be greater than the average mutual information between the stimulus sequence and the response sequence R1 at N1. This is expressed in our information recovery inequality which can only hold when the communications channel, here the unidirectional sequence stimulus→N1→N2, contains active elements. In any passive communications channel, the corresponding inequality, called the data processing inequality, is precisely the opposite. This is not fundamentally a paper about information theory, and the model system in which we calculate average mutual information between stimulus and response uses realistic, hopefully accurate despite being simplified, models of the neurons, the excitatory chemical synapse between neurons, and the input sensory sequence of spike trains. We have held to a connection with the underlying biological questions by assuming that only the arrival times of the spikes in the sensory signals are important and only the membrane voltage variations of the neural dynamics matter.
Within this framework we have had to make a number of specific choices about how we represent or encode the stimulus and response sequences, and by no means have we ex- FIG. 8. Normalized average mutual information E(S,R1) ͑continuous line͒ and E(S,R2) ͑dotted line͒ as a function of the variance D of white noise added to the synaptic currents J 1 (t) and J 2 (t). ͑see Sec. IV C͒. ͑a͒ Noise-inhibited information transfer in the continuous spiking region J dc1 ϭ3.4 and J dc2 ϭ3.4. ͑b͒ Noiseenhanced information transfer in the periodic bursting region J dc1 ϭ2.2 and J dc2 ϭ3.7. hausted the possibilities. Our results demonstrate, however, that with two quite realistic choices of finite coding spaces for representing the sequences there are regions of neuron and coding parameter space where the information recovery inequality holds: information apparently unavailable to a reader of the neural code at one location along the information transmission line is recovered further along the transmission line.
We argue that both the reduced representation of the stimulus in the finite coding space and the active properties of neurons are responsible for the information recovery inequality. By selecting a particular feature of only one dynamical variable of the neuron and discretizing in time we are neglecting other degrees of freedom and allowing the information hiding in the first response neuron. On the other hand, the unstable trajectories of the second response neuron allow the further recovery of the hidden information by means of the mechanism explained above.
The same mechanism has been shown to lead to the possibility of enhanced performance in the presence of channel noise completely different from passive channels we are familiar with. We are not certain of the biological relevance of this feature of active transmission channels, but it may prove of interest in the design of useful communications channels in other contexts.
It is also apparent that in a wide region of parameter space ͑see Fig. 6͒ our chain of two dynamically active neurons is a better information transducer than the first neuron alone. In the neural communications system we explored, the efficiency can be very high even when one of the components is a poor transducer. This observation is of crucial importance because it is known that in real neural systems synapses are often unreliable. We hypothesize that nature may take advantage of the active properties of neurons as described here to develop reliable neural networks with unreliable synapses and inaccurate component neurons. Indeed, our model suggests a framework in which to understand why biological information systems do not consist of a single, reliable information transducing neuron that performs all required tasks well.
It is worth noting that even when we are working with nonlinear neural systems that can be chaotic, we are not studying the time asymptotic behavior on any attractor of the autonomous system because of the highly complex timedependent input which, mathematically, does not permit attractors. Instead the nonlinear information transmission network is continuously exploring different transients. The amount of information that can be stored in these transients is considerably greater than the information carried by the autonomous attractors alone given finite time and amplitude resolution. For a single model neuron of the type we have used in this paper, these transient trajectories live in a fourdimensional state space. There is certainly enough ''room'' in that space to store the ''hidden'' information that may be recovered by unstable actions of the nonlinear network elements. It is also feasible that natural systems use these transients to enrich their behavior ͓28͔. Some experimental methods to calculate information transfer, such as the ''direct method'' ͓15͔, often neglect this, repeating the same stimulus many times.
Our results open the interesting question of whether classic average mutual information is an adequate measure of the interdependence of variables in active dynamical systems. While we do not have an answer to this matter, we recall that average mutual information is not able to distinguish between two signals that are directly connected and two signals with a common input ͓29,30͔. Average mutual information contains neither dynamical nor directional information. We are working with nonlinear oscillating systems with many degrees of freedom, and with the usual application of information theoretic ideas we are only observing some features of a single variable. In order to have a complete description of information processing in active networks, it may be that a new approach that takes into account all the intrinsic dynamics is needed.
Neurons acting as active dynamical systems are not just information transducers. They can enrich their input signals and communicate on different time scales. Neural systems are nonequilibrium systems; They can make use of their unstable trajectories to encode information throughout their entire available state space. That they can show information creation and recovery, expressed quantitatively by our information recovery inequality, in distinction to properties established for passive communications channels, should not surprise us ͓31͔. Instead these aspects of nonlinear activity should provide an interesting framework for understanding the rich properties of realistic neural networks. ACKNOWLEDGMENTS M.C.E. was supported by UBA, Fundación Antorchas, and FOMEC. Partial support for this work came from the U.S. Department of Energy, Office of Science, through Grants Nos. DE-FG03-90ER14138 and DE-FG03-96ER14592, and from the U.S. Army Research Office under a MURI Contract. We thank Pablo Varona, Reynaldo Pinto, Ramon Huerta, Ya. Sinai, Lev Tsimring, and Allen Selverston for many useful discussions on the topics considered here. This manuscript was completed while H.D.I.A. was a visitor at the University of Western Australia; he thanks Alistair Mees for hospitality at UWA.

APPENDIX
The most straightforward way to estimate the information measures Eqs. ͑1͒-͑7͒ is to use empirical probabilities, such as q(r j )ϭn(r j )/M , where n(r j ) is the number times that the word r j was observed and M is the total number of samples. These estimates are affected by random error in numerics but also by a systematic error or bias. The bias can be estimated from a numerical experiment and written as a series expansion in inverse powers of M ͓23͔. Here we report just the leading correction term for the entropy and average mutual information. Let us write H M (R) and I M (R,S) as the biased estimation using M samples of the actual entropy H(R) or average mutual information I(R,S). Then, the lowest order corrections are given by where the C's represent the number of relevant words with finite probability in the various sampling spaces. Since the actual probabilities are unknown, we estimated the number of relevant words by the number of different observed words. We calculate these approximate bias terms as a function of M. In most cases, the numerators of the bias terms relax to their M →ϱ value very rapidly. Then we can estimate how many samples are needed in order to reach any desired accuracy. For medium-sized words LϽ16 the simulations were stopped when the bias was smaller than one percent of the estimated information measure. For words with 16 or more bits the bias terms become more important but remain less than ten percent. The random errors, or, at the lowest order, the variance of the observed information measures can also be estimated from numerical experiments. The formulas are derived in ͓24͔ and read H ϭ ͱ 1 M ͚ iϭ1 C †log 2 q͑r j ͒ϩH M ‡ 2 q͑r j ͒ †1Ϫq͑ r j ͒ ‡ ͑A3͒ where the q's are the estimated probabilities. For example, q(s i ,r j ) is the number of times that the word s i in the stimulus was followed by the word r j in the output, divided by the total number of samples. For the integration time used in our work these errors were even smaller than those associated with the bias.