Since the time of Darwin, biologists
have wondered whether birdsong and music may serve similar purposes or
have the same evolutionary precursors. Most attempts to compare song
with music have focused on the qualities of the sounds themselves, such
as melody and rhythm. Song is a signal, however, and as such its meaning
is tied inextricably to the response of the receiver. Imaging studies
in humans have revealed that hearing music induces neural responses in
the mesolimbic reward pathway. In this study, we tested whether the
homologous pathway responds in songbirds exposed to conspecific song. We
played male song to laboratory-housed white-throated sparrows, and
immunolabeled the immediate early gene product Egr-1 in each region of
the reward pathway that has a clear or putative homologue in humans. We
found that the responses, and how well they mirrored those of humans
listening to music, depended on sex and endocrine state. In females with
breeding-typical plasma levels of estradiol, all of the regions of the
mesolimbic reward pathway that respond to music in humans responded to
song. In males, we saw responses in the amygdala but not the nucleus
accumbens – similar to the pattern reported in humans listening to
unpleasant music. The shared responses in the evolutionarily ancient
mesolimbic reward system suggest that birdsong and music engage the same
neuroaffective mechanisms in the intended listeners.
Introduction
Ornithologists and musicians alike have long
contemplated whether the song of birds might somehow be classified as
“music.” The question can be approached from a variety of angles, each
of which produces a somewhat different answer. Researchers have asked,
for example, whether birdsong and music share evolutionary precursors or
functions (Darwin, 1871; Catchpole and Slater, 1995; Miller, 2001), tonal variation or rhythm (Dobson and Lemon, 1977; Slater, 2001; Baptista and Keister, 2005; Araya-Salas, 2012), or organization (Marler, 2001), and whether, like music, birdsong is creative (Marler, 2001; Hartshorne, 2008).
Whether any particular species of songbird has music-like song depends
on the parameter measured and the type of analysis employed.
Birdsong, hereafter referred to as song, is a signal;
it has a sender and a receiver. Ultimately, a signal’s effect on the
receiver, not its structure, dictates its meaning and function (reviewed
by Scott-Phillips, 2008).
When comparing song and music, it may therefore be informative to ask
about the receiver’s response and subjective experience. Human listeners
find music rewarding; they will approach it and work to hear it.
Songbirds of many species likewise show a phonotaxic response to
conspecific song. Female pied flycatchers (Ficedula albicollis) and European starlings (Sturnus vulgaris) approach and enter nest boxes containing speakers playing male song (Eriksson and Wallin, 1986; Gentner and Hulse, 2000), and female zebra finches (Taeniopygia guttata) will peck a key to hear male song (Riebel, 2000). Young male zebra finches who are learning to sing will also peck to hear song (Adret, 1993), but in general, a phonotaxic effect of song is less pronounced in male songbirds than in females (Dobson and Petrinovich, 1973; Stevenson-Hinde and Roper, 1975).
Measuring behavioral responses is but one way to
assess the effects of a signal on the receiver. Over the past decade,
neuroimaging studies have identified at least 20 different brain regions
that show altered BOLD or PET responses during music listening. Some of
the most commonly reported responses, particularly to music that is
pleasurable to the listener, are those of the mesolimbic reward system.
This system consists of the ventral tegmental area (VTA) and its
dopaminergic projections to several regions of the forebrain, for
example the nucleus accumbens (nAc) in the ventral striatum.
Release of
dopamine in nAc occurs at precisely the time that intensely pleasurable
autonomic responses, or “chills,” are experienced during music listening
(Salimpoor et al., 2011). Although the release itself may not itself cause the experience of reward, it indicates that the stimulus is associated with reward (reviewed by Wise, 2004).
Also included in the reward system are the dorsal striatum (e.g.,
caudate nucleus in humans), the heavily interconnected amygdala and
hippocampus (Hp), and the prefrontal cortex. Each of these regions have
been shown in multiple human imaging studies to respond to music with
BOLD or PET responses (Blood and Zatorre, 2001; Koelsch et al., 2006; Mitterschiffthaler et al., 2007; Montag et al., 2011; Pereira et al., 2011; Salimpoor et al., 2011).
In this study we looked for neural responses to song
in the avian homologues of music-responsive brain regions. Functional
MRI can be used in songbirds listening to song (Van Meir et al., 2005; Boumans et al., 2007),
but to date those analyses have focused primarily on the major auditory
areas. The nAc and other areas known to respond to music in humans are
difficult to study using this technique in songbirds, primarily because
of their small size. Neural responses to stimuli can be more readily
studied in birds by mapping the expression of immediate early genes
(IEGs) such as Fos and Egr-1. In such studies, a stimulus is presented
to an animal and the brain harvested 60–90 min later. The protein
products of IEGs can then be labeled in fixed brain sections using
immunohistochemistry, which provides cellular resolution.
Dubbed the
“genomic action potential” (Clayton, 2000),
the IEG response indicates that a neuron has begun to respond to a
stimulus with new protein synthesis related to synaptic remodeling.
Although the IEG and BOLD responses make use of different underlying
molecular mechanisms, there is good agreement between results obtained
by both methods (Lazovic et al., 2005; Stark et al., 2006). In songbirds, for example, hearing song induces robust Egr-1 and BOLD responses in the auditory forebrain (Mello et al., 1992; Gentner et al., 2001; Van Meir et al., 2005; Boumans et al., 2007).
Egr-1 is particularly useful in the study of reward because it appears
to play an active role in the reward process. In rodents, Egr-1 is
induced in the reward pathway by drugs such as methamphetamine,
morphine, nicotine, or cocaine (reviewed by Girault et al., 2007).
Blockade of Egr-1 prevents conditioned behavioral responses to these
drugs, suggesting that Egr-1 not only marks neuronal responses to reward
but is required for the acquisition of reward-reinforced behaviors.
In this study, we used Egr-1 as a marker to map and
quantify neural responses in the mesolimbic reward system in male and
female white-throated sparrows (Zonotrichia albicollis) listening to conspecific male song. This species sings a particularly musical-sounding song (Saunders, 1959) with heavy use of whistles with a sustained pitch (Dobson and Lemon, 1977). During the non-breeding season, song is used by both sexes to establish and maintain dominance relationships (reviewed by Maney and Goodson, 2011).
During the breeding season, however, the message contained in song
differs for male and female listeners. A female listening to male song
is almost certainly being courted, whereas a male is being challenged by
a territory holder or intruder. Song is therefore expected to have a
more positive valence for females than for males. We predicted that
neural responses to song in the females would resemble that of humans
listening to liked music, whereas the pattern in the males would not.
The valence of song may be affected also by endocrine state. In Zonotrichia
sparrows, females give a courtship display in response to song only
when their plasma estradiol (E2) reaches breeding-typical levels (Moore, 1983; Maney et al., 2009). Males respond to song by singing back, and are more likely to do so if their testosterone (T) levels are elevated (Maney et al., 2009).
Because the function of song, and behavioral responses to it, vary
according to endocrine state, we manipulated plasma E2 in females and T
in males in order to look at the effects on neural responses in the
reward pathway. Following these manipulations, we exposed the birds to
conspecific male song and quantified the expression of Egr-1 throughout
the mesolimbic reward pathway. Because E2 treatment was expected to
increase the valence of song, we predicted that responses would be
greater in the E2-treated females than in untreated, non-breeding
females. T-treatment was expected to lower the valence of an already
negative stimulus, so we predicted little or no effect of T-treatment on
the magnitude of mesolimbic reward responses in males.
Materials and Methods
Animals
All research was conducted in accordance with National
Institutes of Health (NIH) principles of animal care, federal, and
state laws, and university guidelines. Twenty-three white-throated
sparrows of each sex were captured in mist nets during fall migration
and housed initially in mixed-sex aviaries at the animal care facility
at Emory University. The sex of the animals was confirmed via PCR analysis of a blood sample (Griffiths et al., 1998). Birds were housed under a short day length (10L:14D) for at least 4 months (Maney et al., 2007, 2008). The day length remained the same throughout the study to prevent gonadal recrudescence and elevation of endogenous E2 and T.
Hormonal Manipulation
Before the start of each experiment, birds were moved
to individual cages (15″ × 15″ × 17″) inside walk-in sound-attenuating
booths (Industrial Acoustics, Bronx, NY, USA). On the day of transfer,
each bird received one subcutaneous silastic capsule (ID 1.47 mm, OD
1.96 mm, Dow Corning, Midland, MI, USA) sealed at both ends with A-100-S
Type A medical adhesive (Factor 2, Lakeside, AZ, USA). Females received
12 mm capsules that were either empty (n = 11) or filled with 17β-estradiol (n = 12; Steraloids, Newport, RI, USA). Males received 15 mm capsules that were either empty (n = 11) or filled with T (n = 12; Steraloids). These doses elevate E2 and T to breeding-typical levels in this species (Maney et al., 2008, 2009; Sanford et al., 2010)
and stimulated the E2-dependent courtship behavior known as copulation
solicitation display (CSD) in this sample. After receiving the capsules,
birds were housed in single-sex groups of 4–6 per booth for 7–9 days.
All booths were identical.
Stimulus Presentation
On the afternoon prior to stimulus presentation, each
bird was isolated by placing its cage inside an empty sound-attenuating
booth equipped with microphone, speaker, and video camera. The stimulus
playback began at 1 h after lights-on the following morning and was
delivered via the speaker located inside the booth. The type of
stimulus (song or tones, see below) was balanced across treatment groups
for both males and females such that six hormone-treated and six
blank-treated birds heard song, and six hormone-treated and five
blank-treated birds heard tones. The stimuli were presented at a peak
level of 70 dB measured at the bird’s cage (Maney et al., 2008).
The stimulus presentation was followed by 18 min of silence. Video
recordings of all birds were made during the stimulus presentation. For
the females, we counted copulation solicitation events, defined as tail
lifts, wing quivers, or vocalizations characteristic of CSD (see Maney et al., 2003). For the males, we counted full and partial songs (see Maney et al., 2009).
Sound Stimuli
Songs
White-throated sparrow songs obtained from the Borror
Laboratory of Bioacoustics birdsong database were converted to AIFF
format and background noise was removed. The recordings were edited so
that a song was heard every 15 s, which mimics a natural song rate.
Sequences of songs were then spliced together so that the identity of
the singer changed to a novel male every 3 min. Presenting a variety of
songs helps overcome habituation to the stimulus (Stripling et al., 1997).
Each bird within a treatment condition (hormone or blank) heard 14
different singers, in a unique order determined by a balanced Latin
square, for a total stimulus duration of 42 min.
Tones
For each of the 14 recordings of males singing, the
frequency of each whistle (note) in one song was measured using
AudioXplorer (Arizona Software, San Francisco, CA, USA). Songs usually
contained five distinct frequencies. For each song, eight sinusoidal
tones were generated at these frequencies and arranged in a random order
200 ms apart, resulting in a tone sequence that matched the song in
duration, the average number of onsets and offsets, and total sound
energy at each frequency. Tone sequences were spliced together as for
the song stimuli, with 15 s of silence between each sequence, in an
order determined by a balanced Latin Square.
Histology
Sixty min following the onset of the stimulus
presentation, birds were deeply anaesthetized with isoflurane (Abbott
Laboratories, North Chicago, IL, USA) and decapitated. Ovaries were
inspected to confirm a regressed state. Brains were harvested, fixed,
and sectioned at 50 μm as previously described (Maney et al., 2003, 2007).
Every third 50-μm section was incubated with an antibody against Egr-1
(cat# sc189; Santa Cruz Biotechnology, Santa Cruz, CA, USA), which was
subsequently labeled using a biotinylated secondary antibody and
avidin-biotin complex (Vector, Burlingame, CA, USA). The specificity of
this antibody has been validated in this species via preadsorption studies (Saab et al., 2010). Labeling was visualized using diaminobenzidine enhanced with nickel (Maney et al., 2003, 2007). Sections were mounted onto gelatin-coated slides, dehydrated, and coverslipped in DPX (Sigma, St. Louis, MO, USA).
Quantification of Egr-1 Immunoreactivity
Examples of Egr-1 labeling are shown in Figure 1.
We sampled from within the avian homologues of the nAc, caudate
nucleus, Hp, medial amygdala, and VTA. We also sampled within an area
proposed as an avian homologue of the prefrontal cortex and which
receives a strong dopaminergic projection (Mogensen and Divac, 1982; Waldmann and Güntürkün, 1993). The names and abbreviations of each region of interest (ROI) and their human homologues are given in Table 1.
Egr-1 immunoreactivity (ir) was quantified in six sections, 150 μm
apart, in the VTA and in three sections in each of the other regions.
Egr-1-ir was quantified in these regions on one side of the brain,
chosen at random except when that region was damaged on one side due to
folding or tearing of the section; in these cases the intact side was
chosen. Images were acquired with a 4× (nAc and TnA) or 10× objective
(all other regions) using a Leica DFC480 camera attached to a Zeiss
Axioskop microscope. The light level on the microscope was set exactly
the same for each picture.