Since the time of Darwin, biologists 
have wondered whether birdsong and music may serve similar purposes or 
have the same evolutionary precursors. Most attempts to compare song 
with music have focused on the qualities of the sounds themselves, such 
as melody and rhythm. Song is a signal, however, and as such its meaning
 is tied inextricably to the response of the receiver. Imaging studies 
in humans have revealed that hearing music induces neural responses in 
the mesolimbic reward pathway. In this study, we tested whether the 
homologous pathway responds in songbirds exposed to conspecific song. We
 played male song to laboratory-housed white-throated sparrows, and 
immunolabeled the immediate early gene product Egr-1 in each region of 
the reward pathway that has a clear or putative homologue in humans. We 
found that the responses, and how well they mirrored those of humans 
listening to music, depended on sex and endocrine state. In females with
 breeding-typical plasma levels of estradiol, all of the regions of the 
mesolimbic reward pathway that respond to music in humans responded to 
song. In males, we saw responses in the amygdala but not the nucleus 
accumbens – similar to the pattern reported in humans listening to 
unpleasant music. The shared responses in the evolutionarily ancient 
mesolimbic reward system suggest that birdsong and music engage the same
 neuroaffective mechanisms in the intended listeners.
   
   Introduction
Ornithologists and musicians alike have long 
contemplated whether the song of birds might somehow be classified as 
“music.” The question can be approached from a variety of angles, each 
of which produces a somewhat different answer. Researchers have asked, 
for example, whether birdsong and music share evolutionary precursors or
 functions (Darwin, 1871; Catchpole and Slater, 1995; Miller, 2001), tonal variation or rhythm (Dobson and Lemon, 1977; Slater, 2001; Baptista and Keister, 2005; Araya-Salas, 2012), or organization (Marler, 2001), and whether, like music, birdsong is creative (Marler, 2001; Hartshorne, 2008).
 Whether any particular species of songbird has music-like song depends 
on the parameter measured and the type of analysis employed.
   
Birdsong, hereafter referred to as song, is a signal;
 it has a sender and a receiver. Ultimately, a signal’s effect on the 
receiver, not its structure, dictates its meaning and function (reviewed
 by Scott-Phillips, 2008).
 When comparing song and music, it may therefore be informative to ask 
about the receiver’s response and subjective experience. Human listeners
 find music rewarding; they will approach it and work to hear it. 
Songbirds of many species likewise show a phonotaxic response to 
conspecific song. Female pied flycatchers (Ficedula albicollis) and European starlings (Sturnus vulgaris) approach and enter nest boxes containing speakers playing male song (Eriksson and Wallin, 1986; Gentner and Hulse, 2000), and female zebra finches (Taeniopygia guttata) will peck a key to hear male song (Riebel, 2000). Young male zebra finches who are learning to sing will also peck to hear song (Adret, 1993), but in general, a phonotaxic effect of song is less pronounced in male songbirds than in females (Dobson and Petrinovich, 1973; Stevenson-Hinde and Roper, 1975).
Measuring behavioral responses is but one way to 
assess the effects of a signal on the receiver. Over the past decade, 
neuroimaging studies have identified at least 20 different brain regions
 that show altered BOLD or PET responses during music listening. Some of
 the most commonly reported responses, particularly to music that is 
pleasurable to the listener, are those of the mesolimbic reward system. 
This system consists of the ventral tegmental area (VTA) and its 
dopaminergic projections to several regions of the forebrain, for 
example the nucleus accumbens (nAc) in the ventral striatum. 
Release of 
dopamine in nAc occurs at precisely the time that intensely pleasurable 
autonomic responses, or “chills,” are experienced during music listening
 (Salimpoor et al., 2011). Although the release itself may not itself cause the experience of reward, it indicates that the stimulus is associated with reward (reviewed by Wise, 2004).
 Also included in the reward system are the dorsal striatum (e.g., 
caudate nucleus in humans), the heavily interconnected amygdala and 
hippocampus (Hp), and the prefrontal cortex. Each of these regions have 
been shown in multiple human imaging studies to respond to music with 
BOLD or PET responses (Blood and Zatorre, 2001; Koelsch et al., 2006; Mitterschiffthaler et al., 2007; Montag et al., 2011; Pereira et al., 2011; Salimpoor et al., 2011).
   
In this study we looked for neural responses to song 
in the avian homologues of music-responsive brain regions. Functional 
MRI can be used in songbirds listening to song (Van Meir et al., 2005; Boumans et al., 2007),
 but to date those analyses have focused primarily on the major auditory
 areas. The nAc and other areas known to respond to music in humans are 
difficult to study using this technique in songbirds, primarily because 
of their small size. Neural responses to stimuli can be more readily 
studied in birds by mapping the expression of immediate early genes 
(IEGs) such as Fos and Egr-1. In such studies, a stimulus is presented 
to an animal and the brain harvested 60–90 min later. The protein 
products of IEGs can then be labeled in fixed brain sections using 
immunohistochemistry, which provides cellular resolution. 
Dubbed the 
“genomic action potential” (Clayton, 2000),
 the IEG response indicates that a neuron has begun to respond to a 
stimulus with new protein synthesis related to synaptic remodeling. 
Although the IEG and BOLD responses make use of different underlying 
molecular mechanisms, there is good agreement between results obtained 
by both methods (Lazovic et al., 2005; Stark et al., 2006). In songbirds, for example, hearing song induces robust Egr-1 and BOLD responses in the auditory forebrain (Mello et al., 1992; Gentner et al., 2001; Van Meir et al., 2005; Boumans et al., 2007).
 Egr-1 is particularly useful in the study of reward because it appears 
to play an active role in the reward process. In rodents, Egr-1 is 
induced in the reward pathway by drugs such as methamphetamine, 
morphine, nicotine, or cocaine (reviewed by Girault et al., 2007). 
Blockade of Egr-1 prevents conditioned behavioral responses to these 
drugs, suggesting that Egr-1 not only marks neuronal responses to reward
 but is required for the acquisition of reward-reinforced behaviors.
   
In this study, we used Egr-1 as a marker to map and 
quantify neural responses in the mesolimbic reward system in male and 
female white-throated sparrows (Zonotrichia albicollis) listening to conspecific male song. This species sings a particularly musical-sounding song (Saunders, 1959) with heavy use of whistles with a sustained pitch (Dobson and Lemon, 1977). During the non-breeding season, song is used by both sexes to establish and maintain dominance relationships (reviewed by Maney and Goodson, 2011).
 During the breeding season, however, the message contained in song 
differs for male and female listeners. A female listening to male song 
is almost certainly being courted, whereas a male is being challenged by
 a territory holder or intruder. Song is therefore expected to have a 
more positive valence for females than for males. We predicted that 
neural responses to song in the females would resemble that of humans 
listening to liked music, whereas the pattern in the males would not.
   
The valence of song may be affected also by endocrine state. In Zonotrichia
 sparrows, females give a courtship display in response to song only 
when their plasma estradiol (E2) reaches breeding-typical levels (Moore, 1983; Maney et al., 2009). Males respond to song by singing back, and are more likely to do so if their testosterone (T) levels are elevated (Maney et al., 2009).
 Because the function of song, and behavioral responses to it, vary 
according to endocrine state, we manipulated plasma E2 in females and T 
in males in order to look at the effects on neural responses in the 
reward pathway. Following these manipulations, we exposed the birds to 
conspecific male song and quantified the expression of Egr-1 throughout 
the mesolimbic reward pathway. Because E2 treatment was expected to 
increase the valence of song, we predicted that responses would be 
greater in the E2-treated females than in untreated, non-breeding 
females. T-treatment was expected to lower the valence of an already 
negative stimulus, so we predicted little or no effect of T-treatment on
 the magnitude of mesolimbic reward responses in males.
   Materials and Methods
Animals
All research was conducted in accordance with National
 Institutes of Health (NIH) principles of animal care, federal, and 
state laws, and university guidelines. Twenty-three white-throated 
sparrows of each sex were captured in mist nets during fall migration 
and housed initially in mixed-sex aviaries at the animal care facility 
at Emory University. The sex of the animals was confirmed via PCR analysis of a blood sample (Griffiths et al., 1998). Birds were housed under a short day length (10L:14D) for at least 4 months (Maney et al., 2007, 2008). The day length remained the same throughout the study to prevent gonadal recrudescence and elevation of endogenous E2 and T.
   Hormonal Manipulation
Before the start of each experiment, birds were moved 
to individual cages (15″ × 15″ × 17″) inside walk-in sound-attenuating 
booths (Industrial Acoustics, Bronx, NY, USA). On the day of transfer, 
each bird received one subcutaneous silastic capsule (ID 1.47 mm, OD 
1.96 mm, Dow Corning, Midland, MI, USA) sealed at both ends with A-100-S
 Type A medical adhesive (Factor 2, Lakeside, AZ, USA). Females received
 12 mm capsules that were either empty (n = 11) or filled with 17β-estradiol (n = 12; Steraloids, Newport, RI, USA). Males received 15 mm capsules that were either empty (n = 11) or filled with T (n = 12; Steraloids). These doses elevate E2 and T to breeding-typical levels in this species (Maney et al., 2008, 2009; Sanford et al., 2010)
 and stimulated the E2-dependent courtship behavior known as copulation 
solicitation display (CSD) in this sample. After receiving the capsules,
 birds were housed in single-sex groups of 4–6 per booth for 7–9 days. 
All booths were identical.
   Stimulus Presentation
On the afternoon prior to stimulus presentation, each 
bird was isolated by placing its cage inside an empty sound-attenuating 
booth equipped with microphone, speaker, and video camera. The stimulus 
playback began at 1 h after lights-on the following morning and was 
delivered via the speaker located inside the booth. The type of 
stimulus (song or tones, see below) was balanced across treatment groups
 for both males and females such that six hormone-treated and six 
blank-treated birds heard song, and six hormone-treated and five 
blank-treated birds heard tones. The stimuli were presented at a peak 
level of 70 dB measured at the bird’s cage (Maney et al., 2008).
 The stimulus presentation was followed by 18 min of silence. Video 
recordings of all birds were made during the stimulus presentation. For 
the females, we counted copulation solicitation events, defined as tail 
lifts, wing quivers, or vocalizations characteristic of CSD (see Maney et al., 2003). For the males, we counted full and partial songs (see Maney et al., 2009).
   Sound Stimuli
Songs
White-throated sparrow songs obtained from the Borror 
Laboratory of Bioacoustics birdsong database were converted to AIFF 
format and background noise was removed. The recordings were edited so 
that a song was heard every 15 s, which mimics a natural song rate. 
Sequences of songs were then spliced together so that the identity of 
the singer changed to a novel male every 3 min. Presenting a variety of 
songs helps overcome habituation to the stimulus (Stripling et al., 1997).
 Each bird within a treatment condition (hormone or blank) heard 14 
different singers, in a unique order determined by a balanced Latin 
square, for a total stimulus duration of 42 min.
   Tones
For each of the 14 recordings of males singing, the 
frequency of each whistle (note) in one song was measured using 
AudioXplorer (Arizona Software, San Francisco, CA, USA). Songs usually 
contained five distinct frequencies. For each song, eight sinusoidal 
tones were generated at these frequencies and arranged in a random order
 200 ms apart, resulting in a tone sequence that matched the song in 
duration, the average number of onsets and offsets, and total sound 
energy at each frequency. Tone sequences were spliced together as for 
the song stimuli, with 15 s of silence between each sequence, in an 
order determined by a balanced Latin Square.
   Histology
Sixty min following the onset of the stimulus 
presentation, birds were deeply anaesthetized with isoflurane (Abbott 
Laboratories, North Chicago, IL, USA) and decapitated. Ovaries were 
inspected to confirm a regressed state. Brains were harvested, fixed, 
and sectioned at 50 μm as previously described (Maney et al., 2003, 2007).
 Every third 50-μm section was incubated with an antibody against Egr-1 
(cat# sc189; Santa Cruz Biotechnology, Santa Cruz, CA, USA), which was 
subsequently labeled using a biotinylated secondary antibody and 
avidin-biotin complex (Vector, Burlingame, CA, USA). The specificity of 
this antibody has been validated in this species via preadsorption studies (Saab et al., 2010). Labeling was visualized using diaminobenzidine enhanced with nickel (Maney et al., 2003, 2007). Sections were mounted onto gelatin-coated slides, dehydrated, and coverslipped in DPX (Sigma, St. Louis, MO, USA).
   Quantification of Egr-1 Immunoreactivity
Examples of Egr-1 labeling are shown in Figure 1.
 We sampled from within the avian homologues of the nAc, caudate 
nucleus, Hp, medial amygdala, and VTA. We also sampled within an area 
proposed as an avian homologue of the prefrontal cortex and which 
receives a strong dopaminergic projection (Mogensen and Divac, 1982; Waldmann and Güntürkün, 1993). The names and abbreviations of each region of interest (ROI) and their human homologues are given in Table 1.
 Egr-1 immunoreactivity (ir) was quantified in six sections, 150 μm 
apart, in the VTA and in three sections in each of the other regions. 
Egr-1-ir was quantified in these regions on one side of the brain, 
chosen at random except when that region was damaged on one side due to 
folding or tearing of the section; in these cases the intact side was 
chosen. Images were acquired with a 4× (nAc and TnA) or 10× objective 
(all other regions) using a Leica DFC480 camera attached to a Zeiss 
Axioskop microscope. The light level on the microscope was set exactly 
the same for each picture.
