Birdsong: is it music to their ears?

Since the time of Darwin, biologists have wondered whether birdsong and music may serve similar purposes or have the same evolutionary precursors. Most attempts to compare song with music have focused on the qualities of the sounds themselves, such as melody and rhythm. Song is a signal, however, and as such its meaning is tied inextricably to the response of the receiver. Imaging studies in humans have revealed that hearing music induces neural responses in the mesolimbic reward pathway. In this study, we tested whether the homologous pathway responds in songbirds exposed to conspecific song. We played male song to laboratory-housed white-throated sparrows, and immunolabeled the immediate early gene product Egr-1 in each region of the reward pathway that has a clear or putative homologue in humans. We found that the responses, and how well they mirrored those of humans listening to music, depended on sex and endocrine state. In females with breeding-typical plasma levels of estradiol, all of the regions of the mesolimbic reward pathway that respond to music in humans responded to song. In males, we saw responses in the amygdala but not the nucleus accumbens – similar to the pattern reported in humans listening to unpleasant music. The shared responses in the evolutionarily ancient mesolimbic reward system suggest that birdsong and music engage the same neuroaffective mechanisms in the intended listeners.

Introduction

Ornithologists and musicians alike have long contemplated whether the song of birds might somehow be classified as “music.” The question can be approached from a variety of angles, each of which produces a somewhat different answer. Researchers have asked, for example, whether birdsong and music share evolutionary precursors or functions (Darwin, 1871; Catchpole and Slater, 1995; Miller, 2001), tonal variation or rhythm (Dobson and Lemon, 1977; Slater, 2001; Baptista and Keister, 2005; Araya-Salas, 2012), or organization (Marler, 2001), and whether, like music, birdsong is creative (Marler, 2001; Hartshorne, 2008). Whether any particular species of songbird has music-like song depends on the parameter measured and the type of analysis employed.

Birdsong, hereafter referred to as song, is a signal; it has a sender and a receiver. Ultimately, a signal’s effect on the receiver, not its structure, dictates its meaning and function (reviewed by Scott-Phillips, 2008). When comparing song and music, it may therefore be informative to ask about the receiver’s response and subjective experience. Human listeners find music rewarding; they will approach it and work to hear it. Songbirds of many species likewise show a phonotaxic response to conspecific song. Female pied flycatchers (Ficedula albicollis) and European starlings (Sturnus vulgaris) approach and enter nest boxes containing speakers playing male song (Eriksson and Wallin, 1986; Gentner and Hulse, 2000), and female zebra finches (Taeniopygia guttata) will peck a key to hear male song (Riebel, 2000). Young male zebra finches who are learning to sing will also peck to hear song (Adret, 1993), but in general, a phonotaxic effect of song is less pronounced in male songbirds than in females (Dobson and Petrinovich, 1973; Stevenson-Hinde and Roper, 1975).

Measuring behavioral responses is but one way to assess the effects of a signal on the receiver. Over the past decade, neuroimaging studies have identified at least 20 different brain regions that show altered BOLD or PET responses during music listening. Some of the most commonly reported responses, particularly to music that is pleasurable to the listener, are those of the mesolimbic reward system. This system consists of the ventral tegmental area (VTA) and its dopaminergic projections to several regions of the forebrain, for example the nucleus accumbens (nAc) in the ventral striatum.

Release of dopamine in nAc occurs at precisely the time that intensely pleasurable autonomic responses, or “chills,” are experienced during music listening (Salimpoor et al., 2011). Although the release itself may not itself cause the experience of reward, it indicates that the stimulus is associated with reward (reviewed by Wise, 2004). Also included in the reward system are the dorsal striatum (e.g., caudate nucleus in humans), the heavily interconnected amygdala and hippocampus (Hp), and the prefrontal cortex. Each of these regions have been shown in multiple human imaging studies to respond to music with BOLD or PET responses (Blood and Zatorre, 2001; Koelsch et al., 2006; Mitterschiffthaler et al., 2007; Montag et al., 2011; Pereira et al., 2011; Salimpoor et al., 2011).

In this study we looked for neural responses to song in the avian homologues of music-responsive brain regions. Functional MRI can be used in songbirds listening to song (Van Meir et al., 2005; Boumans et al., 2007), but to date those analyses have focused primarily on the major auditory areas. The nAc and other areas known to respond to music in humans are difficult to study using this technique in songbirds, primarily because of their small size. Neural responses to stimuli can be more readily studied in birds by mapping the expression of immediate early genes (IEGs) such as Fos and Egr-1. In such studies, a stimulus is presented to an animal and the brain harvested 60–90 min later. The protein products of IEGs can then be labeled in fixed brain sections using immunohistochemistry, which provides cellular resolution.

Dubbed the “genomic action potential” (Clayton, 2000), the IEG response indicates that a neuron has begun to respond to a stimulus with new protein synthesis related to synaptic remodeling. Although the IEG and BOLD responses make use of different underlying molecular mechanisms, there is good agreement between results obtained by both methods (Lazovic et al., 2005; Stark et al., 2006). In songbirds, for example, hearing song induces robust Egr-1 and BOLD responses in the auditory forebrain (Mello et al., 1992; Gentner et al., 2001; Van Meir et al., 2005; Boumans et al., 2007). Egr-1 is particularly useful in the study of reward because it appears to play an active role in the reward process. In rodents, Egr-1 is induced in the reward pathway by drugs such as methamphetamine, morphine, nicotine, or cocaine (reviewed by Girault et al., 2007).

Blockade of Egr-1 prevents conditioned behavioral responses to these drugs, suggesting that Egr-1 not only marks neuronal responses to reward but is required for the acquisition of reward-reinforced behaviors.

In this study, we used Egr-1 as a marker to map and quantify neural responses in the mesolimbic reward system in male and female white-throated sparrows (Zonotrichia albicollis) listening to conspecific male song. This species sings a particularly musical-sounding song (Saunders, 1959) with heavy use of whistles with a sustained pitch (Dobson and Lemon, 1977). During the non-breeding season, song is used by both sexes to establish and maintain dominance relationships (reviewed by Maney and Goodson, 2011). During the breeding season, however, the message contained in song differs for male and female listeners. A female listening to male song is almost certainly being courted, whereas a male is being challenged by a territory holder or intruder. Song is therefore expected to have a more positive valence for females than for males. We predicted that neural responses to song in the females would resemble that of humans listening to liked music, whereas the pattern in the males would not.

The valence of song may be affected also by endocrine state. In Zonotrichia sparrows, females give a courtship display in response to song only when their plasma estradiol (E2) reaches breeding-typical levels (Moore, 1983; Maney et al., 2009). Males respond to song by singing back, and are more likely to do so if their testosterone (T) levels are elevated (Maney et al., 2009). Because the function of song, and behavioral responses to it, vary according to endocrine state, we manipulated plasma E2 in females and T in males in order to look at the effects on neural responses in the reward pathway. Following these manipulations, we exposed the birds to conspecific male song and quantified the expression of Egr-1 throughout the mesolimbic reward pathway. Because E2 treatment was expected to increase the valence of song, we predicted that responses would be greater in the E2-treated females than in untreated, non-breeding females. T-treatment was expected to lower the valence of an already negative stimulus, so we predicted little or no effect of T-treatment on the magnitude of mesolimbic reward responses in males.

Materials and Methods

Animals

All research was conducted in accordance with National Institutes of Health (NIH) principles of animal care, federal, and state laws, and university guidelines. Twenty-three white-throated sparrows of each sex were captured in mist nets during fall migration and housed initially in mixed-sex aviaries at the animal care facility at Emory University. The sex of the animals was confirmed via PCR analysis of a blood sample (Griffiths et al., 1998). Birds were housed under a short day length (10L:14D) for at least 4 months (Maney et al., 2007, 2008). The day length remained the same throughout the study to prevent gonadal recrudescence and elevation of endogenous E2 and T.

Hormonal Manipulation

Before the start of each experiment, birds were moved to individual cages (15″ × 15″ × 17″) inside walk-in sound-attenuating booths (Industrial Acoustics, Bronx, NY, USA). On the day of transfer, each bird received one subcutaneous silastic capsule (ID 1.47 mm, OD 1.96 mm, Dow Corning, Midland, MI, USA) sealed at both ends with A-100-S Type A medical adhesive (Factor 2, Lakeside, AZ, USA). Females received 12 mm capsules that were either empty (n = 11) or filled with 17β-estradiol (n = 12; Steraloids, Newport, RI, USA). Males received 15 mm capsules that were either empty (n = 11) or filled with T (n = 12; Steraloids). These doses elevate E2 and T to breeding-typical levels in this species (Maney et al., 2008, 2009; Sanford et al., 2010) and stimulated the E2-dependent courtship behavior known as copulation solicitation display (CSD) in this sample. After receiving the capsules, birds were housed in single-sex groups of 4–6 per booth for 7–9 days. All booths were identical.

Stimulus Presentation

On the afternoon prior to stimulus presentation, each bird was isolated by placing its cage inside an empty sound-attenuating booth equipped with microphone, speaker, and video camera. The stimulus playback began at 1 h after lights-on the following morning and was delivered via the speaker located inside the booth. The type of stimulus (song or tones, see below) was balanced across treatment groups for both males and females such that six hormone-treated and six blank-treated birds heard song, and six hormone-treated and five blank-treated birds heard tones. The stimuli were presented at a peak level of 70 dB measured at the bird’s cage (Maney et al., 2008). The stimulus presentation was followed by 18 min of silence. Video recordings of all birds were made during the stimulus presentation. For the females, we counted copulation solicitation events, defined as tail lifts, wing quivers, or vocalizations characteristic of CSD (see Maney et al., 2003). For the males, we counted full and partial songs (see Maney et al., 2009).

Sound Stimuli

Songs

White-throated sparrow songs obtained from the Borror Laboratory of Bioacoustics birdsong database were converted to AIFF format and background noise was removed. The recordings were edited so that a song was heard every 15 s, which mimics a natural song rate. Sequences of songs were then spliced together so that the identity of the singer changed to a novel male every 3 min. Presenting a variety of songs helps overcome habituation to the stimulus (Stripling et al., 1997). Each bird within a treatment condition (hormone or blank) heard 14 different singers, in a unique order determined by a balanced Latin square, for a total stimulus duration of 42 min.

Tones

For each of the 14 recordings of males singing, the frequency of each whistle (note) in one song was measured using AudioXplorer (Arizona Software, San Francisco, CA, USA). Songs usually contained five distinct frequencies. For each song, eight sinusoidal tones were generated at these frequencies and arranged in a random order 200 ms apart, resulting in a tone sequence that matched the song in duration, the average number of onsets and offsets, and total sound energy at each frequency. Tone sequences were spliced together as for the song stimuli, with 15 s of silence between each sequence, in an order determined by a balanced Latin Square.

Histology

Sixty min following the onset of the stimulus presentation, birds were deeply anaesthetized with isoflurane (Abbott Laboratories, North Chicago, IL, USA) and decapitated. Ovaries were inspected to confirm a regressed state. Brains were harvested, fixed, and sectioned at 50 μm as previously described (Maney et al., 2003, 2007). Every third 50-μm section was incubated with an antibody against Egr-1 (cat# sc189; Santa Cruz Biotechnology, Santa Cruz, CA, USA), which was subsequently labeled using a biotinylated secondary antibody and avidin-biotin complex (Vector, Burlingame, CA, USA). The specificity of this antibody has been validated in this species via preadsorption studies (Saab et al., 2010). Labeling was visualized using diaminobenzidine enhanced with nickel (Maney et al., 2003, 2007). Sections were mounted onto gelatin-coated slides, dehydrated, and coverslipped in DPX (Sigma, St. Louis, MO, USA).

Quantification of Egr-1 Immunoreactivity

Examples of Egr-1 labeling are shown in Figure 1. We sampled from within the avian homologues of the nAc, caudate nucleus, Hp, medial amygdala, and VTA. We also sampled within an area proposed as an avian homologue of the prefrontal cortex and which receives a strong dopaminergic projection (Mogensen and Divac, 1982; Waldmann and Güntürkün, 1993). The names and abbreviations of each region of interest (ROI) and their human homologues are given in Table 1. Egr-1 immunoreactivity (ir) was quantified in six sections, 150 μm apart, in the VTA and in three sections in each of the other regions. Egr-1-ir was quantified in these regions on one side of the brain, chosen at random except when that region was damaged on one side due to folding or tearing of the section; in these cases the intact side was chosen. Images were acquired with a 4× (nAc and TnA) or 10× objective (all other regions) using a Leica DFC480 camera attached to a Zeiss Axioskop microscope. The light level on the microscope was set exactly the same for each picture.

Sarah E. Earp and Donna L. Maney

*Department of Psychology, Emory University, Atlanta, GA, USA

Frontiers

7 de janeiro de 2013