Forensic Phonetics: Issues in speaker identification evidence
Andrew Butcher (2002)
Centre for Human Communication Research
Flinders Medical Research Institute
Flinders University, Adelaide, Australia
The field of forensic phonetics has developed over the last 20 years or so and embraces a number of areas involving analysis of the recorded human voice. The area in which expert opinion is most frequently sought is that of speaker identification – the question of whether two or more recordings of speech (from suspect and perpetrator) are from the same speaker. Automated analysis (in which Australia is a world leader) is only possible where recording conditions are identical. In the most frequently encountered real-world forensic situation, comparison is required between a police interview recording and recordings made via telephone intercepts or listening devices. This necessitates a complex procedure, involving auditory and acoustic comparison of both linguistic and non-linguistic features of the speech samples in order to build up a profile of the speaker. The most commonly used measures are average fundamental frequency and the first and second formant frequencies of vowels. Much work is still needed to develop appropriate statistical procedures for the evaluation of phonetic evidence. This means estimating the probability of finding the observed differences between samples from the same speaker and the probability of finding those same differences between samples from two different speakers. Thus there needs to be an acceptance that the outcome will not be an absolute identification or exclusion of the suspect. By itself, your voice is not a complete giveaway.
Effects of voice disguise on speaking fundamental frequency
Hermann J. Künzel (2000)
Department of Phonetics, University of Marburg
Patterns of voice disguise1 in forensic cases involving speaker identification or speaker profiling may contain clues to features of the undisguised voice of a speaker. In a longitudinal and synchronous study, 100 subjects were asked to read a text on five occasions during a period of six months, first using their normal voices, and subsequently with two out of three modes of voice disguise, (1) raising fundamental frequency, (2) lowering fundamental frequency, (3) denasalization by firmly pinching their nose. The focus of this investigation is on fundamental frequency (F0). Results show that most subjects were in fact able consistently to change their F0 according to the mode of disguise they had selected. However, there were differences between both sexes with regard to their preference of disguise modes as well as to the individual articulatory ‘strategies’ which they employed to implement them. Results corroborate experience with forensic casework, that is, they show that there is a constant relation between the F0 of a speaker’s natural speech behaviour and the kind of disguise he will use in an incriminating phone call. Speakers with higher-than-average F0 tend to increase their F0 levels. This process may or may not involve register changes from modal voice to falsetto. Speakers with lower-than-average F0 prefer to disguise their voices by lowering F0 even more and often end up with permanently creaky voice. The latter trend can be observed much more clearly in males. Females are generally more reluctant to make drastic changes to their fundamental frequency patterns.
KEYWORDS speaker identification, voice disguise, fundamental frequency, synchronousaspects, longitudinal aspects
Issues in transcription: factors affecting the reliability of transcripts as evidence in legal cases
Helen Fraser (2003)
School of Languages Cultures and Linguistics, University of New England
This article considers the reliability of transcripts used as evidence in court, especially transcripts of poor recordings. Background information about human speech and speech perception is presented, and the implications of this information for the use of transcripts of different kinds in legal contexts is considered. Finally, recommendations are made to allow judgement of the reliability of existing transcripts, ensure that newly created transcripts are reliable, and to ensure that transcripts are presented to a jury appropriately.
KEYWORDS transcription, forensic phonetics, human speech perception, transcriptreliability
A recent voice parade
Francis Nolan (2003)
University of Cambridge
An account is given of a case in which a voice parade contributed significantly
to prosecution evidence. A witness had overhead his landlord arranging for another younger man to set fire to a house (where a fire later that night resulted in a woman’s death), and claimed to know the voice. A voice parade was constructed using composite samples from this suspect’s interview tapes, and, as foils, composite samples from police interviews with similar young men from the London Asian community. The witness identified the man from the voice parade, and also recognized him in a visual parade. This, together with other evidence, resulted in both men being convicted. The paper outlines the problems involved in picking foils from the interview tapes supplied by the police, discusses the format and conduct of the resulting parade including the question asked of the witness, and summarizes challenges in court to the fairness of the parade. In conclusion ways are suggested in which the procedure might be streamlined and its reliability improved.
KEYWORDS Forensic speech analysis, earwitness identification, line-up parade, voice description.
Digital audio recording analysis: the Electric Network Frequency (ENF) Criterion
Catalin Grigoras (2005)
National Institute of Forensic Expertise, Bucharest, Romania
This article reports on the Electric Network Frequency Criterion as a means of assessing the integrity of digital audio evidence. A brief description is given of
phenomena that determine ENF variations. In most situations, to reach a non-authenticity opinion, the visual inspection of spectrograms and comparison with an ENF database are enough. A more detailed investigation, in the time domain, requires short time windows measurements and analyses. The stability of the ENF over geographical distances has been established by comparison of synchronized recordings made at different locations on the same network. A real case is presented, in which the ENF Criterion was used to investigate an audio file created with a secret surveillance system. By applying the ENF Criterion in forensic audio analysis, one can determine whether and where a digital recording has been edited, establish whether it was made at the time claimed, and identify the time and date of the registering operation.
KEYWORDS Electrical Network Frequency Criterion, forensic audio, forensic
acoustics, audio authentication, digital audio recordings
GSM interference cancellation for forensic audio: a report on work in progress
Philip Harrison (2001)
J P French Associates, England
A central aspect of forensic phonetic casework concerns the transcription of noisy recordings. An increasing problem in this area of work is the contamination of recordings with interference caused by radio transmissions from GSM mobile phones.
Transmitting phones emit short duration radio-frequency pulses at a rate of 217 Hz.
The induced interference signal contains the 217 Hz fundamental and a large number
of harmonics that overlap the frequency range of speech, and therefore severely
degrade speech intelligibility. Listener fatigue is increased due to the harsh sound of the interference, and overall the transcription of such audio samples is problematic. This paper describes the progressing development of a filter to assist the forensic phonetician in carrying out the transcription of such contaminated recordings.
KEYWORDS forensic audio, transcription, GSM interference, adaptive filter
The ‘Mobile Phone Effect’ on Vowel Formants
Catherine Byrne* and Paul Foulkes** (2004)
*University of Sheffield, **University of York
This study analyses the effect of mobile phone transmission on vowel formant frequencies, based on the study presented by Künzel (2001). Six male and six female speakers read a short passage into a mobile phone. Two simultaneous recordings were made, one at the far end of the phone line and the other via a microphone directly in front of the speaker. Measurements of F1, F2 and F3 were taken from between 15 and 25 stressed vowels per speaker in both sets of recordings. Due to the filtering effect of the phone transmission, F1 frequencies for most vowels were found to be higher than their counterparts in the direct recordings. The overall effect of the mobile phone on F1 frequencies was considerably greater than the landline telephone effect found by Künzel (2001): on average the F1 values in the mobile condition were 29 per cent higher than in the direct condition. On the whole F2 measures were not significantly affected, in line with Künzel’s findings. F3 frequencies were also generally unaffected by the mobile phone transmission. Exceptions were found, however, particularly for individual speakers with relatively high F3s. In these cases the mobile recordings tended to yield significantly lower values. The consequences of measurement errors arising from the different recording conditions are discussed with reference to forensic speaker identification.
KEYWORDS formant analysis, mobile phone transmission, forensic speaker identificationGörüntülenme: 1857