Mediamatic Magazine vol.7#1 Remko Scha 1 Jan 1992

Virtual Voices (1)


The new digital media technologies, which are now being developed, are often imitative technologies. Future generations may end up viewing the twentieth century as the century of abstraction, and the now imminent turn of the century as the moment of a return to a mimetic aesthetics. The imitation of nature is once again a widely pursued artistic ideal. Sometimes this concerns the imitation, not of the way things look, but of the processes that constitute life, body, or mind.


Vaucanson’s Digesting Duck - Virtual Voices (1) - published in Mediamatic Magazine Vol. 7#1 (1992)


Wolfgang von Kempelen “speaking machine” - Virtual Voices (1) - published in Mediamatic Magazine Vol. 7#1 (1992)

Mimesis is then called artificial life, robotics, or artificial intelligence. But in other cases, it concerns the classical ideal of the perfect simulation of the surface of things. Then it is called ray tracing, paintbox, digital photography, virtual reality.

Music exists between the poles of mathematical abstraction and pure physics. Imitation is not an issue there, one might think, but nothing could be further from the truth. After the failure of 'real' electronic music, which used sinusoids, square waves, noise and modulators to build sound sculptures that people don't particularly want to listen to, there is now an avalanche of digital electronic technologies that simulate the sounds of conventional instruments in great detail, and make them accessible for keyboards and computers with midi interfaces.

Artificial speech synthesis is an imitative technology, which is closely connected with music. However, its relation with language also lends this medium an entirely unique character. This article explores the history, the techniques and the aesthetics of this medium.


Language is a matter of symbols. The conceptualization and abstraction of human experience.
Music is a matter of physics – not so much because music is usually realized by means of sound, but rather because precisely the structural properties of music (such as metre, rhythm, harmony, melody) are based on physical phenomena.

Between language and sound: speech. Between mind and matter: the voice.

Roland Barthes: ''Listen to a Russian bass (...): something is there, manifest and persistent (you only hear that), which is past (or previous to) the meaning of the words, of their form (the litany), of the melisma, and even of the style of the performance: something which is directly the singer's body, brought by one and the same movement to your ear from the depth of the body's cavities, the muscles, the membranes, the cartilage, and from the depths of the
Slavonic language, as if a single skin lined the performer's inner flesh and the music he sings.''

1 Roland Barthes 'Le Grain de la Voix', in: L'obvie et l'obtus, Paris 1982 (English translation: 'The Grain of the Voice', in: The Responsibility of Forms. Critical essays on Music, Art and Representation. New York 1985, pp. 269/270)

To Copy / To Fake

Within the technology of voice imitation, two approaches are usually distinguished: the genetic approach and the gennematic one. The genetic method imitates the physiological processes that generate speech sounds in the human body. The gennematic method is based on the analysis of the speech sounds themselves, and reconstructs these sounds without considering the way in which the human body produces them.
The speaking machines of the eighteenth-century were based on the genetic principle: the hardware of the larynx and the oral cavity was reconstructed in a stylized way. If such an imitation is faithful enough, the sounds it generates resemble the sounds of human speech.
In the twentieth century, we see an entirely different approach: digital technology which calculates the shapes of sound signals and then uses loudspeakers to make them audible. The voice is no longer imitated, but its output is faked. The algorithm computes signals that evoke the image of a physical process that never occurred.
The eighteenth-century automaton is a mechanical body, a piece of clockwork claiming the qualities of life. In twentieth-century computer simulation, the mechanics is abstract, the machine dissolves into mathematics. The body has disappeared.

To copy

The impulse of classical sculpture was not representation, but imitation. A life-size, coloured, three-dimensional model is not a model, but a copy. The master sculptors of classical mythology even managed to duplicate the human body in sculptures that did not only show perfect likeness, but that could also speak and move naturally. In Chinese and Germanic mythology, carpenters and silversmiths displayed similar skills, building treacherously seductive female automata.

The essential step on the road from myth to technology was taken in the seventeenth century. The idea that living organisms function according to the laws of physics, and could in principle be simulated by means of mechanical constructions, is then no longer a vague, alarming suspicion, but a scientific hypothesis. In the early seventeenth-century, Descartes presented the thesis that animals are in fact machines.

Thomas Hobbes: Nature, the art by which God hath made and governs the world, is by the art of man, as in many other things, in this also imitated, that it can make an artificial animal. For seeing life is but a motion of limbs, the beginning whereof is in the principal part within; why may we not say that all automata (engines that move themselves by springs and wheels as doth a watch) have an artificial life? For what is the heart but a spring, and the nerves but so many strings; and the joints but so many wheels giving motion to the whole body, such as was intended by the artificer?

2 Thomas Hobbes Leviathan 1651 (Harmondsworth, Middlesex 1968)

In the seventeenth and eighteenth centuries, the construction of automata which imitate bodily functions of man or animal was extremely popular: the development of clockwork technology had made it possible to realize much better imitations than before; and the theories of the Cartesians lent a philosophical interest to such enterprises. Thus, there were dolls that could walk or talk, write letters, or play the flute; birds that could flap their wings, tweet, eat, drink and shit. There is a curious similarity between this kind of automaton building and present-day Artificial Intelligence. In those days, too, the capacities of the most advanced technologies of the moment were exploited with the goal of imitating the outward appearance of certain aspects of human behaviour; then, too, this
resulted in products which aroused everybody's interest, because they could be regarded as technological experiments, as biological models, as philosophical existence proofs, as art, or as entertainment.

This is illustrated by the career of Jacques de Vaucanson, one of the most well-known automaton builders of the eighteenth-century. His automata were amusing and astonishing exhibits in popular fairs, while their mechanisms were published in learned scientific articles - by the designer himself, and also by Diderot and D'Alembert in their Encyclopédie. The production of the automata also generated interesting technological spin-offs. Eventually, Vaucanson became an innovative organizer in the textile industry, and built the most advanced silk-spinning factory of that time. He used the technology of his automatic flute player for the design of the first programmable loom, which would later become the basis for Jacquard's work.

L'Homme Machine

Human beings are emphatically excluded from Descartes' reasoning about the mechanical character of animals. He links mechanism with the absence of emotions and the absence of consciousness. Apparently he has no difficulty in viewing animals in this way, but that people could be machines is more of a problem.

The Cartesian philosopher Cordemoy formulates the argument against the mechanizability of humans in terms of the idea of an automatic speech machine: ...although I see clearly that a purely mechanical apparatus could utter a few words, I know at the same time that the springs which distribute the air or open the tubes that let out the voices display a certain order between each other, which they could never change. So that, from the moment the first voice sounds, the voices that usually follow must necessarily follow as well – that is, if the machine still has sufficient air. Contrarily, the words which I hear being uttered by bodies such as mine, are rarely pronounced in the same order.

3 G. de Cordemoy Discours physique de la parole Paris, 1666

From this point of view, the richness of language is connected with the typically human capacity of free will, which is intrinsically incompatible with the rigidity of a clockwork. To account for free will, Descartes provides the - otherwise mechanical - human body with an interface to the immortal soul. He situates this interface in the pineal gland – a small gland with an unknown function, located in the hypothalamus.
A hundred years after Descartes, the idea that human beings are machines too was explicitly defended after all. In L'Homme machine, La Mettrie argues that ''all capacities of the soul are to such an extent dependent on the right organization of the brain and the entire body, that apparently they are nothing but this
organization itself. In this theory, there is a material identity between body and soul; therefore, the necessity of a mind-body-interface''has disappeared.
Between man and animal, there are only differences in degree. The Cartesian arguments concerning the mechanical character of animals now apply directly to human beings, but at the same time their meaning changes profoundly. Mechanism no longer implies non-consciousness: consciousness itself is mechanical.

4 Julien Offray de la Mettrie: L' Homme machine.Leyden 1748

If the more ambitious automaton builders of this period had based their research programmes on La Mettrie's viewpoint, they would have invented something like today's Artificial Intelligence: a discipline aimed at creating actual demonstrations of mechanized mental processes. But clockwork technology was not suitable for such a purpose. Therefore, this step could not be made until the middle of the twentieth century, when the electronic computer became available.
This is also why Cordemoy's argument was so plausible in his days. No one could foresee the virtually unlimited switching flexibility, which would be introduced by Von Neumann's stored program computer. Software is mechanics, which is capable of dynamic reconfiguration. Programs are virtual clockworks with self-modifying and self-extending capacities. Although these capacities are limited by the finiteness of the hardware on which the programs are implemented, in practice we can often ignore these limitations. Compared to a clockwork, the computer realizes a qualitatively superior complexity and flexibility. With this invention behind us, we must now forever view the limits of the mechanizable as unknown and open-ended.

Talking Heads

The first serious speech machines were developed by eighteenth-century automaton builders who were engaged in mechanical simulation of the bodily functions of man and beast. At this time, the sound of speech was not yet viewed as a phenomenon which could be analyzed and reconstructed. Speech simulation was imitation of the act of speaking. Artificial bodies were created, which could blow out air and thereby make the air vibrate; the fidelity of the artificial speech generated in this way depended on the accuracy with which the relevant features of the human body were reproduced.
Like human beings, these machines had 'vocal cords', which vibrate when air is forced through. The precise functioning of the human vocal cords was not yet known at this time. To imitate them, the machine-builders used the principle of the harmonium: an air tube is closed off by a flexible metal tongue, which moves under pressure to let the air through and is consequently set into vibration. The tongue was often covered in leather to dim the high tones slightly.
As with a reed organ, this vibration was then conveyed to the air via a resonance chamber – which in this case was made to resemble the human mouth as much as possible. Depending on the exact shape of the resonance chamber (that is, the position of the mouth), various vowels could be generated. Depending on the way in which the air stream was started or stopped, or obstructed by constricting the outlet, various consonants could be formed.


Abbé Mical's Tetes Parlantes - Virtual Voices (1) - published in Mediamatic Magazine Vol. 7#1

As Cordemoy had argued already, independently functioning machines of this kind could only deliver a limited repertoire of texts. Because speech simulation proved far from easy, in practice this came down to rather small numbers of words or sentences, which would be hardwired into the machine. For this reason, speech machines were often designed as instruments instead – machines which could generate all the sounds that are needed to pronounce any given text, but which could only pronounce an actual text if operated by a technical expert who determined which sounds were produced at which moment. Descartes' solution, one might say: a mechanical machine driven by human consciousness; a body controlled by a mind.
In 1778, for example, Wolfgang Von Kempelen designed a machine, which directly imitated the functioning of the oral cavity. The operator squeezes a pair of bellows to press the air, via 'vocal cords', into a resonance chamber, which he modulates with both hands. The various vowels are created by changing the shape of the resonance chamber with one hand; the consonants are produced as the other hand opens or closes this chamber in various ways.

A lung and vocal-cord prosthesis, which makes it possible to use the hands as a mouth. Technological perversion of speech.
The 'vowel organ'.

5 This is a relatively recent machine (built at the Institute of Phonetic Sciences of the University of Amsterdam), but its method of operation definitely belongs to the eighteenth-century tradition.

The same principle, but in this case it is a carrousel of different sound cavities; a fan of vowels. A laboratory instrument operated by a technician by means of switches, wheels, foot pedals.
As a result of the technician's actions the vibrating air is sent to one resonance chamber or another and such a chamber is opened or closed in a variety of ways. Thus, by a succession of separate interventions, the technician realizes, one by one, the phonetic elements of the language expression to be pronounced.
Human speech is a continuous process. In this mechanical simulation, there is no such continuity. What we hear is phonology: the discrete combinatorics of linguistics.

Joop van Brakel on the 'vowel organ': language shattered into meaningless fragments. Slapstick, merriment, music. Language regressing to animal sounds. Cackling, bleating, barking. (There once was a time when all speech was song.)

The vowel organ has ingenious 'artificial vocal cords'. A hollow cylinder with a slit in it continually rotates within another hollow cylinder, also with a slit in it. The result: a slit-shaped opening is opened and closed continually. The air is pressed through this opening. If we use this technique to set the air in motion
without providing a connection to an 'artificial oral cavity' in which the air can resonate, what you hear is a fart. Is that the sound that underlies all speaking?
Other speech machines create an even greater separation between the operating technician and the material production of sound: they insert a keyboard-interface. Abbé Mical's Têtes Parlantes (1783) and Joseph Faber's Euphonis (1840) belong in this category. Speech machines for entertainment. The designer also acted as operating technician and as variety artist, ventriloquist: he puts a puppet on stage and tries to create the illusion that it really speaks.
Here, the laboratory apparatus has become a musical instrument, with an interface which enables the virtuoso performer to add natural dynamics and timing to the mechanical speech utterances, and to compensate as much as possible for the limitations of technology.
Thus, one of Mical's contemporaries writes about the Têtes Parlantes:// With a little practice and agility, we will be able to speak with the fingers as with the tongue, and we will be able to give the language of the heads the speed, the calm, and in short all the qualities that a language can possess which is not animated by passions.


Poyet, Louis. Kratzenstein's resonators - Virtual Voices (1) - published in Mediamatic Magazine Vol. 7#1 (1992) Louis Poyet

6 Avec un peu d'habitude et d'habileté, on pourra parler avec les toigts comme avec la langue, et on pourra donner au langage des têtes la rapidité, le repos et toute la physionomie enfin que peut avoir une langue qui n'est point animée par les passions. From a letter by Antoine de Rivarol, 1783 ( Oeuvres complètes de Rivarol, Part III, Paris 1808, p. 207) See: Jens-Peter Köster Historische Entwicklung von Syntheseapparaten zur Erzeugung statischer and Vokalartiger Signale nebst Untersuchungen zur Synthese deutscher Vokale. (Historical development of synthesis machines for generating static and vowel-like signals and research into the synthesis of German vowels) Hamburg 1973, p. 85. On p. 95, Köster also quotes another part of this letter: If these heads were multiplied in Europe, they would raise terror in all those Swiss and Gascon language teachers, whose influence has infected all countries and who disfigure our language for the peoples who love it. Köster comments: Here lie the roots of the use of technological tools in foreign language teaching.''

On the keyboard of the Têtes Parlantes, you present a text as you would play a musical score on a piano.

See for continuation of this text: ''Virtual Voices (2)