connectionism
From:
Brainwise, pp. 293-302, by Patricia Churchland
Mindware, ch. 4, by Andy Clark
and other sources
I. Introduction
A. As we have seen, eliminative materialism is so-called because it seeks to eliminate folk psychology
B. folk psychology is closely linked to the sentential view of how knowledge is represented
C. although many philosophers would agree that there are problems with the sentential view of knowledge, they were unwilling to give it up lacking an alternative concept of representation
D. Churchland thinks that an alternative view has arisen in the 1980s: connectionism
E. connectionism makes use of artificial neural networks that are supposed to be based on the actual structure of the brain, although actual brains and neurons are much more complicated than artificial neural nets and connectionist units (MW 62)
II. Neuroscience Background (from other sources)
A. Structure of Neurons
1. each has a (more or less) long output fiber called an "axon"
2. these branch out at the end to make synaptic connections with the dendrites of the next neuron
B. Function of neuron
1. each neuron receives inputs from many others, which are either
a. excitatory
b. inhibitory
2. the level of activation of each neuron is a function of:
a. the number of connections
b. their weights (size)
c. whether each is excitatory or inhibitory
d. strength of the signal
3. the output along the axon is a series of spikes, ranging from 0 to 200 hertz
III. Artificial Neural Networks
A. the way networks encode information is quite different than the way traditional symbol systems do it (MW 66)
1. knowledge not represented in declarative statements
2. rather, in the structure of connections among units and their weights (MW 66)
3. one and the same connection can be part of a large number of different representations – for example, the representations of a black house cat and a black panther
4. this is called “distributed” processing because the information is not located at one point but distributed over the pattern of synaptic connections and weights
B. Artificial neural networks simulate natural neurons with artificial ones, as shown e.g. in Figure 7.16 in BW, p. 301
1. for simplicity's sake, incoming signals are connected directly to the cell body rather than to dendrites
2. contribution of each input is simply is strength times its synaptic weight
C. Artificial neural networks transform vectors in a way that is believed to be similar to the way in which the nervous system does
1. there are two levels of vector-to-vector transformation (see fig. 7.16 in BW, p. 301 again)
a. first level: inputs are connected to "hidden units"
b. second level: hidden units are connected to outputs
2. the distribution of synaptic weights is what determines the vector transformation function
IV. A Relatively Simple Example
A. elsewhere, Paul Churchland describes a neural network computer that uses sonar echoes to distinguish submarine mines from rocks
B. the problem is that there is a large variety of rocks and mines, with a large variety of hard-to-distinguish echoes
C. to solve it:
1. we record sonar echoes from known mines and rocks of various types
2. we then put each recorded echo through a spectral analyzer, which measures the energy at, say, 13 sample frequencies
3. we then introduce a neural network with 13 input units, 7 (e.g.) hidden units, and 2 output units (one for rocks, one for mines), with a total of 105 synapses (7x13 + 2x7)
4. at the beginning, we do not know what weights to assign the synapses, so we distribute them randomly
5. we then "train up the network" by adjusting these weights to give us the correct output for the recorded echoes of the known mines and rocks
a. Using an algorithm for learning through the back-propagation of error
b. The computer, working outside the modeled network, calculates the difference between the vector we obtained and the vector we wanted
c. and then use these results to calculate necessary changes in the weights
D. eventually, the system converges on a configuration of synaptic weights that will allow the system to distinguish mine echoes from rock echoes in a fairly reliable manner – about 90% of the time
1. we can think of the learning rule as pushing the configuration of synaptic weights towards an error minimum in an n-dimensional space – see fig 7.16a, BW p. 301
2. once the system is trained up, we can then test it with new echoes
E. when the sonar system is being trained up, we can think of it as partitioning a 7-dimensional space (for 7 hidden units) into two regions: one for rocks, one for mines – see figure 7.16c, BW p. 301
1. there will be a central region in each space for prototypical mines and rocks -- echoes in those regions will returns an output vector close to <1,0> or <0,1>
2. those near the dividing surface are ambiguous and return output vectors like <.6,.4>
V. NETtalk
A. provides a good example of a system with more than a binary output
B. takes 7-letter printed segments as inputs, gives vector codings of phonemes for outputs
1. attached to a sound synthesizer, yields pretty good English after it's trained up
2. even though it is not given any of the usual rules of English phonetics
3. see MW 63ff for details
C. some results (box 4.3, MW pp. 69-70)
1. partitioned its hidden-unit vector space into 79 sub-spaces, corresponding to 79 phonemes in spoken English
a. it’s important to keep in mind, however, that the hidden-unit vector space partitions do not play any role in the system's computations
b. that is, the system only works on the synaptic weights; there is nothing in the system that "sees" this vector space
2. by successive pairing by similarity and averaging, Sejnowski and Rosenberg showed that these 79 phonemes ultimately cluster into 2 groups: one for consonants, one for vowels
D. there are limitations:
1. it doesn't understand what it's reading, of course
2. the system has trouble with words when their correct pronunciation depends on meaning or grammar – e.g., “John read the book but he would not read it to his little sister.”
VI. Face net (BW 293 ff. See figure 7.12, p. 294)
A. Analyzes photographs of people’s faces
1. not known exactly how we humans do this
2. but this artificial network shows how a neural network might recognize faces
B. 3 layers (BW 294-95)
1. input: 64 x 64 pixels or 4,096 input units
2. 80 hidden (BW 294-95)
3. 8 output
C. when trained up on 64 photos of 11 different faces and 13 photos of non-faces, it can:
1. distinguish faces from non-faces
2. distinguish male faces from female
3. respond with the “name” for any of the faces it was trained up on
4. achieved 100 percent accuracy on 11 images in training set; on the same faces under different lighting, in different angles, etc., achieved 98 percent (BW 297)
5. on new faces, achieved 100 percent on face/nonface, 81 percent on gender
D. hidden units each correspond to a prototypical face (BW 298)
E. boundaries between regions in hidden unit vector spaces are actually rather fuzzy (BW 300; cf. fig. 7.15, p. 299)
VII. Second generation connectionism (MW 68 ff)
A. more recent connectionist systems add a temporal dimension, so that sequences and motions can be recognized
B. Elman’s network, for example (MW 70)
1. exposed to grammatically proper sequences of words
2. given a partial sequence, its job is then to predict the next word (MW 70)
3. learns to recognize nouns vs. verbs, animate vs. inanimate, foods, breakable objects (MW 70-71)
C. Clark also briefly discusses third-generation or dynamic connectionism, which adds more neurobiologically realistic features to the simple units and weights, including: (MW 72)
1. special purpose units
2. more complex connections
3. time delays, etc. (q.v.)
VIII. Discussion (MW 73)
A. connectionism and mental causation
1. folk psychology assumes that functionally discrete, meaningful states play a role in the causation of other mental states and behavior. Or, in more simple language, that individual beliefs can function as discrete causes of specific actions (MW 73-74)
2. but this is not the way that connectionist networks work, because information is not stored in the same way (MW 74)
3. Example: the network that is trained to give yes/no answers to questions like “dogs have fur,” “fish have fur,” etc.
a. in a distributed network, many of the same connections and weights will be involved in answering both questions (MW 74)
b. so the answers to these questions is distributed over the whole network
c. and there’s a sense in which your knowledge that fish have gills causes your answer to the question whether dogs have fur – causal holism (MW 74-75)
d. and if we compare this 16 proposition network to a 17 proposition network that shares the belief that dogs have fur, this commonality between them will be invisible at the level of connections and their weights (MW 75)
4. three possible responses to this issue:
a. the incompatibility with folk psychology is merely apparent and further research will reveal the brain analogues to the beliefs and desires of folk psychology
b. the folk are not all that committed to the propositional attitudes anyway
c. folk psychology is wrong and should be rejected
5. Clark adds that we should not be focused on the level of connections and weights. There may be higher-level descriptions of connectionist networks, such as the way in which different NETtalk networks, with different distributions of synaptic weights, all distinguish vowels and consonants in the same way(MW 75)
6. and as for the causal holism worry, it is not the case that every connection is involved in every answer to a question
B. Systematicity (MW 76)
1. critics of connectionism like Fodor and Pylyshyn argue that connectionism cannot explain the systematicity of thought
a. the systematicity of thought is explained by analogy with the systematicity of language
1.) if you can say John loves Mary, you can also say Mary loves John
2.) the reason we can do this is because language is made of separate parts that can be rearranged according to the rules of syntax and grammar (MW 76)
3.) but we can also think John loves Mary and Mary loves John (MW 77)
4.) so thoughts must be made up of parts in the same way as language
b. but connectionism lacks this sort of inner structure (MW 76)
c. so connectionism is false (MW 76)
3. Clark describes two replies to this argument (MW 77)
a. whether there are connectionist models that can explain systematicity is an empirical question. People are currently working on this problem (MW 77-78)
b. downplay the important of systematicity
1.) the systematicity of human thought may derive from the structure of human language
2.) Fodor and Pylyshyn hold that not only human but non-human thought is systematic – and they think we can explain this in terms of animals being symbol-processors (MW 78-79)
3.) but it is not clear to Clark that, say, a lion that can think “I want to eat that puppy” can also think “That puppy wants to eat me.” (79)
C. biological reality: although any model must simplify in order to explain, this simplicity raises three problems for connectionism: (MW 79)
1. the use of artificial problems, and the choice of input and output representations (MW 79)
a. although the systems learned to solve their problems, what they learned depended on inputs that experimenters chose to give them
b. systems were given tasks like producing past tense of verbs
c. or like balancing blocks on a beam
1.) output required no real motor action
2.) inputs of weights and distances also artificial (MW 79-80)
d. one might argue that in science one must always simplify (MW 80)
e. but the worry is that these simplifications might obscure how real organisms solve problems in real environments
2. artificial neural nets are relatively small compared to the brain, and are designed to tackle distinct problems
a. in biological systems, the same network is involved in many tasks
b. solutions that work well for small systems do not always scale up:
1.) e.g. speech recognition networks have trouble with several people speaking at once
2.) one solution might be networks of networks, with smaller networks each trained up, say, for children’s voices, female voices, male voices (MW 81)
3. most artificial neural nets are not connected with the details of real neuroscience research
a. real neural systems have properties not found in articial neural nets, such as the way in which the diffusion of a gas or chemical over a wide area can affect response (MW 81)
b. also, more attention needs to be paid to the different structures in the brain and what they are for
D. ultimately, Clark thinks that AI needs to go beyond connectionism and consider such things bodily action, the use of artifacts, interaction with environment, external symbol structures (MW 82)