“Language” is discovered but “languages” are invented

July 23, 2018

Say I speak only English and you speak only Japanese. We meet and we

  1. have the desire to communicate, and
  2. attempt to communicate by speech

We hear only gibberish. We cannot decode the sounds we hear to discern any meanings. We do not have a shared language. But our communication is not doomed to failure. What we do share is

  1. that we both have language,
  2. the inferred knowledge that each of us does have a specific language,
  3. the knowledge that we are lacking an agreed vocabulary of signals (sounds, symbols….) representing meanings and an agreed  structure for combining these signals when we transmit and receive them from each other.

We have both already discovered language. What we lack is a shared language. With time and application and given that we each know that the other is both aware of, and capable of language, we can invent a shared vocabulary and an acceptable common grammar. We can invent a particular “Jinglish” for our communications.

That two or more brains can communicate if they have a shared system for the encoding of meanings into signals, which signals can then be transmitted and received and decoded into their meanings, is not an invention but a discovery.

The subsequent development of a specific agreed upon system – a specific language – is then invention. English and Japanese and Braille are invented. Hieroglyphs and alphabets and emojis are invented. Paintings on cave walls, impressions on clay tablets, writing on papyrus or palm leaves or on paper, are all inventions. They are invented to implement communication because it has been discovered that communication of meanings by transmitting and receiving signals has been discovered.

When children “acquire language”, as they do even without any instruction, they do so by absorbing it from their surroundings. Japanese surroundings produce a Japanese-speaking child, not one speaking French. A child acquiring language represents a voyage of discovery – not one of invention. It is actually a voyage of many discoveries; of the possibility of communication, of the ability and the need to communicate, of converting meanings into intelligible signals, of decoding signals and of the specific language it is surrounded by. It is the discovery that sounds can be generated and that some sounds can become speech. The child’s need or desire to communicate is no doubt enabled by its genes. Its ability to produce sounds or gestures or other signals to represent meanings is also governed by its biology and its genes. It is the physiology of the bodies we inhabit which allows speech and whistles and gestures but the limitations of our physiology prevent us from generating or sensing or using infra-sound or ultra-sound. Bluetooth capability is not embedded in our bodies but we can, and do, manufacture adjuncts to our bodies which are Bluetooth enabled.

The specific comes first and then leads to the general. “Languages” is to “language” as the special theory of relativity is to the general theory. As Euclid’s geometry leads to general geometries. It is the invention of specific languages which leads to the general definition of the concept of language.

Language has been called the greatest human invention. But it is a discovery and not an invention. It is what makes us human, it has been said. But that is far too homocentric (anthropocentric) a view. Language exists not because humans exist, but because brains desirous of communicating exist. On Earth it happens to be humans. It is not necessary that the communicating brains be of humans, or of individuals of the same species, or even that the brains be contained in living entities.

With dissimilar brains (whether of individuals of different species or between humans and AIs) it is not language in general that is the problem. It is finding a specific, shared set of signals that can be generated, transmitted and received and a specific language (vocabulary and grammar) which can then be used which poses the challenge. Limitations are set not by the concept of language but by

  1. the capability of the brains to generate meanings,
  2. the codification of meanings into signals, and
  3. the capability of generating, transmitting and receiving the signals

To invent and share a specific language with dogs or horses, the challenge is first in generating signals which can be received by the animals and second in receiving and decoding the signals they generate. Maybe if we used pseudo-tails with our dogs and pseudo-ears with our horses to send signals we might have a higher level of success. And when we meet our nearest aliens who “speak” to each other in bursts of X-rays we should not assume that they are backward because they don’t speak English.

Language: A shared system whereby two or more brains can communicate by the encoding of meanings into signals, which signals can then be transmitted and received and decoded back into their meanings.



Language transcends its encoded signals

July 19, 2018

My phone “talks” to my desktop computer. It can also “speak” with other devices with which it is “paired” (portable speakers, my lawn mower and my house security system). Coupled devices send and receive short-wavelength UHF radio waves in the ISM band (Bluetooth) to communicate. They follow rules (a vocabulary and a grammar) which specify the “meaning” of the bursts of radio waves they send and detect. I cannot detect any of these signals with my senses. I am neither aware of the communication taking place nor can I enter the conversation except through a compatible device within my control and with which I can communicate using a system which is within the range of my sensory capabilities (touch, vision, sound).

Does the system of signals being used by the bluetooth devices for their communications constitute a language?

There is a vast discourse, starting from ancient times, on the definition and the purpose and the philosophy of language. The Encyclopedia Britannica puts it thus.

Many definitions of language have been proposed. Henry Sweet, an English phonetician and language scholar, stated: “Language is the expression of ideas by means of speech-sounds combined into words. Words are combined into sentences, this combination answering to that of ideas into thoughts.” The American linguists Bernard Bloch and George L. Trager formulated the following definition: “A language is a system of arbitrary vocal symbols by means of which a social group cooperates.” Any succinct definition of language makes a number of presuppositions and begs a number of questions. The first, for example, puts excessive weight on “thought,” and the second uses “arbitrary” in a specialized, though legitimate, way.

I find that much of the discussion is homocentric and tends to equate language with speech and writing. This I think is incorrect. I have therefore come to my own characterisation of what constitutes a language:

I find it is not necessary to specify that language is confined to human brains. It is claimed that the difference between human and animal communication is that human language is unrestricted.

EB again – “Human beings are unrestricted in what they can communicate; no area of experience is accepted as necessarily incommunicable, though it may be necessary to adapt one’s language in order to cope with new discoveries or new modes of thought. Animal communication systems are by contrast very tightly circumscribed in what may be communicated”. 

But this is unsatisfactory. Human thought is not in fact unlimited. It is limited by the very finite capability of the human brain. What a brain cannot perceive it cannot think about. What it cannot think about, it cannot communicate. Furthermore, the system agreed-upon restricts the meanings that can be transmitted and received. (A communication in French is of limited value to someone who knows little French. It is the lowest common level of shared encoding in the system which sets the constraint).

I also find the debate on language and thought, and language and philosophy, to be very often circular. It may be simplistic but I observe that the logic we perceive to exist in the universe is the same logic we embed in all our languages (including mathematics). We cannot then use language to prove or disprove the logic that is within it.

As in Gödel’s Incompleteness theorems: “The first incompleteness theorem states that in any consistent formal system F within which a certain amount of arithmetic can be carried out, there are statements of the language of F which can neither be proved nor disproved in F. According to the second incompleteness theorem, such a formal system cannot prove that the system itself is consistent (assuming it is indeed consistent).”

Which I paraphrase to be that “in a language embedded with a logic, that language can neither prove or disprove the logic that lies within it”.

I observe that we have more thoughts and emotions and perceptions than we have language for. We perceive more colours than any language we invent can describe. Which convinces me that thought precedes language. Moreover, it is the logic we perceive around us that we then build into the languages we invent. It cannot be, I think, that language circumscribes thought. It is our thoughts generated by our perceptions of what is around us that circumscribes the languages we invent.

Our senses come into play first in determining the meanings we wish to communicate. They then determine the shared system of encoding meanings into signals capable of being generated and detected. Our perception of a tree (vision/brain) is encoded into a particular sound (“tree”) which is generated (vocal chords) and detected and decoded by somebody else (aural/brain) and understood – according to the shared system of encoding – to mean a tree. The choice of encoding system is arbitrary but is primarily a matter of convenience. We use vision, sound and touch as a matter of convenience. We do not use olfactory signals because we cannot – at will – generate as great a range of smells as of sound. Besides, vision and sound can transmit signals across much greater distances than smells can. Sound can be transmitted in the dark. We do not have the capability in our bodies of generating or detecting radio waves or X-rays or infra-red radiation as encoded signals of meaning except through the use of specialised, instruments manufactured for the purpose. But if we had the same organs as bats do, we could use ultrasound signals in our languages. Our senses enable a convenient encoding of meanings into signals. Equally the limitations of our senses restrict the range of signals that we can generate and/or detect.

So my bluetooth devices do communicate with each other but the range of meanings they can transmit or receive are heavily circumscribed. They have not the freedom to express meanings which have not been predefined. They cannot initiate a conversation but can follow an instruction to do so. They do not have language.

But what is clear is that while language is a shared. agreed-upon system for encoding meanings into signals for the purpose of communication, language transcends its signals. While human language is mainly manifested as speech and writing, we also use sign-language and Braille and songs and music and art and dance within our languages. Photography and video are now part of the encoding we use in our languages. If we had organs for radio transmission and reception, we would no doubt have a word for “tree” but it would be expressed as a burst of radio-waves rather than a pressure wave or an image of a tree. Language is the system of conveying meanings where speech and writing and hand-signals are just specific forms of encoding. Language is a system which transcends the encoded signals it uses.


Justice is just a derived concept

April 20, 2018

Many of our fundamental concepts are not in fact fundamental. They are entirely dependent upon and derive from the negation of other concepts. We are all prisoners of our genes, our bodies, our beginnings and our planet. As a concept “freedom” is meaningless without first defining what captivity means. The concept of freedom is not self-sufficient and derives from some concept of captivity which must come first. Similarly, justice derives from a definition of injustice.  Fighting for justice is a misnomer since it always consists of fighting against some injustice. Equality by itself is almost meaningless. It first requires a definition of inequality. Even in the language of mathematics an equality relies on a prior definition of inequality. Bright opposes dark and each relies on and derives from the other.

Other concepts can live on their own and are not merely negations of some other concepts. Even though they lie on the same scale and may oppose each other they refer to some separate norm as a reference and can live independent lives. Happiness has its own scale (as does unhappiness). The concept of beauty does not require the definition of ugly. Liking and disliking and love and hate can all live on their own. Rich and poor lie on the same scale but each refers to a norm and so they are not dependent upon each other. Rich describes a surplus relative to some norm and poor is a deficiency. Wealth and poverty refer to a norm but not necessarily to the other.


“In triplicate” is being forgotten

April 14, 2018

More than half the world now does not know what a “carbon copy” means.

“Cut and paste” has been used for a very long time with manuscripts but really took off after the advent of the photo-copier.

Seven years ago I posted about the origins of “in triplicate”. At that time a Google search for “triplicate forms” generated over 3.5 million hits. This morning it generated less than 2 million.

Why “in triplicate”? – one for me, one for you and one for Rome

I have a vague recollection that I was once told that it was connected to the use of “carbon paper”  where the quality of the writing was insufficient after the second carbon (third copy). The word “triplicate” is said to have a 15th century origin in Middle English and comes from Latin (triplicatus). There is also a suggestion that pharmacists and their predecessors required 3 copies of everything but I am not clear as to why.

But my preferred story is that the Romans are responsible. It is not inconceivable that Roman administrators in their far-flung empire outposts first started doing things in triplicate.

 “One for me, one for you and one for Rome”.

Or it could just be the mystic, magical power of the number 3!!


The now is ever, never

December 18, 2017

The now, of course, can not, has not, does not and will not ever exist.

Then can refer to the past or the future but never to the now – which does not exist.


Adages updated: The pen is mightier after the sword

August 16, 2017

The wisdom of yesteryear is not necessarily wisdom today.


There is a cognitive limit (the Wordsmith number) to the number of words you can know?

August 5, 2017

Most people know around 20,000 – 35,000 words (in any language). Extremely gifted people – very rarely – may approach a vocabulary of 60,000 words. Even multi-lingual people seem to have a total vocabulary not exceeding the limits of mono-lingual people. Twenty years ago when I lived in Japan, my English conversations included many words which I no longer have in my active memory. Similarly Chinese, Hindi, Tamil and German words that I once used regularly as part of my social conversations in English, are no longer in my active memory.

But why does each of us know so few words of all the words that are available?

It cannot be memory capacity in the brain that sets the limit. My hypothesis is that just like there seems to be a cognitive limit to the number of significant social connections a person can maintain (the Dunbar number – averaging around 150 with a minimum of around 50 and a maximum of perhaps 250), there is a cognitive limit (the Wordsmith Number) to the size of the active vocabulary that a person can maintain. (I note that the number of Facebook friends or Twitter followers do not represent significant social relationships).

The more you read the bigger your vocabulary. The more you write the more likely you are to have a larger vocabulary. The more diverse your social connections the larger vocabulary you need and have. But yet, each of us knows only a fraction of the active words available in any language. The active words in a language form only a fraction of the total words in that language. And the total words in a language are a tiny fraction of all the words that could be formed by an alphabet a and a set of rules.

In any language, the rules of grammar together with about 2,000 base words would be sufficient to get by.  In any language a degree of proficiency would have been achieved with a vocabulary of around 10,000 base words. Over 20,000 words would be considered a high level of fluency.

The number of words needed to enable most communication needs is thus not so large. Equally, knowing words that are not used is pointless. Words that others don’t know is of no great use either. Yet, we have all at some time complained of  “not having the words to express our feelings”. We are often “lost for words”. Our eyes can distinguish shades of colour for which there are no specific words. But we use adjectives and combine words to express emotions or shades of colour rather than invent specific words for just that shade or that emotion.

In any alphabet where the length of a word is not restricted, there are an infinite variety of ways of creating combinations of letters to be words. In practice most languages have working vocabularies of a few hundred thousands and even if all possible variations and forms, past and present, are counted, the vocabulary may be around one million words. The Oxford English Dictionary has around 177,000 words as being in current usage and another 50,000 as obsolete. Similarly German has around 150,000 words as being in current use and Swedish has around 125,000. However current usage is not the whole story. Current usage is only a part of the total number of words available in a language where the total number depends on the age of the language. It is said that Japanese has around 100,000 active words in a total vocabulary of around 500,000. The OED estimates the total number of words in English to be around 750,000. Other estimates put the total English vocabulary at just over one million words.


According to the Global Language Monitor’s (GLM) “English Language WordClock,” there are 1,005,366 words in the English language. …… The Google/Harvard Study of the Current Number of Words in the English Language also arrived at a similar number — 1,022,000 (a difference of .o121%) ……… The Oxford English Dictionaries (OED) comes up with an estimate of 750,000, when counting only distinct senses and excluding variants.

The number of words that any person knows in a language is also not so easily determined. I would generalise to say that all modern languages have each around 100 – 200 thousand active words with a total vocabulary depending upon the age of the language and ranging from 300,000 to about 1 million. But, in most extant languages today, any single individual generally has a personal vocabulary which is only around 10 – 20% of the active words (or 2 – 5% of the total number of words) available in that language. An exceptionally gifted person might come up to around 30% of active words (or less than 10% of the total number of words). Depending on how words are defined Shakespeare is thought to have had command over about 8% of all the English words of that time but only used about 4% in an all his writings. In modern times James Joyce is thought to have had an extraordinarily large personal vocabulary and perhaps it was even a little more than 10% of the total number of English words. Ulysses alone – by one count – contains a larger vocabulary than all of Shakespeare’s works.

According to lexicographer and Shakespeare scholar David Crystal, the entire English vocabulary in the Elizabethan period consisted of about 150,000 words. ……… Crystal believes that Shakespeare had a vocabulary of about 20,000 words (13.5% of the known lexicon). Compare that to the size of the vocabulary of the average modern person (high school-level education) that is 30,000 to 40,000 words (about 6% of the 600,000 words defined in the Oxford English Dictionary). Other lexicographers estimate that Shakespeare’s vocabulary ranged from 18,000 to 25,000 words.

….. In their 1976 study, “Estimating the Number of Unseen Species: How Many Words Did Shakespeare Know,” statisticians Bradley Efron (Stanford University) and Ronald Thisted (University of Chicago) used word-frequency analyses to predict more accurately Shakespeare’s actual vocabulary, including the words he used in his writing (active or manifest vocabulary) and the words he knew but didn’t use in his writing (passive or latent vocabulary). Efron and Thisted turned to the Harvard Concordance and the 31,654 different words from a grand total of 884,647 words, including repetitions. …….. Thus to calculate Shakespeare’s total working vocabulary, we add 31,534 different words found in his writings to the 35,000 words he probably knew, to arrive at an estimate of 66,534 words. 

Taking only current words in English as an example (< 200,000) , most individuals considered fluent would have between 25,000 – 40,000 words in their personal vocabularies. (There may be the extremely rare person with a personal vocabulary approaching 60,000 words, though that is doubtful. But there is surely nobody with a personal vocabulary greater than that). Even for those who are multilingual, the sum of the words they command in all languages seems to be limited to be no different to those who are monolingual.


the rate and pace of development of the bilinguals’ lexical knowledge were similar to those of monolingual children. In addition, the total vocabulary count of these children (taking into account both languages) was not different to that of the monolinguals, but their single language vocabularies were somewhat smaller. So we have known for some time that bilingual children do have as many words as their monolingual counterparts when both languages are taken into account but maybe not so when one examines only one language.

Why this apparent limit to the number of words one can know?


My hypothesis is that there is a stable level – the Wordsmith Number – which the brain establishes. It is a cognitive limit to the size of the active vocabulary that a person can maintain. It is established by the manner in which the brain learns, stores and retrieves active and passive words. It is a dynamic level and varies as our activities change (reading, writing, speaking, diversity of social relationships ..). Words that are not active are shunted out of active memory. In very rare circumstances is a Wordsmith Number of greater than about 30,000 established.


Known, unknown and unknowable

July 22, 2017

Donald Rumsfeld was often the butt of cheap jokes after this quote. In reality, Rumsfeld was absolutely spot on and close to philosophic.

Starting from where Rumsfeld left off we come to the distinction between the knowable and the unknowable

These are things we don’t know that we don’t know. There are knowable unknowns. That is to say, there are things that we could know but we don’t know which we don’t know. But there are also unknowable unknowns. There are things we cannot know that we don’t know that we can never know. 

a la Rumsfeld

I am coming to the conclusion that the sum of all human cognition lacks some of the dimensions of the universe. It may be increasing with time, but human cognition is limited. The expanding universe may be infinite or it may be boundless. For human cognition to grasp the universe is then like trying to measure an infinite length with a ruler of finite length, or of trying to measure some unknown parameter with a ruler marked in inches. Those measurements will never reach a conclusion.


Logic is discovered, language is invented

July 9, 2017

Logic is inherent in the universe. It is not a creation of man and is not dependent on observation or what kind of brain perceives the universe.

The laws of logic are taken to be unchanging over space and time. Logic now, is as logic was, and as logic will always be. Logic here, is as logic is there and everywhere.

Language, however, is invented. All languages (including mathematics or chemical notation or Boolean algebra or …..) must have a structure which is compliant with the logic of the universe it is used to describe. We perceive a logic in the universe and express it through the inbuilt logic of our language(s). We use the one to describe the other and they are both the same.

How not?



June 29, 2017

It is one of the worst feelings one can experience. To have reality intrude rudely on illusions one has cherished.

And the worst of the worst is when it is another person who is the disillusionment. When somebody turns out to be not quite what they seem to be.


