Overall

The WordCorr Project

[ Home ] [ Background ] [ Technical ] [ SourceForge Project Page ] [ Download ]

[ Overall ] [ General Plan ] [ Broader Implications ]

Overall

Objectives
Significance
Relation to Long Term Goals
Relation to Present Knowledge

Objectives: Comparing languages systematically is the most accurate way to firm up our understanding of language diversity and what it tells us about relationships of peoples. Yet in this century some endangered languages and cultures will disappear forever. Making accurate comparisons on a broad scale before the data pass out of reach calls for bringing linguistics and information technology together to augment what individual scholars are now able to accomplish in a whole lifetime, by enabling them to go beyond what they can do using traditional data management practices.

The WordCorr project explores one means of creating that partnership with technology, and in addition enhancing collaboration among linguists. The "advances in language-information technology, such as documentation and comparison of linguistic diversity" that the National Science Foundation's Information Technology Research program calls for will have an impact on the way linguists organize and conduct comparative research in the future, and will at the same time broaden the possibilities for education in linguistic science.

During the first year the focus will be on building and testing a standalone version of WordCorr that can be used by linguists in the field. The second year will see the team-oriented Internet version developed and tested.

Top of page

Significance: The Principal Investigator has been in touch with enough comparative endeavors to respect the enormous amount of work done by dedicated practitioners, but also to know something about the gaps and guesswork that still need to be filled in by solid research. He explored quantification of comparative results in Grimes and Agard 1959, and was Consulting Editor for the Ethnologue from 1974 to 2000 (Grimes 1995b), compiler of the Ethnologue Language Family Index for 1993, 1996, and 2000, one of the Language Identification Editors for the 1992 and 2002 editions of the Oxford International Encyclopedia of Linguistics, a contributor to the Comparative Austronesian Dictionary (1995c), and a member of the Linguistic Society of America's Committee on Endangered Languages and their Preservation from 1997 to 2001.

As he observed the snail's pace at which comparative research often proceeds, he got the idea of sorting out differences between the kind of analytical judgments that linguists usually make quite rapidly, and the meticulous bookkeeping they have to do to keep track of all the crisscrossing implications of those judgments. From that he put together an experimental set of relational data structures suitable for an information technology application with enough capacity that teams of scholars anywhere in the world can use the Internet and standard data management techniques to tackle language families of any size, with computing time linear in the number of speech varieties being compared. Having such an infrastructure for research will do several things:

Replace conjectures about language relationships with demonstrations of relationship backed by evidence.
Augment the rate at which teams of linguists can trace and document language relationships, including those of endangered languages.
Allow conflicting hypotheses about how language families may have developed to be tested simultaneously without confusion.
Make research results available to scholars and to the public at large as soon as investigators reach closure on their analysis, if they desire to circulate them at that time.
Enrich the dialogue within linguistic science by allowing collaborating groups of comparative linguists to share information and discuss it collegially via the Internet.
Assist teachers of graduate and undergraduate linguistics in teaching the principles and practices of comparative linguistics.
Attract smart high school and college students into linguistics by helping them discover on line how interesting language is.
Allow informed citizens to discover for themselves the intricacy and design of languages they might have been taught to regard as "inferior."
Contribute materially to the shared data archives currently being developed by the worldwide linguistics community.

Top of page

Relation to long term goals: The Principal Investigator's interest in language comparison was launched in 1954, when he tabulated the most regular correspondences between Huichol and Cora, neighboring Uto-Aztecan languages in Mexico, from field data. Through personal contact with comparative linguists such as Morris Swadesh, Robert E. Longacre, Charles F. Hockett, Frederick B. Agard, and later the Austronesian Circle at the University of Hawai`i, he watched the field develop through the latter half of the twentieth century. He also branched out into investigating inherent intelligibility among speech varieties (1974), though his major scholarly interests were focused on discourse and the lexicon.

As one of the first linguists to use computers in connection with field work, beginning in 1960, he became aware of their potential for managing the complexities of comparative linguistic data. He initiated a project that rounded up available word lists that linguists had collected but never gotten around to processing, having the greatest success in Africa but getting some from Asia and the Americas. This eventually became the core of the Cornell-SIL-Hawai'i collection.

But collecting data is only a step towards science. The reason why some linguists contributed word lists was that they realized they had little hope of exploiting their own data in their lifetime. It took too long to tabulate everything before they could begin to put together generalizations. The Principal Investigator couldn't help them with tabulation at that stage either. But later he was able to define data structures that could be used to automate the frustrating parts of the process and allow linguists to focus on the comparisons, not on finding mislaid file slips or recreating forgotten hypotheses.

The real long term goal is to demonstrate what the linguistic relationships are within all the world's language families. Whether all languages can be integrated into a single family, as some think, or whether the evidence fuzzes out well this side of that, as others believe, depends upon a lot of scholars doing a lot of very detailed yet creative work, preferably in a much shorter time than the three centuries since the Dutch started pointing out regular differences among Malay varieties in the East Indies. Once WordCorr becomes available as an international vehicle for team-based research, the unthinkable just might become doable.

Top of page

Relation to present knowledge: Comparative and historical linguists look at much more than comparative phonology -- they also examine evidence for morphological, syntactic, and semantic change. Nevertheless, comparative phonology is where most scholars begin, and some spend most of their time on it. That is because the greatest precision in techniques of analysis is there, training begins there, the results are most clearly explained to nonspecialists, and arguments based on detailed handling of masses of phonological data are easier to assess than arguments from the other areas. This project is concerned directly with data handling for comparative phonology.

In that context, the computer application facilitates the best practices of comparative linguistics: extensive, detailed tabulation of sets of correspondences among phonological segments and the relationship of each set to nearby sets. Without a fairly sophisticated computational tool to manage that kind of complexity, it is easy to lose sight of part of the data. This project helps practicing linguists to concentrate on the patterns that explain the data.

Top of page

[ Overall ] [ General Plan ] [ Broader Implications ]

[ Home ] [ Background ] [ Technical ] [ SourceForge Project Page ] [ Download ]

For problems or questions regarding this web contact khamasak@users.sourceforge.net.
Last updated: Jan 01, 1970

Sponsors: