Jewish World Review
http://www.jewishworldreview.com | (KRT) A new computer program can determine the sex of an author by detecting subtle differences in the words men and women prefer to use.
For instance, female writers tend to choose grammatical terms that apply to personal relationships, such as "for" and "with," more frequently than men do.
"Women have a more interactive style," said Shlomo Argamon, a computer scientist at the Illinois Institute of Technology in Chicago who developed the program. "They want to create a relationship between the writer and the reader."
Men, on the other hand, use more numbers, adjectives and determiners - words such as "the," "this" and "that" - because they apparently care more than women do about conveying specific information.
Argamon said the intent of male writers often was to say: "Here's something I want to tell you about, and here are some things about it."
Women, he found, write the pronoun "she" more often than men do, although both sexes use "he" about equally.
Argamon said it wasn't clear what psychological or sociological differences between men and women might explain their different writing styles. "It's a subject for further research," he said.
Other experts, such as Deborah Tannen, a linguistics professor at Georgetown University in Washington, have popularized the idea that men and women have different communications styles. But Argamon's work is the first to show such distinctions in writing.
"This is surprising, since, unlike conversation, writing a book or an article does not involve direct social interaction," he said.
Argamon claimed his program correctly determined the sex of the author in 80 percent of the works it checked. One it missed was A.S. Byatt's best-selling novel, "Possession." The computer said it was written by a man; Byatt is a woman. On the other hand, Michael Frayn's science fiction tale, "A Landing on the Sun," was misidentified as the work of a woman.
Argamon's gender program is part of a much broader technique called "stylometry," which analyzes styles not only of writing, but also of music, graphics, art and architecture.
A practical application of stylometry, he said, would be to identify writers of anonymous communications, such as the Unabomber, on the basis of their writings. The Unabomber, whose 17-year terrorism spree ended in 1995, was identified as Theodore Kaczynski only after his 35,000-word manifesto was compared with his known writings by his brother, David.
Similarly, Donald Foster, a professor of literature at Vassar College in Poughkeepsie, N.Y., unmasked political columnist Joe Klein as the anonymous author of the popular Clinton-era novel "Primary Colors." Without using a computer, Foster laboriously compared the style of the book with Klein's other writings.
Boulder, Colo., prosecutors hired Foster in 1998 in an effort to identify the writer of the ransom note in the unsolved JonBenet Ramsey murder case. He reportedly determined that a woman wrote the note, but authorities refused to confirm that.
For years, scholars have debated whether William Shakespeare wrote a 17th-century play called "Two Noble Kinsmen."
"These computer techniques may eventually be able to provide us with answers to these kinds of questions," Argamon said.
To carry out his project, Argamon and colleagues analyzed the texts of 566 British books and articles, both fiction and nonfiction, taken from a huge computer database known as the British National Corpus.
From that mass of almost 20 million words, a computer program called WINNOW extracted 1,081 distinctive "features," such as prepositions, pronouns and adjective phrases. It checked the use of different verb forms such as "go" and "going." It even counted punctuation marks such as dashes and exclamation marks.
After running repeatedly through these features, the computer winnowed the list down to 128 significant contrasts. The results showed that the words favored most heavily by men were what grammarians call determinative words such as "the," "a," "as," "that" and "one." Female writers favored "she" and relationship words such as "for," "with," "in," "and" and "not."
When Argamon then tested his program on other texts, it succeeded 80 percent of the time in identifying the sex of an anonymous writer.
Argamon and fellow researchers Moshe Koppel and Anat Shimoni published a report on their work in the April edition of the journal Literary and Linguistic Computing.
"This paper has presented convincing evidence of a difference in male and female writing styles in modern English books and articles," Argamon concluded. "Such a difference is sufficiently pronounced that it can be exploited for automated text classification with accuracy of approximately 80 percent (and higher in some cases)."
To see the list of 566 books and 1,081 "features" used in this study, go to:
Appreciate this type of reporting? Why not sign-up for the daily JWR update. It's free. Just click here.
Comment by clicking here.