Automatic Control of Simple Language in Web Pages

Published: Jun 27, 2007

At the 10th ICCHP conference, Constantin Jenge, Sven Hartrumpf, Hermann Helbig and Rainer Osswald from the department of Computer Science, Fern University, Hagen, Germany, presented their paper on Intelligent Information and Communication Systems (IICS).


 

The paper highlighted the need for an assistive technology that simplifies language in Web documents to make it accessible by people with cognitive or reading disabilities. The team presented the architecture of a tool that is meant for automatic testing of Web Content Accessibility Guidelines (WCAG) with the aim to remove accessibility barriers for people with varied reading disabilities.

The team has developed a catalog for linguistic accessibility, which is psycho-linguistically motivated and testable with hi-tech natural language processing technology. They have classified five levels of linguistic description: Morphology, Lexicology, Syntax, Semantics and Discourse.

Morphological compounds can be long and complex, making them hard to read. Abbreviations and acronyms make the text unclear. These should be explained the first time they appear and also in a glossary. The abbreviation and acronym elements should be used in HTML documents.

Either lexical ambiguity should be avoided as far as possible or the intended meaning should be easy to identify. Use of expressions of unusual register or specific domains should be minimized. If used, definitions or explanations need to be provided, depending on the intended audience. Second language users will have minimum knowledge of idioms, phrases and support verb constructions, and so these should be limited. Too many groups of nouns, adjectives, or adverbs increase syntactical complexity. These should be kept to the minimum to decrease density of information.

There are likely to be a number of propositions or conceptual entities in a text unit. Semantic complexity has to be constrained, keeping this in mind. Reducing reference ambiguity and reference distance renders clarity to text. Discourse coherence is deeply related to readability. Sentence connectors, theme  structure and proper use of pronouns go a long way in improving readability. De-Lite, an authoring and evaluation tool that checks and highlights readability problems on webpages, detects pronouns and noun phrases that do not have antecedents.

This assistive technology solution has numerical indicators, which are defined in linguistic units. These indicators encode number information like word tokens, word types, lemmas, propositions and concepts in a sentence. The indicators have been chosen considering that the dependence on the chosen form of syntactic and semantic representation is lessened to a minimum. Currently 50 indicators have been devised.

The team from Fern University is planning to carry out a thorough evaluation of this assistive technology product, which would include the extension to English language from the current German. A facility to adjust indicator parameters automatically will also be designed. Moreover, the system is being developed so that it automatically adapts itself to specific domains.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.

Back to top