Zusammenfassung
The YPA project is building a system to make the information in classified directories more accessible. BT's Yellow Pages®1 provides an example of classified database with which this work would be useful.
There are two reasons for doing this: (i) directories like Yellow Pages contain much useful but hard-to-access information, especially in the free text in semi-display advertisements; (ii) ...
Zusammenfassung
The YPA project is building a system to make the information in classified directories more accessible. BT's Yellow Pages®1 provides an example of classified database with which this work would be useful.
There are two reasons for doing this: (i) directories like Yellow Pages contain much useful but hard-to-access information, especially in the free text in semi-display advertisements; (ii) more generally, the project is a demonstrator for exploitation of semi-structured data — data that is less systematic than database entries or logical clauses, but more systematic than free text because it has been marked up, for display or some other purpose.
Accessing the directory source data file requires both natural language processing (for softening the interface to the system, and separately for analysis of natural-language-like constructs in the data) and information retrieval techniques, which are assisted by shallow knowledge. Deep world knowledge is impractical.
The project seeks to get maximum effect from conveniently simplified approximations of standard natural language processing and knowledge representation. The paper gives an overview of the system, and illustrates its style with points about how the source data file is analysed. The YPA requires further development, but already demonstrates the effectiveness of shallow processing applied to semi-structured data.