Home | Chinese | English

Exchanges

Professor Martin Weisser Lectured on Advanced Corpus Linguistics through Simple XML (Tools)

time: 2014-06-13 16:31:49
Professor Martin Weisser Lectured on Advanced Corpus Linguistics through Simple XML (Tools)
 
     
    Prof. Weisser first introduced XML (extensible Markup Language), which highlights linguistic information and shares similarities with SQL database, SGML and HTML, though SGML and HTML have different functionalities. XML tagging consists of "head tag" and "tail tag" and tags are pair of brackets. The head tag can include attribute names such as ID and attribute values. One advantages of XML is customization, which can meet various linguistic highlighting. Current tools of corpus annotation and transformation are quite hard to learn and prone to err. Therefore, Prof. Weisser proposed a "simple XML" approach. It has the following benefits: (1) uses fewer nested elements; (2) relegates more information to attributes; (3) avoids excessive meta-data in headers; (4) keeps enclosing tags and text separate; (5) improves readability; and (6) improves editability. Then Prof. Weisser briefly introduced seven corpus software, available freely in his personal website. They involve "XML tagging and concordancing", "speech act annotation and research", "text feature extraction", and "phonetic transcription".