Arboreal MWN

Documentation

Language Analysis

Arboreal MWN v0.8 (02/08/2013)

In this documentation unless otherwise noted "Arboreal" refers to "Arboreal MWN."

Document Info

To analyse an XML file with Arboreal, open the file with the Arboreal editor. Then click on the document info button in Arboreal toolbar.

Show Document Info

A wizard opens that lets you define document specific information. On the first page of the wizard, you can specify the following information:

  • Title:
    You can enter a title.
  • Semantic units:
    Define how the semantic units of you XML document are tagged (e.g. "s" if you semantic units are sentences). This can be a comma separated list. Check "Respect namespaces" and add namespace prefixes if you want Arboreal to use only tags of a specific namespace (e.g. "arch:s").

Click "Next" to get to the next page. Here you can specify how Arboreal should determine the language of a word. Arboreal will based on your document try to guess the value of these fields.

  • Language definition by attribute:
    Here you can specify what attributes defines the language of the contents of a tag (e.g. "lang"). This can be a comma separated list. Check "Respect namespaces" and add namespace prefixes if you want Arboreal to use only attributes of a specific namespace (e.g. "xml:lang").
  • Language definition by tag:
    This field specifies what tag should be used to define the overall language of a document (e.g. "lang"). This can be a comma separated list. Check "Respect namespaces" and add namespace prefixes if you want Arboreal to use only tags of a specific namespace (e.g. "xml:lang").

Click "Next" to get to the last page. Here you can specify what language the document is in. Arboreal will try to guess the language based on the values you've entered on the previous page. Click "Finish" to store the data you've entered.

Analyze terms

You can send your document to be analyzed by the language analysis service of the MPIWG. If successful Arboreal can show you the lemma of you words and you can create term lists based on these lemmata. However, depending on the size of your document the analysis might take a while.

To start the analysis click on the "Analyze Text" button (penguin button) in the Arboreal toolbar.

Analyze Text

A wizard will open. First select the nodes you want to analyze (all children of the selected nodes will be automatically analyzed as well). Click next. The next page will ask you for how Arboreal should determine the language of the contents of a node. If you already specified that via the Document Info wizard, the previously entered values will appear here. The same is true for the next page, where you have to define the overall language of the document. Click "Finish" to start the analysis.

The analysis will be running in the background. To see the current progress of the analysis click on the icon in the bottom right corner of Arboreal.

See progress

A popup message will let you know when the analysis is done. To see the results of the analysis, go to an analyzed node in the Arboreal editor. Select the text contents of a node and click on a word in the contents view of the Arboreal editor. If the word could be analyzed the view "Word Info" on the bottom of Arboreal will show the information that were retrieved about that word.

WordInfo
Free CSS Templates by RamblingSoul.com | Valid XHTML 1.0 | CSS 2.0