Search the ELRA catalogue     

> Projects > Completed projects  >  Achievements
|

Project Achievements
Integrated Resource Domain | Standardized Descriptions | Repository Linking | Resource Production | Evaluation | Dissemination and Exploitation |

Integrated Resource Domain

Work Package Leader : MPI

Objectives

  1. Selection of the set of resources to be included in the integrated metadata domain ;
  2. Identification of the needs to adapt the IMDI metadata set and the supporting tools ;
  3. Organization of the creation of metadata descriptions ;
  4. Quality check of the created metadata and their integratation into one browsable and searchable distributed domain ;
  5. Creation of a demonstrator portal and description of the procedures.

Achievements

-  The new IMDI version was again demonstrated and explained at various meetings
-  The controlled vocabularies were adapted according to the needs
-  The IMDI Editor is being debugged and the code part for handling profiles was re-factured
-  The Access Rights Management system has been finished and is in operation
-  The IMDI-OLAC bridge was finished
-  Checking the correctness of the delivered metadata and developing validation scripts
-  Integrating the different metadata contributions

Reports

D2.2a Integrated Metadata Domain
D2.2b Integrated Metadata Domain
D2.3 Portal Establishment

For more information : http://www.mpi.nl/INTERA

Back



Standardized Descriptions

Work Package Leader : LORIA

Objectives

  1. All terms used in INTERA as elements and controlled vocabularies will be defined following the ISO norms and included in open terminology repositories.
  2. All structural and semantic relations will be defined as RDF schemas and be included in RDF repositories.
  3. All descriptions and metadata tool menus etc will be localized in the languages covered by the partners and data centers plus Spanish.
  4. All this work will be guided by the work in the corresponding standardization bodies.

Achievements

-  An ISO TC37/SC4 Metadata requirements document was worked out
-  An Editors Team was built and the requirements document was distributed
-  A mapping definition between the new OLAC and IMDI versions was done and a schema for different mapping types was worked out
-  The formalised IMDI specifications entered in the ISO repository by LORIA and were controlled and corrected
-  Description of the relations other than hierarchical ones to be contained in the ISO data category repository
-  Localisation of the IMDI set in 8 languages.

Reports

D3.1 INTERA Definitions Report
D3.2 Localization Report

Back



Repository Linking

Work Package Leader : USAAR

Objectives

  1. Definition of the LREP protocol ;
  2. Development of the components at both repositories to implement the LREP protocol ;
  3. Definition and implementation of a remote execution scenario, i.e. the extension of the descriptions in the tool repository and the adaptation of the IMDI browser ;
  4. Adaptation of the selected tools to be immediately started within the integrated domain. The result of this work will be a version of the integrated metadata domain where it is possible to start a LREP exchange from the browser to afterwards immediately start tools.

Achievements

-  Definition of specification of the LREP protocol
-  Adaptation of the browser according to the specifications of LREP
-  Introduction of Unique Resource Identifiers
-  Selection, installation and test of Handle System
-  Creation of an XML schema for the function thesaurus
-  Programming of a converter to allow to turn it into an XML document
-  Adaptation of the to read the function thesaurus from the XML file.

Reports

D4.1 LREP Definition Report
D4.2 LREP Components (Demonstration)
D4.3 Remote Execution Scenario
D4.4 Tool Adaptation Report

Back



Resource Production

Work Package Leader : ILSP

Objectives

  1. Production of new multilingual resources (parallel corpora and terminologies) for the less widely spoken languages, including the Balkan ones,
  2. Investigation of user needs to specify the domain(s) of interest to the eContent professionals,
  3. Adaptation and extension of existing standards and specifications for multilingual resources,
  4. Elaboration of a commercially attractive model for the language resources production business.

Achievements

Creation of the following parallel corpora

-  Greek - English parallel corpora (4 MWs (Million words) in total, 2 MWs per language)
-  Slovene - English parallel corpora (4 MWs in total, 2 MWs per language)
-  Serbian - English parallel corpora (2 MWs in total, 1 MWs per language)
-  Bulgarian - English parallel corpora (2 MWs in total, 1 MWs per language).

Total : 12MWs

Processing of corpora

-  alignment (TMX - Translation Memory eXchange format)
-  structural annotation (XCES format)
-  linguistic annotation, i.e. PoS Tagging and lemmatization (XCES format)
-  metadata descriptions (IMDI Metadata Elements for Session Descriptions v3.0.4)
-  validation
-  formal (conformance to the specifications) : automatic
-  content (correct assignment of descriptive features) : human

Extraction of multilingual terms from the corpora produced

-  multilingual terminology in the :
domain of law : Bulgarian, English, Greek, Serbian, Slovene
domain of education : Bulgarian, English, Greek, Serbian
domain of health : English, Greek, Serbian
domain of tourism : English, Greek
domain of environment : English, Greek.

Processing of terms

-  automatic extraction of terms
-  human validation of automatic extraction
-  coding of terminological entries (TMF standard).

Reports

D5.2 Resource Production Report
D5.3 Resource Production Methodology Report

For more information : www.ilsp.gr/intera

Back



Evaluation

Work Package Leader : ILC

Objectives

  1. Describe a testing scenario and a quality assessment strategy for the essential INTERA deliverables,
  2. Form a user group where members of the users of the participating data centers can be seen as the core,
  3. Carry out the evaluation process.

Achievements

-  Test Scenario
Finalisation of the global evaluation strategy

-  Evaluation Criteria
A wide set of evaluation criteria has been defined, in order to leave room for selection of the criteria that will be actually used for evaluation. This strategy responds to the need of best suiting the latest developments of Intera’s results.

-  Creation of a User Forum
The user group which will act as final evaluators of the project is being continuously updated. The preliminary list of contacts which has been circulated among the partners and is being populated.

For more information : http://www.ilc.cnr.it

Back



Dissemination and Exploitation

Work Package Leader : ELDA

Objectives

  1. Integration of INTERA in a general standardizing context by continuous liaison and close cooperation with internationally established standardization bodies,
  2. Dissemination of INTERA results and exploitation strategies to the wide communities of major archive builders, distributors of language resources, industrials in the field of Language Engineering, and potential implementers of the technology developed in INTERA,
  3. Continuous promotion the INTERA domain by publications and presentations
  4. Organization of at least one dedicated workshop for an international audience.

Achievements

Organisation of the following international workshops dedicated to Intera :

INTERA-ISO preparation workshop, was organised in Frankfurt/Hahn (22nd March 2004), and was dedicated to issues related to standards, metadata and data categories for lexical information. Participants were coming from Saarbruecken (USAAR and DFKI), Nijmegen, Pisa, (for the INTERA part) , France (INRIA, representing Loria), Hamburg, Berlin, Sheffield, Barcelona and the USA.

A joint INTERA-ISO Workshop was also organised at the LREC 2004 conference in which the Intera partners took part. The workshop’s topic was about a Registry of Linguistic Data Categories within an Integrated Language Resource Repository Area (INTERA). In addition, some of the Intera partners were part of the Workshop Programme Committee. Intera partners contributed to the discussion by presenting a paper where, starting from the ISLE-MILE Lexical Classes, Data Categories for lexicons are presented in view of content interoperability of such kind of resources. Concretely, the work formulates also a proposal for an RDF schema of the categories of the semantic and syntax-semantic linking layers and shows a concrete instantiation of some RDF lexical objects. The set of Data Categories developed complies with the goal of the ISO TC37/SC4.

An INTERA workshop (INTERA panel) was organised at the LREC 2004 Conference (24-31 Mai 2004), and was focused on Standards for Data Categories (see http://www.lrec-conf.org/lrec2004/doc/ws/prg_Intera.pdf). LREC 2004 is one of the major event in Language Technologies. The conference gathered over 900 people from all over the world.

Presentation of papers :

INTERA project participants presented several papers dedicated to Intera at LREC 2004 conference as well as at other events :

-  Daan Broeder, Thierry Declerck, Laurent Romary, Markus Uneson, Sven Strömqvist, Peter Wittenburg : A Large Metadata Domain of Language Resources, LREC2004 Conference, Lisbon, May 2004
-  Peter Wittenburg, Heidi Johnson, Markus Buchhorn, Hennie Brugman, Daan Broeder : Architecture for Distributed Language Resource Management and Archiving, LREC2004 Conference, Lisbon, May 2004
-  Peter Wittenburg, Greg Gulrajani, Daan Broeder, Marcus Uneson : Cross-Disciplinary Integration of Metadata Descriptions, LREC2004 Conference, Lisbon, May 2004
-  Daan Broeder, Peter Wittenburg, Onno Crasborn : Using Profiles for IMDI Metadata Creation, LREC2004 Conference, Lisbon, May 2004
-  Peter Wittenburg, Hennie Brugman, Daan Broeder, Albert Russel : XML-Based Language Archiving, LREC 2004 Workshop on XML-based richly annotated corpora, LREC2004 Conference, Lisbon, May 2004
-  Daan Broeder, Hennie Brugman, Nelleke Oostdijk, Peter Wittenburg : Towards Dynamic Corpora, LREC Workshop on Compiling and Processing Spoken Corpora, LREC2004 Conference, Lisbon, May 2004
-  Daan Broeder, Maria Nava, Thierry Declerck : INTERA - a Distributed Domain of Metadata Resources, Workshop on a Registry of Linguistic Data Categories within an Integrated Language Resources Repository Area, LREC2004 Conference, Lisbon, May 2004
-  Peter Wittenburg : The IMDI Metadata Concept, Workshop on Building the LR&E Roadmap : Joint COCOSDA and ICCWLRE Meeting, LREC2004 Conference, Lisbon, May 2004
-  Thierry Declerck , Nancy Ide , Key-Sun Choi , Laurent Romary (Eds) : A Registry of Linguistic Data Categories within an Integrated Language Resources Repository Area (INTERA), LREC2004 Conference, Lisbon, May 2004
-  Kiyong Lee, Lou Burnard, Laurent Romary, Eric de la Clergerie, Thierry Declerck, Syd Bauman, Harry Bunt, Lionel Clément, Tomaz Erjavec, Azim Roussanaly, Claude Roux : Towards an International Standard on Feature Structure Representation, LREC2004 Conference, Lisbon, May 2004
-  Thierry Declerck, Paul Buitelaar, Nicoletta Calzolari, Alessandro Lenci : Towards A Language Infrastructure for the Semantic Web, LREC2004 Conference, Lisbon, May 2004
-  Ulrike Mosel, Peter Wittenburg : The DOBES Programme and its Contribution to Documentation and Revitalization, Dialogue on Language Diversity, Sustainability and Peace, Universal Forum of Cultures, Barcelona, May 2004

Promotion of Intera at international events

Moreover, the Intera partners organised and or participated at the following events where they promoted the Intera project :

-  Meeting of the Interparlamentary Group Netherlands/Belgium, Nijmegen, January
-  International Sign Language Meeting, Nijmegen, January 2004, (organiser)
-  ISO TC37/SC4 Meeting Korea, February 2004
-  E-Content Info Day : INTERA presented as reference Action Line 2.2 project, Bari, March 2004
-  UNESCO Training Course on Digital Archiving, Vilnius, March 2004
-  Lexikon Workshop, Hahn, March 2004 (co-organiser)
-  Endangered Languages Training Course, Nijmegen, May 2004 (organiser)
-  Lingua Pax Conference, Barcelona, May 2004
-  LREC Conference 2004, Lisbon, May 2004
-  Workshop on XML-based richly annotated corpora, Lisbon, May 2004 (co-organiser)
-  Workshop on Compiling and Processing Spoken Corpora, Lisbon, May 2004
-  Workshop on a Registry of Linguistic Data Categories within an Integrated Language Resources Repository Area, May 2004
-  COCOSDA and ICCWLRE Meeting, Lisbon, May 2004
-  ISO TC37/SC4 Meeting, Lisbon, May 2004
-  Meeting on Bilingual Databases, Utrecht, June 2004

At all the mentioned events the INTERA intentions and the IMDI infrastructure were described. This time much time was spent on "evangelisation", since it was seen as very important to convince many researchers in the area of Language Resources to create metadata and make their resources visible. In the ISO meetings the standardization for metadata for annotated resources and lexica was discussed.

In general we can say that the INTERA standard for metadata description of LR (IMDI) is one of the two relevant concepts and that it is widely used world-wide. At LREC there were other talks from people applying this technology. The attached world map gives an impression where IMDI currently is used.

Further, a proposal was worked out to further integrate Language Resource centres and therefore language resources together with University of London, University of Lund, Dutch Institute for Lexicology. The proposal where the MPI acts as co-ordinator got a very high rating in the evaluation process and one of its major pillars is metadata.

For more information : www.elda.org/intera

Back





|