 |
Project Achievements
Integrated Resource Domain |
Standardized Descriptions |
Repository Linking |
Resource Production |
Evaluation |
Dissemination and Exploitation |
Integrated Resource Domain
Work Package Leader : MPI
Objectives
- Selection of the set of resources to be included in the integrated metadata domain ;
- Identification of the needs to adapt the IMDI metadata set and the supporting tools ;
- Organization of the creation of metadata descriptions ;
- Quality check of the created metadata and their integratation into one browsable and searchable distributed domain ;
- Creation of a demonstrator portal and description of the procedures.
Achievements
The new IMDI version was again demonstrated and explained at various meetings
The controlled vocabularies were adapted according to the needs
The IMDI Editor is being debugged and the code part for handling profiles was re-factured
The Access Rights Management system has been finished and is in operation
The IMDI-OLAC bridge was finished
Checking the correctness of the delivered metadata and developing validation scripts
Integrating the different metadata contributions
Reports
D2.2a Integrated Metadata Domain
D2.2b Integrated Metadata Domain
D2.3 Portal Establishment
For more information : http://www.mpi.nl/INTERA
Back
Standardized Descriptions
Work Package Leader : LORIA
Objectives
- All terms used in INTERA as elements and controlled vocabularies will be defined following the ISO norms and included in open terminology repositories.
- All structural and semantic relations will be defined as RDF schemas and be included in RDF repositories.
- All descriptions and metadata tool menus etc will be localized in the languages covered by the partners and data centers plus Spanish.
- All this work will be guided by the work in the corresponding standardization bodies.
Achievements
An ISO TC37/SC4 Metadata requirements document was worked out
An Editors Team was built and the requirements document was distributed
A mapping definition between the new OLAC and IMDI versions was done and a schema for different mapping types was worked out
The formalised IMDI specifications entered in the ISO repository by LORIA and were controlled and corrected
Description of the relations other than hierarchical ones to be contained in the ISO data category repository
Localisation of the IMDI set in 8 languages.
Reports
D3.1 INTERA Definitions Report
D3.2 Localization Report
Back
Repository Linking
Work Package Leader : USAAR
Objectives
- Definition of the LREP protocol ;
- Development of the components at both repositories to implement the LREP protocol ;
- Definition and implementation of a remote execution scenario, i.e. the extension of the descriptions in the tool repository and the adaptation of the IMDI browser ;
- Adaptation of the selected tools to be immediately started within the integrated domain. The result of this work will be a version of the integrated metadata domain where it is possible to start a LREP exchange from the browser to afterwards immediately start tools.
Achievements
Definition of specification of the LREP protocol
Adaptation of the browser according to the specifications of LREP
Introduction of Unique Resource Identifiers
Selection, installation and test of Handle System
Creation of an XML schema for the function thesaurus
Programming of a converter to allow to turn it into an XML document
Adaptation of the to read the function thesaurus from the XML file.
Reports
D4.1 LREP Definition Report
D4.2 LREP Components (Demonstration)
D4.3 Remote Execution Scenario
D4.4 Tool Adaptation Report
Back
Resource Production
Work Package Leader : ILSP
Objectives
- Production of new multilingual resources (parallel corpora and terminologies) for the less widely spoken languages, including the Balkan ones,
- Investigation of user needs to specify the domain(s) of interest to the eContent professionals,
- Adaptation and extension of existing standards and specifications for multilingual resources,
- Elaboration of a commercially attractive model for the language resources production business.
Achievements
Creation of the following parallel corpora
Greek - English parallel corpora (4 MWs (Million words) in total, 2 MWs per language)
Slovene - English parallel corpora (4 MWs in total, 2 MWs per language)
Serbian - English parallel corpora (2 MWs in total, 1 MWs per language)
Bulgarian - English parallel corpora (2 MWs in total, 1 MWs per language).
Total : 12MWs
Processing of corpora
alignment (TMX - Translation Memory eXchange format)
structural annotation (XCES format)
linguistic annotation, i.e. PoS Tagging and lemmatization (XCES format)
metadata descriptions (IMDI Metadata Elements for Session Descriptions v3.0.4)
validation
formal (conformance to the specifications) : automatic
content (correct assignment of descriptive features) : human
Extraction of multilingual terms from the corpora produced
multilingual terminology in the :
domain of law : Bulgarian, English, Greek, Serbian, Slovene
domain of education : Bulgarian, English, Greek, Serbian
domain of health : English, Greek, Serbian
domain of tourism : English, Greek
domain of environment : English, Greek.
Processing of terms
automatic extraction of terms
human validation of automatic extraction
coding of terminological entries (TMF standard).
Reports
D5.2 Resource Production Report
D5.3 Resource Production Methodology Report
For more information :
www.ilsp.gr/intera
Back
Evaluation
Work Package Leader : ILC
Objectives
- Describe a testing scenario and a quality assessment strategy for the essential INTERA deliverables,
- Form a user group where members of the users of the participating data centers can be seen as the core,
- Carry out the evaluation process.
Achievements
Test Scenario
Finalisation of the global evaluation strategy
Evaluation Criteria
A wide set of evaluation criteria has been defined, in order to leave room for selection of the criteria that will be actually used for evaluation. This strategy responds to the need of best suiting the latest developments of Intera’s results.
Creation of a User Forum
The user group which will act as final evaluators of the project is being continuously updated. The preliminary list of contacts which has been circulated among the partners and is being populated.
For more information : http://www.ilc.cnr.it
Back
Dissemination and Exploitation
Work Package Leader : ELDA
Objectives
- Integration of INTERA in a general standardizing context by continuous liaison and close cooperation with internationally established standardization bodies,
- Dissemination of INTERA results and exploitation strategies to the wide communities of major archive builders, distributors of language resources, industrials in the field of Language Engineering, and potential implementers of the technology developed in INTERA,
- Continuous promotion the INTERA domain by publications and presentations
- Organization of at least one dedicated workshop for an international audience.
Achievements
Organisation of the following international workshops dedicated to Intera :
INTERA-ISO preparation workshop, was organised in Frankfurt/Hahn (22nd March 2004), and was dedicated to issues related to standards, metadata and data categories for lexical information. Participants were coming from Saarbruecken (USAAR and DFKI), Nijmegen, Pisa, (for the INTERA part) , France (INRIA, representing Loria), Hamburg, Berlin, Sheffield, Barcelona and the USA.
A joint INTERA-ISO Workshop was also organised at the LREC 2004 conference in which the Intera partners took part. The workshop’s topic was about a Registry of Linguistic Data Categories within an Integrated Language Resource Repository Area (INTERA). In addition, some of the Intera partners were part of the Workshop Programme Committee.
Intera partners contributed to the discussion by presenting a paper where, starting from the ISLE-MILE Lexical Classes, Data Categories for lexicons are presented in view of content interoperability of such kind of resources. Concretely, the work formulates also a proposal for an RDF schema of the categories of the semantic and syntax-semantic linking layers and shows a concrete instantiation of some RDF lexical objects. The set of Data Categories developed complies with the goal of the ISO TC37/SC4.
An INTERA workshop (INTERA panel) was organised at the LREC 2004 Conference (24-31 Mai 2004), and was focused on Standards for Data Categories (see http://www.lrec-conf.org/lrec2004/doc/ws/prg_Intera.pdf). LREC 2004 is one of the major event in Language Technologies. The conference gathered over 900 people from all over the world.
Presentation of papers :
INTERA project participants presented several papers dedicated to Intera at LREC 2004 conference as well as at other events :
Daan Broeder, Thierry Declerck, Laurent Romary, Markus Uneson, Sven Strömqvist, Peter Wittenburg : A Large Metadata Domain of Language Resources, LREC2004 Conference, Lisbon, May 2004
Peter Wittenburg, Heidi Johnson, Markus Buchhorn, Hennie Brugman, Daan Broeder : Architecture for Distributed Language Resource Management and Archiving, LREC2004 Conference, Lisbon, May 2004
Peter Wittenburg, Greg Gulrajani, Daan Broeder, Marcus Uneson : Cross-Disciplinary Integration of Metadata Descriptions, LREC2004 Conference, Lisbon, May 2004
Daan Broeder, Peter Wittenburg, Onno Crasborn : Using Profiles for IMDI Metadata Creation, LREC2004 Conference, Lisbon, May 2004
Peter Wittenburg, Hennie Brugman, Daan Broeder, Albert Russel : XML-Based Language Archiving, LREC 2004 Workshop on XML-based richly annotated corpora, LREC2004 Conference, Lisbon, May 2004
Daan Broeder, Hennie Brugman, Nelleke Oostdijk, Peter Wittenburg : Towards Dynamic Corpora, LREC Workshop on Compiling and Processing Spoken Corpora, LREC2004 Conference, Lisbon, May 2004
Daan Broeder, Maria Nava, Thierry Declerck : INTERA - a Distributed Domain of Metadata Resources, Workshop on a Registry of Linguistic Data Categories within an Integrated Language Resources Repository Area, LREC2004 Conference, Lisbon, May 2004
Peter Wittenburg : The IMDI Metadata Concept, Workshop on Building the LR&E Roadmap : Joint COCOSDA and ICCWLRE Meeting, LREC2004 Conference, Lisbon, May 2004
Thierry Declerck , Nancy Ide , Key-Sun Choi , Laurent Romary (Eds) : A Registry of Linguistic Data Categories within an Integrated Language Resources Repository Area (INTERA), LREC2004 Conference, Lisbon, May 2004
Kiyong Lee, Lou Burnard, Laurent Romary, Eric de la Clergerie, Thierry Declerck, Syd Bauman, Harry Bunt, Lionel Clément, Tomaz Erjavec, Azim Roussanaly, Claude Roux : Towards an International Standard on Feature Structure Representation, LREC2004 Conference, Lisbon, May 2004
Thierry Declerck, Paul Buitelaar, Nicoletta Calzolari, Alessandro Lenci : Towards A Language Infrastructure for the Semantic Web, LREC2004 Conference, Lisbon, May 2004
Ulrike Mosel, Peter Wittenburg : The DOBES Programme and its Contribution to Documentation and Revitalization, Dialogue on Language Diversity, Sustainability and Peace, Universal Forum of Cultures, Barcelona, May 2004
Promotion of Intera at international events
Moreover, the Intera partners organised and or participated at the following events where they promoted the Intera project :
Meeting of the Interparlamentary Group Netherlands/Belgium, Nijmegen, January
International Sign Language Meeting, Nijmegen, January 2004, (organiser)
ISO TC37/SC4 Meeting Korea, February 2004
E-Content Info Day : INTERA presented as reference Action Line 2.2 project, Bari, March 2004
UNESCO Training Course on Digital Archiving, Vilnius, March 2004
Lexikon Workshop, Hahn, March 2004 (co-organiser)
Endangered Languages Training Course, Nijmegen, May 2004 (organiser)
Lingua Pax Conference, Barcelona, May 2004
LREC Conference 2004, Lisbon, May 2004
Workshop on XML-based richly annotated corpora, Lisbon, May 2004 (co-organiser)
Workshop on Compiling and Processing Spoken Corpora, Lisbon, May 2004
Workshop on a Registry of Linguistic Data Categories within an Integrated Language Resources Repository Area, May 2004
COCOSDA and ICCWLRE Meeting, Lisbon, May 2004
ISO TC37/SC4 Meeting, Lisbon, May 2004
Meeting on Bilingual Databases, Utrecht, June 2004
At all the mentioned events the INTERA intentions and the IMDI infrastructure were described. This time much time was spent on "evangelisation", since it was seen as very important to convince many researchers in the area of Language Resources to create metadata and make their resources visible. In the ISO meetings the standardization for metadata for annotated resources and lexica was discussed.
In general we can say that the INTERA standard for metadata description of LR (IMDI) is one of the two relevant concepts and that it is widely used world-wide. At LREC there were other talks from people applying this technology. The attached world map gives an impression where IMDI currently is used.
Further, a proposal was worked out to further integrate Language Resource centres and therefore language resources together with University of London, University of Lund, Dutch Institute for Lexicology. The proposal where the MPI acts as co-ordinator got a very high rating in the evaluation process and one of its major pillars is metadata.
For more information :
www.elda.org/intera
Back
|
 |