Additional resources

In order to carry out the BARR track we will release the BARR corpus, consisting in a manually labeled collection of Spanish medical abstracts constructed using a customized version of AnnotateIt as well as using the Markyt annotation system. This corpus comprises a selection of medical abstracts. The BARR corpus is structured into a training, development and test set, each consisting of a total of 1000 abstracts with their corresponding <Abbreviation, Definition> offset annotations annotated an annotation team of three domain experts.

For evaluation purposes participating teams have to recognize short form-long form pairs co-occurring within sentences consisting of:

  1. Short Forms (SFs): a shorter term that denotes a longer word or phrase.
  2. Long Forms (LFs): refers to the corresponding definition found in the same sentence as the SF.

A large collection of unlabeled medical abstracts written primarily in Spanish, the BARR background set will be released together with the BARR corpus. Moreover several lexical resources and a collection of pointers to existing tools will be released as well.

Here's an example of an annotated abstract. A short form (ATM) is annotated, together with the long form (articulación temporomandibular). We can find the short form again in the abstract, so we mark it as "Multiple":

