News about CEDAR

CEDAR Offers Support for CDEs from caDSR

We are pleased to report that CEDAR template creators can now import from over 60,000 of NCI’s caDSR Common Data Elements (CDEs) to build new Fields in CEDAR. Using CEDAR’s search, browsing, and viewing services template builders can easily build a form based partly or entirely on CDEs from caDSR.

Over the last 18 months, CEDAR developers have collaborated with the NCI to adapt CEDAR capabilities to the unique characteristics of CDEs. By representing these CDEs as CEDAR Fields, we have made them fully accessible to CEDAR users. CEDAR already handled many of the specialized features that are found in the caDSR templates, and the CEDAR team added some features to support particular CDE workflows.

Most attributes from the data elements in the CDE browser can be represented directly in CEDAR, especially the attributes used in creating templates.

User Applications

CEDAR’s easy-to-use system for working with metadata templates and reusable components work with the imported CDEs in several ways. A CEDAR Template creator may import CDE representations into a Template using CEDAR’s Template Designer import process, in this way building a metadata form based partly or entirely on CDE content from caDSR.

All the resource discovery and lookup features in CEDAR also work for CDEs. So you can use CEDAR’s search, browsing, and viewing services to find and review CDE content in the system. And CEDAR’s REST APIs also work for the CDEs in CEDAR, which means users can remotely discover and download CDE content.

CEDAR users can even build new Fields by making a copy of any of the CDE-based Fields that are already in CEDAR. The user can modify this copy—which at this point is a generic CEDAR Field—however he or she wants, and will inherit any other Field values like the label, description and help tip. (Eventually CEDAR might support re-submission of CDEs to a CDE repository, but this is not offered at this time.)

About Common Data Elements

CDEs offer precise specifications of questions, including the set of allowable answers to each question. Generally following ISO 11179 data standards, CDEs are decribed in great detail, including information about their development history. CDEs are increasingly being adopted to help improve standardization and interperability, but while CDEs can provide a strong conceptual foundation for interoperation, there are no widely recognized serialization or interchange formats to describe and exchange their definitions.

CDE registries can help standardize the way CDEs are collected, stored, transferred, and reported. One of the largest CDE registries has been developed by the U.S. National Cancer Institute (NCI) with the goal of facilitating multidisciplinary, multi-institutional cancer research. This registry is called the Cancer Data Standards Repository (caDSR) and it contains over 60,000 CDEs that cover many aspects of cancer research. The U.S. National Institutes of Health (NIH) are also developing a multi-discipline registry that aims to unify the range of biomedical CDEs that have been produced by a variety of NIH and other organizations (https://cde.nlm.nih.gov).

How CEDAR Adopts CDEs from caDSR

To make existing CDEs more readily accessible to form builders, we extended our CEDAR Web-based metadata management platform to provide a core representation of CDEs suitable for specifying questions in a metadata acquisition system. We do not manage the entire CDE specification—that contains a comprehensive implementation of the ISO/IEC 11179 standard—but focus instead on core functionality that specifies the questions and the values used to answer those questions.

By importing the XML-defined CDEs from the caDSR system into JSON Schema-defined fields in CEDAR, we made these specifications available to any CEDAR user. We run the conversion process automatically to keep the CEDAR CDEs up-to-date with respect to the source content.

CEDAR captures the CDE’s field information (top), and puts the value set information into BioPortal, a repository of vocabularies and ontologies. The field specification of the uploaded CEDAR caDSR CDE (top right) references the versioned value set in BioPortal (bottom right).

Additional Information

You can find more information about CEDAR and its use of CDEs in the following resources.

CEDAR in the GO FAIR Funder Study

FAIR Funder Implementation Study: life cycle with founding members

After providing contributions to the GO FAIR project over the last 18 months, CEDAR will be a significant participant in GO FAIR’s FAIR Funder Implementation Study.

This collaborative project will demonstrate a new level of integrated and FAIR metadata, making data projects funded by research agencies demonstrably more Findable, Accessible, Interoperable, and Reusable. As one of the founding collaborators, CEDAR has played a significant role in defining, describing, and implementing services that will improve metadata collection for funded research.

CEDAR’s Role

The CEDAR project provides a way for funders to specify what metadata they want to collect as part of the research life cycle. This can include not just logistical metadata the applicants may provide to describe their proposed project (title, investigators, summary, costs, duration), but metadata describing how they will manage and document their research products—their Data Management Plan or Data Stewardship Plan—and pointers to those products when they have been released. Grantees will be able to specify this metadata in simple forms with clear instructions throughout their execution of the grant, so that funders and other potential users can find and reuse the described data products.

Furthermore, in the FAIR Funder Implementation Study, the supplied metadata can be evaluated to see whether it meets criteria for FAIRness, for example having persistent unique identifiers that can be resolved. The grantees and funders can rely on automated evaluation systems to obtain the metadata, perform assessments of it, and issue reports to the grantees and funders of the described projects. This enables the grantees to easily provide provably FAIR metadata and data, while community members can see, understand, and reuse the best practices the metadata represents

Coming Soon

OpenView of FAIR Funder template in outline formIn earlier workshops to work on funder metadata, the CEDAR team helped funders describe a basic set of metadata fields describing products throughout the funded life cycle. In coming Metadata for Machine (M4M) workshops, this simple example will be enhanced and customized to align it with the needs of the funders who are early adopters of the GO FAIR methods. The FAIR Funder Implementation Study will demonstrate the CEDAR template’s application in creating metadata throughout the life cycle, including evaluating the resulting metadata for FAIRness with external evaluation software.

Going beyond the CEDAR demonstrations, other founding systems like the Data Stewardship Wizard and Castor will demonstrate their own ability to perform metadata capture and reuse within the Implementation Study, and will demonstrate interoperation with CEDAR using common specifications to exchange templates and metadata. Meanwhile, templates and components that are useful for others will be registered in FAIRsharing to so that they can be easily found and evaluated for reuse.

Publication: CEDAR offers metadata recommendations from mined rules

Marcos Martinez-Romero and his co-authors have published a new paper describing CEDAR’s updated implementation of intelligent authoring. The new methods use rule mining to generate recommendations based on previously entered metadata in the CEDAR system, and offer the users only the most likely recommendations given previous metadata entries for the template.

Suggested values seen by users
The intelligent authoring metadata recommendations take into account rules derived from previously entered metadata with the same values.

You can find instructions for setting up this capability in the CEDAR User Manual page Understanding the Suggestion System.

Martínez-Romero M, O’Connor MJ, Egyedi AL, Willrett D, Hardi J, Graybeal J, Musen MA. Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databasesDatabase. Volume 2019, 10 June 2019. https://doi.org/10.1093/database/baz059.