The User Guide contains our all documents and materials about the ontology, including details for requesting new ontology terms. If these materials do not satisfy your questions, please do not hestitate to contact us.

Background

Knowledge Repositories

Like most scientific disciplines, the field of biology is ever expanding and advancing. New research methods are continually developed and a never-ending stream of scientific publications continues to grow. As our scientific knowledge and data associated with such knowledge have increased, mechanisms for representing and storing such information have become essential.

With the growth of the internet & commensurate gains in storage capacity, biological databases have become important repositories of our knowledge. They allow the worldwide community of researchers to access, search, sort, and share a vast amount of information in a way previously not possible. Biological databases can store data but also knowledge learned through scientific analysis of that data. For example, a database might store primary information such as a protein’s amino acid sequence – but it might also contain information that describes that protein’s function, for example “protein X performs some biological function Y”. The amount of information stored by biological databases is immense, interconnected, and complex.

Onotologies for People & Computers

To facilitate efficient storage, search, and retrieval of such information, it is important to represent information in a coherent, consistent, and logical fashion. Such information must be interpretable by computers, but it must also be intuitive and coherent to people who want to access the information. Ontologies allow both people and computers to understand and organize information in such a logical and structured fashion. Thus, scientific concepts at biological databases are represented by ontologies and other controlled vocabularies.

The Evidence & Conclusion Ontology

The Evidence & Conclusion Ontology (ECO) is one such ontology. ECO is used to represent scientific evidence in biological research. Documenting the evidence that supports a scientific assertion such as a protein annotation is essential and fundamental to the scientific method: it allows us to say why we believe what we believe to be true. It also affords us a practical means by which to employ quality control measures when importing data into databases. We can also draw inferences about our confidence in conclusion by looking at associated evidence.

ECO describes evidence arising from laboratory experiments, computational methods, curator inferences, and other means. ECO is currently used by dozens of databases, software tools, and other applications to provide structure and context for documenting evidence in scientific research. Using ECO allows users to query, manage, and interpret data in ways heretofore not possible. Therefore, many of the largest biological and genomic databases in the world already use ECO for summarizing evidence in scientific investigations. But the goal of ECO is to enable summary descriptions of evidence types for a broad range of scientific disciplines. The possibility of expanding ECO to enable representing evidence in disciplines as diverse as anthropology, biodiversity, psychology, and clinical research is being explored.

ECO development is facilitated by National Science Foundation DBI award number 1458400.

Rules for New Terms

Succinct term name (label) in the form of evidence

Examples are "microscopy evidence" or "motif discovery evidence". Many ECO experimental evidence types can be thought of as recorded information derived from a process (e.g. performing an assay) that has inputs of various components (e.g. machines, instruments, reagents, samples) and has outputs of various data types (e.g. images, tables, figures).

Aristotelian definition, in the form of "B is an A that C's"

In other words, consider exactly what characteristics (C) make this term (B) a more specific subtype of the parent term (A). Consider also what distinguishes this term from other proposed or existing terms sharing its parent. For example, the term "traceable author statement" is a subclass (child term) of "author statement". The definition of "traceable author statement" is "An author statement that is based on a cited reference". A related term with the same parent, "non-traceable author statement", is defined as "An author statement that is not associated with results presented or a cited reference." As subclasses of the same parent, these terms share common attributes, i.e. they are both author statements, but they each have distinguishing characteristics that are specific and non-overlapping, as well.

Definitions should be succinct and fit into one sentence

However, they should contain as much information as is required to capture the meaning of the term. Additional comments, examples, or usage notes should be provided in a separate comments section and not merged into the definition. This will prevent unnecessarily restricting the meaning of terms so that they can only be used by a specific database (usually the term creator's database).

Suggest a parent term, either existing or new

As you are researching your term's definition, make a note of its most appropriate parent term. In general, ECO is trying to follow a single inheritance hierarchy, where each term has one parent. If you feel that more parents are appropriate or necessary, then note this and we can create appropriate logical definitions to allow for inferring these relationships.

Provide a reference for the definition, preferably in the form of a PubMed ID

Preferred references are papers that are free (i.e. no paywall) and easy to access for most people.

Additional information can be shared

Writing clear, detailed, succinct definitions requires full knowledge of a subject. There are many types of additional information that can help us write better definitions & comments and clarify subtleties in meaning. These include:

  • Additional references or hyperlinks to articles
  • Personal notes on ECO term use in curation
  • Examples of ECO terms in use by other resources
  • Personal notes about the techniques (experimental or computational) described by a requested term
  • Knowledge about terms similar to ECO terms at other ontologies, databases, and other resources

History

Origin of ECO

The Evidence Ontology (ECO) was initially created by founders of the Gene Ontology (GO). In tandem with the earliest GO evidence codes, some 100 terms were added to ECO. Today, there are nearly 300 ECO terms, and the approximately 20 GO evidence codes are now mapped to a subset of these ECO terms. The GO uses these evidence terms in the process of annotation, whereby an attribute is assigned to a gene product. GO annotation consists of assigning a molecular function, biological process, or cellular component to a gene product. This information is stored inside a so-called gene association file (GAF), along with a reference, an evidence code, and several other types of information.

For the GO, the purpose of storing evidence is to enable anyone who wants to know why a particular biological attribute was assigned to a particular gene product to know the methods that were employed. That is, a gene product was assigned a particular characteristic by an annotator after a particular methodology was used and an inference was drawn. Incidentally, this is why nearly all the terms in ECO at one point began with the text inferred from followed by the name of a particular research method or result.

    Some Evidence Ontology terms with a subset of Gene Ontology terms mapped to ECO shown in blue.

Early Revisions

In late 2010, an effort was begun to address inconsistencies that existed in ECO, which until recently had been developed in an ad hoc fashion. The following are some updates to the ontology:

  • Clarifying what ECO represents & addressing multiple meanings of the word "evidence"
  • Defining the main root class evidence
  • General structure improvement & moving inappropriately classed terms to appropriate subclasses
  • Reviewing term names (labels), correcting misspellings, and generally improving syntax
  • Creating new terms requested by users
  • Making obsolete any terms that are not evidence, such as not recorded

It was established that evidence is "a type of information that is used to support an assertion" where an assertion is a statement of fact about a thing. This critical piece of information enabled ECO developers to distinguish between evidence and assertion method, both of which were represented as types of evidence in ECO. The latter is actually not a type of evidence, but rather the way in which an assertion (a statement about a thing, an annotation, et cetera) is associated with the thing that the assertion describes. The Gene Ontology term inferred from electronic annotation was renamed automatic assertion. This term was moved to become a subclass of a newly created term assertion method, which is defined as "a means by which a statement is made about an entity". Assertion method distinguishes between human- and computer-based association of statements with the things they describe. Evidence still describes the underlying support for that statement. Both classes can be combined to make complex statements that describe both evidence and assertion method.

Harmonizing with OBI

In 2011, the Evidence Ontology began collaborating with the Ontology for Biomedical Investigations (OBI) in order to better integrate the two ontologies. OBI is particularly well suited to describing instrumentation and research protocols, and it is generally more expressive and detailed than ECO. Many users of ECO desire simple representation of evidence, given their particular database needs, but they often use complex workflows involving multiple methodologies in order to generate the evidence that supports their conclusions. This represents an ontological problem of sorts, because it is often confusing when multiple methods are represented within one ontology term, if they are inconsistently incorporated. Fortunately, complex workflows can be modeled in OBI, and represented as simpler concepts that can be imported into ECO. Work on associating these two ontologies is ongoing.