Pyversions image1 PyPi

Installation

This project uses pipenv for installation. Some IDE’s like PyCharm also have direct support for pipenv.

> pipenv install linkml

Language Features

  • Polymorphism/Inheritance, see is_a

  • Abstract and Mixin classes

  • Control JSON-LD mappings to URIs via prefix declarations

  • Ability to refine the meaning of a slot in the context of a particular class via slot usage

Examples

LinkML can be used as a modeling language in its own right, or it can be compiled to other schema/modeling languages.

We will use the following simple schema for illustrative purposes:

id: http://example.org/sample/organization
name: organization

types:
  yearCount:
    base: int
    uri: xsd:int
  string:
    base: str
    uri: xsd:string

classes:

  organization:
    slots:
      - id
      - name
      - has boss

  employee:
    description: A person
    slots:
      - id
      - first name
      - last name
      - aliases
      - age in years
    slot_usage:
      last name :
        required: true

  manager:
    description: An employee who manages others
    is_a: employee
    slots:
      - has employees

slots:
  id:
    description: Unique identifier of a person
    identifier: true

  name:
    description: human readable name
    range: string

  aliases:
    is_a: name
    description: An alternative name
    multivalued: true

  first name:
    is_a: name
    description: The first name of a person

  last name:
    is_a: name
    description: The last name of a person

  age in years:
    description: The age of a person if living or age of death if not
    range: yearCount

  has employees:
    range: employee
    multivalued: true
    inlined: true

  has boss:
    range: manager
    inlined: true

Note that this schema does not illustrate the more advanced datamodel features like in Biolink Model.

Generators

JSON Schema is a schema language for JSON documents.

With the example organization LinkML schema schema, we can illustrate the autogeneration of a JSON Schema output. You can run:

pipenv run gen-json-schema examples/organization.yaml

Note that any JSON that conforms to the derived JSON Schema can be converted to RDF using the derived JSON-LD context.

JSON-LD context provides mapping from JSON to RDF.

With the example organization LinkML schema schema, we can illustrate the autogeneration of a JSON-LD context output. You can run:

pipenv run gen-jsonld-context examples/organization.yaml

You can control the output via prefixes declarations and default_curi_maps.

Any JSON that conforms to the derived JSON Schema (see above) can be converted to RDF using this context.

You can also combine a JSON instance file with a JSON-LD context using simple code or a tool like jq:

jq -s '.[0] * .[1]' examples/organization-data.json examples/organization.context.jsonld > examples/organization-data.jsonld

The above generated JSON-LD file can be converted to other RDF serialization formats such as N-Triples. For example we can use Apache Jena as follows:

riot examples/organization-data.jsonld > examples/organization-data.nt

With the example organization LinkML schema schema, we can illustrate the autogeneration of a Python Dataclass output. You can run:

pipenv run gen-py-classes examples/organization.yaml > examples/organization.py

Python Dataclass for organization schema

@dataclass
class Organization(YAMLRoot):
    _inherited_slots: ClassVar[List[str]] = []

    class_class_uri: ClassVar[URIRef] = URIRef("http://example.org/sample/organization/Organization")
    class_class_curie: ClassVar[str] = None
    class_name: ClassVar[str] = "organization"
    class_model_uri: ClassVar[URIRef] = URIRef("http://example.org/sample/organization/Organization")

    id: Union[str, OrganizationId]
    name: Optional[str] = None
    has_boss: Optional[Union[dict, "Manager"]] = None

    def __post_init__(self, **kwargs: Dict[str, Any]):
        if self.id is None:
            raise ValueError(f"id must be supplied")
        if not isinstance(self.id, OrganizationId):
            self.id = OrganizationId(self.id)
        if self.has_boss is not None and not isinstance(self.has_boss, Manager):
            self.has_boss = Manager(self.has_boss)
        super().__post_init__(**kwargs)

For more details see PythonGenNotes.

The python object can be directly serialized as RDF.

ShEx, short for Shape Expressions Language is a modeling language for RDF files.

With the example organization LinkML schema schema, we can illustrate the autogeneration of a ShEx output. You can run:

pipenv run gen-shex examples/organization.yaml > examples/organization.shex

ShEx output for organization schema

BASE <http://example.org/sample/organization/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd1: <http://example.org/UNKNOWN/xsd/>


<YearCount> xsd1:int

<String> xsd1:string

<Employee>  (
    CLOSED {
       (  $<Employee_tes> (  <first_name> @<String> ? ;
             <last_name> @<String> ;
             <aliases> @<String> * ;
             <age_in_years> @<YearCount> ?
          ) ;
          rdf:type [ <Employee> ]
       )
    } OR @<Manager>
)

<Manager> CLOSED {
    (  $<Manager_tes> (  &<Employee_tes> ;
          rdf:type [ <Employee> ] ? ;
          <has_employees> @<Employee> *
       ) ;
       rdf:type [ <Manager> ]
    )
}

<Organization> CLOSED {
    (  $<Organization_tes> (  <name> @<String> ? ;
          <has_boss> @<Manager> ?
       ) ;
       rdf:type [ <Organization> ]
    )
}

Web Ontology Language OWL is modeling language used to author ontologies.

With the example organization LinkML schema schema, we can illustrate the autogeneration of a ShEx output. You can run:

pipenv run gen-owl examples/organization.yaml > examples/organization.owl.ttl

OWL output for organization schema

<http://example.org/sample/organization/Organization> a owl:Class,
        meta:ClassDefinition ;
    rdfs:label "organization" ;
    rdfs:subClassOf [ a owl:Restriction ;
            owl:onClass <http://example.org/sample/organization/String> ;
            owl:onProperty <http://example.org/sample/organization/id> ;
            owl:qualifiedCardinality 1 ],
        [ a owl:Restriction ;
            owl:maxQualifiedCardinality 1 ;
            owl:onClass <http://example.org/sample/organization/String> ;
            owl:onProperty <http://example.org/sample/organization/name> ],
        [ a owl:Restriction ;
            owl:maxQualifiedCardinality 1 ;
            owl:onClass <http://example.org/sample/organization/Manager> ;
            owl:onProperty <http://example.org/sample/organization/has_boss> ] .

Generating Markdown documentation

The below command will generate a Markdown document for every class and slot in the model which can be used in a static site for ex., GitHub pages.

pipenv run gen-markdown examples/organization.yaml -d examples/organization-docs/

Specification

See specification. Also see the semantics folder for an experimental specification in terms of FOL.

FAQ

Why invent our own yaml and not use JSON-Schema, SQL, UML, ProtoBuf, OWL, etc.?

Each of these is tied to a particular formalism. JSON Schema to trees. OWL to open world logic. There are various impedance mismatches in converting between these. The goal was to develop something simple and more general that is not tied to any one serialization format or set of assumptions.

There are other projects with similar goals for ex., schema_salad. It may be possible to align with these.

Here X may be bioschemas, some upper ontology (BioTop), UMLS metathesaurus, bio*, and various other attempts to model all of biology in an object model.

Currently, as far as we know there is no existing reference datamodel that is flexible enough to be used here.

Developers Notes

A Github action is set up to automatically release the package to PyPI. When it is ready for a new release, create a Github release. The version should be in the vX.X.X format following the semantic versioning specification.

After the release is created, the GitHub action will be triggered to publish to Pypi. The release version will be used to create the Pypi package.

If the Pypi release failed, make fixes, delete the GitHub release, and recreate a release with the same version again.

History

This framework used to be called BiolinkML. LinkML replaces BiolinkML. For assistance in migration, see Migration.md.