Raras
Buscar doenças, sintomas, genes...

Developer & Data Documentation

The Raras Knowledge Graph

An open, FAIR-compliant knowledge graph of rare diseases — the largest in Latin America. It links diseases, phenotypes, genes, medications, clinical trials, research and Brazilian public-health (SUS) data through open standards, and serves it as Linked Data anyone can query, download and reuse. Think of it as a “Wikidata for rare diseases”: persistent identifiers, machine-readable provenance, and a public domain (CC0) core.

243,504
RDF triples
27,814
entities
10,468
diseases
11,652
phenotypes

Overview

The graph integrates Orphanet, the Human Phenotype Ontology (HPO), OMIM, MONDO, ClinVar, Open Targets, PubMed and ClinicalTrials.gov with Brazilian public-health data (DATASUS, CEAF, SIGTAP, PNTN, PCDT). Every disease carries Portuguese clinical content and crosswalks to international ontologies, so the data is usable both for Brazilian care pathways and for global federation (ERN, EJP RD, GA4GH, NCATS Translator).

The data is dedicated to the public domain under CC0 1.0 for Raras-originated triples, with upstream sources keeping their own open licenses (see Data sources & licensing). Full license terms are on the license page.

Quick start

No authentication, no API key. Every endpoint is public and CORS-enabled.

1 — Query with SPARQL

curl
curl -G https://raras.org/api/sparql \
  --data-urlencode 'query=SELECT ?orphaCode ?name WHERE {
    ?d a rnc:Disease ;
       rnp:orphaCode ?orphaCode ;
       rdfs:label ?name .
  } LIMIT 5' \
  -H 'Accept: application/sparql-results+json'

2 — Get one disease as RDF

curl
curl 'https://raras.org/api/rdf?type=Disease&format=turtle' | head -40

3 — Query with GraphQL

curl
curl https://raras.org/api/graphql \
  -H 'Content-Type: application/json' \
  -d '{"query":"{ disease(orphaCode:\"166\"){ name mondoId genes{ symbol } } }"}'

Data model

The graph has 8 entity classes and 45 properties. Classes are aligned with the Biolink Model, schema.org and OBO ontologies, so the same vocabulary works across SPARQL, GraphQL and TRAPI. Every term dereferences to its own RDF definition — see /class/Disease and /property/orphaCode, or browse the full vocabulary at /ontology.

Entity classes

ClassCountDescriptionAligned with
rnc:Disease10,468A rare diseasebiolink:Disease, schema:MedicalCondition
rnc:Phenotype11,652An HPO phenotypic featurebiolink:PhenotypicFeature
rnc:Gene5,571A human genebiolink:Gene
rnc:Medication123A treatment / drugbiolink:ChemicalEntity
rnc:ClinicalTrial505A ClinicalTrials.gov study
rnc:Paper5,539A curated research paperbiolink:Publication
rnc:ReferenceCenter81A CNES-coded treatment center
rnc:Community10,471A patient support community

Disease properties

PropertyRangeMeaning
rnp:orphaCodestringOrphanet code (primary identifier)
rnp:mondoIdstringMONDO Disease Ontology ID
rnp:omimIdstringOMIM ID (identifier only)
rnp:icd10CodestringCID-10 (Brazilian ICD-10)
rnp:prevalenceClassstringOrphanet prevalence class
rnp:inheritancestringMode(s) of inheritance
rnp:ageOfOnsetstringAge(s) of onset (HPO-aligned)
rnp:clinicalDescriptionstringClinical description (PT-BR)
rnp:activeTrialCountintegerNumber of active clinical trials
rnp:paperCountintegerNumber of curated papers
rnp:susCoverageScorefloatBrazilian SUS coverage score
rnp:wikidataIdstringWikidata QID
rdfs:label / skos:prefLabellang stringDisease name (PT-BR / EN)
owl:sameAsIRICross-references (Orphanet, MONDO, OMIM, Wikidata)

Relationships

PredicateTargetMeaning
rnp:hasPhenotypePhenotypeDisease presents this HPO phenotype (frequency-annotated)
rnp:associatedGeneGeneDisease is associated with this gene
rnp:hasMedicationMedicationTreated with this medication
rnp:hasSUSMedicationMedicationCovered by Brazilian SUS (CEAF)
rnp:hasTrialClinicalTrialHas an associated clinical trial
rnp:hasPaperPaperHas a curated research paper
rnp:hasReferenceCenterReferenceCenterTreated at this reference center

Biolink aliases (e.g. biolink:has_phenotype, biolink:treats) are accepted as synonyms for TRAPI/Translator compatibility.

Identifiers & cross-references

Like Wikidata, every entity has a persistent, dereferenceable identifier. Resources live under https://raras.org/id/ and resolve to RDF via content negotiation (HTTP 303 to HTML, or Turtle / JSON-LD with the matching Accept header).

Type prefixEntityExample RARAS ID
DDiseaseD00166
CReferenceCenterC00081
PProtocolP00012
AAssociationA00007
GCommunityG00451
MMedicationM00123

Each disease emits owl:sameAs links to external authorities, enabling round-trip integration with the wider Linked Open Data cloud:

  • Orphanethttp://www.orpha.net/ORDO/Orphanet_{code}
  • MONDOhttp://purl.obolibrary.org/obo/MONDO_{id}
  • OMIMhttps://omim.org/entry/{id}
  • HPOhttp://purl.obolibrary.org/obo/HP_{id}
  • Wikidatahttp://www.wikidata.org/entity/{QID}
  • HGNC (genes) — https://www.genenames.org/.../HGNC:{id}

Auditability & the Disease Twin

The graph is not a one-off import. Every one of the 10,468 diseases is continuously maintained by an autonomous “Disease Twin” agent that verifies facts against official sources, keeps the data fresh, discovers new sources, and proposes research hypotheses. This is what makes the dataset auditable: every fact is traceable to a source and a verification timestamp.

Continuous verification

Atomic claims (FActScore-style) checked against authority-ranked official URLs (DOU > bvsms > gov.br > ANVISA). Confidence-graded; stale claims demoted.

Freshness SLA

A control-plane coverage matrix tracks every dimension. All 10,468 diseases are re-checked within 30 days; gene data within 180 days. Populated data stays 100% within SLA.

Source discovery (Scout)

An agent autonomously finds specialized sources the fixed pipeline misses — disease registries, ERNs, locus-specific variant DBs — and verifies each URL before trusting it.

Hypothesis generation

A co-scientist layer (generate → debate → evolve) proposes novel, testable gaps per disease, grounded in the twin’s current knowledge.

Each release is stamped with provenance (dcterms:publisher, prov:wasAttributedTo) and a unique fingerprint (dcterms:hasVersion), so any copy stays traceable. See the license page for attribution terms, and the live Disease Twin transparency dashboard for real-time coverage and freshness across all diseases.

Access methods

The same data is exposed through multiple open protocols — pick the one that fits your use case.

EndpointProtocolBest for
/api/sparqlSPARQL 1.1Researchers, semantic web, federation
/api/rdfRDF / Linked DataOntology ingestion, triple stores
/api/graphqlGraphQLApp & frontend developers
/api/graph/publicJSONNetwork visualization
/api/downloadsBulk dumpFull-graph download (N-Triples, CSV)
/.well-known/voidVOID / DCATDataset metadata & discovery
/api/beaconGA4GH Beacon v2Variant / disease discovery
/api/phenopacketsGA4GH Phenopackets v2Clinical phenotype exchange

SPARQL support

Supported: SELECT (DISTINCT, aggregates COUNT/SUM/AVG/MIN/MAX), ASK, CONSTRUCT, DESCRIBE, OPTIONAL, FILTER (=, !=, <, >, CONTAINS, STRSTARTS, STRENDS, REGEX, BOUND), ORDER BY, GROUP BY, LIMIT, OFFSET. Not yet supported (returns HTTP 501): UNION, MINUS, SERVICE, subqueries, property paths. Rate limit: 120 requests/min per IP; 15s query timeout. The bare endpoint (no query) returns a machine-readable service description.

Namespaces

Default prefixes are pre-declared on the SPARQL endpoint, so you can use them without a PREFIX header.

PrefixIRI
rnc:https://raras.org/class/
rnp:https://raras.org/property/
rn:https://raras.org/resource/
raras:https://raras.org/id/
orphanet:http://www.orpha.net/ORDO/Orphanet_
mondo:http://purl.obolibrary.org/obo/MONDO_
hp:http://purl.obolibrary.org/obo/HP_
omim:https://omim.org/entry/
wd:http://www.wikidata.org/entity/
rdfs:, owl:, skos:, dcterms:, void:, schema:standard W3C / metadata vocabularies

SPARQL examples

Diseases linked to a specific HPO phenotype

SELECT ?orphaCode ?name WHERE {
  ?d a rnc:Disease ;
     rnp:orphaCode ?orphaCode ;
     rdfs:label ?name ;
     rnp:hasPhenotype ?p .
  ?p rnp:hpoId "HP:0001250" .
}

Count diseases with active clinical trials

SELECT (COUNT(?d) AS ?count) WHERE {
  ?d a rnc:Disease ;
     rnp:activeTrialCount ?n .
  FILTER(?n > 0)
}

CONSTRUCT RDF for one disease

CONSTRUCT {
  ?d rdfs:label ?name ;
     rnp:orphaCode ?orphaCode ;
     rnp:mondoId ?mondoId .
} WHERE {
  ?d a rnc:Disease ;
     rnp:orphaCode "166" ;
     rdfs:label ?name .
  OPTIONAL { ?d rnp:mondoId ?mondoId }
}

Check existence (ASK)

ASK { ?d rnp:orphaCode "166" }

GraphQL

The GraphQL endpoint at /api/graphql has introspection enabled. Key queries: searchDiseases, disease(orphaCode), diseaseByMondo, diseasesByPhenotypes (differential-diagnosis ranking), similarDiseases (vector similarity), searchPhenotypes, referenceCenters, stats and serviceInfo.

GraphQL
{
  disease(orphaCode: "166") {
    name
    mondoId
    cid10
    phenotypes(limit: 5) { phenotype { hpoId name } frequency }
    genes { symbol hgncId }
    susCoverage { coverageScore ceafMedicationCount }
  }
}

Formats & content negotiation

RDF endpoints negotiate by Accept header or a ?format= query parameter:

FormatAccept header?format=
Turtletext/turtleturtle
JSON-LDapplication/ld+jsonjsonld
N-Triplesapplication/n-triplesnt
SPARQL Results JSONapplication/sparql-results+json(default for SELECT)
CSV / TSVtext/csv, text/tab-separated-values

Data sources & licensing

Per-partition licensing follows the Wikidata / Open Targets convention: the Raras-originated layer is CC0, upstream sources keep their licenses. The combined dataset / bulk dump is distributed under CC-BY 4.0 (it embeds CC-BY sources, so attribution is required), while Raras-originated triples remain CC0. These terms are emitted machine-readably in the VOID descriptor as void:subset + dcterms:license. See the license page for details.

PartitionLicense
Raras-originated (IDs, crosswalks, PT translations, SUS integration)CC0 1.0
Orphanet (nomenclature, cross-refs)CC BY 4.0
Human Phenotype Ontology (HPO)CC BY 4.0
MONDOCC BY 4.0
WikidataCC0 1.0
Open TargetsCC0 1.0
ClinVarPublic domain
PubMed (metadata)Public domain
OMIMIdentifiers only — no text redistributed

Rate limits & versioning

  • Rate limit: 120 requests/min per IP on the SPARQL endpoint; 15s query timeout.
  • CORS: all read endpoints send Access-Control-Allow-Origin: *.
  • Versioning: dataset version and last-modified date are published in the VOID descriptor (dcterms:modified).
  • Read-only: SPARQL UPDATE (INSERT/DELETE/LOAD) is rejected with HTTP 403.

How to cite

Attribution required. The combined dataset is CC-BY 4.0 — credit Raras (https://raras.org) when you reuse or redistribute it. Provenance and a per-release fingerprint are stamped into the data (dcterms:publisher, dcterms:hasVersion).

When reusing the dataset, please cite:

Raras — Brazilian Rare Disease Knowledge Graph (RarasNet).
https://raras.org — Raras-originated triples CC0 1.0; combined dataset CC-BY 4.0.
Upstream sources retain their own licenses (see /license).

Questions, federation requests or data corrections: [email protected]. Full license terms: /license.