The objective of the HCM project consists in the integration and exploration of clinical and biological data, with the purpose of developing a patient characterization and prognostic system for the disease Hypertrophic cardiomyopathy (HCM).
HCM is a relatively common genetic myocardial disorder and the most frequent cause of sudden cardiac death in young people and athletes. It is characterized by a variable clinical presentation and onset, as well as a genetic heterogeneity denoted by 640 known mutations in more than 20 genes. Although the existence of a single mutation is sufficient for a positive diagnosis, the severity of HCM may not be the same for two individuals, even if direct relatives, since the presence of a given mutation can have a benign pattern in one individual and result in sudden cardiac death in another. Given the disease characteristics just referred, the identification of correlations between genotype and phenotype is of great importance, specifically the development of models for the association between the presence of certain mutations and the resulting physical traits.
The first step towards the concretization of the objective of the HCM project consists in the integration of the genotype and phenotype data necessary for the characterization of HCM patients. This data originates from the activities performed both by medical doctors and molecular biologists. Figure 1 contains a representation of such activities and of the data elements generated in each activity.
The genotype data corresponds to the presence of the known HCM-associated mutations in the genome of the patients, while the phenotype data corresponds to the clinical elements upon which the clinicians rely to provide a diagnose. The latter normally include the results from physical examinations (e.g. electrocardiogram, echocardiogram), as well as the clinical history of the individual (e.g. age at diagnosis, sudden deaths in the family).
The approach we propose to follow is based on Semantic Web technologies, previously identified as suitable for the integration of heterogeneous data since they make it possible to integrate, share and reuse data in an application- and domain-independent manner. We are currently developing a semantic model in the Web Ontology Language (OWL). This model contains the concepts and the relationships between concepts underlying the data with which HCM is characterized, and is composed of three modules:
- HCM Clinical Evaluation: this module comprehends concepts associated with administrative data and with the clinical data elements necessary for the diagnosis (the phenotype data).
- Genotype Analysis: contains concepts associated with the genetic testing of biological samples (the genotype data).
- Medical Classifications: is an auxiliary module containing medical standards to characterize clinical elements such as patient symptoms.
A first browsable version of the model is available at https://sites.google.com/site/hcmsemanticmodel/home-1.
The second step in the prosecution of the HCM project consists in the analysis of the integrated data in order to infer the previously referred genotype-phenotype correlations. Such analysis will be performed with a combination of powerful data mining techniques, such as support vector machines, with less powerful but more expressive techniques, such as decision trees. This dual approach is intended to provide accurate results without sacrificing their interpretation.
The identified correlation patterns will be included in the semantic model and ultimately used in the HCM characterization system: upon introduction of a new patient’s data, the system will provide the medical doctor with the possible disease outcome for that particular patient.
- Francisco Couto (research advisor)
- Ana Teresa Freitas (INESC-ID / IST) (research advisor)
- Catia M. Machado
- Alexandra R. Fernandes (Universidade Lusófona de Humanidades e Tecnologias & Centro de Química Estrutural / IST) (Molecular biology expert)
- Susana Santos (Universidade Lusófona de Humanidades e Tecnologias & Centro de Química Estrutural / IST) (Molecular biology expert)
- Nuno Cardim (Hospital da Luz) (Medical expert)
- Period: 1-Jan-2010 to 31-Dec-2012
- SFRH/BD/65257/2009, Doctoral research scholarship for Catia M. Machado
Catia M. Machado, Ana T. Freitas, Francisco Couto, Enrichment analysis applied to disease prognosis. Em: M. Boeker, H. Herre, R. Hoehndorf, F. Loebe (Ed.), 4th Workshop on Ontologies in Biomedicine and Life Sciences (OBML) September, 2012.
Catia M. Machado, Francisco Couto, Alexandra R. Fernandes, Susana Santos, Ana T. Freitas, Toward a Translational Medicine Approach for Hypertrophic Cardiomyopathy.3rd International Conference on Information Technology in Bio- and Medical Informatics (ITBAM 2012) 2012. Springer-Verlag GmbH Berlin Heidelberg.
Catia M. Machado, Francisco Couto, Alexandra R. Fernandes, Susana Santos, Nuno Cardim, Ana T. Freitas, Semantic Characterization of Hypertrophic Cardiomyopathy Disease.First Workshop on Knowledge Engineering, Discovery and Dissemination in Health (KEDDH10), held in conjunction with The IEEE International Conference on Bioinformatics % Biomedicine (BIBM 2010) 2010.
Catia M. Machado, Francisco Couto, Alexandra R. Fernandes, Susana Santos, Nuno Cardim, Ana T. Freitas 2010: Unraveling Hypertrophic Cardiomyopathy Variability. ERCIM News 82 - Special Theme: Computational Biology.