Deciphering next-generation pharmacogenomics: an information technology perspective

In the post-genomic era, the rapid evolution of high-throughput genotyping technologies and the increased pace of production of genetic research data are continually prompting the development of appropriate informatics tools, systems and databases as we attempt to cope with the flood of incoming genetic information. Alongside new technologies that serve to enhance data connectivity, emerging information systems should contribute to the creation of a powerful knowledge environment for genotype-to-phenotype information in the context of translational medicine. In the area of pharmacogenomics and personalized medicine, it has become evident that database applications providing important information on the occurrence and consequences of gene variants involved in pharmacokinetics, pharmacodynamics, drug efficacy and drug toxicity will become an integral tool for researchers and medical practitioners alike. At the same time, two fundamental issues are inextricably linked to current developments, namely data sharing and data protection. Here, we discuss high-throughput and next-generation sequencing technology and its impact on pharmacogenomics research. In addition, we present advances and challenges in the field of pharmacogenomics information systems which have in turn triggered the development of an integrated electronic ‘pharmacogenomics assistant’. The system is designed to provide personalized drug recommendations based on linked genotype-to-phenotype pharmacogenomics data, as well as to support biomedical researchers in the identification of pharmacogenomics-related gene variants. The provisioned services are tuned in the framework of a single-access pharmacogenomics portal.


Summary
In the post-genomic era, the rapid evolution of high-throughput genotyping technologies and the increased pace of production of genetic research data are continually prompting the development of appropriate informatics tools, systems and databases as we attempt to cope with the flood of incoming genetic information. Alongside new technologies that serve to enhance data connectivity, emerging information systems should contribute to the creation of a powerful knowledge environment for genotype-to-phenotype information in the context of translational medicine. In the area of pharmacogenomics and personalized medicine, it has become evident that database applications providing important information on the occurrence and consequences of gene variants involved in pharmacokinetics, pharmacodynamics, drug efficacy and drug toxicity will become an integral tool for researchers and medical practitioners alike. At the same time, two fundamental issues are inextricably linked to current developments, namely data sharing and data protection. Here, we discuss high-throughput and next-generation sequencing technology and its impact on pharmacogenomics research. In addition, we present advances and challenges in the field of pharmacogenomics information systems which have in turn triggered the development of an integrated electronic 'pharmacogenomics assistant'. The system is designed to provide personalized drug recommendations based on linked genotype-to-phenotype pharmacogenomics data, as well as to support biomedical researchers in the identification of pharmacogenomics-related gene variants. The provisioned services are tuned in the framework of a single-access pharmacogenomics portal.

Introduction
in clinical practice. However, it is clear that the growth of genomic knowledge has not led to a corresponding increase in clinical implementation. The concept of genomic or personalized medicine (GPM), the tailoring of medical treatment to the individual characteristics, needs and preferences of each patient, is not new. Indeed, it is pertinent to note the words of Hippocrates: 'It is far more important to know what person the disease has than what disease the person has'. The discovery, made back in 1956, that the genetic basis for the selective toxicity of both fava beans and the antimalarial drug primaquine is a deficiency of the metabolic enzyme glucose-6-phosphate dehydrogenase [1] presents one of the earliest illustrations of the principle of personalized medicine.
Nowadays, it is almost axiomatic that the continuing advances in genomic research will revolutionize the way in which GPM is performed. Building on these advances, pharmacogenetics and pharmacogenomics, PGx 1 in short, are currently driving discovery, analysis and interpretation in the context of research into the genetic basis of inter-individual variation in drug response [2]. As an integral part of GPM, PGx targets the delineation of the relationship between genomic variation/gene expression and drug efficacy and/or toxicity [3]. To date, there are several genes, referred to as pharmacogenes, which play a role in the absorption, distribution, metabolism, excretion and toxicity (ADMET) of several drugs. The most important ADMET genes can be grouped into four main categories: modifiers, phase-I and phase-II metabolism enzymes, and transporters. PharmaADME (http://www. pharmaadme.org/; an industry-initiated effort that aims to provide a core list of evidence-based drug metabolizing ADMET genetic biomarkers) catalogues 32 core (e.g. ABCB1-ATP-binding transporter), 267 extended (e.g. AHRaryl hydrocarbon receptor), 73 related (not directly involved in metabolism, e.g. CTSK-cathepsin K) pharmacogenes, as well as 187 core genetic biomarkers (e.g. CYP1A1: c.2453C.A).
Moreover, in relation to oncology, the US National Cancer Institute (NCI) has announced a set of priorities that include treatment response and adverse outcomes associated with chemotherapeutic agents and other medications used to treat cancer (via the Trans-NCI Pharmacogenomics and Pharmacoepidemiology Working Group (PPWG); http://epi.grants. cancer.gov/pharm/ppwg.html). The recommendation for the corresponding research and development (R&D) agenda is directed towards: (i) supporting the routine collection of germline and tumour biospecimens from clinical trials and population-based studies; (ii) the development of, and support in, the identification of clinical, socio-demographic, lifestyle and genomic markers related to treatment response and/or adverse events; (iii) the incorporation of PGx markers into clinical trials; and (iv) addressing the ethical, legal, social, biospecimen, as well as data-sharing implications of PGx research [4].
The rise of next-generation sequencing technology has created unprecedented opportunities to analyse whole genomes [5]. This approach promises to be extremely useful in PGx, because unlike conventional medium-or even high-throughput genetic screening approaches, such as microarray-based assays (e.g. AmpliChip CYP450 (http://molecular.roche.com/ assays/Pages/AmpliChipCYP450Test.aspx), Roche Molecular Diagnostics, Basel, Switzerland; and DMET Plus (www.affymetrix.com/estore/browse/level_ three_category_and_children. jsp?category=35791&categoryId Clicked=35791&expand=-true&parent=35923), Affymetrix, Santa Clara, CA, USA), it allows the acquisition of a full picture with respect to individual ADMET gene variants. This is important because it is very likely that each individual harbours rare and/or novel variants of functional significance in well-established pharmacogenes, which may render an individual/patient as a poor or hypermetabolizer or non-responder to certain drugs, and which may go undetected when using a genetic screening assay.
Over the last few years, genome-wide association studies (GWAS) have been the main enablers of PGx research, with a track record of novel and interesting findings [6]. A search in PubMed (April 2014) for PGx-related papers shows that the proportion of papers related to 'pharmacogenomics' OR 'pharmacogenetics' (PGx) and 'genome-wide' (GW) as search terms has risen fourfold over the last 10 years, from about  3.0% in 2004 to more than 11% in 2012 (figure 1; fraction (PGx and GW)/PGx), with a fraction of about 2% for GW-related papers that involve pharmacogenomic quests (figure 1; fraction (PGx and GW)/GW)-similar results have been reported by Gurwitz & McLeod [7]. However, most variants that impact drug response still remain to be identified [8]. As GWAS may not identify all risk biomarkers, the identification of unknown (possibly rare) variants from whole-genome sequencing studies may provide indicative associations between specific genotypes and adverse drug reactions. Whole exome and/or whole-genome sequencing can now be easily performed using several commercially available or proprietary platforms, to analyse genome variation comprehensively and with a high degree of accuracy at reasonable cost, as compared with the recent past [9].
The paper is organized as follows: in § §2 and 3 we discuss recent advancements in PGx whole-genome sequencing, in §4 we highlight the current shortcomings of PGx data integration, while in subsequent sections ( § §5-7) we describe the design principles and the development of an integrated PGx electronic assistant, as a potential approach to address the problem of PGx information management and overload, and the delivery of personalized PGx translation services.

Personalized PGx profiling using wholegenome sequencing
Recently, whole-genome sequence analysis of almost 500 individuals identified a very large number of rare potentially functional genomic variants in ADMET genes, which would not have been identified had a conventional high-throughput genetic screening approach been employed. Mizzi et al. [10] showed that the number of ADMET-related genomic variants identified by whole-genome sequencing was significantly higher compared with those that would have been identified had the DMET Plus assay, the most comprehensive genotyping platform for pharmacogenomic biomarkers available to date [11], been used. These authors reported the identification of 408 964 genomic variants in ADMET-related genes, of which almost 10% (38 636) attained population frequencies of more than 20%. On average, 17 733 variants were found for each individual of these 231 ADMET-related genes, compared with an anticipated 250 variants in the same genes had the DMET assay been employed. Interestingly, 16 487 novel (not annotated in dbSNP) variants were identified within exons and regulatory regions, of which 861 attained frequencies of over 1%, and are likely to be functionally significant. The latter finding underlines the fact that any result from a currently available pharmacogenetic screening assay would not be indicative of a patient's ability to respond to certain drugs and as such should be interpreted with a degree of caution.
In similar vein, Paré-Brunet et al. [12] resequenced an almost 700 kb DNA sequence including 23 vascular endothelial growth factor pathway genes that play a central role in physiology, pathophysiology and drug treatment in angiogenesis, and reported 3558 genetic variants of which 449 were novel. Similar applications of next-generation sequencing could also be envisaged for germline cancer variation discovery with possible PGx implementation to individualize cancer treatment [13].
It seems evident that, in the light of the plummeting cost of whole-genome sequencing and the gradual increase in data accuracy, one would envisage that comprehensive pharmacogenomic testing could be readily applicable in a clinical setting [14]. By applying whole-genome sequence analysis to two unrelated family members suffering from atrial fibrillation and presenting with differential response rates to anticoagulation treatment, Mizzi et al. [10] were able to delineate the differential response rate to anticoagulation treatment of these family members. In particular, wholegenome analysis in these family members not only revealed CYP2C9 variants as the basis of the inter-individual response to acenocoumarol treatment, but was also able to predict the outcome of an alternative anticoagulation treatment using clopidogrel. Similar findings in genes involved in the metabolism of anti-cancer drugs [15] further demonstrate the potential applicability of this approach for pharmacogenomic testing in a clinical setting in the not too distant future.

Customized whole-pharmacogenome resequencing
Although whole-genome sequence PGx analysis is still in its infancy, one might envisage that the ultimate pharmacogenomic test would involve at the very least the resequencing of the ADMET-related pharmacogenes, particularly those that have been acknowledged to be credible pharmacogenomic biomarkers by regulatory agencies. Among these drugs, 25% are metabolized by cytochrome CYP2D6 and their rates of metabolism can vary; for example, one meta-analysis demonstrated a reduction in 50% in the average dose for most tricyclic antidepressants in patients who are CYP2D6 poor metabolizers (CYP2D6 *3/*3) [16]. Figure 2 shows the distribution of these drugs between various target diseases, with oncology and psychiatry dominating. However, these drug labels do not always provide, based on relevant genetic information, specific guidelines (e.g. in relation to putative adverse drug reactions) and recommendations about what actions should be taken [17]. The most challenging and perhaps most crucial part of such an approach would be the accurate target enrichment of those ADMET-related pharmacogenes, followed by wholepharmacogenome resequencing. Such an approach would be dynamic in the sense that it would allow enrichment and/or modification of the pharmacogene panel. By contrast, the main disadvantage would be omitting important variants in modifier genes involved in drug metabolism. However, it would definitely capture novel and putative deleterious variants in the known pharmacogenes.
For such an approach to be viable and readily applicable in clinical practice, it should be accompanied by the necessary genome informatics platforms to potentiate accurate analysis of the resulting pharmacogenome resequencing, to address the secure storage of the sheer amount of genomic information resulting from pharmacogenome resequencing, and ideally to provide a meaningful and clinician-friendly report so that it can be readily exploited in the clinic. To cope with this challenge, innovative approaches that derive meaningful insights and knowledge from large and complex PGx resources need to be developed and thoroughly tested. The scope of the task is twofold: firstly, to facilitate and enhance identification and evidence-based documentation of (existing or newly discovered) PGx gene/variant-drug-phenotype associations and secondly, the translation and transfer of well-documented PGx knowledge to clinical implementation with the aim of both rationalizing and individualizing the therapy. The following section focuses on this particular challenge.

Towards pharmacogenomic data integration
As previously mentioned, a crucial component of personalized medicine is the individualization of drug therapy. Understanding the complex interactions and detailed characterization of the functional variants of individual ADMET-related genes and drugs is needed to demonstrate clinical utility. In other words, associating gene variants with specific drug responses in individual patients improves clinical decision-making by informing the adjustment of the dosage or the selection of a different drug [19]. True individualization of therapy, however, which would maximize drug efficacy and minimize toxicity, would need to consider genetic and phenotypic data, as well as any environmental factors that could influence the response to treatment, in the context of the specific individual concerned. Therefore, the design and development of information systems that are continually updatable by inclusion of newly generated pharmacogenomic information, that are able to disseminate knowledge in the form of guidelines, and that are capable of linking the results of pharmacogenomic tests to recommendations for therapeutic interventions with the aim of supporting drug-prescribing decision-makers is a prerequisite for incorporating PGx into routine clinical practice.
Once the Internet became an indispensable tool for biomedical researchers, genomic information overload was inevitable. There are numerous websites and biological databases, which often create confusion for users in terms of which might be the most appropriate to investigate a given biological question. The current battery of genome databases, particularly those directly related to PGx, is limited in number. Table 1 summarizes the resources directly (knowledge bases and genetic testing) or indirectly (databases about gene variants, genotype-to-phenotype associations and drug/chemo-related databanks) related to PGx.
The main problem regarding the exploitation of PGx knowledge and its utilization in clinical practice relates to the heterogeneity and low degree of connectivity between different PGx resources. In most cases, the amount of raw data is so overwhelming that PGx biomedical researchers and stakeholders are often at a loss to know how to make sense of it, rendering them unable to capture all that is known and being discovered regarding genetic variation and its correlation with variable drug response [22]. The challenge to design integrated Web information systems to interconnect and federate diverse PGx information resources into a single portal is a formidable one [23].

Integrated pharmacogenomic assistant services
The PGx information overload challenge calls for specialized informatics services for the interpretation and integration of the increasingly large amounts of molecular and clinical data. Such a multifaceted endeavour entails both translational and clinical bioinformatics approaches that would, on the one hand, offer analytical and interpretational methods to optimize the transformation of increasingly voluminous biomedical data into proactive, predictive, preventive and participatory ('4Ps') medicine [24], and on the other, enable  Figure 2. Disease distribution of PGx labelled FDA-approved drugs ('other' includes: haematology, dermatology, transplantation, urology, analgesia, anaesthesiology, antidotal therapy, autoimmune diseases, metabolic disorders and pulmonary diseases). rsob.royalsocietypublishing.org Open Biol. 4: 140071 Table 1. Web-based resources directly or indirectly related to pharmacogenomics (see also [20,21] the clinical application of discovery-driven bioinformatics methods to understand molecular mechanisms and prompt the search for potential therapies for human diseases [25]. In this respect, the R&D agenda aims to create and deliver an electronic PGx assistant platform to act as the PGx bench-tobedside enabling medium. The fundamental components that together underpin the novelty of such a platform revolve around its ability to provide translation services which will in turn link genotypic to phenotypic (metabolizer status) information as a valuable tool both to clinicians, by supporting them in making informed decisions based on state of the art PGx data, and to biomedical researchers, by providing a single place where information can be found to facilitate an understanding of inter-individual differences in drug efficacy, toxicity and pharmacokinetics (PK), as well as driving the discovery of new PGx variants.
In this respect, the goal is to provide a 'one-stop shop' Webbased platform to ease the processing, assimilation and sharing of PGx knowledge and facilitate the aggregation of different PGx stakeholders' perspectives. The platform should take advantage of, and be designed around, interoperable and flexible bioinformatics and advanced information processing components that are able to serve two major PGx tasks: (i) to offer personalized diagnostics based on reliable genomic/ genetic evidence and (ii) to reduce healthcare costs by increasing drug efficacy and minimizing adverse drug reactions.
To develop such a system, one would first need to determine its functional requirements, from the user's perspective. Such requirements would include: (i) retrieval of PGx information regarding ADMET genes, their respective variants and drugs; (ii) a format that is readily updatable with information on newly discovered pharmacogenomic variants; and (iii) the capability to receive personalized recommendations based on personalized PGx profiles. These would impose the main requirements for electronic PGx assistant. However, more detailed requests seem to relate to specific users' roles. In particular, four different types of potential users may be identified in terms of their likely needs and roles: (i) the individual/patient: any user who provides single nucleotide polymorphism (SNP) genotype profiles with the aim of receiving corresponding clinical annotations and personalized PGx recommendations (as assessed and validated by healthcare professionals); (ii) the medical professional: any healthcare professional (physician, geneticist, etc.) who needs to infer the phenotypic status of individuals (based on their genotype profiles, and by reference to 'look-up' genotypephenotype translational tables), to review and supervise an individual patient's personalized recommendations, assess them and decide upon ensuing therapeutic protocols and treatment options; (iii) the submitter: any biomedical researcher who discovers and identifies a new gene variant and its putative PGx associations-the submitter can either validate the findings and enrich the system's database, or request a (local) version of the PGx database to work with; and (iv) the administrator: any user with administrative privileges responsible for maintaining and upgrading the electronic PGx assistant's database server (backups, versioning, restoration, etc.), managing application tools and services, assigning and authorizing user roles and privileges, and providing appropriate security and privacy-preserving services.

Towards an electronic PGx assistant
As a next step, we describe the development of the most crucial components (data model and personal information management) of an electronic PGx assistant, designed on the basis of the aforementioned functional requirements. Figure 3 outlines the reference architecture, in a multi-layer level, including the basic components of the proposed electronic PGx assistant platform. At first, several external data sources are leveraged to extract and integrate pharmacogenomic information. To this end, we adopted the trivial in the design of business intelligence and decision support systems, notion of a Data Warehouse (DW) star schema (figure 4), as the basic data model and the most appropriate to encompass the different requirements for database  The DW is centred around the gene/variant-drug-phenotyperecommendation concept, embodied in the fact table which, in turn, references the dimensional tables around it, corresponding to the entities of: (i) gene, (ii) drug, (iii) diplotype, (iv) phenotype and (iv) clinical annotations, guidelines and recommendations. These entities correspond to the dimension attributes that act as foreign keys to the fact table. Different types of data extraction tools (e.g. APIs, Web-Services, JSON/XML or text parsers, etc.) are used in order to fetch and transform data from the various heterogeneous data sources (PharmGKB, dbSNP, Affymetrix annotations, PubMed, etc.) into the central DW, following an extraction-transform-load process. Standard ontologies and nomenclatures are used in an effort to uniformly represent the various data and information PGx items (e.g. genevariant nomenclatures, gene ontology/GO, ICD for disease classification and encoding, etc.).
With regard to the management of individuals' genotype/ SNP profiles, an electronic healthcare record (EHR) solution has been adopted. To this end, state-of-the-art guidelines and data-models related to the genetic tests and their interpretations have been employed (e.g. the HL7/CDA2 guide for genetic testing report [26]). Figure 4 depicts an outline architecture for the integration of genotype data from the genetic laboratory to the EHR. Standard ontologies and datamodels could be used for the representation of genotype profiles (e.g. genetic variant format, www.sequenceonto logy.org/resources/gvf.html, and LOINC, https://loinc.org) [27]. The utility of linking genotype data to EHRs is crucial for the translation and transfer of PGx knowledge into the clinic. The approach is both cost-effective and time-efficient as there is no need to actively recruit and gather samples from a study population-cases and controls are readily available and consistently identified from EHRs and the linked genetic samples. The eMERGE (http://emerge.mc. vanderbilt.edu/) consortium has already exploited this alternative with very interesting results [28,29].
Regarding the accumulation, storage and management of individuals' genotype profiles, and taking into consideration the current debate about genetic tests and translational research [30], a number of key ethical issues are raised, viz. public genetic awareness and genomic literacy, physicians' knowledge of genomics, handling of genomic information in and beyond the clinic, online direct-to-consumer (DTC) (pharmaco)genomics, with the associated arguments to be highly polarized [31]. In this respect, all the relevant ethical, privacy preserving and security issues should be employed and implemented, with special effort devoted to surveys and the Figure 4. The basic star-schema data model of the electronic PGx assistant: gene-any gene that is known to be associated with drug metabolism; defining attributes: GeneName, Chromosome and PharmGKB code (PharmGKBAccessionID); drug-any PGx drug; defining attributes: DrugName and PharmGKB code (PharmGKBAccessionID); factTable-the table that joins genes, drugs, diplotypes, phenotypes and recommendations; its primary (composite) key is a combination of the foreign keys of other tables; diplotype-the diplotype as defined by the two allele combination (i.e. *1/*2); defining attributes: the two allele IDs (Allele1ID and Allele2ID); allele-any star-allele haplotype (i.e. *1, *2, *1B, etc.); defining attribute: (AlleleName); genericRecommendation-a recommendation for a specific gene -drug combination; defining attributes: Summary and Annotation; personalRecommendation-specific recommendations when diplotype is known; defining attributes: Recommendation, Implication, RelatedGenotype (i.e. an individual carrying two loss-of-function alleles) and ActivityScore; phenotype-the different types of Metabolizer status (i.e. Intermediate Metabolizer, Extensive Metabolizer, etc.); defining attribute: MetabolStatus. rsob.royalsocietypublishing.org Open Biol. 4: 140071 assessment of guidelines, in order to critically appraise the impact of genetics and PGx on society and increase the level of awareness of the general public, healthcare professionals and biomedical researchers to PGx and personalized medicine.
From the researcher's perspective, the proposed PGx assistant will enhance PGx research by facilitating the discovery of new PGx variants. Methods for discovering genetic factors in drug response, including GWAS, expression analysis and even whole-genome resequencing, already exist. However, more sophisticated knowledge-based tools to assign meaning to novel variants are required. The proposed PGx assistant brings reporting and analytical services to the end user, through a simple and user-friendly interface, and supports research by revealing hidden relations between genes, variants and drugs, thereby driving the discovery of candidate genomic regions of interest. Moreover, as knowledge about drug-gene and drug-drug interactions accumulates, the proposed freely available system, which is additionally coupled with advanced literature mining features and updatable components, becomes even more beneficial to the research community and society.

Personalized pharmacogenomic translation services
Surely the most important notion in the proposed PGx assistant platform is the idea of personalization. The inclusion of a personalized PGx translation component in the platform is founded on the assumption that clinical high-throughput and pre-emptive genotyping will eventually become common practice and clinicians will increasingly have patients' genotypes available before a prescription is written [32]. The personalized translation component aims to serve: (i) the automated matching of patients' genotype profiles with established and/or newly discovered gene-variants/alleles-based on the customization of an elaborate allele-matching algorithm [33]; (ii) the inference of respective phenotypes (e.g. metabolizer profiles); and (iii) the delivery of relevant and updated clinical annotations and (drug) dosing recommendations. The component will be founded upon the harmonization of PGx haplotype/ translation tables from the DMET Plus assay and PharmGKB knowledge base (www.pharmgkb.org), but will also provide services for updating the haplotype/translation tables with newly discovered and validated gene-variants/alleles. In order to produce accurate personalized recommendations, the focus should be placed on the relationship between a variant and its related gene, drug(s) and phenotype(s). For example, let us suppose that a patient with bipolar disorder receives a genetic test that has the potential to determine an A/C genotype for the variant rs2032582, which corresponds to the 7 : 87160618 nucleotide position of the ABCB1 gene. How can this information be translated into clinical knowledge? The medical professional should be well aware that although the majority (70-80%) of patients with bipolar disorder respond well to lithium, a significant proportion will present with patterns of partial or non-responsiveness. Before prescribing this drug, the physician enters the patient's encrypted genotype data into the proposed PGx assistant, which then identifies that the specific variant is related to lithium response. More specifically, the medical professional receives the following information, that the PGx assistant would bring to his/her attention: 'Patients with the AC genotype and depression may have an increased risk of suicidal ideation when treated with clomipramine, lithium,. . ., or venlafaxine as compared to patients with the CC genotype' (relevant text from PharmGKB). At the same time, the PGx assistant displays three new studies related to genetic variants associated with lithium response and provides links to the respective sources of information. By considering the patient's family and medical history, the medical professional may then decide to provide an alternative treatment and closely monitor this patient. The medical professional may also prescribe a new genetic test based on the findings of the recommended articles in the literature that associate additional genetic variants with response to lithium.
A related protocol, called PG4KDS, is employed at St Jude Children's Research Hospital (http://www.stjude.org/ pg4kds). The purpose of the protocol is to selectively migrate microarray-based genotypes for clinically relevant genes into each patient's electronic medical record pre-emptively. By leveraging 'look up' translation tables created by the Translational Pharmacogenetics Project (TPP) [34], a PGRN-led initiative with the goal of operationalizing the work of the Clinical Pharmacogenetics Implementation Consortium (CPIC) [35] by translating widely accepted actionable PGx discoveries into real-world clinical practice, they assigned phenotypes to each unique CYP2D6 or TPMT diplotype based on assessments of functional allele activity [36].

Encouraging pharmacogenomic data sharing
The advances in bioinformatics required to annotate human genomic variants and to place them in public data repositories have not kept pace with data discovery. The continued deposition of such data in the public domain is essential to maximize both their scientific and clinical utility. However, rewards for data sharing are few, representing a serious practical impediment to data submission, and as such, incentivizing individual researchers or research groups to submit their newly acquired and unpublished mutation/variation data to public repositories or knowledge bases in return for appropriate credit, and attribution is of the utmost importance.
In 2008, the scientific journal Nature Genetics introduced the concept of 'microattribution', to introduce an alternative reward system for scientific data contributions. The principle of microattribution is '. . . to produce a publication workflow that is open to all journals and that draws on the expertise of all those with a stake in understanding variation at a particular region in the human genome' [37]. Microattribution comprises two main components, namely the Public Genome Browser, to display the actual number of database entries and related articles that would be contributed and recorded, based on an individual researcher's unique identity (e.g. Open Researcher and Contributor ID consortium, http://orcid.org; ResearcherID, http:// www.researcherid.com, etc.), and the microattribution analysis article, which would summarize the features of all variome data contributions, such as phenotypes, clinical findings, allele frequencies and so on [38].
The first demonstration of microattribution working in practice was achieved by Giardine et al. [39] using HbVar, the globin-gene locus-specific mutation database, as a model, followed by the InSiGHT locus-specific database much later [40]. This approach has also been implemented in the field of clinical genetics [41], documenting the clinical features of almost 40 000 rsob.royalsocietypublishing.org Open Biol. 4: 140071 cystic fibrosis patients and their underlying CFTR gene variants. Lastly, microattribution has also been implemented in the FINDbase database (http://www.findbase.org), a worldwide national genetic database documenting causative mutations and pharmacogenomic biomarkers [42], followed by the Pharmacogenomics for Every Nation Initiative consortium (http://www.pgeni.org [43]).
In all of the above cases, not only were the overall contributions from individual scientists increased compared with the situation prior to the microattribution call, but most importantly, a number of useful conclusions were drawn in every case that the microattribution was implemented, derived from, for example, variant clustering, clinical phenotype and/ or pharmacogenomic variant allele frequencies comparisons and so on. Such conclusions would not have been possible without such an approach, further demonstrating the value of the immediate contribution and sharing of novel genome variants even though they would not warrant classical narrative publication on their own.
In the context of human PGx variation data sharing, a sensible approach to incentivize free data sharing would be to base the whole process around one or more pre-existing and freely available high-quality centralized databases or database journals, possibly coupled with the regular publication of microattribution-type articles ( possibly online only) in PGx journals, so that the individual contribution of the data submitters in a consortium would be recognized by their co-authorship. Such an approach would in turn further stimulate researchers to submit data to a central repository.
A significant hurdle that needs to be overcome is the selfsustainability of such a large centralized database, possibly by partnering with a major publishing group, along the lines of a database-journal-like model [38]. One should also bear in mind that the variation data to be generated from such pharmacogene resequencing would be 'raw' uncurated data. Being different from well-curated data, they would have to be handled differently, e.g. tagged as uncurated data, since for instance, they might correspond to benign rather than pharmacogene-disabling variants (see also [10]), and specify the level of data confidence in a clear way, e.g. raw (and perhaps false positive) data versus curated data of unknown significance or with an in silico prediction or even with a clear genotype -phenotype correlation.

Translating PGx knowledge into clinical decision-making: the next great leap
From the above, it is clear that once next-generation re-sequencing-based PGx testing becomes widely available, it will require a substantial effort to translate this genomic information into clinically meaningful guidelines. In the real life situation, the PGx clinical scenarios are truly complex, which often, if not always, poses significant dilemmas to the medical professionals regarding the selection of a treatment modality. This complexity does not occur because of our inability to correlate genomic with clinical variables; indeed, genomics research has already revealed and produced (and continues to produce) a plethora of valuable pharmacogenomic associations and knowledge. This complexity arises mainly due to the large translation gap in moving pharmacogenomic (as with the other-omics) scientific discoveries towards successful innovations. This gap occurs because of the lack of a 'systems orientation' to innovation that conceptualizes knowledge-based PGx innovation as an ecosystem of communicating 'innovation actors' (pharmacology, pharmacogenomics, molecular biology and genetics researchers) and 'innovation narrators' [44], an ecosystem to be realized by the (currently) missing 'intermediate medium' that facilitates communication and supports bench-to-bedside translation endeavours, by harnessing knowledge from basic PGx to produce treatment options for patients [45].
In such a setting, it is imperative to adopt a multidisciplinary approach based on a portfolio of interoperating translational or clinical biomedical informatics components and their alignment with contemporary information engineering and processing approaches. Such approaches should aim to devise: (i) a PGx knowledge assimilator that seamlessly (i.e. based on standard semantics and data-models) links diverse PGx knowledge sources and (ii) knowledge-extraction services able to identify useful genotype-to-phenotype associations and knowledge from these sources. Moreover, the identified PGx genotype-to-phenotype associations should be explored in relation to their PK and pharmacodynamic (PD) background. Such exploration could be served by the elaboration of the appropriate PK/PD simulation models that help to assess PGx association's covariance in virtually devised populations, e.g. following the approach of SimCYP (www.simcyp.com/) and NONMEM (www.iconplc.com/technology/products/ nonmem) virtual simulation commercial packages, as well as using free open-source PK modelling s/w tools such as 'PKreport' (cran.r-project.org/web/packages/PKreport) and 'WFN' (wfn.sourceforge.net/wfnxpose.htm) R-packages. In addition, and based on the coupling of Web 2.0 and social-networking technology, it would be essential to facilitate and support the engaged collaboration needs and 'fill-in' the missing communication medium between the diverse PGx knowledge sources, the simulated PGx genotype-to-phenotype associations and the PGx actors.
To accommodate these needs, one should incorporate: (i) the linkage and seamless integration of established PGx resources (e.g. PharmGKB, CPIC, etc.), literature and other genomic databases (PubMed, dbSNP, dbGAP, ClinVar, FINDbase, etc.), to be based on the elaboration and operationalization of standard (pharmaco)genomic/clinical ontologies and datamodels; (ii) literature mining/natural language processing, to extract putative disease-drug-gene/variant-phenotype associations from PGx resources and the published literature; (iii) a virtual population pharmacokinetic simulator, to test putative variant-phenotype associations and assess relevant genotype-to-phenotype covariance statistics in virtual populations; and (iv) a collaborative recommender environment, to enable communication and collaboration between PGx actors towards the formation, validation and evidential assessment of such associations. In addition, two additional components and respective services are required to align and harmonize such a platform with a bench-to-bedside orientation and its utilization in a clinical decision-making setting. First, an Electronic Healthcare Genotype component, that would be readily compatible with the general EHR, so as to service the management of patients' genotype profiles, and a Phenotype-to-Genotype-translation component, to service the automated matching of patients' genotype profiles with established and/ or newly discovered gene/variant alleles, inference of the respective phenotypes (e.g. metabolizer profiles), and delivery of up-to-date relevant clinical annotations and respective rsob.royalsocietypublishing.org Open Biol. 4: 140071 (drug) dosing guidelines. Finally, a portal will be required as a single-access-point PGx environment that embraces the aforementioned components and services.
Such a system, once operable, would facilitate the integration and translation of PGx knowledge into the clinical decision-making process and bring clinic-based genomic medicine closer to a reality. To this end, one would also need to circumvent additional fundamental hurdles, namely (i) ensuring that all the necessary consents are provided by the patients, (ii) safeguarding sensitive personal data to avoid the inappropriate leaking of genetic information which may lead to stigmatization and (iii) enhancing the genetics awareness and genetics education of healthcare professionals. Related to these issues is the increase in 'DTC' genetic testing which, quite apart from its controversial status, has gained a lot of attention in recent years and has already produced interesting results [46]. As these topics lie outside the scope of this article and have been discussed elsewhere [47], they will not be further discussed here.

Conclusion and future perspectives
The post-genomic revolution, characterized by the rise of massively parallel whole-genome and exome sequencing, has led to the correlation of specific genomic variants with disease predisposition and other clinical features, including response to some of the most commonly prescribed drugs. As personalized drug treatment and genomic medicine gets closer to becoming a reality, the use of whole-genome sequencing that spans all ethnicities and covers all possible genetic alterations is the most useful approach [48]. Recent evidence, though limited at the present time, confirms that whole-genome sequencing can reveal a relatively large number of unique (or rare) pharmacogenomic markers that would otherwise go undetected by conventional genetic screening methods.
An important aspect of the next-generation sequencing technology that would be critical for its early adoption in the clinic is its cost-effectiveness. In other words, it becomes clear that performing a comprehensive personalized pharmacogenomic profile using whole-genome sequencing (currently 3000 US$ and decreasing), that would include almost all of the germline and de novo genomic variants needed to manage all current and future treatment modalities, would be cost-effective when compared with the cost of testing for a single marker or several markers in a few pharmacogenes (from 300 US$ up to 1500 US$, respectively). At present, setting up a (centralized) whole-genome sequencing facility and pharmacogenomic data translation to clinicians are two of the most important hurdles to be overcome, but sample outsourcing for data analysis and interpretation might be the answer to surmounting this obstacle using an economy-of-scale model. Ultimately, as pharmacogenomic testing costs using whole-genome sequencing and cost-effectiveness are well documented, it should only be a matter of time until the cost of pharmacogenomic testing reimbursement is adopted by national insurance bodies.
In light of the above, the design and development of advanced informatics solutions that ease to fill-in the gap between PGx research findings and clinical practice emerges as a major need. Here, we presented the operational requirements and design specifications of an electronic PGx assistant that aims to act as the medium between the various PGx communities (biomedical researchers, geneticists, healthcare providers and PGx regulatory bodies), equipping them with innovative services that enable PGx research findings to reach clinical implementation. The orchestration of the provisioned PGx assistant's services in an integrated platform empowers the capabilities of PGx communities to grasp, assess and maximize the use of relevant biomedical and molecular PGx knowledge. Finally, the implementation of PGx assistant services should address and provide feasible solutions to challenges related to the PGx annotation of whole genomes [49] that concern: the accuracy of PGx markers across the genome; the ambiguity of gene-variants and PGx markers in relevant literature references; the effect of multi gene-variants and PGx markers on individual phenotypes; the combined effects of variants on multiple drugs; as well as the limited body of clear guidelines and recommendations.
Acknowledgements. This publication is encouraged by the Genomic Medicine Alliance (http://www.genomicmedicinealliance.org) Genome Informatics working group. The authors declare no conflict of interests. Endnotes 1 These terms are often used interchangeably. However, pharmacogenomics refers to the collective impact that many individuals' genetic variations have on the response to medication; by contrast, pharmacogenetics refers to the impact of one specific variation, or the impact of variation in one specific gene, on the response to medication. 2