LimTox

LimTox Documentation

How to use LimTox. A search guide.

Sources

Users can select the data source/s to be used when performing a search in LimTox. At the moment, it is possible to choose amongst:

  • All: Select “All” to include all type of sources described below.
  • Pubmed: Entities will be searched taking into account only the Pubmed Articles in the system. Nowadays we have 122.997.116 Pubmed articles indexed.
  • FullText:  Select FullText to search into a large set of full text articles that were used for the dataset construction.
  • NDA:  Search entities inside NDA articles. Currently we have 759.621 articles indexed.
  • EPAR:  Search entities inside EPAR articles. We have 630.507 EPAR articles indexed.
  • Abstracts:  You can search compounds inside abstracts. You can also perform free-text searches against abstracts.

Figure 1. Potential data sources available in LimTox to perform entities searches are highlighted in the image

Free-text search (search by keywords)

Documents inside LimToX are indexed so free-text searches can be performed using any keyword or set of keywords. The system will return all the documents, taking into account the keyword/set of keywords, sorted by SVM Score, although resorting process can be done by using the Sorting by Score functionality.

  • Any: A free text search will be performed against all the documents inside the system
  • With Compounds: The free text search will be done inside documents that have chemical compounds comentioned in them
  • With CYPs: The free text search will be done against documents that contains mentions to cytochromes
  • With Markers: The free text search will be done against documents that contains mentions to markers

Figure 2. Example about the relevant information for performing a free-text search using LimTox

Chemical compounds search

You can search for documents containing an specific chemical compound. Typically you can search by a compound Name or using an identifier (ChemId Plus, ChEBI, cas Registry Number, DrugBank, Kegg Compound, Kegg Drug, Mesh) as well as Smiles and InChi.

  • Compound name: Performs a search by compound name. A compound with that name will be searched and if it is found, a query expansion will be performed to find aliases for that compound and include in the results also the documents containing compounds that are aliases of the search query compound.
  • Chemical identifiers: Use any of the following chemical identifiers to retrieve documents for the chemical compound having that identifier. Allowed identifiers are: ChemId Plus, ChEBI, cas Registry Number, DrugBank, Kegg Compound, Kegg Drug and Mesh
  • Smiles: Smiles (Simplified Molecular Input Line Entry Specification) strings can be used
  • InChI: IUPAC International Chemical Identifier strings can also be used
  • Any(Free-text search): Free text search against all the documents indexed inside the system (filtered by Source)
  • With CYPs: Free-text search inside documents with Cytochrome comentions
  • With Markers: Free-text search inside documents with Marker comentions
  • Term Relations: Free-text search inside documents that contains compound-term relations

Figure 3. Relevant options when performing a chemical compound search using LimTox.

Cytochromes search

You can search for Cytochromes inside the system using CYP names, CYPs Uniprot Accessions and CYPs nomenclature. Some free text searches involving CYPs are available from here too.

  • CYPs name/symbol: Search entities inside documents by Cytochrome P450 gene/protein name or symbol.
  • CYPs Uniprot Accession: Use Cytochrome P450 gene/protein Uniprot database primary accession number
  • CYPs Nomenclature: Search inside LimToX system using the name of the Cytochrome P450 gene according to the standard hierarchical gene naming nomenclature convention (see http://www.cypalleles.ki.se)
  • Any(Free-text search): A free text search will be performed against all the documents inside the system. It is equivalent to free-text search
  • With Compounds: Free-text search inside documents that contain chemical compound comentions
  • CYPs-Chemical Relations: Free text search inside documents where CYPs and Chemical compounds relations have been found

Figure 4. Relevant information to consider when looking at Cytochrome P450s in LimTox.

Markers search

You can search for liver toxicity markers inside the LimToX sources using a marker name/symbol or a marker identifier as a query.

  • Marker name/symbolName or symbol of the biochemical liver test marker. Correspond to one of 17 enzymes and chemical compounds assayed as biochemical markers of adverse liver reactions (e.g. ALT, AST, SDH, GGT, 5'-NT, LAP, SGPT,...).
  • Marker IdentifierMarker UniProt accession number or PubChem compound Id (e.g. P00441, P04040, P07195, P07203,..)
  • Any(Free-text search)A free text search will be performed against all the documents inside the system. It is equivalent to free-text search
  • With Compounds. Free-text search inside documents containing Compound comentions
  • Marker-chemical relations. Free-text search inside documents where Markers and Chemicals are related
  • Figure 5. Relevant information when performing a markers search at LimTox.

    Genes search

    You can search for liver toxicity relevant genes inside the LimToX indexed documents. For that matter you can use Gene names and Entrez genes as valid queries. Searching for genes can only be performed against all the documents or against the documents that contain chemical compound comentions

    • Gene name: Search by Gene name
    • Entrez Gene ID: Search by Entrez Gene ID

    Search different toxic endpoints

    You can search for keywords related to not only hepatotoxicity but also different toxic endpoints.

    • Hepatotoxicity: Search keywords for hepatotoxicity endpoint
    • Nephrotoxicity: Search keywords for nephrotoxicity endpoint
    • Cardiotoxicity: Search keywords for cardiotoxicity endpoint
    • Thyroid toxicity: Search keywords for thyrotoxicity endpoint
    • Phospholipidosis: Search keywords for phospholipidosis endpoint

    How to use LimTox. Understanding results.

    Results Statistics

    Right before the results, a simple statistics table is returned containing:

    • Total Mentions Total number of mentions returned by the system
    • Total number of mentions displayed: If the total number of mentions displayed is bigger than the total of existing mentions inside of LimToX, the total number of mentions displayed will be shown. The following values are calculated taking into account the total number of mentions displayed
    • Maximum Score: The maximum value for the score that is taking into account for the sorting process of the results
    • Minimum Score: The minimum value for the score that is taking into account for the sorting process of the results
    • Mean score: The mean value for the score that is taking into account for the sorting process
    • Median score: The median of the score that is taking into account for the sorting process

    View Interaction Network

    Access to interaction network of the results. A graph with all compound relations inside LimTox is shown. It is only available for non-freetext chemical compound searches.
    The Chemical compound used as query is displayed at the center of the graph as the primary node of the network. The edges of the network represent a relation between the compound and the entity that it is related to. Clicking the edges, some basic info is displayed, showing the number of relations stablished (Weight) and the Type of Relations that the compound has with that entity followed by the number of relations established inside the documents for this type of relation.
    The types of relations can be the following: Association, Comention, Induction, Inhibition, Up, etc...
    The higher the number of established relations, the stronger the color of the edge's representation will be.
    Clicking a compound node will center the interaction network on that compound showing its relations inside the system

    Results Scores

    LimTox has made an extensive use of SVMLight for recognizing and annotating relationships among chemical compounds and adverse toxicological episodes. Therefore, additional information about SVM scores can be found at http://svmlight.joachims.org.

    • SVM: Support Vector Machine Score. Binary linear kernel SVM classifier score result (SVMLight).Features: word unigram, stop word filter, balanced training set. The higher the more related to the topic. If the output of the scoring function is positive then the text was classified as relevant for adverse hepatobiliary events, while if it has a negative score it was classified as non-relevant. See: http://svmlight.joachims.org
    • SVM Confidence: Support Vector Machine confidence score. Binary linear kernel SVM classifier confidence scores (scikit-learn).Features: word 1-4grams, stop word filter, tf-idf weights, balanced training set. The higher the more related to the topic. If the output of the scoring function is positive then the text was classified as relevant for adverse hepatobiliary events, while if it has a negative score it was classified as non-relevant. (See: http://scikit-learn.org and decision_function)
    • Pattern: Number of adverse hepatobiliary event text patterns detected
    • Term: Number of adverse hepatobiliary event terms/phrases detected
    • Rule: Scores of 0.01 correspond to sentences that only mention a term or phrase related to the hepatobiliary system. Scores > 0.01 correspond to sentences that also contain adverse, toxic or disease events. The rule score is a heuristic score that takes into account: (1) the number of co-occurrences between hepatobiliary terms and adverse effect terms in a sentences, (2) their respective relative order within the sentence and (3) their relative distance measured by the number of word tokens between them
    • Other toxicological endpoints
      • Nephro: Nephrotoxicity score. Binary linear kernel SVM classifier score result (SVMLight).Features: word unigram, stop word filter, balanced training set. The higher the more related to the topic. If the output of the scoring function is positive then the text was classified as relevant for adverse nephrologic events, while if it has a negative score it was classified as non-relevant. See: http://svmlight.joachims.org
      • Cardio: Cardiotoxicity score. Binary linear kernel SVM classifier score result (SVMLight).Features: word unigram, stop word filter, balanced training set. The higher the more related to the topic. If the output of the scoring function is positive then the text was classified as relevant for adverse cardiological events, while if it has a negative score it was classified as non-relevant. See: http://svmlight.joachims.org
      • Thyro: Thyrotoxicity score. Binary linear kernel SVM classifier score result (SVMLight). Features: word unigram, stop word filter, balanced training set. The higher, the more related to the topic. If the output of the scoring function is positive then the text was classified as relevant for adverse thyroid events, while if it has a negative score it was classified as non-relevant.
      • Phospho: Phospholipidosis score. Binary linear kernel SVM classifier score result (SVMLight). Features: word unigram, stop word filter, balanced training set. The higher, the more related to the topic. If the output of the scoring function is positive then the text was classified as relevant for adverse phospholipidic events, while if it has a negative score it was classified as non-relevant.
    • Toxicology: Match found with PubMed Boolean MeSH query search for Toxicology
    • Biomarker: Match found with PubMed Boolean MeSH query search for Biomarker

    Additional Information

    Some extra info is displayed depending of the entity type used as a query.

    Basic information related to the CYP used as query

    • Name: Cytochrome P450 name of the query entity
    • UniprotID: Uniprot ID and outlink to Uniprot database for the query
    • Type: Type of cytochrome used
    • Tax: Tax ID an link to Taxonomy Browser
    • Canonical: Canonical name for the query cytochrome

    Entity Mentions Highlight

    Entity mentions, in the results table, are shown in the sentence that they appear and highlighted as follows: What you searched, Compounds, Cytochromes, Markers, Terms, Species . Curated evidences are indicated by:

    Color Key for Scores

    In order to improve the readability of the results, a color key for highlighting the scores of the results is applied. The coloring of the scores helps to give you an idea, at a glance, of the significance of the values. If the score has a value greater than 1, the color is green. If it is negative it will be red. Between 0 and 1 has no color. In addition, according to the gradient of the green and red colors, the color intensity is linked to the high/low value of the score. Following this key: Color key for scores:  

     -6 <                             < +7 

    LimToX retrieves the entities inside the documents and try to give as more information as possible. When returning compound entities, it tries to give as much chemical compound database identifiers linked to the original database for you to expand the information i.e: chemIdPlus, chebi, cas Registry Number, inChi, drugBank, keggCompound, keggDrug, Mesh

    Download Results

    Download the results in two different possible file formats, CSV and PDF.

    Sorting by Score

    While viewing the results of a query, you can resort all the documents taking into account the different scores (SVM, Confidence, Pattern, Term, Rule, Nephro, Cardio, Toxicology, Biomarker…). Clicking on the name of the score at the top of the results table will lead to a new page with the documents sorted by that score.

    Simple Curation of Results

    Compound, Cytochrome and Marker mentions can be curated at the sentence level from the results table. The curation process consists in confirming (checbox) or denying (crossbox) the association between that entity and the hepatotoxic effect inside the sentence. A total number is associated to that relation in the way that checkbox will add (+1) to the total count while crossbox will sustract (-1) to the total count.

    Understanding LimTox. Query Expansion

    A query expansion (QE) is performed for each entity search. The process depends on the type of entity that is being searched but it tends to widen the results adding aliases and/or linguistic variants.

    • Compounds QE: When searching for compounds using compound names, once retrieved the compound entity, a query searching for compounds with the same identifiers than the entity retrieved is performed. Therefore equivalent compounds sharing at least one identifier with the main compound query are added and will be taking into account as they seem to be the same compound.
    • Cytochromes QE: Once the cytochrome entity searched is found, we search for other cytochromes that have the same entityId or Canonical Name. Resulting entities will be added to final query as they seem to be the same cytochrome
    • Markers QE: Once the marker entity searched is found, markers that have the same entityId are added to the final query. Therefore this marker aliases will also be present in the results.
    • Keywords QE: The normalised term is recovered and all the terms having the same normalised term are added to take them into account.