import { HTMLTable } from "@blueprintjs/core";
import { Col, Container, Row } from "react-grid-system";
import workflow from "../assets/workflow.png" 

export default function UsefulInfo() {
    return (
        <Container style={{marginTop: "20px", marginBottom: "40px"}}>
            <Row>
                <Col lg={8}>
                    <h2>Selection of publications</h2>
                    <p>Currently the database includes information extracted only from
                    publications in which the incorporation of the ncAA into the protein was
                    proven by mass spectrometry (MS).</p>
                    <p>To retrieve these publications, we used the “Publish or Perish”
                    software developed by <a rel="noreferrer" target="_blank"
                    href="https://harzing.com/resources/publish-or-perish">AW
                    Harzing</a> to search Google Scholar for terms typical of studies
                    involving ncAAs incorporated into proteins and MS as validation method
                    (Figure 2). “Publish or Perish” allows setting keywords and saving the
                    bibliographic data of the papers as comma-separated value files, which
                    can be later used for downstream analysis. As there are many keywords
                    used in the field of Genetic Code Expansion, we conducted the search
                    based on “unnatural”, “non canonical”, and “non standard” amino
                    acids.</p>
                    <p>
                    These are the keywords we used in the search:
                    </p>
                    <div style={{textAlign: "left", marginBottom: "30px", marginTop: "30px"}}>
                        <div>
                            <p style={{display: "block"}}>
                                <b>Table 1.</b>  Keywords combinations used to find publications on ncAA incorporation into proteins proven by mass spectrometry. Date of search: 22.09.2023. For each keyword search the term “tRNA Synthetase” was additionally used. “unnatural amino acid”, “non canonical amino acid”, and “non standard amino acid” are represented by UAA, ncAA, and nsAA, respectively. Different versions of "non canonical" and "non standard" ("noncanonical", "non-standard") were not used because the "Publish or Perish" software already replaces the blank during the search for a hyphen or non-existent blank. Without removing duplications, 3355 hits were gathered. For all mass spectrometry search terms except for “Mass spectrometry”, the term “Mass spectrometry” was excluded with “-“mass spectrometry”” limit the amount of duplicates.
                            </p>
                        </div>
                        <div style={{width: "100%"}}>
                            <HTMLTable striped style={{width: "100%"}}>
                                <thead>
                                    <tr>
                                        <th>Mass spectrometry search term</th>
                                        <th>UAA</th>
                                        <th>ncAA</th>
                                        <th>nsAA</th>
                                    </tr>
                                </thead>
                                <tbody>
                                    <tr>
                                        <td>“mass spectrometry”</td>
                                        <td>1960</td>
                                        <td>752</td>
                                        <td>112</td>
                                    </tr>
                                    <tr>
                                        <td>“Electrospray ionization”</td>
                                        <td>50</td>
                                        <td>8</td>
                                        <td>0</td>
                                    </tr>
                                    <tr>
                                        <td>“Electrospray ionisation”</td>
                                        <td>6</td>
                                        <td>1</td>
                                        <td>0</td>
                                    </tr>
                                    <tr>
                                        <td>“MALDI”</td>
                                        <td>102</td>
                                        <td>36</td>
                                        <td>5</td>
                                    </tr>
                                    <tr>
                                        <td>“MS/MS”</td>
                                        <td>74</td>
                                        <td>16</td>
                                        <td>0</td>
                                    </tr>
                                    <tr>
                                        <td>“LC MS”</td>
                                        <td>145</td>
                                        <td>37</td>
                                        <td>9</td>
                                    </tr>
                                    <tr>
                                        <td>“GC MS”</td>
                                        <td>20</td>
                                        <td>4</td>
                                        <td>0</td>
                                    </tr>
                                    <tr>
                                        <td>“HPLC MS”</td>
                                        <td>16</td>
                                        <td>2</td>
                                        <td>0</td>
                                    </tr>
                                </tbody>
                            </HTMLTable>
                        </div>
                    </div>
                    <h2>Retieved data</h2>
                    <p>
                    From each publication, we retrieved the following information:
                    </p>
                    <ol>
                        <li>abbreviation and name of ncAA;</li>
                        <li>name, organism of origin and natural substrate of the aaRS used
                        to incorporate the ncAA;</li>
                        <li>mutations applied to the aaRS and name of mutated aaRS (if
                        applicable);</li>
                        <li>name of tRNA used for the incorporation. We decided to assign to
                        each tRNA a name composed of 3 words: a) abbreviation of the organism
                        from which it was derived; b) tRNA; c) AA naturally transported by the
                        tRNA. For example, “Bs-tRNA Tyr” indicates a tRNA naturally found in
                        <em>Bacillus subtilis, which</em> transports tyrosine.</li>
                        <li>anticodon recognized by the tRNA;</li>
                        <li>modifications made to the tRNA (if any);</li>
                        <li>protein in which the ncAA was incorporated;</li>
                        <li>position of incorporation (if given);</li>
                        <li>organism in which incorporation was tested (if given);</li>
                        <li>application for the ncAA (if given);</li>
                        <li>original publication in APA citation style;</li>
                        <li>DOI link to the publication.</li>
                        <li>sequence of the aaRS (if not given in the publication, the
                        sequence was retrieved from repositories such as <a rel="noreferrer" target="_blank"
                        href="https://www.addgene.org/">Addgene</a>);</li>
                        <li>sequence of the tRNA (if not given in the publication, the
                        sequence was retrieved from other databases or repositories (e.g. <a rel="noreferrer" target="_blank"
                        href="https://www.ncbi.nlm.nih.gov/genbank/">GenBank</a> or <a rel="noreferrer" target="_blank"
                        href="https://www.addgene.org/">Addgene</a>)</li>
                        <li>molecular-input line-entry system (SMILES) and IUPAC name of the
                        ncAA. To this aim, we uploaded images of the ncAAs into the “Optical
                        Structure Recognition A <a rel="noreferrer" target="_blank"
                        href="https://sourceforge.net/p/osra/wiki/Home/">(OSRA)</a>”
                        software and compared the generated SMILES with the entries of the <a rel="noreferrer" target="_blank"
                        href="https://pubchem.ncbi.nlm.nih.gov/">PubChem</a> database.
                        Note that some of the ncAAs could unfortunately not be found on PubChem,
                        but we have nevertheless provided the (SMILES) and the chemical formulas
                        for these;</li>
                        <li>AA closest in structure to the ncAA;</li>
                        <li>comments (if applicable). These include, for example, Addgene
                        links, information on flanking sequences on the tRNA, or specific
                        conditions necessary for the incorporation of the ncAA.</li>
                    </ol>
                    <div style={{textAlign: "center", marginBottom: "30px", marginTop: "30px"}}>
                        <div style={{width: "100%"}}>
                            <img src={workflow} alt="" style={{width: "70%", maxWidth: "1000px"}}/>
                        </div>
                        <div>
                            <p style={{display: "block"}}>
                            Figure 2. Scheme of the workflow behind iNClusive
                            </p>
                        </div>
                    </div>
                    <h2>What is an entry in iNClusive?</h2>
                    <p>Typically, an entry contains information about one protein, modified
                    with one ncAA, for which one aaRS/tRNA pair was used whose modification was described in one publication
                    (validation done via MS, as mentioned earlier). Thus,</p>
                    <ul>
                        <li>If a single ncAA has been incorporated into multiple proteins,
                        each of these will be a distinct entry;</li>
                        <li>If the same aaRS/tRNA pair and protein have been
                        used with different ncAAs, each combination will be a distinct
                        entry;</li>
                        <li>If the ncAA and the protein are identical, but the experimental
                        system is different, each will be a distinct entry.</li>
                    </ul>
                    <p>However…</p>
                    <p>If the <strong>same protein</strong> has been modified <strong>at
                    different positions</strong>, these modifications are grouped together
                    in <strong>one entry</strong>. We indicate positions with the letter for
                    the AA and the number, e.g. Y12, R34, S56. If there is an asterisk (*)
                    in front of the position (e.g., *78), it signifies that no amino acid
                    information was provided for that position in the publication.</p>
                    <p><strong>In our opinion, this way of data management reflects the
                    diversity and complexity of the field described by scientific
                    publications</strong></p>
                    <h2>Data visualization and download</h2>
                    <p>You can explore the content online or can export the dataset in a
                    single file where the entries are given as comma-separated values (CSV).
                    Use the search tool to easily find and download specific information. As
                    the database is quite extensive and not all columns can be visualized on
                    the screen at the same time, we implemented the possibility to set up
                    filters under "Filter data". With these filters it is possible to search
                    either all or only certain columns for the desired terms. Several
                    filters can also be applied at once, for instance to find out how often
                    a certain ncAA has been incorporated into proteins using a certain aaRS.
                    It is also possible to hide certain columns using the "Show/Hide
                    columns" function.</p>
                    <h2>Disclaimer</h2>
                    <p>Unfortunately, some entries in the database are incomplete (marked as
                    “not available”) because the data were not provided in the publications,
                    and it was not possible for us to unambiguously assign the missing
                    information using other sources (e.g. GenBank, citations in the
                    publication). Moreover, if a typo was present in the original
                    publication, the name of the ncAA might be inaccurately reported in
                    iNClusive. Finally, publications in which mass spectrometry was wrongly
                    indicated as “mass spectroscopy” have not been considered.</p>
                </Col>
            </Row>
        </Container>
    )
}