{"id":29735,"date":"2024-08-23T13:26:41","date_gmt":"2024-08-23T11:26:41","guid":{"rendered":"https:\/\/fgu.antstudio.dev\/?post_type=vyzkumny-projekt&#038;p=29735"},"modified":"2025-12-15T10:47:34","modified_gmt":"2025-12-15T09:47:34","slug":"chemoinformatics-resources","status":"publish","type":"vyzkumny-projekt","link":"https:\/\/fgu.cas.cz\/en\/research-project\/chemoinformatics-resources\/","title":{"rendered":"Bioinformatics"},"content":{"rendered":"<div>\n<h2>Short-term internships:<\/h2>\n<p>Several software tools are currently being developed in the laboratory. We offer short-term internships (summer or semester-long) for bioinformatics students with programming skills (Python, C#, or Matlab) who are interested in biological or chemical projects. If you are interested, please send an email. These projects currently include:<\/p>\n<table>\n<tbody>\n<tr>\n<td><img decoding=\"async\" class=\"aligncenter wp-image-51870 \" src=\"https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/vilifog-logo-150x150.jpg\" alt=\"python+matlab\" width=\"72\" height=\"72\" title=\"\" srcset=\"https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/vilifog-logo-150x150.jpg 150w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/vilifog-logo-300x300.jpg 300w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/vilifog-logo.jpg 356w\" sizes=\"(max-width: 72px) 100vw, 72px\" \/><\/td>\n<td><strong>ViLiFOG<\/strong> &#8211; Virtual Lipidome Fluxomics Object Generator (Python+Matlab)<\/td>\n<\/tr>\n<tr>\n<td><img decoding=\"async\" class=\"aligncenter wp-image-51874 size-medium\" src=\"https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/nginx-docker-logo-300x300.jpg\" alt=\"-\" width=\"72\" height=\"72\" title=\"\" srcset=\"https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/nginx-docker-logo-300x300.jpg 300w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/nginx-docker-logo-1024x1024.jpg 1024w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/nginx-docker-logo-150x150.jpg 150w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/nginx-docker-logo-768x769.jpg 768w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/nginx-docker-logo-1534x1536.jpg 1534w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/nginx-docker-logo-2045x2048.jpg 2045w\" sizes=\"(max-width: 72px) 100vw, 72px\" \/><\/td>\n<td>VM for hosting bioinfo web apps (linux+nginx+docker&#8230;)<\/td>\n<\/tr>\n<tr>\n<td><img decoding=\"async\" class=\"aligncenter wp-image-51876 size-medium\" src=\"https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/lora-in-r-296x300.jpg\" alt=\"-\" width=\"72\" height=\"72\" title=\"\" srcset=\"https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/lora-in-r-296x300.jpg 296w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/lora-in-r-150x150.jpg 150w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/lora-in-r-768x777.jpg 768w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/lora-in-r.jpg 980w\" sizes=\"(max-width: 72px) 100vw, 72px\" \/><\/td>\n<td><a href=\"https:\/\/lora.metabolomics.fgu.cas.cz\">LORA implementation in R\/CLI\/Shiny<\/a><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Shared and published resources:<\/p>\n<table>\n<tbody>\n<tr>\n<td><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/avatars.githubusercontent.com\/u\/130560605?s=200&amp;v=4\" width=\"20\" height=\"20\" alt=\"-\" title=\"\"> GitHub<\/td>\n<td><a href=\"https:\/\/github.com\/IPHYS-Bioinformatics\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/IPHYS-Bioinformatics<\/a><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<hr \/>\n<p>We run local instances of several services focused on Metabolomics, Lipidomics, Statistical analysis and Chemometrics. As the hardware resources are limited, these tools are available only within campus LAN.<\/p>\n<\/div>\n<p>All services are running in separate Docker containers:<\/p>\n<h2>Contents:<\/h2>\n<ol>\n<li><a href=\"https:\/\/www.fgu.cas.cz\/articles\/876-chemoinformatics-resources#Metaboanalyst\">Metaboanalyst &#8211; statistical, functional and integrative analysis of metabolomics data<\/a><\/li>\n<li><a href=\"https:\/\/www.fgu.cas.cz\/articles\/876-chemoinformatics-resources#ChemRICH\">ChemRICH &#8211; Chemical Similarity Enrichment Analysis for Metabolomics<\/a><\/li>\n<li><a href=\"https:\/\/www.fgu.cas.cz\/articles\/876-chemoinformatics-resources#CRediT\">CRediT (Contributor Roles Taxonomy) generator<\/a><\/li>\n<li><a href=\"https:\/\/www.fgu.cas.cz\/articles\/876-chemoinformatics-resources#MS-DIAL\">MS-DIAL data processing benchmark<\/a><\/li>\n<\/ol>\n<hr \/>\n<h2><a name=\"Metaboanalyst\"><\/a>Metaboanalyst &#8211; statistical, functional and integrative analysis of metabolomics data<\/h2>\n<ul>\n<li>Local version based on v4.93, slightly modified to run in Ubuntu 20.04 LTS.<\/li>\n<li>Few notes on a local standalone <a href=\"https:\/\/www.fgu.cas.cz\/articles\/835-local-installation-of-metaboanalyst\">installation<\/a><\/li>\n<li>Official service website: <a href=\"https:\/\/www.metaboanalyst.ca\/\" target=\"_blank\" rel=\"noopener\">https:\/\/www.metaboanalyst.ca<\/a><\/li>\n<\/ul>\n<p>MetaboAnalyst in a container&#8230;. from <a href=\"https:\/\/github.com\/xia-lab\/MetaboAnalyst_Docker\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/xia-lab\/MetaboAnalyst_Docker<\/a><\/p>\n<pre class=\"prettyprint lang-bsh prettyprinted\"># modify Docker file\r\n# prefer Java 8 (Oracle flavor)\r\n...\r\n\r\nENV METABOANALYST_VERSION 4.93\r\nENV METABOANALYST_LINK https:\/\/www.dropbox.com\/s\/9xo4yy3gzqsvyj9\/MetaboAnalyst-4.93.war?dl=0\r\nENV METABOANALYST_FILE_NAME MetaboAnalyst.war\r\n...\r\n<\/pre>\n<pre class=\"prettyprint lang-bsh prettyprinted\"># Build the Dockerfile\r\ndocker build -t metab_docker .\r\n\r\n# Run the Dockerfile in the interactive mode\r\ndocker run -ti --rm --name METAB_DOCKER -p 8080:8080 metab_docker\r\n\r\n# Execute R script loading libraries etc. inside the container\r\nRscript \/metab4script.R \r\n\r\n# Deploy Java cargo inside the container\r\njava -jar \/opt\/payara\/payara-micro.jar --deploymentDir \/opt\/payara\/deployments\r\n\r\n# Look for running MetaboAnalyst at http:\/\/localhost:8080\/MetaboAnalyst\/\r\n<\/pre>\n<h2><a name=\"ChemRICH\"><\/a>ChemRICH &#8211; Chemical Similarity Enrichment Analysis for Metabolomics<\/h2>\n<ul>\n<li>Local version based on latest git version, modified to run in Ubuntu 20.04 LTS (R 4.0+)<\/li>\n<li>Corrected output formats, fixed compatibility issues and added SVG export option<\/li>\n<li>Official service website: <a href=\"http:\/\/chemrich.fiehnlab.ucdavis.edu\/\" target=\"_blank\" rel=\"noopener\">http:\/\/chemrich.fiehnlab.ucdavis.edu<\/a><\/li>\n<\/ul>\n<pre class=\"prettyprint lang-bsh prettyprinted\"># Dockerfile\r\nFROM opencpu\/base\r\nMAINTAINER OK\r\nLABEL Description = \"ChemRICH 0.1.1 container\"\r\n...\r\n# prepare Ubuntu for compilations as needed\r\n...\r\n# setup Java, jdk contains jre, set PATH\r\nRUN apt-get -y install openjdk-11-jdk\r\n# check where is Java\r\n#RUN update-java-alternatives -l\r\n#RUN java -version\r\nENV JAVA_HOME=\"\/usr\/lib\/jvm\/java-1.11.0-openjdk-amd64\"\r\nENV PATH $JAVA_HOME\/bin:$PATH\r\n\r\n# configure Java for R\r\nRUN R CMD javareconf\r\nRUN R -e \"install.packages('rJava', repos='http:\/\/cran.rstudio.com\/')\"    \r\n\r\n# install all R packages via R script, wait few hours\r\n# XLConnect works with Java 8 till Java 11, no more\r\nADD install_package.R \/install_package.R\r\nRUN Rscript install_package.R\r\n\r\n# these packages require special attention\r\n# RCurl needs re-installation if ....rcurl.so... error appears\r\nRUN R -e \"install.packages('devtools', repos='http:\/\/cran.rstudio.com\/')\"\r\nRUN R -e \"install.packages('RCurl', repos='http:\/\/cran.rstudio.com\/')\"\r\nRUN R -e \"install.packages('unix', repos='http:\/\/cran.rstudio.com\/')\"\r\n\r\n# run local installation of the package\r\nCOPY ChemRICH_0.1.1.tar.gz \/ChemRICH_0.1.1.tar.gz \r\nRUN R -e \"install.packages('ChemRICH_0.1.1.tar.gz', repos = NULL)\"\r\n\r\n# opencpu needs more time for POST and more memory, upload modified configurations (timelimit.post\": 900, etc.)\r\nCOPY defaults.conf \/usr\/local\/lib\/R\/site-library\/opencpu\/config\/defaults.conf\r\nCOPY server.conf \/etc\/opencpu\/server.conf\r\n\r\n# make sure Java can be found in rApache and other daemons not looking in R ldpaths\r\n# otherwise RJava loading error will appear\r\nRUN echo \"\/usr\/lib\/jvm\/java-1.11.0-openjdk-amd64\/lib\/server\/\" &gt; \/etc\/ld.so.conf.d\/rJava.conf\r\nRUN \/sbin\/ldconfig\r\n\r\n# add R script if needed in the interactive session\r\nADD run-opencpu-server.R \/run-opencpu-server.R\r\n\r\n# start the service\r\nCMD service cron start &amp;&amp; \/usr\/lib\/rstudio-server\/bin\/rserver &amp;&amp; apachectl -DFOREGROUND\r\n\r\n# just in case\r\n# ENTRYPOINT [\"bin\/bash\"]\r\n<\/pre>\n<p>Build and run the container<\/p>\n<pre class=\"prettyprint lang-bsh prettyprinted\">docker run -t -p 80:80 -p 8004:8004 opencpu\/rstudio\r\n\r\n# help: https:\/\/hub.docker.com\/r\/opencpu\/rstudio\r\n# help: https:\/\/opencpu.github.io\/server-manual\/opencpu-server.pdf\r\n\r\n# browse\r\n# http:\/\/localhost:8004\/ocpu\/library\/ChemRICH\/www\/\r\n# http:\/\/localhost:8004\/ocpu\/info\r\n<\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-26788 aligncenter\" src=\"https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/chemrich-1024x483.jpg\" alt=\"-\" width=\"978\" height=\"461\" title=\"\" srcset=\"https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/chemrich-1024x483.jpg 1024w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/chemrich-300x142.jpg 300w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/chemrich-768x362.jpg 768w, https:\/\/fgu.cas.cz\/wp-content\/uploads\/2024\/08\/chemrich.jpg 1250w\" sizes=\"(max-width: 978px) 100vw, 978px\" \/><\/p>\n<hr \/>\n<h2><a name=\"CRediT\"><\/a>CRediT (Contributor Roles Taxonomy) generator<\/h2>\n<p>This web <a href=\"https:\/\/credit.metabolomics.fgu.cas.cz\/\">generator<\/a> helps to summarize contributions of individual authors and prepare the Author contributions paragraph for a scientific journal<\/p>\n<hr \/>\n<h2><a name=\"MS-DIAL\"><\/a>MS-DIAL data processing benchmark<\/h2>\n<p>The purpose of this test was to select an appropriate hardware for LC-MS data processing using <a href=\"http:\/\/prime.psc.riken.jp\/compms\/msdial\/main.html\" target=\"_blank\" rel=\"noopener\">MS-DIAL 4.20<\/a>.<\/p>\n<h3>Input data<\/h3>\n<ul>\n<li>Metabolomics profiling using <a href=\"https:\/\/www.fgu.cas.cz\/en\/departments\/metabolomics\">LIMeX<sup>5D<\/sup> workflow<\/a>. Only HILICn part was selected.<\/li>\n<li>Q Exactive Plus; R17,500; 714 *.abf files, each ~5 MB<\/li>\n<li>Time windows of 6 minutes selected for data processing.<\/li>\n<li>Custom MSP metabolomics library, 1.5 GB<\/li>\n<\/ul>\n<h3>HW resources<\/h3>\n<div class=\"row\">\n<div>\n<p>Machine A &#8211; old office computer<\/p>\n<ul>\n<li>Intel Core i3&nbsp;4130 (2 CPU\/4 threads)<\/li>\n<li>DDR3&nbsp;8 GB<\/li>\n<li>SSD 6 Gb\/s<\/li>\n<li>Windows 10 Pro 64bit<\/li>\n<\/ul>\n<\/div>\n<div>\n<p>Machine B &#8211; Workstation<\/p>\n<ul>\n<li>AMD Ryzen 7&nbsp;7200X (8 CPU\/16 threads)<\/li>\n<li>DDR4&nbsp;64 GB<\/li>\n<li>NVMe 4x 8.0 GT\/s<\/li>\n<li>Windows 10 Pro 64bit<\/li>\n<\/ul>\n<\/div>\n<div>\n<p>Machine C &#8211; Virtual machine (OpenStack)<sup>#<\/sup><\/p>\n<ul>\n<li>virtual Intel Xeon-2296 (16 vCPUs)<\/li>\n<li>64 GB<\/li>\n<li>QEMU HDD<\/li>\n<li>Windows Server 2019<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h3>Results<\/h3>\n<table>\n<thead>\n<tr>\n<th>Machine<\/th>\n<th>MS-DIAL threads<\/th>\n<th>Intensity threshold<\/th>\n<th>Library loading&#8230;<\/th>\n<th>Peak detection&#8230;<\/th>\n<th>Alignment&#8230;<\/th>\n<th>Total time<\/th>\n<th>Peak spots<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>A<\/td>\n<td>4<\/td>\n<td>20,000 cps<\/td>\n<td>20 min<\/td>\n<td>90 min<\/td>\n<td>260 min<\/td>\n<td>6.2 hrs<\/td>\n<td>23,840<\/td>\n<\/tr>\n<tr>\n<td>B<\/td>\n<td>4<\/td>\n<td>20,000 cps<\/td>\n<td>20 min<\/td>\n<td>34 min<\/td>\n<td>37 min<\/td>\n<td>1.5 hrs<\/td>\n<td>23,840<\/td>\n<\/tr>\n<tr>\n<td>B<\/td>\n<td>8<\/td>\n<td>20,000 cps<\/td>\n<td>20 min<\/td>\n<td>27 min<\/td>\n<td>42 min<\/td>\n<td>1.5 hrs<\/td>\n<td>23,840<\/td>\n<\/tr>\n<tr>\n<td>B<\/td>\n<td>16<\/td>\n<td>20,000 cps<\/td>\n<td>20 min<\/td>\n<td>23 min<\/td>\n<td>40 min<\/td>\n<td>1.4 hrs<\/td>\n<td>23,840<\/td>\n<\/tr>\n<tr>\n<td>B<\/td>\n<td>16<\/td>\n<td>20,000 cps<\/td>\n<td>1 min*<\/td>\n<td>23 min<\/td>\n<td>36 min<\/td>\n<td>1.0 hrs<\/td>\n<td>23,840<\/td>\n<\/tr>\n<tr>\n<td>C<\/td>\n<td>16<\/td>\n<td>20,000 cps<\/td>\n<td>23 min<\/td>\n<td>14 min<\/td>\n<td>46 min<\/td>\n<td>1.4 hrs<\/td>\n<td>23,840<\/td>\n<\/tr>\n<tr>\n<td>C<\/td>\n<td>16<\/td>\n<td>20,000 cps<\/td>\n<td>1 min*<\/td>\n<td>14 min<\/td>\n<td>45 min<\/td>\n<td>1.0 hrs<\/td>\n<td>23,840<\/td>\n<\/tr>\n<tr>\n<td>C<\/td>\n<td>16<\/td>\n<td>1,000 cps<\/td>\n<td>23 min<\/td>\n<td>30 min<\/td>\n<td>920 min<\/td>\n<td>16.2 hrs<\/td>\n<td>27,026<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<ul>\n<li><em>Processor:<\/em> More threads means faster data processing, mainly in the middle part. New CPUs beat older versions. Number of available threads per physical CPU is not equivalent to the number of VCPUs as it depends on the virtualization.<\/li>\n<li><em>Memory:<\/em> Peak memory usage was ~ 25 GB during the alignment phase. The final alignmentResult_2020_6_6_22_21_59.EIC.aef file was ~14 GB. At least 1.5 GB RAM per thread was needed. If RAM amount was limited, memory caching to disk prolonged the processing significantly (not shown, run scratched after 6 hrs).<\/li>\n<li><em>Disk:<\/em> no major difference was observed between SSD, NVMe, and emulated virtual drive. The final part of data merging run at ~35 Mb\/s writing.<\/li>\n<li>If no Threadripper workstation is available, virtualization could help. Physical machine B and virtual machine C performance was comparable. However, B was faster in parts where a direct hardware access was needed &#8211; and C was faster in parallel processing thanks to dedicated 16vCPUs. A general estimation is that 1 vCPU = 1 Physical CPU Core. However, this is not entirely correct, as the vCPU is made up of time slots across all available physical cores, so in general 1vCPU is actually more powerful than a single core, especially if the physical CPUs have 8 cores. Therefore, AMD workstation with 8 CPUs kept pace with 16vCPUs.<\/li>\n<li>When peak picking threshold was set to very low level (1,000 cps), only ~3,200 more spots were found for extra 14.8 hours. Most of the extra spots were unreliable and within the noise level.<\/li>\n<li>* The initial Library loading takes a constant amount of time. However, once processed and serialized, an *.msp2 binary file is created. This could be re-used in an independent project as an MSP library that is loaded within a minute. Therefore, the minimal time of analysis was 1.0 hr.<\/li>\n<li><sup>#<\/sup> Computational resources were supplied by the project &#8220;e-Infrastruktura CZ&#8221; (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures.<\/li>\n<\/ul>\n<h3>Conclusion<\/h3>\n<p>Dedicated physical workstation with many CPUs (Threadripper or better) is the best option for processing large datasets with large libraries using MS-DIAL software.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Short-term internships: Several software tools are currently being developed in the laboratory. We offer short-term internships (summer or semester-long) for bioinformatics students with programming skills (Python, C#, or Matlab) who are interested in biological or chemical projects. If you are interested, please send an email. These projects currently include: ViLiFOG &#8211; Virtual Lipidome Fluxomics Object [&hellip;]<\/p>\n","protected":false},"author":1,"template":"","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"oddeleni":[161],"poskytovatel":[],"stav-projektu":[209],"class_list":["post-29735","vyzkumny-projekt","type-vyzkumny-projekt","status-publish","hentry","oddeleni-metabolism-of-bioactive-lipids","stav-projektu-current-projects"],"acf":[],"_links":{"self":[{"href":"https:\/\/fgu.cas.cz\/en\/wp-json\/wp\/v2\/vyzkumny-projekt\/29735","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fgu.cas.cz\/en\/wp-json\/wp\/v2\/vyzkumny-projekt"}],"about":[{"href":"https:\/\/fgu.cas.cz\/en\/wp-json\/wp\/v2\/types\/vyzkumny-projekt"}],"author":[{"embeddable":true,"href":"https:\/\/fgu.cas.cz\/en\/wp-json\/wp\/v2\/users\/1"}],"version-history":[{"count":0,"href":"https:\/\/fgu.cas.cz\/en\/wp-json\/wp\/v2\/vyzkumny-projekt\/29735\/revisions"}],"wp:attachment":[{"href":"https:\/\/fgu.cas.cz\/en\/wp-json\/wp\/v2\/media?parent=29735"}],"wp:term":[{"taxonomy":"oddeleni","embeddable":true,"href":"https:\/\/fgu.cas.cz\/en\/wp-json\/wp\/v2\/oddeleni?post=29735"},{"taxonomy":"poskytovatel","embeddable":true,"href":"https:\/\/fgu.cas.cz\/en\/wp-json\/wp\/v2\/poskytovatel?post=29735"},{"taxonomy":"stav-projektu","embeddable":true,"href":"https:\/\/fgu.cas.cz\/en\/wp-json\/wp\/v2\/stav-projektu?post=29735"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}