Medical Information Processing

Mitarbeiter: Jörn Ostermann, Bodo Rosenhahn, Jan Voges, Yeremia G. Adhisantoso
Einleitung

Am Institut für Informationsverarbeitung (TNT) entwickeln wir Methoden zur Verarbeitung und Analyse von DNA-Sequenzierungsdaten. Die Fortschritte in der Entwicklung von Hoch-Durchsatz-Technologien zur Sequenzierung haben das Potential, die Verwendung von Sequenzierungsdaten als tägliche Praxis in verschiedenen Bereichen zu ermöglichen. Allerdings übersteigen die IT-Kosten im Zusammenhang mit der Speicherung, Übertragung und Verarbeitung großer Mengen von Sequenzierungsdaten die Kosten für die Durchführung der tatsächlichen Sequenzierung inzwischen erheblich. Mit unserer Arbeit möchten wir diese Daten in großer Menge nutzbar machen, um zum Beispiel Ihre breite Anwendung in der personalisierten Medizin zu ermöglichen.

Aktuelle Forschungsthemen

Kompression von Sequenzierungsdaten

Bei der DNA-Sequenzierung wird zunächst eine auszulesende Sequenz fragmentiert. Die Fragmente werden zunächst vervielfacht und abschließend von einer Sequenzierungsmaschine ausgelesen. Alle bekannten Technologien zur Sequenzierung sind grundsätzlich fehlerbehaftet. Aus diesem Grund wird jeder ausgelesenen Base ein Qualitätswert zugeordnet. Die ausgelesenen Fragmente werden als Reads bezeichnet und zusammen mit den Qualitätswerten in FASTQ-Dateien gespeichert. Weitere Verarbeitungsschritte sind das Alignment der Reads, mit dem Ziel die zugrundeliegende DNA-Sequenz zu rekonstruieren, und die Identifizierung von Strukturvarianten des sequenzierten Materials.

In unseren Arbeiten beschäftigen wir uns insbesondere mit Kompressionsverfahren für alignierte Reads sowie mit der transparenten verlustbehafteten Kompression von Qualitätswerten.

MPEG-G

Die MPEG-G-Standardserie ist das erste ISO/IEC-Projekt zur Speicherung und Übertragung von Sequenzierungsdaten. Weite Teile unserer Arbeiten sind in MPEG-G eingegangen.

Verwendete Methoden

Sequenz-Alignment, Verlustbehaftete Kompression, Maschinelles Lernen, Entropiecodierungsverfahren

Referenzen

[1] Ibrahim Numanagic, James K Bonfield, Faraz Hach, Jan Voges, Jörn Ostermann, Claudio Alberti, Marco Mattavelli, S Cenk Sahinalp. Comparison of high-throughput sequencing data compression tools. Nature Methods 13(12), pp. 1005–1008, 2016.

[2] Jan Voges, Jörn Ostermann, Mikel Hernaez. CALQ: compression of quality values of aligned sequencing data. Bioinformatics 34(10), pp. 1650–1658, 2018

[3] Claudio Alberti, Noah Daniels, Mikel Hernaez, Jan Voges, Rachel L Goldfeder, Ana A Hernandez-Lopez, Marco Mattavelli, Bonnie Berger. An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values. 2016 Data Compression Conference (DCC), pp. 221–230, Snowbird, UT (US), 2016.

[4] Claudio Alberti, Tom Paridaens, Jan Voges, Daniel Naro, Junaid J. Ahmad, Massimo Ravasi, Daniele Renzi, Giorgio Zoia, Idoia Ochoa, Marco Mattavelli, Jaime Delgado, Mikel Hernaez. An introduction to MPEG-G, the new ISO standard for genomic information representation. bioRxiv preprint, 2018.

  • Conference Contributions
    • Idoia Ochoa, Hongyi Li, Florian Baumgarte, Charles Hergenrother, Jan Voges, Mikel Hernaez
      AliCo: a new efficient representation for SAM files
      2019 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 93-102, Snowbird, UT (US), March 2019
    • Tom Paridaens, Jan Voges, Mikel Hernaez, Jan Fostier, Jörn Ostermann
      GABAC: an arithmetic coding solution for genomic data
      27th Conference on Intelligent Systems for Molecular Biology (ISMB) and 18th European Conference on Computational Biology (ECCB) 2019, International Society for Computational Biology (ISCB), Vol. 8, p. 1463 (poster), Basel (CH), July 2019
    • Jan Voges
      Optimization Strategy for MPEG-G Compliant Entropy Encoding
      Contributions 5th ITG/VDE Summer School Video Compression and Processing (SVCP), University of Konstanz, pp. 228-255, Konstanz (DE), June 2019, edited by Dietmar Saupe, André Kaup, Jens-Rainer Ohm
    • Brian E Bliss, Joshua M Allen, Saurabh Baheti, Matthew A Bockol, Shubham Chandak, Jaime Delgado, Jan Fostier, Josep L Gelpi, Steven N Hart, Mikel Hernaez Arrazola, Matthew E Hudson, Michael T Kalmbach, Eric W Klee, Liudmila S Mainzer, Fabian Müntefering, Daniel Naro, Idoia Ochoa-Alvarez, Jörn Ostermann, Tom Paridaens, Christian A Ross, Jan Voges, Eric D Wieben, Mingyu Yang, Tsachy Weissman, Mathieu Wiepert
      Genie: an MPEG-G conformant software to compress genomic data
      International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), p. (poster), Denver, CO (US), November 2019
    • Jan Voges, Ali Fotouhi, Jörn Ostermann, M. Oguzhan Külekci
      A Two-Level Scheme for Quality Score Compression
      Proceedings of the 10th International Conference on Bioinformatics and Computational Biology (BICOB 2018), International Society for Computers and their Applications (ISCA), pp. 161-167, Las Vegas, NV (US), March 2018, edited by Hisham Al-Mubaid, Qin Ding, Oliver Eulenstein
    • Ana A Hernandez-Lopez, Jan Voges, Claudio Alberti, Marco Mattavelli, Jörn Ostermann
      Lossy Compression of Quality Scores in Differential Gene Expression: A First Assessment and Impact Analysis
      2018 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 167-176, Snowbird, UT (US), March 2018
    • Jan Voges
      MPEG-G: The Standard for Genomic Information Representation
      Proceedings of the 4th Summer School on Video Compression and Processing (SVCP) 2018, Leibniz Universität Hannover, Institut für Informationsverarbeitung, pp. 7-8, Hannover (DE), July 2018, edited by Jan Voges
    • Ana A Hernandez-Lopez, Jan Voges, Claudio Alberti, Marco Mattavelli, Jörn Ostermann
      Differential Gene Expression with Lossy Compression of Quality Scores in RNA-Seq Data
      2017 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), p. 444 (poster), Snowbird, UT (US), April 2017
    • Jan Voges, Jörn Ostermann, Mikel Hernaez
      CALQ: compression of quality values of aligned sequencing data
      Joint 25th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 16th European Conference on Computational Biology (ECCB) 2017, International Society for Computational Biology (ISCB), Vol. 6, p. 1382 (poster), Prague (CZ), August 2017
    • Jan Voges, Jörn Ostermann
      MPEG-G: The Emerging Standard for Genomic Data
      Poster abstracts of the 25th German Conference on Bioinformatics, PeerJ, Vol. 5, p. 2 (poster), Tübingen (DE), September 2017
    • Claudio Alberti, Noah Daniels, Mikel Hernaez, Jan Voges, Rachel L Goldfeder, Ana A Hernandez-Lopez, Marco Mattavelli, Bonnie Berger
      An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values
      2016 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 221-230, Snowbird, UT (US), April 2016
    • Jan Voges, Marco Munderloh, Jörn Ostermann
      Predictive Coding of Aligned Next-Generation Sequencing Data
      2016 Data Compression Conference (DCC), IEEE Computer Society Conference Publishing Services (CPS), pp. 241-250, Snowbird, UT (US), April 2016
    • Erik Soltow, Bodo Rosenhahn
      Automatic Pose Estimation Using Contour Information from X-Ray Images
      Image and Video Technology – PSIVT 2015 Workshops, Springer International Publishing, March 2016, edited by Fay Huang and Akihiro Sugimoto
    • Erik Soltow, Christof Hurschler, Bodo Rosenhahn
      Geometric bone models for marker-less RSA in total knee arthroplasty: a proof of concept
      4th International RSA Meeting, May 2015
    • Oliver Müller, Sabine Donner, Tobias Klinder, Ivonne Bartsch, Alexander Krüger, Alexander Heisterkamp, Bodo Rosenhahn
      Compensating motion artifacts of 3D in vivo SD-OCT scans
      Medical Image Computing and Computer Assisted Intervention (MICCAI), Vol. 7510, pp. 198--205, October 2012, edited by Nicholas Ayache, Hervé Delingette, Polina Golland, Kensaku Mori
    • Stojan Maleschlijski, Laura Leal-Taixé, Sebastian Weiße, Alessio Di Fino, Nicholas Aldred, A. S. Clare, G. Hernán Sendra, Bodo Rosenhahn, Axel Rosenhahn
      A stereoscopic approach for three dimensional tracking of marine biofouling microorganisms
      Microscopic Image Analysis with Applications in Biology (MIAAB). Heidelberg, Germany, September 2011
    • Oliver Müller, Sabine Donner, Tobias Klinder, Ralf Dragon, Ivonne Bartsch, Frank Witte, Alexander Krüger, Alexander Heisterkamp, Bodo Rosenhahn
      Model Based 3D Segmentation and OCT Image Undistortion of Percutaneous Implants
      Medical Image Computing and Computer-Assisted Intervention – MICCAI 2011 14th International Conference, Lecture Notes in Computer Science (LNCS), Springer Berlin / Heidelberg, Vol. 6893, pp. 454-462, September 2011, edited by Fichtinger, Gabor and Martel, Anne and Peters, Terry
    • Arne Ehlers, Florian Baumann, Ralf Spindler, Birgit Glasmacher, Bodo Rosenhahn
      PCA Enhanced Training Data for Adaboost
      Computer Analysis of Images and Patterns - 14th International Conference, CAIP 2011, Seville, Spain, August 29-31, 2011, Proceedings, Part I, Springer, Vol. 6854, pp. 410-419, August 2011
    • Laura Leal-Taixé, Matthias Heydt, Sebastian Weiße, Axel Rosenhahn, Bodo Rosenhahn
      Classification of swimming microorganisms motion patterns in 4D digital in-line holography data
      32nd Annual Symposium of the German Association for Pattern Recognition (DAGM), Springer, Vol. 6376, pp. 283-292, 2010
    • Tobias Klinder, Cristian Lorenz, Jörn Ostermann
      Free-Breathing Intra- and Intersubject Respiratory Motion Capturing, Modeling, and Prediction
      SPIE 2009, SPIE Medical Imaging, Orlando, Florida, USA , February 2009
    • Matthias Ehm, Tobias Klinder, Reinhard Kneser, Cristian Lorenz
      Automated Vertebra Identification in CT images
      SPIE 2009, SPIE Medical Imaging, Orlando, Florida, USA , February 2009
    • Laura Leal-Taixé, Ahmet U. Coskun, Bodo Rosenhahn, Dana H. Brooks
      Automatic segmentation of arteries in multi-stain histology images
      World Congress on Medical Physics and Biomedical Engineering, Munich (Germany), September 7th-12th, 2009
    • Laura Leal-Taixé, Matthias Heydt, Axel Rosenhahn, Bodo Rosenhahn
      Automatic tracking of swimming microorganisms in 4D digital in-line holography data
      IEEE Workshop on Motion and Video Computing (WMVC), Snowbird, Utah, USA., December 2009
    • Tobias Klinder, Cristian Lorenz, Jens von Berg, Steffen Renisch, Thomas Blaffert, Jörn Ostermann
      4DCT Image-Based Lung Motion Field Extraction and Analysis
      SPIE 2008, SPIE Medical Imaging, San Diego, California, USA , February 2008
    • Thomas Blaffert, Hans Barschdorff, Jens von Berg, Sebastian Dries, Astrid Franz, Tobias Klinder, Cristian Lorenz, Steffen Renisch, Rafael Wiemker
      Lung Lobe Modeling and Segmentation with Individualized Surface Meshes
      SPIE 2008, SPIE Medical Imaging, San Diego, California, USA, February 2008
    • Torbjörn Vik, Sven Kabus, Jens von Berg, Konstantin Ens, Sebastian Dries, Tobias Klinder, Cristian Lorenz
      Validation and Comparison of Registration Methods for Free-Breathing 4D Lung-CT
      SPIE 2008, SPIE Medical Imaging, San Diego, California, USA , February 2008
    • Tobias Klinder, Cristian Lorenz, Jörn Ostermann
      Respiratory Motion Modeling and Estimation
      The First Annual Workshop on Pulmonary Image Analysis, MICCAI 2008, Medical Image Computing and Computer Assisted Intervention, New York, USA, September 2008
    • Jalda Dworzak, Hans Lamecker, Jens von Berg, Tobias Klinder, Cristian Lorenz, Dagmar Kainmüller, Heiko Seim, Hans-Christian Hege, Stefan Zachow
      Towards Model-based 3-D Reconstruction of the Human Rib Cage from Radiographs
      CURAC 2008, Computer- und Roboterassistierte Chirurgie e.V., September 2008
    • Udo van Stevendaal, Tobias Klinder, Cristian Lorenz, Thomas Köhler
      Breathing-Motion Correction for Helical CT
      IEEE NSS/MIC, IEEE Nuclear Science Symposium and Medical Imaging Conference, Dresden, Germany, October 2008
    • Astrid Franz, Robin Wolz, Tobias Klinder, Cristian Lorenz, Hans Barschdorf, Thomas Blaffert, Sebastian Dries, Steffen Renisch
      Simultaneous Model-Based Segmentation of Multiple Objects
      BVM 2008, Bildverarbeitung für die Medizin, Berlin, Germany, April 2008
    • Tobias Klinder, Robin Wolz, Cristian Lorenz, Astrid Franz, Jörn Ostermann
      Spine Segmentation Using Articulated Shape Models
      MICCAI 2008, Medical Image Computing and Computer Assisted Intervention, Springer, New York, USA, September 2008
    • Tobias Klinder, Cristian Lorenz, Jens von Berg, Sebastian Dries, Thomas Bülow, Jörn Ostermann
      Automated Model-Based Rib Cage Segmentation and Labeling in CT images
      MICCAI 2007, Medical Image Computing and Computer Assisted Intervention, Springer, Brisbane, Australia, October 2007
    • Tobias Klinder, Cristian Lorenz, Jens von Berg
      Geometrical Rib-Cage Modeling, Detection, and Segmentation
      CARS 2007, Computer Assisted Radiology and Surgery, Berlin, Germany, June 2007
  • Journals
    • Jan Voges, Tom Paridaens, Fabian Müntefering, Liudmila S Mainzer, Brian Bliss, Mingyu Yang, Idoia Ochoa, Jan Fostier, Jörn Ostermann, Mikel Hernaez
      GABAC: an arithmetic coding solution for genomic data
      Bioinformatics, Oxford University Press, Vol. 36, No. 7, pp. 2275-2277, December 2019, edited by John Hancock
    • Jan Voges, Jörn Ostermann, Mikel Hernaez
      CALQ: compression of quality values of aligned sequencing data
      Bioinformatics, Oxford University Press, Vol. 34, No. 10, pp. 1650-1658, May 2018, edited by Bonnie Berger
    • Jan Voges, Ali Fotouhi, Jörn Ostermann, M. Oguzhan Külekci
      A Two-level Scheme for Quality Score Compression
      Journal of Computational Biology, Mary Ann Liebert, Inc., Vol. 25, No. 10, October 2018
    • Claudio Alberti, Tom Paridaens, Jan Voges, Daniel Naro, Junaid J. Ahmad, Massimo Ravasi, Daniele Renzi, Giorgio Zoia, Idoia Ochoa, Marco Mattavelli, Jaime Delgado, Mikel Hernaez
      An introduction to MPEG-G, the new ISO standard for genomic information representation
      bioRxiv, Cold Spring Harbor Laboratory, September 2018
    • Ibrahim Numanagic, James K Bonfield, Faraz Hach, Jan Voges, Jörn Ostermann, Claudio Alberti, Marco Mattavelli, S Cenk Sahinalp
      Comparison of high-throughput sequencing data compression tools
      Nature Methods, Nature Publishing Group, Vol. 13, No. 12, pp. 1005-1008, October 2016
    • Sabine Donner, Oliver Müller, Frank Witte, Ivonne Bartsch, Elmar Willbold, Tammo Ripken, Alexander Heisterkamp, Bodo Rosenhahn, Alexander Krüger
      In situ optical coherence tomography of percutaneous implant-tissue interfaces in a murine model
      Biomedical Engineering/Biomedizinische Technik, De Gruyter, pp. 1-9, Karlsruhe, May 2013, edited by Dössel, Olaf
    • S. Maleschlijski, G. H. Sendra, A. Di Fino, L. Leal-Taixé, I. Thome, A. Terfort, N. Aldred, M. Grunze, A. S. Clare, B. Rosenhahn, A. Rosenhahn
      Three dimensional tracking of exploratory behavior of barnacle cyprids using stereoscopy
      Biointerphases. Journal for the Quantitative Biological Interface Data., Springer, August 2012
    • Ralf Spindler, Bodo Rosenhahn, Nicola Hofmann, Birgit Glasmacher
      Video analysis of osmotic cell response during cryopreservation
      Cryobiology, Elsevier, February 2012
    • Tobias Klinder, Jörn Ostermann, Matthias Ehm, Astrid Franz, Reinhard Kneser, Cristian Lorenz
      Automated Model-Based Vertebra Detection, Identification, and Segmentation in CT Images
      Medical Image Analysis, Elsevier, Vol. 13, pp. 471-482, 2009
  • Book Chapters
    • Laura Leal-Taixé, Matthias Heydt, Axel Rosenhahn, Bodo Rosenhahn
      Understanding what we cannot see: automatic analysis of 4D digital in-line holographic microscopy data
      Video Processing and Computational Video, Springer, July 2011, edited by D. Cremers, M.A. Magnor, M.R. Oswald, L. Zelnik-Manor