EN | CZ

Methods for the Analysis of 3D Structural Fragments in Biomolecules

Ph.D. Thesis by RNDr. David Sehnal, Ph.D.

Supervisor: prof. RNDr. Ludek Matyska, CSc.   Consultant: RNDr. Radka Svobodová Vařeková, Ph.D.


Structural data for biomolecules are becoming increasingly available in the recent years. However, in addition to the high availability of the data, life science researchers are in need of advanced tools for performing sophisticated analyses and extracting meaningful information. My work is focused on one particular type of object that is very important for life science and has a straightforward representation in computer science—the biomacromolecular structural fragment.

I concentrate on all steps of theoretical analysis of these fragments, i.e., their definition and detection, comparison, characterization, and validation. Specifically, I developed the molecular language PatternQuery for the straightforward and general definition of biomacromolecular fragments. Afterwards, I introduced an approach for the detection of fragments described by this language, and implemented it in the PatternQuery server to be used for efficiently searching large structural databases. I further employed the PatternQuery language as a basis for other life science tools which deal with structural fragments. Specifically, in a tool called SiteBinder I designed and implemented a methodology for comparison of large sets of biomacromolecular fragments. Additionally, I created a software tool called MOLE 2 which is able to detect channels and pores in biomacromolecules, and calculate their important characteristics. In parallel, I developed an approach for the fast and reliable calculation of partial atomic charges, and implemented it in the web application AtomicChargeCalculator. Last but not least, I designed and implemented validation tools MotiveValidator and ValidatorDB which provide information about errors in biomacromolecular fragments.

All the tools can be used separately, however, a framework for their integration in the form of extending the PatternQuery language that streamlines and unifies the analysis process is also presented. Moreover, the basic ideas behind the PatternQuery language can be generalized to methods of analysis of structural biomolecular data other than biomolecular structural fragments. The thesis is composed of two parts. First, the theoretical background describing the principles and algorithms used, and second, composed of six journal publications about the developed tools and their life science applications.

The full text and reviews of the thesis can be found at https://is.muni.cz/auth/th/140435/fi_d/?lang=en.

Papers Selected papers that are included in or are the direct result of the Ph.D. work

* Denotes the first author(s).
Total of 10 journal publications that were cited 92 times (Web of Science; as of January 13 2016).
The full list of papers, talks, and posters can be found at https://is.muni.cz/osoba/david.sehnal?lang=en#publikace.

Software Developed as part of the Ph.D. work

WebChemistry

WebChemistry is a set of freely available tools for in silico analysis of structural bioinformatics data. It is useful for complex analysis of structural patterns in biomacromolecules (such as binding sites, catalytic sites, specific protein or nucleic acid sequences, or channels). The tools, described in more detail below, give the user the ability to detect, validate, compare, and characterize any pattern of interest.

Theoretical description of the algorithms can be found in the text of the thesis. Case studies of the usage of the individual applications are available in the articles listed above and/or the WebChemistry web.

PatternQuery
PatternQuery

PatternQuery is a web service enabling the user to effectively define, extract, and analyze structural patterns in biomolecular complexes using simple query language. Such analysis is particularly useful not only in the structural and functional assignment of uncharacterized or newly determined proteins, but also represents a key point in rational design and engineering of novel functional sites, and comparative protein structural analyses.

MotiveValidator
MotiveValidator

MotiveValidator is an application designed to help the user to determine whether a residue or a ligand in a biomolecule or biomolecular complex is structurally complete and correctly annotated according to its models stored in the wwPDB Chemical Component Dictionary.

ValidatorDB
ValidatorDB

ValidatorDB is a database of validation reports for the entire Protein Data Bank based on the MotiveValidator tool.

SiteBinder
SiteBinder

SiteBinder is a software designed for comparing the topology and 3D structure of up tens of thousands of biomolecular structural motifs containing up to hundreds of atoms.

MOLE
MOLE

MOLE is a tool designed to analyze channels and pores in biomacromolecules. These channels and pores play significant biological roles, e.g., in molecular recognition and enzyme substrate specificity.

AtomicChargeCalculator
AtomicChargeCalculator

AtomicChargeCalculator is a complex yet intuitive utility for the calculation, visualization, and analysis of atomic charges for small drug­like molecules, as well as for large biomolecular complexes. Empirical atomic charges that respond to changes in molecular conformation are calculated via an efficient implementation of the well-established Electronegativity Equalization Method (EEM).