Structural data for biomolecules are becoming increasingly available in the recent years. However, in addition to the high availability of the data, life science researchers are in need of advanced tools for performing sophisticated analyses and extracting meaningful information. My work is focused on one particular type of object that is very important for life science and has a straightforward representation in computer science—the biomacromolecular structural fragment.
I concentrate on all steps of theoretical analysis of these fragments, i.e., their definition and detection, comparison, characterization, and validation. Specifically, I developed the molecular language PatternQuery for the straightforward and general definition of biomacromolecular fragments. Afterwards, I introduced an approach for the detection of fragments described by this language, and implemented it in the PatternQuery server to be used for efficiently searching large structural databases. I further employed the PatternQuery language as a basis for other life science tools which deal with structural fragments. Specifically, in a tool called SiteBinder I designed and implemented a methodology for comparison of large sets of biomacromolecular fragments. Additionally, I created a software tool called MOLE 2 which is able to detect channels and pores in biomacromolecules, and calculate their important characteristics. In parallel, I developed an approach for the fast and reliable calculation of partial atomic charges, and implemented it in the web application AtomicChargeCalculator. Last but not least, I designed and implemented validation tools MotiveValidator and ValidatorDB which provide information about errors in biomacromolecular fragments.
All the tools can be used separately, however, a framework for their integration in the form of extending the PatternQuery language that streamlines and unifies the analysis process is also presented. Moreover, the basic ideas behind the PatternQuery language can be generalized to methods of analysis of structural biomolecular data other than biomolecular structural fragments. The thesis is composed of two parts. First, the theoretical background describing the principles and algorithms used, and second, composed of six journal publications about the developed tools and their life science applications.