Sinhgad Technical Education Society’s, Smt. Kashibai Navale College of Pharmacy, Pune-Saswad Road, Kondhwa (Bk), Pune 411048, Maharashtra, India.
Email: rajshama1@yahoo.com
Received: 17 Dec 2014 Revised and Accepted: 05 Jan 2015
ABSTRACT
Objective: Developing 4D QSAR model for chromenones as DNA-protein kinase inhibitors with LQTAgrid program and MATLAB with PLS Tool box.
Methods: 2D structures of forty five chromenones and pyrimido [2,1-a] isoquinolin-4-ones were drawn in Marvin application and subsequently converted to 3D format. With PRODRG server coordinate and topology files were generated. Molecular dynamics simulation for 20 ps was carried out in Gromacs4.6.3. Conformation ensemble for each molecule was aligned with the most active compound 22. Interaction energy descriptors were computed by placing NH3+ probe with LQTA grid program. 4D QSAR model was built-in MATLAB workspace with PLS Toolbox.
Results: 4D QSAR model developed with PLS regression showed root mean square error 0.188741, root mean square error predicted 0.575649, prediction bias -0.0221941, R2 calculated for training set 0.834502. The model built could predict the DNA-PK inhibitory activity for analogous compounds with reasonably good predictive ability. The Leenard Jones interaction descriptors were found important in the model suggesting the importance of 3D steric features in the molecule. These facts could be exploited in designing newer DNK-PK inhibitors with improved activity.
Conclusion: 4D QSAR model was developed with good squared correlation coefficient of 0.834502. Few Leenard Jones descriptors were found contributing to the model. The 4D QSAR model could be used in designing more active DNK-PK inhibitors.
Keywords: Chromenones, 4D QSAR, DNA-Protein kinase inhibitors, LQTA grid, MATLAB, PLS regression.
INTRODUCTION
Chromenones are important class of bioactive molecules consisting of chromen-2-ones and chromen-4-ones. The flavones, isoflavones, flavanoids and coumarins have been extensively studied phyto constituents belonging to chromen-2-ones and chromen-4-ones class. Chromenones possess derivatives possess diverse pharmacological activities including antitumor [1, 2], antivascular [3], antimicrobial [4, 5], antioxidant [6], TNF-a inhibitor [7], antifungal [8], anticoagulant[9], anti spasmolytic [10, 11], estrogenic [12, 13], antiviral[14], anthelminthic [15], anti-HIV [16], antitubercular [17, 18], anti-inflammatory [19, 20], herbicidal [21], analgesic [22] and anticonvulsant [23, 24] activity. Wortmannin and LY294002 (1) (fig. 1) are the benchmark phosphatidylinositol 3-kinase (PI 3-K) inhibitors and inhibit other kinases from PI 3-K family viz. DNA-dependent protein kinase (DNA-PK).
Fig. 1: PI 3-K inhibitors
The survival of cancer cell against radio or chemotherapy due to their ability to repair DNA damage has been concerned in development of anticancer agents. DNK-PK, serine/threonine protein kinase, is the key enzyme in repair of DNA double strand break and has homology in the kinase domain of PI 3-K family. Griffin RJ et al [25] reported chromenones and pyrimido[2,1-a]isoquinolin-4-ones as DNA-PK inhibitors. This report highlights the importance of morpholine ring, presence of fused benzene ring at 6, 7 and 7, 8 positions and carbonyl oxygen of chromenone. Quantitative structure activity relationship (QSAR) studies have been an important strategy in drug design [26]. Since its inception, QSAR approaches have evolved from 2D QSAR to more complex 4D QSAR. A 4D QSAR approach proposed by Hopfinger and co-workers [27] exploits conformational flexibility of ligand and freedom of alignment in three-dimensional spaces as required in 3D QSAR. In this approach grid cell occupancy descriptors, GCODs are calculated by aligning the set of molecules in the defined grid box. GOCDs are good measure of interaction pharmacophore elements, IPEs for atom sets like polar positive, polar negative, aromatic, hydrogen bond acceptor, hydrogen bond donor. The 4D QSAR models built with this formalism can be a way of designing bioactive molecules. In a pursuit of designing newer DNA-PK and PI 3-K inhibitors, 4D-QSAR study was carried out on the chromenones and pyrimido[2,1-a]isoquinolin-4-ones from the literature [25] with 4D QSAR approach called LQTAgrid-QSAR [28] (LQTA, Theoretical and Applied Chemometrics Laboratory, State University of Campinas, Brazil). Manuscript presents the 4D QSAR methodology and model built with Partial Least Squares (PLS) chemometric method.
MATERIAL AND METHODS
Computer hardware and software
Computational work was carried out on Ubuntu 12.0 and windows XP operating system. Software used included Autodock 4.2 with mgltools 1.5.6 [29], Marvin Sketch, UCSF Chimera1.8rc, Discovery Studio 3.5 (Biova Inc.), MATLAB 7.7.0 (R2008b) (MathWorks, Inc.), PLS Toolbox version 7.51 (Eigenvector Research, Inc), Pymol version1.3 (from Schrodinger, LLC), LQTAgrid (Martins JPA et al., 2009) and Gromacs 4.6.3 (a molecular dynamics suite) [30].
Dataset
Forty-five compounds presented in table 1 have been taken from the report of Griffin RJ et al [25]. Based on the substitution pattern in these derivatives 11 compounds were selected as test set and remaining 34 compounds as training set. DNA-PK IC50 in µM was transformed to pIC50 by taking the negative logarithm of IC50 values in moles.
Table 1: Structures of compounds
Compound No. | General structure | R1 | R2 | pIC50 (mole) |
1 | A | 8-Phenyl | 5.832683 | |
2 | B | - | 6.638272 | |
3 | 6.585027 | |||
4 | 5.860121 | |||
5* | 6.443697 | |||
6 | 6.408935 | |||
7 | 5.90309 | |||
8 | 5.489455 | |||
9 | 5.288193 | |||
10* | 5.176526 | |||
11 | 5.721246 | |||
12* | E | - | 6.552842 | |
13 | A | H | 5.899629 | |
14* | A | 8-CH3 | 5.782516 | |
15 | A | 7-OCH3 | 6.022276 | |
16* | A | 8-OCH3 | 5.920819 | |
17 | A | 6-Phenyl | 5.649752 | |
18 | A | 8-Phenyl | 5.330683 | |
19 | A | 8-Phenyl | 5.793174 | |
20 | B | - | 5.496209 | |
21* | B | - | 5.03292 | |
22 | B | - | 6.721246 | |
23 | B | - | 5.383 | |
24 | B | - | 5.453457 | |
25 | C | H | 5.316953 | |
26* | C | H | 5.055024 | |
27 | C | 6-CH3 | 5.271646 | |
28 | C | 8-CH3 | 5.094204 | |
29 | C | 6-OCH3 | 5.756962 | |
30 | C | 7-OCH3 | 4.946537 | |
31* | C | 8-OCH3 | 4.960983 | |
32 | C | 6-Cl | 5.485452 | |
33* | B | - | Phenyl | 5.302771 |
34 | B | - | 5.066007 | |
35 | B | - | 5.243364 | |
36 | B | - | 5.716699 | |
37* | D | H | 5.69897 | |
38 | E | - | 5.772113 | |
39 | E | - | 5.175224 | |
40 | E | - | 5.551294 | |
41 | E | - | 5.187755 | |
42* | E | - | 5.488117 | |
43 | E | - | 5.173277 | |
44 | E | - | 5.07263 | |
45 | E | - | 4.980053 |
* Test set compounds
Computational work
Initially 2D structures of dataset compounds were drawn in Marvin application interface and after adding explicit hydrogen structures were converted in 3D. Further computational steps are depicted in fig. 2.
Fig. 2: Schematic representation of 4D-QSAR Methodology
Gromacs coordinates and topology files
The 3D structures were processed in PRODRG2 server [30] to generate coordinate and topology files of all the compounds from dataset. As the structures were not pre optimized, energy minimization option in PRODRG was chosen.
Wherever necessary, add hydrogen or hybridization patch was used during generation of coordinate and topology files. The charges generated in topology files by PRODRG server were manually replaced by gasteiger charges computed in UCSF chimera.
Fig. 3: 3 Alignment of conformers generated during MD simulation (CEP) (A) Aligned CEP of most active (reference) compound 22; (B) CEP of active compound 1 aligned with compound 22 (compound 22 shown blue colour); (C) CEP of active compound 2; (D) CEP of active compound 3; (E) CEP of active compound 12; (F) CEP of less active compound 21; (G) CEP of least active compound 30; (H) CEP of less active compound 31; (I) CEP of less active compound 45
Molecular dynamics simulation
Molecular Dynamics simulation was carried out in Gromacs 4.6.3. During MD simulation, GROMOS96 ffG43a1 force field was used in an explicit water model in a cubic box of 1A0 volume. MD simulation included heating the system at 50, 100, 200 and 350K for 20 picoseconds (ps) simulation time with 1 famto seconds (fs) step size. Long-range electrostatics and Van der waal interaction energies were calculated with a cut off radius of 1 Å with Particle Mesh Ewald (PME) method. Compound and solvent water was separately coupled. Parrinello-Rahman coupling and velocity rescaling thermostat (V-rescale) respectively controlled the pressure and temperature of the system. The system was then cooled down to 300 K. The trajectory generated was recorded at every 2 ps simulation time.
Alignment
The most active compound from dataset (Compound 22) was chosen as reference compound. Conformations of all other compounds generated during MD simulation were aligned to this reference compound. The alignments of conformers of the few most active (1, 2, 3 & 12) and least active compounds (21, 30 & 45) from dataset aligned with conformer of most active compound 22 are shown in fig. 3.
Descriptors of interaction energy
In order to generate interaction energy descriptors, grid box of size 14 x 14 x 14, large enough to accommodate the CEPs was used in LQTAgrid module. LQTAgrid uses hypothetical N-terminal of the protein as –NH3+ probe to generate the matrix of interaction energy descriptors. The electrostatic property in terms of Coulombic potential function and steric 3D property in terms of Lennard-Jones potential function was generated as a descriptor matrix of 5488 descriptors containing 2744 Leenard-Jones and 2744 Coulombic potential based descriptors.
Variable selection and model development
Descriptor matrix was refined by eliminating descriptors having correlation lower than 0.3 leaving 523 descriptors. The data set was split into training set of 34 compounds and test set of 11 compounds. Test set compounds included the highest and lowest activity as well as structural diversity. The Partial Least square model was built using PLSToolbox ((Eigenvector Research, Inc) in MATLAB workspace. The model built by using absolute values, 10 latent variables and venetian blinds with 6 splits which includes 1 sample per split was found to be the best model.
RESULTS AND DISCUSSION
4 D QSAR Model
The objective of the current work was to build the best 4D-QSAR model with good predictive abilities and subsequently to exploit the 4D QSAR model in designing new DNA-PK / PI 3-K inhibitors. The approach of MD simulation and the consequent generation of interaction energy contributions with the aid of hypothetical –NH3+ probe emulate interaction of important residues at the binding site of DNA-PK with compounds under investigation. Among many chemometric tools, multiple linear regression (MLR) and partial least square regression (PLS) are frequently used regression approaches in development of QSAR models. In current work as the number of refined descriptors was 523, MLR was not chosen. This is because the number of independent variable matrix exceeds the dependent variable vector, and generated model could have the over fitting. Partial least square regression (PLS) can be good in this situation and in current investigation PLS regression was carried out.
The data was split into 34 training set compounds and 11 test set compounds. In MATLAB workspace with PLS Toolbox, data was pre-processed by auto scaling the raw data. Venetian blind cross validation method with 5 number of sample splits with 1 sample per split was chosen. The model was built with 19 latent variables.
This PLS model showed R2 calculated for training set 0.834502, root mean square error 0.188741, root mean square error predicted 0.575649, prediction bias -0.0221941. The pIC50 predicted with the residuals are shown in table 2. The plots of predicted activity against measured activity is shown in Fig. 4. Coefficient of most contributing descriptors is given in table 3.
Table 2: The measured pIC50 values and predicted pIC50 values for test set and training set compounds along with residuals
Comp. | Measured pIC50 | Predicted pIC50 | Residuals |
1 | 5.83268 | 5.75303 | 0.07964 |
2 | 6.63827 | 6.71624 | -0.07796 |
3 | 6.58502 | 5.99735 | 0.58766 |
4 | 5.86012 | 5.97798 | -0.11786 |
5* | 6.44369 | 5.87118 | 0.57250 |
6 | 6.40893 | 6.17532 | 0.23360 |
7 | 5.90308 | 6.10552 | -0.20243 |
8 | 5.48945 | 5.62230 | -0.13285 |
9 | 5.28819 | 5.51838 | -0.23019 |
10* | 5.17652 | 5.52461 | -0.34808 |
11 | 5.72124 | 5.34853 | 0.37271 |
12* | 6.55284 | 5.37326 | 1.17957 |
13 | 5.89962 | 5.81422 | 0.08540 |
14* | 5.78251 | 5.79515 | -0.01263 |
15 | 6.02227 | 5.91948 | 0.10279 |
16* | 5.92081 | 4.58487 | 1.33594 |
17 | 5.64975 | 5.65963 | -0.00988 |
18 | 5.33068 | 5.33233 | -0.00164 |
19 | 5.79317 | 5.81486 | -0.02168 |
20 | 5.49620 | 5.88143 | -0.38522 |
21* | 5.03292 | 5.88427 | -0.85135 |
22 | 6.72124 | 6.72132 | -0.00012 |
23 | 5.38299 | 5.23753 | 0.14546 |
24 | 5.45345 | 5.55418 | -0.10072 |
25 | 5.31695 | 5.37888 | -0.06193 |
26* | 5.05502 | 5.82225 | -0.76723 |
27 | 5.27164 | 5.46657 | -0.19493 |
28 | 5.09420 | 4.94749 | 0.14671 |
29 | 5.75696 | 5.49072 | 0.26623 |
30 | 4.94653 | 5.27887 | -0.33233 |
31* | 4.96098 | 5.51927 | -0.55829 |
32 | 5.48545 | 5.41195 | 0.07350 |
33* | 5.30277 | 4.88430 | 0.41846 |
34 | 5.06600 | 5.06552 | 0.00048 |
35 | 5.24336 | 5.24761 | -0.00425 |
36 | 5.71669 | 5.71042 | 0.00626 |
37* | 5.69897 | 5.77046 | -0.07149 |
38 | 5.77211 | 5.45695 | 0.31516 |
39 | 5.17522 | 5.19812 | -0.02290 |
40 | 5.55129 | 5.50205 | 0.04924 |
41 | 5.18775 | 5.50870 | -0.32095 |
42* | 5.48811 | 5.56078 | -0.07266 |
43 | 5.17327 | 5.42157 | -0.24829 |
44 | 5.07262 | 5.07059 | 0.00203 |
45 | 4.98005 | 4.98082 | -0.00076 |
* Test set compounds
Fig. 4: Predicted pIC50 values against measured pIC50 values for training set and test set compounds (Black circles: training set compounds; red triangles: test set compounds)
Table 3: Contributions of descriptors (Variables) in final model
Variable | Coefficient | Variable | Coefficient |
0_0_0_NH3+_LJ | 34.704 | 0_0_1_ NH3+_LJ | 25.739 |
0_1_0_ NH3+_LJ | 23.749 | 0_0_2_ NH3+_LJ | 19.310 |
1_0_1_ NH3+_LJ | 17.505 | 0_1_1_ NH3+_LJ | 17.135 |
2_0_0_ NH3+_LJ | 17.087 | 0_2_0_ NH3+_LJ | 15.843 |
1_1_0_ NH3+_LJ | 15.478 | 5_0_0_ NH3+_LJ | 14.856 |
0_0_3_ NH3+_LJ | 14.603 | 1_0_2_ NH3+_LJ | 12.728 |
0_1_2_ NH3+_LJ | 12.629 | 3_0_0_ NH3+_LJ | 12.044 |
2_0_1_ NH3+_LJ | 11.742 | 6_0_0_ NH3+_LJ | 11.370 |
0_2_1_ NH3+_LJ | 10.997 | 0_0_4_ NH3+_LJ | 10.964 |
1_1_1_ NH3+_LJ | 10.563 | 0_3_0_ NH3+_LJ | 10.156 |
0_10_0_ NH3+_LJ | -10.513 | 2_0_8_ NH3+_LJ | -12.601 |
0_1_9_NH3+_LJ | -26.958 | 1_0_9_NH3+_LJ | -38.509 |
0_0_9_ NH3+_LJ | -54.750 |
The 4D-QSAR model showed that only Leenard-Jones (LJ) interaction energy descriptors are contributing to the model. LJ interaction energy descriptors showed importance of steric 3D features required in the compounds. Few LJ descriptors show negative contributions which revealed unfavourable region in the molecules which can affect the pIC50 adversely. More importantly the LJ descriptors around aromatic ring are inversely contributing to the model where as LJ descriptors around morpholine ring are favourable for the activity. This is in good agreement with the SAR proposed by Griffin RJ et al [25]. In line with these findings currently we are developing library of 2, 3 di substituted chromenones. The contributions of descriptors 0_0_0_NH3+_LJ, 0_0_1_ NH3+_LJ, 0_1_0_ NH3+_LJ, 0_0_2_ NH3+_LJ, 2_0_8_ NH3+_LJ, 0_1_9_NH3+_LJ, 1_0_9_NH3+_LJ, 0_0_9_ NH3+_LJ around most active compound 22 and least active compound 30 is shown in fig. 5.
Fig. 5: Interpretation of interaction energy descriptors. (A) Interaction energy descriptors around most active compound 22; (B) interaction energy descriptors around least active compound 30
CONCLUSION
4D-QSAR model was built using gromacs based molecular dynamics simulation on chromoneone derivatives as DNK-PK inhibitors. In the model, interaction energy descriptors were constructed on CEP of each compound. PLS regression was carried out on selected descriptor matrix of 523 descriptors. The selected model with 19 latent variables showed R2 calculated for training set 0.834502, root mean square error 0.188741, root mean square error predicted 0.575649, prediction bias -0.0221941 suggesting reasonable predictive abilities in the model developed. Few LJ interactions are important for higher activity of these compounds. As a part of future work, we are designing 2, 3 di substituted chromenones which will mainly contribute to LJ interactions.
CONFLICT OF INTERESTS
Declared None
ACKNOWLEDGEMENT
Authors are thankful to Prof. M. N. Navale, Founder President, Sinhgad Technical Education Society for constant encouragement.
REFERENCES