Reducing Square-Error of Jarvis-Patrick Algorithm for Drug Discovery

Ashraf B.    El-Sisi; Hamdy M.    Mousa; Mohamed G.    Malhat

doi:10.17577/IJERTV3IS070727

Volume 03, Issue 07 (July 2014)

Reducing Square-Error of Jarvis-Patrick Algorithm for Drug Discovery

DOI : 10.17577/IJERTV3IS070727

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 66
Total Downloads : 120
Authors : Ashraf B. El-Sisi, Hamdy M. Mousa, Mohamed G. Malhat
Paper ID : IJERTV3IS070727
Volume & Issue : Volume 03, Issue 07 (July 2014)
Published (First Online): 21-07-2014
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Reducing Square-Error of Jarvis-Patrick Algorithm for Drug Discovery

Ashraf B. El-Sisi, Hamdy M. Mousa, Mohamed G. Malhat Computer Science dept., Faculty of Computers and Information, Menofia University, Egypt

AbstractClustering algorithms play an important role in chemoinformatics and especially in the drug discovery process. Clustering methods may be hierarchical or non-hierarchical. Non-hierarchical algorithms have fast processing for clustering large chemical data sets than hierarchical algorithms. One of the most popular non-hierarchical clustering algorithms that are used in many applications in the drug discovery process is Jarvis-Patrick algorithm. The applications of Jarvis-Patrick in the drug discovery process are compound selection, compound acquisition, low-throughput screening and Quantitative Structure-Activity Relationship (QSAR) analysis. Jarvis-Patrick groups compounds in a cluster based on a three neighborhood conditions. These three conditions groups compounds, which are not similar enough, in the same cluster. Adding dissimilar compounds in the same cluster will lead to poor compound selection, compound acquisition and QSAR analysis. In this paper, standard Jarvis-Patrick is modified by adding a fourth condition which computed only if the three standard conditions are true. This condition computes the increasing in the value of Square Error (SE) of the cluster by adding a compound and compares it with expected increasing in SE to determine whether to add a compound to the cluster or not. The result shows that our modification produces clusters with more similar compounds and still has fast processing.

KeywordsChemoinformatics; Drug Discovery; Non- hierarchical Clustering; Jarvis-Patrick

INTRODUCTION

The use of clustering for chemical applications is based on similar property and activity principle which states that compounds with similar structures are likely to exhibit similar properties, which known as Structure-Property Relationship (SPR), and similar activities which known as Structure- Activity Relationship (SAR) [1]. Clustering algorithms, which are used in chemical application, must group more similar compounds in term of properties or activity in the same cluster based on their structure. Most clustering algorithms for chemical application cover the area of drug discovery process [2, 3]. The drug discovery is the process of making drugs that response to diseases with fewer side effects. It consists of seven steps: disease selection, target hypothesis, leads compound identification, lead optimization, pre-clinical trial, and clinical trial and pharmacogenomic optimization [4].

Chemoinformatics are used in lead compound identification and optimization steps [5]. Chemoinformatics are the application of informatics methods that are used to

solve chemical problems. It is a new discipline emerging from storing, manipulating, processing, design, creation, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information. The use of chemoinformatics becomes a critical part of the drug discovery process as it accelerates the drug discovery process and reduces the overall cost [6, 7]. There are many applications of chemoinformatics in the drug discovery such as compound selection, compound acquisition, virtual library generation, virtual screening, QSAR analysis and Absorption, Distribution, Metabolism, Elimination, and Toxicity (ADMET) prediction [8-11]. Central tasks of most of these applications are the establishment of a relationship between a chemical structure and its biological activity and the prediction of pharmacological properties in addition to lead finding [5, 6].

Clustering algorithms are used in most of these applications as a method of selection, diversity analysis and data reduction. Compared to the other costs of drug discovery, clustering can add significant value at minimal cost [12]. Clustering algorithms divided into two main categories hierarchical and non-hierarchical. Jarvis-Patrick is one of the most popular non-hierarchical clustering algorithms that has a wide range of applications in chemoinformatics because of it is fast processing for clustering large chemical data sets and ease implementation. Standard implementation of Jarvis- Patrick may group compounds in one cluster that are not similar enough because the compounds satisfy the three neighborhood conditions. Adding dissimilar compounds in the same cluster will increase the value of SE in clusters and lead to increase in the SSE (the sum of SE for all clusters) of the produced clusters. SSE is one of the quality measures that used to evaluate clustering algorithm in its ability to group more similar compounds in the same cluster.

Standard Jarvis-Patrick is modified by adding a condition that will be computed only if the standard Jarvis-Patrick conditions are true. This condition will determine if to add a compound to a cluster or not. The condition computes the increasing in value of Square Error (SE) of the cluster by adding this compound and compares it with expected increasing in SE. If this increasing is less than or equal to the expected increasing then the compound will be added to the cluster else the compound will not be added. The results show that by adding this condition, Jarvis-Patrick will not add dissimilar compounds to the same cluster and still has fast processing. The organization of this paper is as following. In

section 2, standard Jarvis-Patrick and its usage in chemoinformatics are overviewed. In section 3, our modification on Jarvis-Patrick is proposed. In section 4, modified Jarvis-Patrick is compared with standard Jarvis- Patrick and their implementation and experimental results are discussed. Finally in section 5, conclusion is given.
JARVIS-PATRICK CLUSTERING USAGE IN CHEMOINFORMATICS

Clustering methods are used in a number of disciplines such as computer science, information technology, information system, engineering, bioinformatics and chemoinformatics. The main using of clustering methods in chemoinformatics is to group similar compounds in a cluster based on the underlying distribution of input. After grouping these compounds, the activity of compound is predicted based on known compounds activity that are in the same cluster.

Jarvis-Patrick is one of the most popular methods that have a wide range of applications in chemoinformatics because of its ability to handle large data sets in reasonable time, ease implementation and the availability of an efficient commercial implementation from Daylight for handling very large data sets [13]. Jarvis-Patrick is non-hierarchical non- overlapping clustering method. Non-overlapping means that each compound can be only in one cluster. Non-hierarchical means that data set is analyzed to produce a single partition of the compounds resulting in a set of clusters.

Standard Jarvis-Patrick method proceeds in two levels [14]. In the first level, a list of the top K nearest neighbors (K is usually16) is generated for each compound in the data set. The nearest neighbors are usually determined by the Euclidean distance for numerical descriptor and by the Tanimoto coefficient for binary descriptor [15]. In the second level, the nearest-neighbor lists are scanned to create clusters that satisfy the three following neighborhood conditions:
1. The top K nearest-neighbor list of compound i must contain compound j.
2. The top K nearest-neighbor list of compound j must contain compound i.
3. The top K nearest-neighbor lists of compound i and j must have at least K-Mi common compounds (Kmin is determined by user and in the range 1 to K).
The pairs of compounds, that don't satisfy any of the above three conditions, are not put into the same cluster. The value of top K nearest-neighbors specifies the number of compound's neighbors to consider when counting the number of mutual neighbors shared with another compound. This value must be at least 2. Lower values make the algorithm to finish faster, but the final set of clusters will have many small clusters. Higher values cause the algorithm to take longer time to finish, but may result in fewer clusters and clusters that form longer chains. The K-Min specifies the minimum

Several modifications have been developed to overcome singletons problem such as:
1. A variable-length nearest-neighbor list [16], a proximity threshold is used to determine a variable number of neighbors for each compound. All neighbors that pass the threshold test are considered as neighbors to this compound. By this modification, outliers are prevented from joining a cluster while preventing the arbitrary splitting of large clusters arising from the limitations imposed by fixed length lists.
2. Re-clustering of singletons [17], standard Jarvis Patrick is applied in an iterative way to remove the singletons. The singletons are assigned to a cluster using less strict parameters than defined by user. This iterative way is repeated until a fewer a specified percentage of singletons remain.
3. Fuzzy clustering [18], all compounds are assigned a probability that determines the distances of compounds from each cluster. The singletons are assigned to its nearest cluster based on specified threshold probability. For singletons that not exceed threshold, they will be regarded as outliers and remains as singletons.
The applications of JarvisPatrick clustering in chemoinformatics are compound selection, compound acquisition and high throughput screening. In [19], Jarvis- Patrick is used to cluster a data set of about 240,000 compounds for compound selection. Singletons are moved to the nearest non-singleton cluster. Then cluster centroids are calculated for each cluster to select representative compounds based on their closet centroid. In [20], Jarvis Patrick is to assist low-throughput screening and to support QSAR analysis by analyzing databases for efficient compound acquisition. In [17], JarvisPatrick is used for high throughput screening by the selection of compounds from the corporate database. In [18], Jarvis-Patrick is used for analysis of the compound database to support high throughput screening.

The previous modifications are developed to overcome the singletons problem. The three neighborhood conditions of Jarvis-Patrick don't guarantee to group more similar compounds in the same cluster. So, the produced clusters have large SSE values. In the next section, the standard Jarvis- Patrick algorithm will be modified by adding a fourth condition to overcome this problem.

PROPOSED MODIFICATION ON STANDARD JARVIS-PATRICK

The standard Jarvis-Patrick will be modified by adding a fourth neighborhood condition that will be computed only if the three previous neighborhood conditions are true. The fourth condition will compute the increasing in SE for a cluster contains compound i after adding compound j to this cluster and compare it with expected increasing in SE. First, for the cluster of n compounds each represented by a vector. The vector of the cluster centroid, x(c), is defined as

=1

number of mutual nearest neighbors that the two compounds must have to be in the same cluster. This value must be at least 1 and must not exceed the value of the K nearest- neighbors. Lower values result in clusters that are compact. Higher values result in clusters that are more dispersed.

The standard implementation of Jarvis-Patrick produces a large number of singletons and clusters with large SSE.

X c = (1 )

x(r)

(1)

Subset Name	Number of Compounds	SE
NCI-1	100	25.61546473
NCI-2	500	791.56501
NCI-3	1000	1838.0002

The centroid is the simple arithmetic mean of the vectors of the cluster members. The SE for a cluster is the sum of squared Euclidean distances to the centroid for all n compounds in that cluster. The SE is defined as

TABLE I. THREE SUBSETS OF NCI DATA SET

SE =

=1

[ ]2

(2)

The SSE is the summation of SE for all produced m clusters and is defined as

BCUT descriptor is used to represent compounds in the three subsets [22]. For each NCI subset, 4 runs are recorded with K=16 and K-Min= 4, 8, 12 and 14 for each run. Table 2 shows the K, K-Min, Number of Clusters (NOC),

=1

SSE =

(3)

Computation time in milliseconds and SSE of standard Jarvis Patrick algorithm. Tables 3, 4, 5 and 6 show the same

The increasing in SE is the difference between the value of SE for the cluster containing i after adding compound j and before adding compound j. The increasing in SE is defined as

Increasing in SE =

(4)

The expected increasing in SE is the SE for data set divided by number of compounds n multiplied with a user specified ratio r; r is a value between 0 and 1. Small values of r will ensure that more similar compounds will be grouped into the same cluster. The expected increasing in SE is defined as

Expected increasing in SE =

information for modified Jarvis Patrick algorithm where r = 1.0, 0.5, 0.1 and 0.01.

Data set Name	K	K- Min	NOC	SSE	Time in Milliseconds
NCI-1	16	4	8	13.29864	40
	16	8	10	4.069484	20
	16	12	28	1.49243	10
	16	14	62	0.238635	10
NCI-2	16	4	46	44.92072	190
	16	8	63	29.30268	140
	16	12	200	14.63466	130
	16	14	335	2.768861	120
NCI-3	16	4	85	43.46274	480
	16	8	126	28.03772	420
	16	12	387	11.34654	410
	16	14	683	6.63129	410

TABLE II. OUTPUT OF STANDARD JARVIS-PATRICK ALGORITHM

(5)

If increasing in SE is less than or equal expected increasing, then compound j will be added to the cluster containing compound i, else compound j will not be added to this cluster. By adding this modification, fourth condition will produce clusters with less SSE by not adding the compounds that will increase SE than expected increasing into the same cluster. So, compound selection, acquisition and QSAR analysis will be more efficient and the algorithm still has fast processing because the fourth condition will not be computed only if the three conditions of standad Jarvis-Patrick algorithm are true.

IMPLEMENTATION ND EXPERIMENTAL RESULTS The implementations of the algorithms are in JAVA, under

Windows-7 operating system, Intel core-i5, 2.5 GHz and Ram 4 GB. NCI data set, one of the most popular data set, is used for experimental [21]. Three random subsets are taken from NCI data set with the following number of compounds and SE as shown in Table 1.

Subset Name	K	K- Min	NOC	SSE	Time in Milliseconds
NCI-1	16	4	10	4.926679	80
	16	8	11	3.459389	30
	16	12	28	1.457589	30
	16	14	62	0.232923	10
NCI-2	16	4	48	39.88919	270
	16	8	65	24.27115	150
	16	12	201	11.79896	140
	16	14	335	2.768861	140
NCI-3	16	4	84	40.29349	620
	16	8	126	23.87385	460
	16	12	378	11.19564	450
	16	14	659	6.579031	430

TABLE III. OUTPUT OF MODIFIED JARVIS-PATRICK ALGORITHM WHERE R = 1.0

NCI-3	16	4	171	5.603048	570
	16	8	198	4.629283	470
	16	12	420	1.893126	440
	16	14	677	0.592186	430

TABLE IV. OUTPUT OF MODIFIED JARVIS-PATRICK ALGORITHM WHERE R = 0.5

MJP r=1.0

MJP r=0.5 MJP r=0.1 MJP r=0.01

4

SJP

Sum of Square Error

Subset Name	K	K- Min	NOC	SSE	Time in Milliseconds
NCI-1	16	4	25	0.897523	70
	16	8	25	0.896463	50
	16	12	36	0.530665	20
	16	14	64	0.177409	10
NCI-2	16	4	61	11.57974	290
	16	8	76	8.472676	170
	16	12	208	3.491306	150
	16	14	336	1.312308	140
NCI-3	16	4	91	25.03054	600
	16	8	133	16.17223	480
	16	12	382	5.916373	440
	16	14	662	2.095064	440

TABLE V. OUTPUT OF MODIFIED JARVIS-PATRICK ALGORITHM WHERE R = 0.1

Fig.1 shows the SSE for the Standard Jarvis-Patrick (SJP) and Modified Jarvis-Patrick (MJP) where r = 1.0, 0.5, 0.1 and

14

12

10

8

6

4

2

0

Subset Name	K	K- Min	NOC	SSE	Time in Milliseconds
NCI-1	16	4	13	2.894775	90
	16	8	13	2.878076	30
	16	12	30	0.858161	20
	16	14	62	0.232923	20
NCI-2	16	4	52	29.29845	270
	16	8	67	16.73628	180
	16	12	203	7.632148	150
	16	14	336	2.705852	140
NCI-3	16	4	84	39.44875	610
	16	8	126	23.02911	460
	16	12	378	9.807962	430
	16	14	659	5.191356	410

0.01 for the three subsets. As shown in Fig.1, our approach produces clusters with less or equal SSE than SJP for all subsets with K-Min = 4, 8, 12 and 14. For example in NCI-1 subset when K-Min = 4, SJP produces clusters with SSE equal 13.2986 and MJP produces clusters with SSE equal 4.9266 where r = 1.0, 2.8947 where r = 0.5, 0.8975 where r = 0.1 and 0.0313 where r = 0.01. When K-Min = 14, SJP produces clusters with SSE equal 0.2386 and MJP produces clusters with SSE equal 0.2329 where r = 1.0, 0.2329 where r = 0.5, 0.1774 where r = 0.1 and 0.0194 where r = 0.01. From previous results, as the value of K-Min increase, MJP produces clusters with SSE less than or equal to SJP. When K- Min decrease, MJP produces clusters with SSE less than SJP for all values of r.

Value of K-Min

Subset Name	K	K- Min	NOC	SSE	Time in Milliseconds
NCI-1	16	4	70	0.031339	70
	16	8	70	0.031339	50
	16	12	72	0.02826	30
	16	14	78	0.01946	10
NCI-2	16	4	117	2.293131	270
	16	8	130	1.933292	180
	16	12	237	0.764977	140
	16	14	348	0.368605	140

TABLE VI. OUTPUT OF MODIFIED JARVIS-PATRICK ALGORITHM WHERE R = 0.01

(a)

14

12

8

NCI-1

50

Sum of Square Error

40

30

20

10

0

4 8 12 14

Value of K-Min

(b)

SJP

MJP r=1.0 MJP r=0.5 MJP r=0.1 MJP r=0.01

NCI-2

45

35

25

15

5

SJP

MJP r=1.0 MJP r=0.5 MJP r=0.1

MJP r=0.01

Value of K-Min

14

12

8

MJP r=1.0

MJP r=0.5 MJP r=0.1 MJP r=0.01

4

SJP

400

350

300

250

200

150

100

50

0

NCI-2

Value of K-Min

14

12

8

4

-5

NCI-3

Sum of Square Error

Number of Clusters

(c)

Figure 1. SSE of SJP and MJP for three subsets where r = 1.0, 0.5, 0.1 and 0.01

Fig.2 shows the number of clusters generated by SJP and MJP where r = 1.0, 0.5, 0.1 and 0.01 for the three subsets. As shown in Fig.2, the number of clusters generated by our approach is large than or equal to the number of clusters generated by SJP for all subsets with K-Min = 4, 8, 12 and 14. For example in NCI-1 subset when K-Min = 4, SJP produces 8 clusters and MJP produces 10 clusters where r = 1.0, 13

MJP r=1.0

MJP r=0.5 MJP r=0.1 MJP r=0.01

4

SJP

80

70

60

50

40

30

20

10

0

Number of Clusters

clusters where r = 0.5, 25 clusters where r = 0.1 and 70 clusters where r = 0.01. When K-Min = 14, SJP produces 62 clusters and MJP produces 62 clusters where r = 1.0, 62 clusters where r = 0.5, 64 clusters where r = 0.1 and 78 clusters where r = 0.01. From previous results, as the value of K-Min increase MJP and SJP produce similar number of clusters and when K-Min decrease MJP produces more clusters than SJP for all values of r.

Value of K-Min

14

12

8

NCI-1

(a)

(b)

Value of K-Min

14

12

8

MJP r=1.0

MJP r=0.5 MJP r=0.1 MJP r=0.01

4

SJP

700

600

500

400

300

200

100

0

NCI-3

Number of Clusters

(c)

Figure 2. Number of Clusters of SJP and MJP for three subsets where r = 1.0, 0.5, 0.1 and 0.01

Fig.3 shows the time required in milliseconds for SJP and MJP where r = 1.0, 0.5, 0.1 and 0.01 for the three subsets. As shown in Fig.3, The time required for our approach is large than or equal to the time required for SJP for all subsets with K-Min = 4, 8, 12 and 14. For example in NCI-1 subset when K-Min = 4, SJP takes 60 milliseconds and MJP takes 60 milliseconds where r = 1.0, 90 milliseconds where r = 0.5, 70 milliseconds where r = 0.1 and 70 milliseconds where r =

When K-Min = 14, SJP takes 10 milliseconds and MJP takes 10 milliseconds where r = 1.0, 20 milliseconds where r = 0.5, 10 milliseconds where r = 0.1 and 10 milliseconds where r = 0.01. From previous results, as the value of K-Min increase MJP and SJP take similar computation time and when K-Min decrease MJP takes more time than SJP for all values of r. The increasing in time for MJP represents the overhead time needed to process the fourth condition.

100

Time in Milliseconds

80

60

40

20

0

4 8 12 14

Value of K-Min

(a)

SJP

MJP r=1.0 MJP r=0.5 MJP r=0.1 MJP r=0.01

condition. The increasing in time needed by our approach is overhead time to apply the fourth condition.

CONCULSION

NCI-1

The demands of clustering data sets of several million compounds with high-dimensional representations led to the widespread adoption of a few inherently efficient and optimally implemented methods. Jarvis-Patrick is one of the most popular clustering methods that have many applications in chemoinformatics such as compound selection, compound acquisition, lead-finding and QSAR analysis. In this paper, standard Jarvis-Patrick is modified in order to group more similar compounds in the same cluster and avoiding adding compounds to clusters that will increase SSE. The results show that our modification produces clusters with less SSE than standard Jarvis-Patrick. So, compound selection, acquisition and QSAR analysis will exhibit better efficiency and at the same time Jarvis-Patrick still has fast processing. In the future work, Modified Jarvis-Patrick will be applied for large chemical data sets and will be compared with ward clustering algorithm.

SJP

300

250

200

Time in Milliseconds

REFERENCES

150

MJP r=1.0

100

50

0

700

Time in Milliseconds

600

500

400

300

200

100

0

4

8

12

14

Value of K-Min

(b)

4 8 12 14

Value of K-Min

(c)

MJP r=0.5

MJP r=0.1

MJP r=0.01

SJP

MJP r=1.0 MJP r=0.5 MJP r=0.1 MJP r=0.01

NCI-3

Geoffrey M. Downs and Peter Willett, "The Use of Similarity and Clustering Techniques for the Prediction of Molecular Properties," in Applied Multivariate Analysis in SAR and Environmental Studies, J. Devillers and W. Karcher, Eds.: Springer Netherlands, 1991, vol. 2, pp. 247-279.

NCI-2
J. Nouwen and B. Hansen, "An Investigation of Clustering as a Tool in Quantitative Structure-Activity Relationships (QSARS)," SAR and QSAR in Environmental Research, vol. 4, no. 1, pp. 1-10, 1995, PMID: 22091841.
Kenny B. Lipkowitz and Donald B. Boyd, Reviews in Computational Chemistry. New York, NY, USA: John Wiley ; Sons, Inc., 2002.
Thomas Engel, "Basic Overview of Chemoinformatics," Journal of Chemical Information and Modeling, vol. 46, no. 6, pp. 2267-2277, 2006, PMID: 17125169.
Gyorgy M. Keseru and Gergely M. Makara, "Hit discovery and hit-to- lead approaches," Drug Discovery Today , vol. 11, no. 15, pp. 741-748, 2006.
Charu C. Aggarwal and Haixun Wang, Managing and Mining Graph Data, 1st ed.: Springer Publishing Company, Incorporated, 2010.
Andrew R. Leach and Valerie J. Gillet, An Introduction to Chemoinformatics.: Springer Publishing Company, Incorporated, 2007.
Jeremy L. Jenkins, Andreas Bender, and John W. Davies, "In silico target fishing: Predicting biological targets from chemical structure ," Drug Discovery Today: Technologies , vol. 3, no. 4, pp. 413-421, 2006.
Meenakshi Mishra, Hongliang Fei, and Jun Huan, "Computational Prediction of Toxicity," Int. J. Data Min. Bioinformatics, vol. 8, no. 3, pp. 338-348, 2013.
Christian Korn and Stefan Balbach, "Compound selection for development – Is salt formation the ultimate answer? Experiences with an extended concept of the "100 mg approach"," European Journal of Pharmaceutical Sciences , vol. 57, no. 0, pp. 257-263, 2014, Special Issue on 7th International Symposium on Microdialysis – Edited By:

Figure 3. Time Required for SJP and MJP for three subsets where r = 1.0, 0.5, 0.1 and 0.01

Form Figures 1-3, our approach results reduce SSE in the resulted clusters than SJP. This SSE reducing is obvious for small values of K-Min because small values of K-Min will give the opportunity for the fourth condition to be invoked and the percentage of SEE reducing is depending on value of r. If the value of r is small then the SSE is more reduced. In order to reduce SSE, more clusters will be generated. These extra clusters represent compounds that don't satisfy the fourth

William Couet and Hartmut Derendorf.
Vishnu J. Gaikwad, "Application of Chemoinformatics for Innovative Drug Discovery," International Journal of Chemical Sciences and Applications, vol. 1, no. 1, pp. 16-24, 2010.
S Kavi Priya and M Lingaraj, "Performance analysis of data clustering in rapid mediccal development," International Journal of Engineering Research and Science \& Technology, vol. 2, no. 2, pp. 115-122, 2013.
Daylight. [Online]. http://www.daylight.com/
R. A. Jarvis and Edward A. Patrick, "Clustering Using a Similarity Measure Based on Shared Near Neighbors," Computers, IEEE Transactions on, vol. C-22, no. 11, pp. 1025-1034, 1973.
Peter Willett, John M. Barnard, and Geoffrey M. Downs, "Chemical Similarity Searching," Journal of Chemical Information and Computer Sciences, vol. 38, no. 6, pp. 983-996, 1998.
Robert D. Brown and Yvonne C. Martin, "Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection," Journal of Chemical Information and Computer Sciences, vol. 36, no. 3, pp. 572-584, 1996.
Paul R. Menard, Richard A. Lewis, and Jonathan S. Mason, "Rational Screening Set Design and Compound Selection: Cascaded Clustering," Journal of Chemical Information and Computer Sciences, vol. 38, no. 3, pp. 497-505, 1998.
Thompson N. Doman, John M. Cibulskis, Michael J. Cibulskis, Patrick Dale McCray, and Dale P. Spangler, "Algorithm5: A Technique for Fuzzy Similarity Clustering of Chemical Inventories," Journal of Chemical Information and Computer Sciences, vol. 36, no. 6, pp. 1195- 1204, 1996.
Peter Willett, Vivienne Winterman, and David Bawden, "Implementation of nonhierarchic cluster analysis methods in chemical information systems: selection of compounds for biological testing and clustering of substructure search output," Journal of Chemical Information and Computer Sciences, vol. 26, no. 3, pp. 109-118, 1986.
Malcolm J. McGregor and Peter V. Pallai, "Clustering of Large Databases of Compounds: Using the MDL "Keys" as Structural Descriptors," Journal of Chemical Information and Computer Sciences, vol. 37, no. 3, pp. 443-448, 1997.
NCI Data set. [Online]. http://cactus.nci.nih.gov/download/nci/.
BCUT Descriptor. [Online]. http://sourceforge.net/projects/cdk/.

Reducing Square-Error of Jarvis-Patrick Algorithm for Drug Discovery

Leave a Reply