Microsoft word - 281-83-87

FULL PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 1, November 2009 EVISTA – Interactive Visual Clustering System K. Thangavel1, P. Alagambigai2 1 Department of Computer Science, Periyar University, Salem, Tamilnadu, India Email: 2 Department of Computer Applications, Easwari Engineering College, Chennai, Tamilnadu, India Email: Abstract—Due to the enormous increase in the data, exploring
Visualization techniques could enhance the current and analyzing them is increasingly important but difficult to
knowledge and data discovery methods by increasing the user achieve. Information visualization and visual data mining can
involvement in the interactive process. More recently there help to deal with this. Visual data exploration has a high
are a lot of discussions on visualization for data mining. potential and many applications such as fraud detection and
Visual data mining can be viewed as an integration of data data mining will use information visualization technology for an
improved data analysis. The advantage of visual data
visualization and data mining [5, 15]. Considering
exploration is that the user is directly involved in the data
visualization as a supporting technology in data mining, four mining process. There are a large number of information
possible approaches are stated in [1]. The first approach is the visualization techniques which have been developed over the last
usage of visualization technique to present the results that are decade to support the exploration of large data sets. VISTA is an
obtained from mining the data in the database. Second interactive visual cluster rendering system which invites human
approach is applying the data mining technique to into the clustering process, but there are some limitations in
visualization by capturing essential semantics visually. The identifying the cluster distribution and human-computer third approach is to use visualization techniques to
interaction. In this paper, we propose an Enhanced VISTA
complement the data mining techniques. The fourth approach (EVISTA) which addresses these drawbacks. EVISTA improves
the visualization in two ways: first it uses the weighted vector

uses visualization technique to steer mining process. normalization instead of max-min normalization, which
In general, visualization can be used to explore data to improves the data visualization such that the user can confirm a hypothesis or to manipulate a view. Exploratory
understand the underlying pattern without human intervention.
visualization creates a dynamic scenario in which interaction Secondly it completely eliminates the use of α tuning, which
is critical. The user not necessarily know that what he/she is reduces the complexity in visual distance computation and eases
looking for, can search for structures or trends and is the human computer interaction in a better way. The attempting to arrive at some hypothesis. The confirmatory
experiment results show that EVISTA explore the underlying
visualization, in which the system parameters are often pattern of the dataset effectively and reduces the user operation
predetermined and the visualization tools are used to confirm burden greatly.

or refute the hypothesis. The manipulative visualization Index Terms— Clustering, EVISTA, Human-computer focuses on refining the visualization to optimize the
interaction, Information visualization, Visual data mining.
presentation. Visualization has been categorized in to two major areas: i) scientific visualization –which focuses primarily on physical data such as human body, etc. ii) Information visualization – which focuses on abstract Data visualization is essential for understanding the nonphysical data such as text, hierarchies and statistical data. concept of multidimensional spaces [5]. It allows the user to Data mining techniques primarily oriented on information explore the data in different ways at different levels of visualization [4]. Both scientific visualization and abstraction to find the right levels of details. Therefore information visualization create graphical models and visual techniques are most useful if they are highly interactive, representations from data that support direct user interaction permit direct manipulation and include a rapid response time. for interaction for exploring and acquiring insight in to useful Visualization is defined by ware as "a graphical information embedded in the underlying data [10, 15]. Even representation of data or concepts" which is either an though visualization techniques have advantages over "internal construct of the mind" or an "external artifact automatic methods, it brings up some specific problems such supporting decision making". Visualization provides valuable as limitation in visibility, visual bias due to mapping of assistance to the human by representing information visually. dataset to 2D/ 3D representation, easy-to-use visual interface This assistance may be called cognitive support. Visualization operations and reliable human-computer interaction. In most can provide cognitive support through a number of of the visualization methods the human-computer interaction mechanisms such as grouping related information for easy costs than automated [9]. In general, the visual data mining is search and access, representing large volumes of data in a different from scientific visualization and it has the following small space and imposing structure on data and tasks can characteristics: reduce time complexity, allowing interactive exploration Wide range of users through manipulation of parameter values [11]. Wide choice of visualization techniques and 2009 ACEEE DOI: 01.IJRTET.02.01.281 FULL PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 1, November 2009 Important dialog function. Star coordinate system is a traditional multivariate data The users of scientific visualization are scientists and visualization technique in which the k-axis is defined by an engineers who can endure the difficulty in using the system O = (x, y) k coordinate for little at most, whereas a visual data mining must have the S ,1 S 2, S ,., represents the possibility that the general persons uses widely and so on k dimensions in 2D spaces. easily [16]. By considering this issue, this paper proposes a The k coordinates are equidistantly distributed on the novel information visualization technique called enhanced circumference of the circle C, where the unit vectors are visual clustering system (EVISTA), an extension version of VISTA [8]. VISTA, a dynamic data visualization model which invite human into the clustering process. Even though Si = (cos( 1 2,., k VISTA proved to be an efficient interactive visual cluster rendering system, it requires a complete user interaction And the 2D point Q( x, y) is obtained by, throughout the clustering process. When the number of dimension increases, the human computer interaction { becomes tedious. EVISTA designed in such a way to provide Qy = ⎨( )∑xi'cos an efficient data visualization such that the user can able to understand the underlying pattern of the given data set without human intervention. wt xi The rest of the paper is organized as follows: Section 2 discusses reviews of the related works in the domain of where xi represents the given data object, i x ' represents the information visualization. Section 3 deals with the EVISTA. Section 4 discusses the experimental analysis. Section 5 normalized data value based on weighted vector concludes the paper. II. RELATED WORKS Various efforts are made to visualize multidimensional EVISTA employs the design of VISTA visual cluster datasets [2, 10, 11, 13]. The early research on general plot rendering proposed by KeKe Chen and L. Liu [8] provides an based data visualization is Grand Tour and Projection Pursuit intuitive way to visualize clusters with interactive feedbacks [2]. The purpose of the Grand Tour and Projection Pursuit is to encourage domain experts to participate in the clustering to guide user to find the interesting projections. revision and cluster validation process. It allows the user to L.Yang [2] utilizes the Grand Tour technique to show interactively observe potential clusters in a series of projections of datasets in an animation. They project the continuously changing visualizations through α. More dimensions to co-ordinate in a 3D space. However, when the importantly, it can include algorithmic clustering results and 3D space is shown on a 2D screen, some axes may be serve as an effective validation and refinement tool for overlapped by other axes, which make it hard to perform irregularly shaped clusters [9]. The VISTA system has two direct interactions on dimensions. unique features. First, it implements a linear and reliable Star coordinate [7] is an interactive visualization model visualization model to interactively visualize the multi- which treats dimensions uniformly, in which data are dimensional datasets in a 2D star-coordinate space. Second, it represented coarsely and by simple and more space efficient provides a richest set of user-friendly interactive rendering points, which result in less cluttered visualization for large operations, allowing users to validate and refine the cluster structure based on their visual experience as well as their Interactive visual clustering (IVC) [10] combines spring- domain knowledge. embedded graph layout techniques with user interaction and The VISTA visualization model consists of two linear constrained clustering. mappings: Max-min normalization followed by α-mapping. VISTA [8, 9] is a recent visualization models utilizes star Equation (5) represents the Max-Min normalization: is used coordinate system provide similar mapping function like star to normalize the columns in the datasets so as to eliminate the co-ordinate systems. There are two types of cluster rendering dominating effect of large-valued columns. in VISTA model. The former one is unguided rendering and ⎡ 2 (v − min) the latter is guided rendering. where v is the original and i v is the normalized value. The III. ENHANCED VISUAL CLUSTERING SYSTEM α - mapping maps k dimensional points on to two Enhanced VISTA (EVISTA) is an information dimensional visual spaces with the convenience of visual visualization frameworks employs improved data parameter tuning. visualization and reveal the hidden patterns in complex high The proposed visualization model EVISTA utilizes the dimensional data sets, without human intervention. The weighted vector normalization which is performed on rows EVISTA model is designed based on the star coordinates. instead of columns, such that the visualization model defines 2009 ACEEE DOI: 01.IJRTET.02.01.281 FULL PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 1, November 2009 the reliable position of Q ( x, y ) . EVISTA completely boundaries between the clusters become clearer. Figure. 2 eliminates the usage of α- tuning, since α- mapping is tedious show the visualization of iris dataset after α tuning. As the when the number of dimensions is high. And each change in literature of iris dataset specified, the two clusters are not α- values requires a fresh visual distance computation. As the linearly separable. In VISTA it could be observed after the number of dimensions increases, visual distance computation fine tuning of α. And the small region which consisting of the process may create time complexity. Similar effects may overlapping data points are also observed. And more occur when the number of data objects increases. This makes importantly the separation of two clusters found to be the human computer interaction ineffective and affects the difficult for the users. applicability of VISTA. B. Results and Discussion EXPERIMENTAL ANLYSIS To illustrate the efficiency of our proposed visualization, empirical analyses are conducted on number of bench mark data sets available in the UCI machine learning data Figure 1. Visualization of Iris Dataset using VISTA system repository. The performance of EVISTA is compared against VISTA system and the automatic clustering algorithm K- Means. The experiments in VISTA are conducted by setting α value as 1.The detailed information of the data sets is shown Figure 2. Visualization of Iris Dataset after α- tuning using VISTA system ETAILS OF DATASETS Attributes Classes Figure 3. Visualization of Iris Dataset using EVISTA system 10 2 699 r In VISTA, the domain knowledge plays a vital role in finding the optimum number of clusters. In general, the domain knowledge in the form of labeled items obtained by traditional automatic clustering algorithms such as K-Means of clusters is very important in cluster analysis, because can be incorporated in to the visual clustering process. And a clustering methods tend to generate clustering even for fairly user without domain knowledge may fail in finding the homogeneous datasets. The quality of clusters obtained optimum clusters, since α tuning change the data point through visual clustering is measured in terms of three distribution. Most of the automated clustering algorithms classical methods proposed in [3]; require the number of clusters to be specified prior, that may not coincide with real cluster distribution of the dataset. This The Rand index and Jaccard coefficient validations are based on the agreement between clustering increases the complexity of clustering process. EVISTA results and the "ground truth". reduces the complexity of clustering by eliminating the usage The classical validity measures are heavily related to the of α. Figure. 3 show the iris dataset visualization based on geometry or density nature of clusters and they do not work well for arbitrary shaped clusters [8]. In such cases, visual From the results, it is observed that one cluster is perception plays an important in deciding right clusters. completely separated from the others and the visual boundaries between the other two clusters are clearly Iris Data: Iris dataset is a benchmark dataset widely used identified. It is also noticed that there are only two data points in pattern recognition and clustering. It is formed by 150 four are overlapped. Since EVISTA doesn't possess α tuning the dimensional instances of the three classes of plants classified process of visual distance computation process is completely according to the sepal length and width and the petal length eliminated, which reduces the time complexity. EVISTA and width. The iris dataset consists of three clusters with doesn't require the domain knowledge in any form, which equal distribution. One cluster is linearly separable from the eases the human computer interaction and it visualizes the other two; the latter two are not exactly linearly separable exact pattern of the given dataset without human intervention. from each other. Figure.1 shows the initial visualization of iris dataset in VISTA model, where we observe the possibility Australian Data: of three clusters. And it is observed from the figure that, one Australian Dataset concerns with credit card applications. cluster is completely separated from the other two, where the This dataset is interesting because there is a good mix of remaining two are found to be overlapped. After performing attributes continuous, nominal with small numbers of interactive visual clustering with suitable α tuning the visual values, and nominal with larger numbers of values. This data 2009 ACEEE DOI: 01.IJRTET.02.01.281 FULL PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 1, November 2009 set also has missing values. Suitable statistical based With the development of data collection technology, computation is applied for finding the missing values. It has effective data visualization models are required to understand two classes. The class distribution is 44.5% for class A and the pattern of multidimensional and multivariate data. In this 55.5% for class B. paper Enhanced VISTA is proposed to gain improvement in Figure.4 show the visualization of Australian data set in data visualization. EVISTA is designed with weight vector VISTA, where possibly one single cluster is observed. During normalization, which improves the data exploration. And the α tuning, the user can able to identify the two clusters. If the α elimination of α tuning in the visualization process reduces tuning is not performed carefully, the user may get different the complexity of human – computer interaction. More pattern which may leads confusion. Figure. 5 show the importantly EVISTA doesn't require the domain knowledge process of α tuning, where it is observed four cluster in any form, which improves the applicability of EVISTA. distribution. This leads a poor cluster quality. In such case, The experiment results show that the EVISTA efficiently domain knowledge is the only aid to identify the optimum identifies the cluster distribution and reduces the complexity number of clusters. Figure. 5 show the cluster distribution in the visual distance computation. Specifically it eases the using EVISTA; where two potential clusters are observed. human-computer interaction. Since α tuning is not included in the EVISTA model, the cluster distribution can be clearly visualized. Even though the user doesn't have enough domain knowledge in any of the form such as: number of clusters, cluster distribution, visualization model EVISTA suitably identifies the optimum number of clusters. Pima Data Figure 4. Visualization of Australian Dataset using VISTA system Pima Dataset is an Indian Diabetes Database with 768 data objects. It has two classes with class distribution as 500 and 268. It consists of attributes such as number of times pregnant, Plasma glucose concentration, Diastolic blood pressure (mm Hg), Triceps skin fold thickness (mm), Diabetes pedigree function, etc. Figure. 7 show the VISTA visualization of pima Indian dataset. When the pima dataset is visualized using VISTA, one possible cluster is observed. Even the suitable α tuning doesn't distinguish the clusters. Visualization of Australian Dataset using VISTA system with α- The boundary regions of the two clusters are possibly not identified. Whereas EVISTA visualization of pima dataset clearly shows two potential clusters. From Fig. 8 it is observed that pima dataset contains two potential clusters, and few data objects are scattered around the potential area. Since EVISTA doesn't require α tuning the user may find it very flexible in finding the underlying pattern of the dataset without human intervention. And with suitable geometric transformation such as scaling and rotation the user may able to observe the Figure 6. Visualization of Australian Dataset using EVISTA cluster distribution according to their visual perception. C. Comparative Analysis This part of the section compares the results of EVISTA with VISTA and the centroid based automatic clustering algorithm K-Means. In EVISTA the cluster labeling is performed using free hand drawing. The area with potential data points are covered by convex hull and the data points in Figure 7. Visualization of Pima Dataset using VISTA system the convex hull are labeled as one single cluster. The cluster results are evaluated based on Rand Index and Jaccard coefficients are shown in Table II and Table III. The results of VISTA are obtained by conducting the experiments on several runs and the average of them is taken for experimental Figure 8. Visualization of Pima Dataset using EVISTA system 2009 ACEEE DOI: 01.IJRTET.02.01.281 FULL PAPER International Journal of Recent Trends in Engineering, Vol 2, No. 1, November 2009 [1] Bhavani Thuraisingham, "DataMining: Technologies, Techniques, Tools and Trends", CRC press, London,Newyork, Washington,1999. [2] Cook, D.R., Buja, A., Cabrea, J., and Harley, H.: Grand Tour and Projection pursuit. J.Computational and Graphical Visual Clustering Statistics, v23, (1995). [3] Daxin Jiang, Chun Tang, Aidong Zhang, "Cluster analysis for gene expression data: a survey", IEEE Transactions on
Without α
With α
Knowledge and Data Engineering, Vol. 16, No.11, 2004. [4] Daniel, Keim, A., and Hans-Peter (1996), ‘Visualization Techniques for Mining Large Databases:A Comparison', IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 50.58 [5] J. Han and M. Kamber," Data Mining: Concepts and Techniques," Morgan Kaufmann Publishers, August 2000, Australian 63.46 68.00 ISBN 1-55860-489-8. COMPARISON OF EVISTA WITH VISTA AND K-MEANS BASED ON RAND [6] A., K ,Jain,, M. N., Murty and Flynn P.J," Data clustering : A
Review", ACM computing surveys, 1999.
[7] E. Kandogan," Visualizing Multi-dimensional Clusters,"
COMPARISON OF EVISTA WITH VISTA AND K-MEANS Trends and outliers using star co-ordinates, Proc of ACM BASED ON JACCARD COEFFICIENT KDD, 2001. [8] Keke Chen and Liu. L, "VISTA: "Validating and Refining Visual Clustering clusters via Visualization", Information Visualization, Vol. 3, 4, K-Means [9] Keke Chen and Liu.L, "iVIBRATE:" Interactive Visualization- With α
Based Framework for Clustering Large Datasets", ACM α tuning
Transactions on Information Systems, Vol. 24, April 2006, [10] Marie desJardins, James MacGlashan, Julia Ferraioli," Interactive visual clustering," Intelligent User Interfaces 2007,
45.84 [11] Melanie Tory and Torsten Moller, "Human Factors in Australian 48.82 Visualization Research," IEEE Transactions on Visualization and Computer Graphics, 10(1), 2004. [12] Pang-ning Tan, Michael Steinbach and Vipin Kumar, "Introduction to Data Mining", Pearson Addison Wesley, Boston, 2006. [13] O.,Sourina., D., Liu.,"Visual interactive 3-dimensional clustering with implicit functions", Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Volume: 1, 1-3 Dec 2004, pp. 382-386. [14] Thangavel. K and Ashok Kumar. D, ‘Optimization of code Figure 9. Comparison based on Rand Index book in Vector Quantization", International Journal Annals of Operations Research, Vol.143, No.1, 317-325, 2006. [15] Ye N., "The Hand Book of Data Mining", Lawrence Erlabum Associates, Publishers, Mahwah, Newjersey, 2003. [16] Zhen Liu, Shinichi Kamohara., Minyi Guo,"A Scheme of interactive Data Mining Support System in Parallel and Distributed Environment," ISPA 2003, LCNS 2745, Springer- verlag, pp. 263-272, 2003. Figure 10. Comparison based on Jaccard coefficients First author expresses his thanks to University Grants Commission for financial support (F-No. 34-105/2008, SR). 2009 ACEEE DOI: 01.IJRTET.02.01.281


How Breakfast Happens in the Café Eric Laurier ABSTRACT. In this article I present an ethnographic study of ‘breakfast in the café', to begin to document the orderly properties of an emergent timespace. In so doing, the aim is to provide a descrip- tion of the local production of timespace and a consideration of a change to the daily rhythm of city life. Harold Garfinkel and David


LEARNING FROM PRACTICE Dapagliflozin: Clinical practice comparedwith pre-registration trial data ANDREW P MCGOVERN1-3, NINA DUTTA1, NEIL MUNRO1-4, KENNETH WATTERS1,2,4, MICHAEL FEHER1,2,4 Abbreviations and acronyms Background: Dapagliflozin is the first sodium-glucose co-transporter 2 (SGLT2) inhibitor to be approved in Europe