Data Mining (138 page)

Read Data Mining Online

Authors: Mehmed Kantardzic

15.7 REVIEW QUESTIONS AND PROBLEMS

1.
Explain the power of n-dimensional visualization as a data-mining technique. What are the phases of data mining supported by data visualization?

2.
What are fundamental experiences in human perception we would build into effective visualization tools?

3.
Discuss the differences between scientific visualization and information visualization.

4.
The following is the data set X:

Although the following visualization techniques are not explained with enough details in this book, use your knowledge from earlier studies of statistics and other courses to create 2-D presentations.

(a)
Show a bar chart for the variable A.

(b)
Show a histogram for the variable B.

(c)
Show a line chart for the variable B.

(d)
Show a pie chart for the variable A.

(e)
Show a scatter plot for A and B variables.

5.
Explain the concept of a data cube and where it is used for visualization of large data sets.

6.
Use examples to discuss the differences between icon-based and pixel-oriented visualization techniques.

7.
Given 7-D samples

(a)
make a graphical representation of samples using the parallel-coordinates technique;

(b)
are there any outliers in the given data set?

8.
Derive formulas for radial visualization of

(a)
3-D samples

(b)
8-D samples

(c)
using the formulas derived in (a) represent samples (2, 8, 3) and (8, 0, 0).

(d)
using the formulas derived in (b) represent samples (2, 8, 3, 0, 7, 0, 0, 0) and (8, 8, 0, 0, 0, 0, 0, 0).

9.
Implement a software tool supporting a radial-visualization technique.

10.
Explain the requirements for full visual discovery in advanced visualization tools.

11.
Search the Web to find the basic characteristics of publicly available or commercial software tools for visualization of n-dimensional samples. Document the results of your search.

15.8 REFERENCES FOR FURTHER STUDY

Draper, G. M., L. Y. Livnat, R. F. Riesenfeld, A Survey of Radial Methods for Information Visualization,
IEEE Transaction on Visualization and Computer Graphics
, Vol. 15, No. 5, 2009, pp. 759–776.

Radial visualization, or the practice of displaying data in a circular or elliptical pattern, is an increasingly common technique in information visualization research. In spite of its prevalence, little work has been done to study this visualization paradigm as a methodology in its own right. We provide a historical review of radial visualization, tracing it to its roots in centuries-old statistical graphics. We then identify the types of problem domains to which modern radial visualization techniques have been applied. A taxonomy for radial visualization is proposed in the form of seven design patterns encompassing nearly all recent works in this area. From an analysis of these patterns, we distill a series of design considerations that system builders can use to create new visualizations that address aspects of the design space that have not yet been explored. It is hoped that our taxonomy will provide a framework for facilitating discourse among researchers and stimulate the development of additional theories and systems involving radial visualization as a distinct design metaphor.

Fayyad, V., G. G. Grinstein, A. Wierse,
Information Visualization in Data Mining and Knowledge Discovery
, Morgan Kaufmann, San Francisco, CA, 2002.

Leading researchers from the fields of data mining, data visualization, and statistics present findings organized around topics introduced in two recent international knowledge-discovery and data-mining workshops. The book introduces the concepts and components of visualization, details current efforts to include visualization and user interaction in data mining, and explores the potential for further synthesis of data-mining algorithms and data-visualization techniques.

Ferreira de Oliveira, M. C., H. Levkowitz, From Visual Data Exploration to Visual Data Mining: A Survey,
IEEE Transactions On Visualization And Computer Graphics
, Vol. 9, No. 3, 2003, pp. 378–394.

The authors survey work on the different uses of graphical mapping and interaction techniques for visual data mining of large data sets represented as table data. Basic terminology related to data mining, data sets, and visualization is introduced. Previous work on information visualization is reviewed in light of different categorizations of techniques and systems. The role of interaction techniques is discussed, in addition to work addressing the question of selecting and evaluating visualization techniques. We review some representative work on the use of IVT in the context of mining data. This includes both visual-data exploration and visually expressing the outcome of specific mining algorithms. We also review recent innovative approaches that attempt to integrate visualization into the DM/KDD process, using it to enhance user interaction and comprehension.

Gallaghar, R. S.,
Computer Visualization: Graphics Techniques for Scientific and Engineering Analysis
, CRC Press, Boca Raton, 1995.

The book is a complete reference book on computer-graphic techniques for scientific and engineering visualization. It explains the basic methods applied in different fields to support an understanding of complex, volumetric, multidimensional, and time-dependent data. The practical computational aspects of visualization such as user interface, database architecture, and interaction with a model are also analyzed.

Spence, R.,
Information Visualization
, Addison Wesley, Harlow, England, 2001.

This is the first fully integrated book on the emerging discipline of information visualization. Its emphasis is on real-world examples and applications of computer-generated interactive information visualization. The author also explains how these methods for visualizing information support rapid learning and accurate decision making.

Tufte, E. R.,
Beautiful Evidence
, 2nd edition, Graphic Press, LLC, Cheshire, CT, 2007.

Beautiful Evidence
is a masterpiece from a pioneer in the field of data visualization. It is not often an iconoclast comes along, trashes the old ways, and replaces them with an irresistible new interpretation. By teasing out the sublime from the seemingly mundane world of charts, graphs, and tables, Tufte has proven to a generation of graphic designers that great thinking begets great presentation. In
Beautiful Evidence
, his fourth work on analytical design, Tufte digs more deeply into art and science to reveal very old connections between truth and beauty—all the way from Galileo to Google.

APPENDIX A

This summary of some recognized journals, conferences, blog sites, data-mining tools, and data sets is being provided to help readers to communicate with other users of data-mining technology, and to receive information about trends and new applications in the field. It could be especially useful for students who are starting to work in data mining and trying to find appropriate information or solve current class-oriented tasks. This list is not intended to endorse any specific Web site, and the reader has to be aware that this is only a small sample of possible resources on the Internet.

A.1 DATA-MINING JOURNALS

1.
Data Mining and Knowledge Discovery (DMKD)

http://www.kluweronline.com/issn/1384-5810/

DMKD
is a premier technical publication in the Knowledge Discovery and Data Mining (KDD) field, providing a resource collecting common relevant methods and techniques and a forum for unifying the diverse constituent research communities. The journal publishes original technical papers in both the research and practice of data mining and knowledge discovery surveys and tutorials of important areas and techniques, and detailed descriptions of significant applications. The scope of
DMKD
includes (1)
theory and foundational issues including
data and knowledge representation, uncertainty management, algorithmic complexity, and statistics over massive data sets; (2)
data mining methods
such as classification, clustering, probabilistic modeling, prediction and estimation, dependency analysis, search, and optimization; (3) a
lgorithms
for spatial, textual, and multimedia data mining, scalability to large databases, parallel and distributed data-mining techniques, and automated discovery agents; (4)
knowledge discovery proc
ess including data preprocessing, evaluating, consolidating, and explaining discovered knowledge, data and knowledge visualization, and interactive data exploration and discovery; and (5)
application issues
such as application case studies, data-mining systems and tools, details of successes and failures of KDD, resource/knowledge discovery on the Web, and privacy and security.

2.
IEEE Transactions on Knowledge and Data Engineering (TKDE)

http://www.computer.org/tkde/

The
IEEE TKDE
is an archival journal published monthly. The information published in this journal is designed to inform researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data-engineering area. We are interested in well-defined theoretical results and empirical studies that have potential impact on the acquisition, management, storage, and graceful degeneration of knowledge and data, as well as in provision of knowledge and data services. Specific topics include, but are not limited to, (1) artificial intelligence (AI) techniques, including speech, voice, graphics, images, and documents; (2) knowledge and data-engineering tools and techniques; (3) parallel and distributed processing; (4) real-time distributed; (5) system architectures, integration, and modeling; (6) database design, modeling, and management; (7) query design and implementation languages; (8) distributed database control; (9) algorithms for data and knowledge management; (10) performance evaluation of algorithms and systems; (11) data-communications aspects; (12) system applications and experience; (13) knowledge-based and expert systems; and (14) integrity, security, and fault tolerance.

3.
Knowledge and Information Systems (KAIS)

http://www.cs.uvm.edu/∼kais/

KAIS
is a peer-reviewed archival journal published by Springer. It provides an international forum for researchers and professionals to share their knowledge and report new advances on all topics related to knowledge systems and advanced information systems. The journal focuses on knowledge systems and advanced information systems, including their theoretical foundations, infrastructure, enabling technologies, and emerging applications. In addition to archival papers, the journal also publishes significant ongoing research in the form of short papers and very short papers on “visions and directions.”

4.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

http://computer.org/tpami/

IEEE TPAMI
is a scholarly archival journal published monthly. Its editorial board strives to present most important research results in areas within
TPAMI
’s scope. This includes all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence. Areas such as machine learning, search techniques, document and handwriting analysis, medical-image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition, and relevant specialized hardware and/or software architectures are also covered.

5.
Machine Learning

http://www.kluweronline.com/issn/0885-6125/

Machine Learning
is an international forum for research on computational approaches to learning. The journal publishes articles reporting substantive results on a wide range of learning methods applied to a variety of learning problems. It features papers that describe research on problems and methods, applications research, and issues of research methodology as well as papers making claims about learning problems or methods provide solid support via empirical studies, theoretical analysis, or comparison to psychological phenomena. Application papers show the process of applying learning methods to solve important applications problems. Research methodology papers improve how machine-learning research is conducted. All papers describe the supporting evidence in ways that can be verified or replicated by other researchers. The papers also detail the learning component clearly and discuss assumptions regarding knowledge representation and the performance task.

6.
Journal of Machine Learning Research (JMLR)

http://jmlr.csail.mit.edu

The
JMLR
provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. JMLR provides a venue for papers on machine learning featuring new algorithms with empirical, theoretical, psychological, or biological justification; experimental and/or theoretical studies yielding new insights into the design and behavior of learning in intelligent systems; accounts of applications of existing techniques that shed light on the strengths and weaknesses of the methods; formalization of new learning tasks (e.g., in the context of new applications) and of methods for assessing performance on those tasks; development of new analytical frameworks that advance theoretical studies of practical-learning methods; computational models of data from natural learning systems at the behavioral or neural level; or extremely well-written surveys of existing work.

7.
ACM Transactions on Knowledge Discovery from Data (TKDD)

http://tkdd.cs.uiuc.edu/index.html

The
ACM TKDD
addresses a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include scalable and effective algorithms for data mining and data warehousing, mining data streams, mining multimedia data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social-network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, preprocessing and postprocessing for data mining, robust and scalable statistical methods, data-mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data-mining technology.

8.
Journal of Intelligent Information Systems (JIIS)

http://www.springerlink.com/content/0925-9902

The
JIIS: Integrating Artificial Intelligence and Database Technologies
fosters and presents research and development results focused on the integration of AI and database technologies to create next generation information systems—intelligent information systems.
JIIS
provides a forum wherein academics, researchers, and practitioners may publish high-quality, original and state-of-the-art papers describing theoretical aspects, systems architectures, analysis and design tools and techniques, and implementation experiences in intelligent information systems. Articles published in
JIIS
include research papers, invited papers, meeting, workshop and conference announcements and reports, survey and tutorial articles, and book reviews. Topics include foundations and principles of data, information, and knowledge models; and methodologies for IIS analysis, design, implementation, validation, maintenance and evolution.

9.
Statistical Analysis and Data Mining

http://www.amstat.org/publications/sadm.cfm

The
Statistical Analysis and Data Mining
addresses the broad area of data analysis, including data-mining algorithms, statistical approaches, and practical applications. Topics include problems involving massive and complex data sets, solutions using innovative data-mining algorithms and/or novel statistical approaches, and the objective evaluation of analyses and solutions. Of special interest are articles that describe analytical techniques and discuss their application to real problems in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce.

10.
Intelligent Data Analysis

http://www.iospress.nl/html/1088467x.php

Intelligent Data Analysis
provides a forum for the examination of issues related to the research and applications of AI techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to) all areas of data visualization, data preprocessing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and postprocessing. In particular, we prefer papers that discuss development of new AI-related data analysis architectures, methodologies, and techniques and their applications to various domains. Papers published in this journal are geared heavily toward applications, with an anticipated split of 70% of the papers published being application-oriented research, and the remaining 30% containing more theoretical research.

A.2 DATA-MINING CONFERENCES

1.
SIAM International Conference on Data Mining, SDM

http://www.siam.org/meetings/

This conference provides a venue for researchers who are addressing extracting knowledge from large datasets that requires the use of sophisticated, high-performance and principled analysis techniques and algorithms, based on sound theoretical and statistical foundations. It also provides an ideal setting for graduate students and others new to the field to learn about cutting-edge research by hearing outstanding invited speakers and attending presentations and tutorials (included with conference registration). A set of focused workshops are also held in the conference. The proceedings of the conference are published in archival form, and are also made available on the
SIAM
Web site.

2.
The ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)

http://sigkdd.org/conferences.php

The annual ACM SIGKDD conference is the premier international forum for data-mining researchers and practitioners from academia, industry, and government to share their ideas, research results, and experiences. It features keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, and demonstrations. Authors can submit their original work either to SIGKDD Research track or SIGKDD Industry/Government track. The research track accepts papers on all aspects of knowledge discovery and data mining overlapping with topics from machine learning, statistics, databases, and pattern recognition. Papers are expected to describe innovative ideas and solutions that are rigorously evaluated and well presented. The Industrial/Government track highlights challenges, lessons, concerns, and research issues arising out of deploying applications of KDD technology. The focus is on promoting the exchange of ideas between researchers and practitioners of data mining.

3.
IEEE International Conference on Data Mining (ICDM)

http://www.cs.uvm.edu/∼icdm/

The
IEEE ICDM
has established itself as the world’s premier research conference in data mining. The conference provides a leading forum for presentation of original research results, as well as exchange and dissemination of innovative, practical development experiences. The conference covers all aspects of data mining, including algorithms, software and systems, and applications. In addition, ICDM draws researchers and application developers from a wide range of data mining-related areas such as statistics, machine learning, pattern recognition, databases and data warehousing, data visualization, knowledge-based systems, and high-performance computing. By promoting novel, high-quality research findings, and innovative solutions to challenging data-mining problems, the conference seeks to continuously advance the state-of-the-art in data mining. Besides the technical program, the conference will feature workshops, tutorials, panels, and the
ICDM
data-mining contest.

Other books

Big Brother by Susannah McFarlane
Empress by Shan Sa
Love Songs by MG Braden
The Killing Season by Meg Collett
Sworn To Defiance by Edun, Terah
The Italian Mission by Champorcher, Alan