Data Mining (139 page)

Authors: Mehmed Kantardzic

4.
International Conference on Machine Learning and Applications (ICMLA)

The aim of the conference is to bring researchers working in the areas of machine learning and applications together. The conference will cover both theoretical and experimental research results. Submission of machine-learning papers describing machine-learning applications in fields like medicine, biology, industry, manufacturing, security, education, virtual environments, game playing, and problem solving is strongly encouraged.

5.
The World Congress in Computer Science Computer Engineering and Applied Computing (WORLDCOMP)

http://www.world-academy-of-science.org/

WORLDCOMP
is the largest annual gathering of researchers in computer science, computer engineering, and applied computing. It assembles a spectrum of affiliated research conferences, workshops, and symposiums into a coordinated research meeting held in a common place at a common time. This model facilitates communication among researchers in different fields of computer science and computer engineering. The WORLDCOMP is composed of more than 20 major conferences. Each conference will have its own proceedings. All conference proceedings/books are considered for inclusion in major database indexes that are designed to provide easy access to the current literature of the sciences (database examples are DBLP, ISI Thomson Scientific, IEE INSPEC).

6.
IADIS European Conference on Data Mining (ECDM)

http://www.datamining-conf.org/

The
ECDM
is aimed to gather researchers and application developers from a wide range of data mining-related areas such as statistics, computational intelligence, pattern recognition, databases, and visualization.
ECDM
aims to advance the state-of-the-art in the data-mining field and its various real-world applications.
ECDM
will provide opportunities for technical collaboration among data mining and machine-learning researchers around the globe.

7.
Neural Information Processing Systems Conference (NIPS)

http://nips.cc/

The NIPS Foundation is a nonprofit corporation whose purpose is to foster the exchange of research on neural information-processing systems in their biological, technological, mathematical, and theoretical aspects. Neural information processing is a field that benefits from a combined view of biological, physical, mathematical, and computational sciences.

The primary focus of the NIPS Foundation is the presentation of a continuing series of professional meetings known as the Neural Information Processing Systems Conference, held over the years at various locations in the United States and Canada.

The NIPS Conference features a single-track program, with contributions from a large number of intellectual communities. Presentation topics include algorithms and architectures; applications; brain imaging; cognitive science and AI; control and reinforcement learning; emerging technologies; learning theory; neuroscience; speech and signal processing; and visual processing.

8.
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)

http://www.ecmlpkdd.org/

The
ECML PKDD
is one of the leading academic conferences on machine learning and knowledge discovery, held in Europe every year.

ECML PKDD
is a merger of two European conferences, ECML and PKDD. In 2008 the conferences were merged into one conference, and the division into traditional ECML topics and traditional PKDD topics was removed.

9.
Association for the Advancement of Artificial Intelligence (AAAI) Conference

http://www.aaai.org/

Founded in 1979, the AAAI, formerly the American Association for Artificial Intelligence, is a nonprofit scientific society devoted to advancing the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines. AAAI also aims to increase public understanding of AI, improve the teaching and training of AI practitioners, and provide guidance for research planners and funders concerning the importance and potential of current AI developments and future directions.

Major AAAI activities include organizing and sponsoring conferences, symposia, and workshops, publishing a quarterly magazine for all members, publishing books, proceedings, and reports, and awarding grants, scholarships, and other honors. The purpose of the AAAI conference is to promote research in AI and scientific exchange among AI researchers, practitioners, scientists, and engineers in related disciplines.

10.
International Conference on Very Large Data Base (VLDB)

http://www.vldb.org/

VLDB Endowment Inc. is a nonprofit organization incorporated in the United States for the sole purpose of promoting and exchanging scholarly work in databases and related fields throughout the world. Since 1992, the Endowment has started to publish a quarterly journal, the VLDB Journal, for disseminating archival research results, which has become one of the most successful journals in the database area. The VLDB Journal is published in collaboration with Springer-Verlag. On various activities, the Endowment closely cooperates with ACM SIGMOD.

VLDB conference is a premier annual international forum for data management and database researchers, vendors, practitioners, application developers, and users. The conference features research talks, tutorials, demonstrations, and workshops. It covers current issues in data management, database and information systems research. Data management and databases remain among the main technological cornerstones of emerging applications of the twenty-first century.

A.3 DATA-MINING FORUMS/BLOGS

1.
KDnuggets Forums

http://www.kdnuggets.com/phpBB/index.php

Good resource for sharing experience and asking questions.

2.
Data Mining

http://dataminingwarehousing.blogspot.com/

This blog is helpful for data-mining beginners. It presents basic data-mining concepts with examples and applications.

3.
Data Mining and Predictive Analytics

http://abbottanalytics.blogspot.com/

The posts on this blog cover topics related to data mining and predictive analytics from the perspectives of both research and industry.

4.
AI, Data Mining, Machine Learning, and Other things

http://blog.markus-breitenbach.com/

This blog discusses machine learning with emphasis on AI and statistics.

5.
Geeking with Greg

http://glinden.blogspot.com

This blog focuses on the topic of personalization and related research.

6.
Data Miners Blog

http://blog.data-miners.com/

The posts on this blog provide industry-oriented reflections on topics from data analysis and visualization.

7.
Data-Mining Research

http://www.dataminingblog.com/

This blog provides a venue for exchanging ideas and comments about data-mining techniques and applications.

8.
Data Wrangling

http://www.datawrangling.com/

This blog provides across the board posts on news and technology related to machine learning and data mining.

9.
Intelligent Machines

http://www.damienfrancois.be/blog/

This blog is dedicated to artificial intelligence and machine learning, and focuses on applications in business, science and every-day life.

10.
Mininglabs

http://www.mininglabs.com/

This blog is established by a group of French independent researchers in the field of data mining, analyzing and data visualization. They are mostly interested in analyzing data coming from the internet at large (Web, peer-to-peer networks).

11.
Machine Learning (Theory)

http://hunch.net/

A blog dedicated to the various aspects of machine learning theory and applications.

A.4 DATA SETS

This section describes a number of freely available data sets ready for use in data-mining algorithms. We selected a few examples for students who are starting to learn data mining and they would like to practice traditional data-mining tasks. A majority of these data sets are hosted on the UCI Machine Learning Repository. For more data sets look up this repository at
http://archive.ics.uci.edu/ml/index.html
.

A.4.1 Classification

Iris Data Set.

http://archive.ics.uci.edu/ml/datasets/Iris

The Iris Data Set is a small data set often used in machine learning and data mining. It includes 150 data points each representing three different kinds of iris. The task is to learn to classify iris based on four measurements. This data set was used by R. A. Fisher in 1936 as an example for discriminant analysis.

Adult Data Set.

http://archive.ics.uci.edu/ml/datasets/Adult

The Adult Data Set contains 48,842 samples extracted from the U.S. Census. The task is to classify individuals as having an income that does or does not exceed $50,000/year based on factors such as age, education, race, sex, and native country.

Breast Cancer Wisconsin (Diagnostic) Data Set.

http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

This data set consists of a number of measurements taken over a “digitized image of a fine needle aspirate (FNA) of a breast mass.” There are 569 samples. The task is to classify each data point as benign or malignant.

A.4.2 Clustering

Bag of Words Data Set.

http://archive.ics.uci.edu/ml/datasets/Bag+of+Words

Word counts have been extracted from five document sources: Enron Emails, NIPS full papers, KOS blog entries, NYTimes news articles and Pubmed abstracts. The task is to cluster the documents used in this data set based on the word counts found. One may compare the output clusters with the sources from which each document came.

US Census Data (1990) Data Set.

http://archive.ics.uci.edu/ml/datasets/US+Census+Data+%281990%29

This data set is a one percent sample from the 1990 Public Use Microdata Samples (PUMS). It contains 2,458,285 records and 68 attributes.

A.4.3 Regression

Auto MPG Data Set.

http://archive.ics.uci.edu/ml/datasets/Auto+MPG

This data set provides a number of attributes of cars that can be used to attempt to predict the “city-cycle fuel consumption in miles per gallon.” There are 398 data points and eight attributes.

Computer Hardware Data Set.

http://archive.ics.uci.edu/ml/datasets/Computer+Hardware

This data set provides a number of CPU attributes that can be used to predict relative CPU performance. It contains 209 data points and 10 attributes.

A.4.4 Web Mining

Anonymous Microsoft Web Data.

http://archive.ics.uci.edu/ml/datasets/Anonymous+Microsoft+Web+Data

This data set contains page visits for a number of anonymous users who visited
www.microsoft.com
. The task is to predict future categories of pages a user will visit based on the Web pages previously visited.

KDD Cup 2000.

http://www.sigkdd.org

This Web site contains five tasks used in a data-mining competition run yearly called KDD Cup. KDD Cup 2000 uses clickstream and purchase data obtained from Gazelle.com. Gazelle.com sold
legwear
and
legcare
products and closed their online store that same year. This Web site provides links to papers and posters of the winners of the various tasks and outlines their effective methods. Additionally, the description of the tasks provides great insight into original approaches to using data mining with clickstream data.

A.4.5 Text Mining

Reuters-21578 Text Categorization Collection.

http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

This is a collection of news articles that appeared on Reuters newswire in 1987. All of the news articles have been categorized. The categorization provides opportunities to test text classification or clustering methodologies.

20 Newsgroups.

http://people.csail.mit.edu/jrennie/20Newsgroups/

The 20 Newsgroups data set contains 20,000 newsgroup documents. These documents are divided nearly evenly among 20 different newsgroups. Similar to the Reuters collection, this data set provides opportunities for text classification and clustering.

A.4.6 Time Series

Dodgers Loop Sensor Data Set.

http://archive.ics.uci.edu/ml/datasets/Dodgers+Loop+Sensor

This data set provides the number of cars counted by a sensor every 5 min over 25 weeks. The sensor was for the Glendale on ramp for the 101 North Freeway in Los Angeles. The goal of this data was to “predict the presence of a baseball game at Dodgers stadium.”

Australia Gun Deaths.

http://robjhyndman.com/TSDL/crime.html

These data give the yearly death rates in Australia for gun-related and non-gun-related homicides and suicides for the years 1915–2004.

A.4.7 Data for Association Rule Mining

BMS-POS.

http://www.sigkdd.org/kddcup

This data set gives the category for each product purchased from a large electronics retailer. It covers several years worth of point of sales data. This data set contains 515,597 transactions and 1,657 distinct items.

BMS-WebView1.

http://www.sigkdd.org/kddcup

This data set contains several months of clickstream sessions for Gazelle.com. A transaction is defined in this data set as the detail pages viewed per session. This data set contains 59,602 transactions and 497 distinct items.

Other books

Peedie by Olivier Dunrea

Injury by Tobin, Val

The Golden Day by Ursula Dubosarsky

RECKLESS - Part 1 by Alice Ward

Immersion (Magnetic Desires) by Unknown

Heart of the Druid Laird by Barbara Longley

The Thorn and the Blossom: A Two-Sided Love Story by Theodora Goss

Wolves of Haven: Lone by Danae Ayusso

Wolves in the Shadows (The Wolf Clan Chronicles) by McLaughlin, Sharon

The Savage Grace: A Dark Divine Novel by Despain, Bree