Read Data Mining Online

Authors: Mehmed Kantardzic

Data Mining (140 page)

A.5 COMERCIALLY AND PUBLICLY AVAILABLE TOOLS

This summary of some publicly available commercial data-mining products is being provided to help readers better understand what software tools can be found on the market and what their features are. It is not intended to endorse or critique any specific product. Potential users will need to decide for themselves the suitability of each product for their specific applications and data-mining environments. This is primarily intended as a starting point from which users can obtain more information. There is a constant stream of new products appearing in the market and hence this list is by no means comprehensive. Because these changes are very frequent, the author suggests two Web sites for information about the latest tools and their performances:
http://www.kdnuggets.com
and
http://www.knowledgestorm.com
.

A.5.1 Free Software

DataLab

  • Publisher: Epina Software Labs (
    www.lohninger.com/datalab/en_home.html
    )
  • DataLab, a complete and powerful data mining tool with a unique data exploration process, with a focus on marketing and interoperability with SAS. There is a public version for students.

DBMiner

  • Publisher: Simon Fraser University (
    http://ddm.cs.sfu.ca
    )
  • DBMiner is a publicly available tool for data mining. It is a multiple-strategy tool and it supports methodologies such as clustering, association rules, summarization, and visualization. DBMiner uses Microsoft SQL Server 7.0 Plato and runs on different Windows platforms.

GenIQ Model

  • Publisher: DM STAT-1 Consulting (
    www.geniqmodel.com
    )
  • GenIQ Model uses machine learning for regression tasks; automatically performs variable selection, and new variable construction, and then specifies the model equation to “optimize the decile table.”

NETMAP

  • Publisher:
    http://sourceforge.net/projects/netmap
  • NETMAP is a general-purpose, information-visualization tool. It is most effective for large, qualitative, text-based data sets. It runs on Unix workstations.

RapidMiner

  • Publisher: Rapid-I (
    http://rapid-i.com
    )
  • Rapid-I provides software, solutions, and services in the fields of predictive analytics, data mining, and text mining. The company concentrates on automatic intelligent analyses on a large-scale base, that is, for large amounts of structured data-like database systems and unstructured data-like texts. The open-source data-mining specialist Rapid-I enables other companies to use leading-edge technologies for data mining and business intelligence. The discovery and leverage of unused business intelligence from existing data enables better informed decisions and allows for process optimization.

SIPNA

  • Publisher:
    http://eric.univ-lyon2.fr/∼ricco/sipina.html
  • Sipina-W is publicly available software that includes different traditional data-mining techniques such as CART, Elisee, ID3, C4.5, and some new methods for generating decision trees.

SNNS

  • Publisher: University of Stuttart (
    http://www.nada.kth.se/∼orre/snns-manual/
    )
  • SNNS is a publicly available software. It is a simulation environment for research on and application of artificial neural networks. The environment is available on Unix and Windows platforms.

TiMBL

  • Publisher: Tilburg University (
    http://ilk.uvt.nl/timbl/
    )
  • TiMBL is a publicly available software. It includes several memory-based learning techniques for discrete data. A representation of the training set is explicitly stored in memory, and new cases are classified by extrapolation from the most similar cases.

TOOLDIAG

  • Publisher:
    http://sites.google.com/site/tooldiag/Home
  • TOOLDIAG is a publicly available tool for data mining. It consists of several programs in C for statistical pattern recognition of multivariate numeric data. The tool is primary oriented toward classification problems.

Weka

  • Publisher: University of Waikato (
    http://www.cs.waikato.ac.nz/ml/weka/
    )
  • Weka is a software environment that integrates several machine-learning tools within a common framework and a uniform GUI. Classification and summarization are the main data-mining tasks supported by the Weka system.

Web Utilization Miner WUM

  • Publisher:
    http://hypknowsys.sourceforge.net/
  • WUM 6.0 is a publicly available integrated environment for Web-log preparation, querying, and visualization of summarized activities on a Web site.

A.5.2 Commercial Software WITH Trial Version

Alice d’Isoft

  • Vendor: Isoft (
    www.alice-soft.com
    )
  • ISoft provides a complete range of tools and services dedicated to analytical CRM, behavioral analysis, data modeling and analysis, Data Mining and Data Morphing.

ANGOSS’ suite

  • Vendor: Angoss Software Corp. (
    www.angoss.com
    )
  • ANGOSS’suite consists of KnowledgeSTUDIO® and KnowledgeSEEKER®. KnowledgeSTUDIO® is an advanced data-mining and predictive analy­tics suite for all phases of the model development and deployment cycle—profiling, exploration, modeling, implementation, scoring, validation, monitoring and building scorecards—all in a high-performance visual environment. KnowledgeSTUDIO is widely used by marketing, sales, and risk analysts providing business users and expert analysts alike with a powerful, scalable, and complete data-mining solution. KnowledgeSEEKER® is a single-strategy desktop or client/server tool relying on a tree-based methodology for data mining. It provides a nice GUI for model building and letting the user explore data. It also allows users to export the discovered data model as text, SQL query, or Prolog program. It runs on Windows and Unix platforms, and accepts data from a variety of sources.

Autoclass III

  • Vendor:
    www.openchannelsoftware.com/projects/AUTOCLASS_III/
  • Autoclass III is an unsupervised Bayesian classification system for independent data. It seeks a maximum posterior probability to provide a simple approach to problems such as classification, clustering, and general mixture separation. It works on Unix platforms.

BayesiaLab

  • Vendor: Bayesia (
    www.bayesia.com
    )
  • BayesiaLab is a complete and powerful data-mining tool based on Bayesian networks, including data preparation, missing values imputation, data and variables clustering, and unsupervised and supervised learning.

Data Applied

  • Vendor: Data Applied (
    http://data-applied.com
    )
  • Data Applied, offers a comprehensive suite of web-based data mining techniques, an XML web API, and rich data visualizations.

DataEngine

  • Vendor: MIT GmbH (
    www.dataengine.de
    )
  • DataEngine is a multiple-strategy data-mining tool for data modeling, combining conventional data-analysis methods with fuzzy technology, neural networks, and advanced statistical techniques. It works on the Windows platform.

Evolver™

  • Vendor: Palisade Corp. (
    www.palisade.com
    )
  • Evolver is a single-strategy tool. It uses genetic-algorithm technology to solve complex optimization problems. This tool runs on all Windows platforms and it is based on data stored in Microsoft Excel tables.

GhostMiner System

  • Vendor: FQS Poland (
    www.fqs.pl
    )
  • GhostMiner, complete data mining suite, including k-nearest neighbors, neural nets, decision tree, neurofuzzy, SVM, PCA, clustering, and visualization.

KXEN Analytic

  • Vendor: KXEN Inc. (
    www.kxen.com
    )
  • KXEN (Knowledge eXtraction ENgines), providing Vapnik SVM (Support Vector Machines) tools, including data preparation, segmentation, time series, and SVM classifiers.

NeuroSolutions

  • Vendor: NeuroDimension Inc. (
    www.neurosolutions.com
    )
  • NeuroSolutions combines a modular, icon-based network design interface with an implementation of advanced learning procedures, such as recurrent backpropagation and backpropagation through time, and it solves data-mining problems such as classification, prediction, and function approximation. Some other notable features include C++ source code generation, customized components through DLLs, a comprehensive macro language, and Visual Basic accessibility through OLE Automation. The tool runs on all Windows platforms.

Oracle Data Mining

  • Vendor: Oracle (
    www.oracle.com
    )
  • Oracle Data Mining (ODM)—an option to Oracle Database 11 g Enterprise Edition—enables customers to produce actionable predictive information and build integrated business intelligence applications. Using data-mining functionality embedded in Oracle Database 11 g, customers can find patterns and insights hidden in their data. Application developers can quickly automate the discovery and distribution of new business intelligence—predictions, patterns and discoveries—throughout their organization.

Optimus RP

  • Vendor: Golden Helix Inc. (
    www.goldenhelix.com
    )
  • Optimus RP, uses Formal Inference-based Recursive Modeling (recursive partitioning based on dynamic programming) to find complex relationships in data and to build highly accurate predictive and segmentation models.

Partek Software

  • Vendor: Partek Inc. (
    www.partek.com
    )
  • Partek Software is a multiple-strategy data-mining product. It is based on several methodologies including statistical techniques, neural networks, fuzzy logic, genetic algorithms, and data visualization. It runs on UNIX platforms.

Rialto™

  • Vendor: Exeura (
    www.exeura.com
    )
  • Exeura Rialto™ provides comprehensive support for the entire data mining and analytics lifecycle at an affordable price in a single, easy-to-use tool.

Salford Predictive Miner

  • Vendor: Salford Systems (
    http://salford-systems.com
    )
  • Salford Predictive Miner (SPM) includes CART®, MARS, TreeNet, and RandomForests, and powerful new automation and modeling capabilities. CART® is a robust, easy-to-use decision tree that automatically sifts large, complex databases, searching for and isolating significant patterns and relationships. Multivariate Adaptive Regression Splines (MARS) focuses on the development and deployment of accurate and easy-to-understand regression models. TreeNet demonstrates remarkable performance for both regression and classification and can work with varying sizes of data sets, from small to huge, while readily managing a large number of columns. RandomForests is best suited for the analysis of complex data structures embedded in small to moderate data sets containing typically less than 10,000 rows but allowing for more than 1 million columns. RandomForests has therefore been enthusiastically endorsed by many biomedical and pharmaceutical researchers.

STATISTICA Data Miner

  • Vendor: Statsoft (
    www.statsoft.com
    )
  • STATISTICA Data Miner contains the most comprehensive selection of data-mining methods available on the market, for example, by far the most comprehensive selection of clustering techniques, neural networks architectures, classification/regression trees (also called recursive partitioning methods), multivariate modeling (including MARSplines, Support Vector Machines), association and sequence analysis (an optional add-on), and many other predictive techniques, even methods for advanced/true simulation and optimization of models are provided. It also provides the largest selection of graphics and visualization procedures of any competing products, to enable effective data exploration and visual data mining.

Synapse

  • Vendor: Peltarion (
    www.peltarion.com
    )
  • Synapse, a development environment for neural networks and other adaptive systems, supporting the entire development cycle from data import and preprocessing via model construction and training to evaluation and deployment; allows deployment as. NET components.

SOMine

  • Vendor: Viscovery (
    www.viscovery.net
    )
  • This single-strategy data-mining tool is based on self-organizing maps and is uniquely capable of visualizing multidimensional data. SOMine supports clustering, classification, and visualization processes. It works on all Windows platforms.

TIBCO Spotfire® Professional

  • Vendor: TIBCO Software Inc. (
    www.spotfire.tibco.com
    )
  • TIBCO Spotfire® Professional makes it easy to build and deploy reusable analytic applications over the Web, or perform pure ad hoc analytics, driven on-the-fly by your own knowledge, intuition, and desire to answer the next question. Spotfire analytics does all this by letting you interactively query, visualize, aggregate, filter, and drill into data sets of virtually any size. Ultimately you will reach faster insights with Spotfire and bring clarity to business issues or opportunities in a way that gets all the decision makers on the same page quickly.

A.5.3 Commercial Software WITHOUT Trial Version

AdvancedMiner

  • Vendor: StatConsulting (
    www.statconsulting.eu
    )
  • AdvancedMiner is a platform for data mining and analysis, featuring modeling interface (OOP script, latest GUI design, advanced visualization) and grid computing.

Affinium Model

  • Vendor: Unica Corp. (
    www.unica.com
    )
  • Affinium Model (from Unica), includes valuator, profiler, response modeler, and cross-seller. Unica provides innovative marketing solutions that turn your passion for marketing into business success. Our unique interactive marketing approach incorporates customer and Web analytics, centralized decision, cross-channel execution, and integrated marketing operations. More than 1000 organizations worldwide depend on Unica.

IBM SPSS Modeler Professional

Other books

Doctor at Villa Ronda by Iris Danbury
The Tea Machine by Gill McKnight
Our Yanks by Margaret Mayhew
Baumgartner Hot Shorts by Selena Kitt
Guarding Grayson by Cathryn Cade
Mathieu by Irene Ferris
Between Love and Duty by Janice Kay Johnson