Videos tagged with Data Mining
Google Tech Talk (more info below) March 30, 2011 Presented by Raffael Marty. ABSTRACT In this two part presentation we will explore log analysis and log visualization. We will have a look at the history of log analysis; where log analysis stands today, what tools are available to process logs, what is working today, and more importantly, what is not working in log analysis. What will the futur...
A general purpose segmentation algorithm using analytically evaluated random walks
An ideal segmentation algorithm could be applied equally to the problem of isolating organs in a medical volume or to editing a digital photograph without modifying the algorithm, changing parameters, or sacrificing segmentation quality. However, a general-purpose, multiway segmentation of objects in an image/volume remains a challenging problem. In this talk, I will describe a recently develop...
Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection
Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analys...
Correlation Search in Graph Databases
Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, the research of correlation mining from graph databases is still lacking despite the fact that graph data, especially in various scientific domains, proliferate in recent years. In this paper, we propose a new problem of correlation mining from g...
Spooky Stuff in Metric Space
Decision trees are intelligible, but do they perform well enough that you should use them? Have SVMs replaced neural nets, or are neural nets still best for regression, and SVMs best for classification? Boosting maximizes margins similar to SVMs, but can boosting compete with SVMs? And if it does compete, is it better to boost weak models, as theory might suggest, or to boost stronger models? B...
Statistical Change Detection for Multi-Dimensional Data
This paper deals with detecting change of distribution in multi-dimensional data sets. For a given baseline data set and a set of newly observed data points, we define a statistical test called the density test for deciding if the observed data points are sampled from the underlying distribution that produced the baseline data set. We define a test statistic that is strictly distribution-free u...
Data Mining Vs. Semantic Web
This tutorial covers the field of datamining in general, talks about its possible applications (special case studies can be added on request), and elaborates on the issue of hardware accelerators for datamining. The introduction gives a formal and an informal definition (through an example), plus it points to possible missunderstandings typical of the topic. The part on methods and algorithms c...
MINI: Mining Informative Non-redundant Itemsets
Lecture slides: MINI: Mining Informative Non-redundant Itemsets We discuss frequent itemset mining… Solutions? Author: Tijl de Bie, Department of Engineering Mathematics, University of Bristol
Realistic Synthetic Data for Testing Association Rule Mining Algorithms for Market Basket Databases
Lecture slides: Item occurrence distribution: real data Item occurrence distribution: synthetic data The Co-Zi generator Author: Michael Zito, University of Liverpool
Finding low-entropy sets and trees from binary data
The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the co-occurrence of the value 1 in the data. While this choice makes sense in the context of sparse binary data, it disregards potentially interesting subsets of attributes that have some other type of dependency structure. We c...