Google Tech Talk (more info below) August 9, 2011 Presented by Alon Halevy. ABSTRACT: Google hosted 100 attendees of the 2011 conference for the Association of the Advancement of Artificial Intelligence (AAAI) at our San Francisco office. The program showcased a featured talk by Director of Research Peter Norvig and a lightning talk series on an array of projects relevant to the field of artificial intelligence and its applications. About the speaker: Alon Halevy is a Research Scientist at Google. He leads structured data efforts including Google Fusion Tables.

Want more on these topics?

Browse the archive of posts filed under Companies, Conferences]]>

App Engine 101

Amit Agarwal, Max Lin, Gideon Mann, Siddartha Naidu

Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access.

For all I/O 2010 sessions, please go to http://code.google.com/events/io/2010/sessions.html

Want more on these topics?

Browse the archive of posts filed under Companies, Databases, Science, Conferences]]>

App Engine 101

Amit Agarwal, Max Lin, Gideon Mann, Siddartha Naidu

Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access.

For all I/O 2010 sessions, please go to http://code.google.com/events/io/2010/sessions.html

Want more on these topics?

Browse the archive of posts filed under Companies, Databases, Science, Conferences]]>

We present a novel method for generic visual categorization: the problem of identifying the object content of natural images while generalizing across variations inherent to the object class. This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches. We propose and compare two alternative implementations using different classifiers: Naïve Bayes and SVM. The main advantages of the method are that it is simple, computationally efficient and intrinsically invariant. We present results for simultaneously classifying several semantic visual categories. These results clearly demonstrate that the method is robust to background clutter and produces good categorization accuracy even without exploiting geometric information.

*Author: Chris Williams, University Of Edinburgh*

Want more on these topics?

Browse the archive of posts filed under Science]]>

In this paper we develop a new formulation of probabilistic relaxation labeling for the task of data classification using the theory of diffusion processes on graphs. The state space of our process as the nodes of a support graph which represent potential object-label assignments. The edge-weights of the support graph encode data-proximity and label consistency information. The state-vector of the diffusion process represents the object-label probabilities. The state vector evolves with time according to the Fokker-Planck equation.We show how the solution state vector can be estimated using the spectrum of the Laplacian matrix for the weighted support graph. Experiments on various data clustering tasks show effectiveness of our new algorithm.

*Author: Edwin Hancock, University Of York*

Want more on these topics?

Browse the archive of posts filed under Science]]>

The field of statistical pattern recognition is characterized by the use of feature vectors for pattern representation, while strings or, more generally, graphs are prevailing in structural pattern recognition. In this paper we aim at bridging the gap between the domain of feature based and graph based object representation. We propose a general approach for transforming graphs into n-dimensional real vector spaces by means of prototype selection and graph edit distance computation. This method establishes the access to the wide range of procedures based on feature vectors without loosing the representational power of graphs. Through various experimental results we show that the proposed method, using graph embedding and classification in a vector space, outperforms the tradional approach based on k-nearest neighbor classification in the graph domain.

*Author: Kaspar Riesen, University Of Bern*

Want more on these topics?

Browse the archive of posts filed under Science]]>

Sequence classification is a significant problem that arises in many different real-world applications. The purpose of a sequence classifier is to assign a class label to a given sequence. Also, to obtain the pattern that characterizes the sequence is usually very useful. In this paper, a technique to discover a pattern from a given sequence is presented followed by a general novel method to classify the sequence. This method considers mainly the dependencies among the neighbouring elements of a sequence. In order to evaluate this method, a UNIX command environment is presented, but the method is general enough to be applied to other environments.

*Author: José Antonio Iglesias, Carlos Iii University Madrid*

Want more on these topics?

Browse the archive of posts filed under Science]]>

Graph data is getting increasingly popular in, e.g., bioinfor- matics and text processing. A main dificulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible sub- graphs, the dimensionality gets too large for usual statistical methods.

*Author: Koji Tsuda, Max Planck Institute For Biological Cybernetics*

Want more on these topics?

Browse the archive of posts filed under Science]]>

In this paper we study asymmetric proximity measures on directed graphs, which quantify the relationships between two nodes or two groups of nodes. The measures are useful in several graph mining tasks, including clustering, link prediction and connection subgraph discovery. Our proximity measure is based on the concept of escape probability. This way, we strive to summarize the multiple facets of nodes-proximity, while avoiding some of the pitfalls to which alternative proximity measures are susceptible. A unique feature of the measures is accounting for the underlying directional information. We put a special emphasis on computational efficiency, and develop fast solutions that are applicable in several settings. Our experimental study shows the usefulness of our proposed direction-aware proximity method for several applications, and that our algorithms achieve a significant speedup (up to 50,000x) over straightforward implementations.

*Author: Hanghang Tong, Cmu*

Want more on these topics?

Browse the archive of posts filed under Science]]>

Lecture slides:

- Learning CRFs with Hierarchical Features: An Application to Go
- The Game of Go
- Territory Prediction
- Talk Outline
- Hierarchical Patterns
- Models
- Independent Pattern-based Classifiers
- Inference and Training
- Bayesian Model Averaging
- Hierarchical Tree Models
- CRF & Pattern CRF
- Inference and Training
- Pseudolikelihood
- Local Training
- Evaluation
- Models & Algorithms
- Training Time
- Inference Time
- Performance Metrics
- Performance Tradeoffs I
- Why is Vertex Error better for CRFs?
- Why is Net Error worse for CRFs?
- Bias of Local Training
- Performance Tradeoffs II
- Conclusions

*Author: Scott Sanner, University Of Toronto*

Want more on these topics?

Browse the archive of posts filed under Science]]>

The talk will consider ways of bounding the complexity of a graph as measured by the number of partitions satisfying certain properties. The approach adopted uses Vapnik Chervonenkis dimension techniques. An example of such a bound was given by Kleinberg et al (2004) with an application to network failure detection. We describe a new bound in the same vein that depends on the eigenvalues of the graph Laplacian. We show an application of the result to transductive learning of a graph labelling from examples.

*Author: John Shawe Taylor, University Of London*

Want more on these topics?

Browse the archive of posts filed under Science]]>

Several problems in chemistry can be formulated as classification or regression problems over molecules which, when represented by their planar structure, can be seen as labeled graphs. Several approaches have been proposed recently to define positive definite kernels over labeled graphs, paving the way to the use of powerful kernel methods in chemoinformatics. In this talk I will review some of these approaches and present relevant applications in computational chemistry.

*Author: Jean Philippe Vert, Ecole Des Mines De Paris Paris Tech*

Want more on these topics?

Browse the archive of posts filed under Science]]>

Discriminative learning framework is one of the very successful fields of machine learning. The methods of this paradigm, such as Boosting, and Support Vector Machines have significantly advanced the state-of-the-art for classification by improving the accuracy and by increasing the applicability of machine learning methods. One of the key benefits of these methods is their ability to learn efficiently in high dimensional feature spaces, either by the use of implicit data representations via kernels or by explicit feature induction. However, traditionally these methods do not exploit dependencies between class labels where more than one label is predicted. Many real-world classification problems involve sequential, temporal or structural dependencies between multiple labels. We will investigate recent research on generalizing discriminative methods to learning in structured domains. These techniques combine the efficiency of dynamic programming methods with the advantages of the state-of-the-art learning methods.

*Author: Yasemin Altun, Tti*

Want more on these topics?

Browse the archive of posts filed under Science]]>

Graph matching plays a key role in many areas of computing from computer vision to networks where there is a need to determine correspondences between the components (vertices and edges) of two attributed structures. In recent years three new approaches to graph matching have emerged as replacements to more traditional heuristic methods. These new methods are: * Least squares - where the optimal correspondence in determined in terms of deriving the best fitting permutation matrix between sets. * Spectral methods - where optimal correspondences are derived via subspace projections in the graph eigenspaces. * Graphical models - where algorithms such as the junction tree algorithm are used to infer the optimal labeling of the nodes of one graph in terms of the other and that satisfy similarity constraints between vertices and edges. In this lecture we review and compare these methods and demonstrate examples where this applies to point set and line matching.

*Author: Terry Caelli, Nicta*

Want more on these topics?

Browse the archive of posts filed under Science]]>

Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, the research of correlation mining from graph databases is still lacking despite the fact that graph data, especially in various scientific domains, proliferate in recent years. In this paper, we propose a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS). CGS adopts Pearson’s correlation coefficient as a correlation measure to take into consideration the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions which set bounds on the occurrence probability of a candidate in the database. With this result, we design an efficient algorithm that operates on a much smaller projected database and thus we are able to obtain a significantly smaller set of candidates. To further improve the efficiency, we develop three heuristic rules and apply them on the candidate set to further reduce the search space. Our extensive experiments demonstrate the effectiveness of our method on candidate reduction. The results also justify the efficiency of our algorithm in mining correlations from large real and synthetic datasets.

*Author: Yiping Ke, The Hong Kong University Of Science And Technology*

Want more on these topics?

Browse the archive of posts filed under Science]]>

The k-Nearest Neighbors algorithm can be easily adapted to classify complex objects (e.g. sets, graphs) as long as a proper dissimilarity function is given over an input space. Both the representation of the learning instances and the dissimilarity employed on that representation should be determined on the basis of domain knowledge. However, even in the presence of domain knowledge, it can be far from obvious which complex representation should be used or which dissimilarity should be applied on the chosen representation. In this paper we present a framework that allows to combine different complex representations of a given learning problem and/or different dissimilarities defined on these representations. We build on ideas developed previously on metric learning for vectorial data. We demonstrate the utility of our method in domains in which the learning instances are represented as sets of vectors by learning how to combine different set distance measures.

*Author: Adam Woznica, University Of Geneva*

Want more on these topics?

Browse the archive of posts filed under Science]]>

A key problem for automating the processing of semi-structured resources is the format heterogeneity among data sources. For dealing with heterogeneous semi-structured data, the correspondence between the different formats has to be established. The multiplicity and the rapid growth of information sources have motivated researchers to develop machine learning technologies for helping to automate those transformations.

*Author: Ludovic Denoyer, University Of Paris 6*

Want more on these topics?

Browse the archive of posts filed under Science]]>

Network clustering (or graph partitioning) is an important task for the discovery of underlying structures in networks. Many algorithms find clusters by maximizing the number of intra-cluster edges. While such algorithms find useful and interesting structures, they tend to fail to identify and isolate two kinds of vertices that play special roles - vertices that bridge clusters (hubs) and vertices that are marginally connected to clusters (outliers). Identifying hubs is useful for applications such as viral marketing and epidemiology since hubs are responsible for spreading ideas or disease. In contrast, outliers have little or no influence, and may be isolated as noise in the data. In this paper, we proposed a novel algorithm called SCAN (Structural Clustering Algorithm for Networks), which detects clusters, hubs and outliers in networks. It clusters vertices based on a structural similarity measure. The algorithm is fast and efficient, visiting each vertex only once. An empirical evaluation of the method using both synthetic and real datasets demonstrates superior performance over other methods such as the modularity-based algorithms.

*Author: Xiaowei Xu, University Of Arkansas At Little Rock*

Want more on these topics?

Browse the archive of posts filed under Science]]>