Videos tagged with Text Mining
Computer document processing often starts with an abstract, structural, representation before entering a processing pipeline which creates a desired layout and appearance. But unfortunately the whole system resembles a series of steps in a one-way chemical reaction, or the successive irreversible stages of creating assembler code using a compiler. This `one-way function' behaviour is most obvio...
Supporting Casual Data-Centric Interactions on the Web
It is now mainstream thinking that the Web is not merely a web of textual documents, but that these documents contain fine-grained, structured data within (e.g., phone numbers, street addresses, job titles, and even subject/verb/object relationships). Infrastructures and user interfaces need to be built to leverage such structured data to make our use of the Web more effective. My doctoral rese...
Ontologies and Machine Learninig
We address the problem of constructing light-weight ontology from social network data. As an example we use social network of a mid size research institution obtained based on e-mail communication. The main contribution is an architecture consisting from five major steps that enable transformation of the data from a given e-mail transactions recordings to an ontology estimating the structure of...
Text Garden
Lecture slides: Outline What is Text-Garden? Some history… …local JSI development of Text-Garden Major functionalities Functionality blocks Lexical processing Unsupervised learning Supervised learning Dimensionality reduction Named Entity Extraction Crawler & Search Engine Support for selected external sources Technical aspects How to use Text-Garden functionality? Multiplatfo...
Ontogen Software Demo
We address the problem of constructing light-weight ontology from social network data. As an example we use social network of a mid size research institution obtained based on e-mail communication. The main contribution is an architecture consisting from five major steps that enable transformation of the data from a given e-mail transactions recordings to an ontology estimating the structure of...
Learning Hierarchical Multi-Category Text Classification Models
Lecture slides: Hierarchical Multilabel Classification: Frequently used learning strategies for hierarchies Max-margin Structured output approach Loss functions for hierarchies Optimization problem Marginalized problem Efficient optimization Conditional Gradient-based training Experiments Microlabel prediction quality: whole tree Levelwise F1 Optimization efficiency Conclusions Author: Juho Rou...
The use of machine translation tools for cross-lingual text-mining
Lecture contents: Outline Cross-lingual text mining KCCA (Kernel Canonical Correlation Analysis) Paired training set and machine translation Experiments Experiment #1 – Information retrieval Results Experiment #2 – Classification Results Conclusions Questions? Author: Blaž Fortuna, Jozef Stefan Institute
Mixtures of Hierarchical Topics with Pachinko Allo cation
The four-level pachinko al location model (PAM) (Li & McCallum, 2006) represents correlations among topics using a DAG structure. It does not, however, represent a nested hierarchy of topics, with some topical word distributions representing the vocabulary that is shared among several more specific topics. This paper presents hierarchical PAM -- an enhancement that explicitly represents a t...
Text Categorization
This course will cover the principal topics important to creating a working text categorization system. It will focus on the components of such a system and processes required to create it based on the practical experiences of the Scamseek project. The role of machine learning will be the center of the discussion but the surrounding tasks of language modeling, computational linguistics and soft...
Robust Textual Inference Using Diverse Knowledge Sources
Lecture contents: Our approach Outline of this talk Sentence processing Named Entity Recognizer Parse tree post-processing Parse tree ? Dependencies Representations Annotations More annotations Event nouns Outline of this talk Graph Matching Approach Graph Matching: Idea Graph Matching: Costs Digression: Phrase similarity Graph Matching: Example Outline of this talk Abductive inference Abductiv...