Using Rank Propagation and Probabilistic Counting for Link-based Spam Detection

Posted in Companies, Science on July 25, 2008


Using Rank Propagation and Probabilistic Counting for Link-based Spam Detection

Lecture slides:

  • Content
  • What is on the Web?
  • Web spam (keywords + links)
  • Web spam (mostly keywords)
  • Search engine?
  • Fake search engine
  • Problem: “normal” pages that are spam
  • Link farms
  • Motivation
  • Metrics
  • Test collection
  • Degree-based measures
  • Degree
  • Edge reciprocity
  • Assortativity
  • Automatic classifier
  • PageRank
  • Maximum PageRank in the Host
  • Variance of PageRank
  • Variance of PageRank of in-neighbors
  • Automatic classifier
  • TrustRank
  • TrustRank score
  • TrustRank / PageRank
  • Automatic classifier
  • Truncated PageRank
  • Path-based formula for PageRank
  • General functional ranking
  • Truncated PageRank
  • Truncated PageRank(T=2) / PageRank
  • Max. change of Truncated PageRank
  • Automatic classifier
  • Counting supporters
  • Idea: count “supporters” at different distances
  • High and low-ranked pages are different
  • Probabilistic count
  • Hosts at distance 4
  • Automatic classifier
  • Conclusions
  • Summary of classifiers
  • Top 10 metrics
  • Conclusions

Author: Carlos Castillo, Yahoo! Research

Watch Video

Tags: Yahoo!, Science, Lectures, Computer Science, VideoLectures.Net, Network Analysis, Web Mining