Searching and Mining Open Source Code from the Web
Various data mining techniques have been applied to mine source code repositories. However, relying only on one or several local source code repositories may not provide sufficient, relevant data samples (e.g., usage of a certain API call) for mining tasks such as code reuse and defect detection. The recent availability of code search engines allows the mining scope to be scaled to billions of lines of open source code available from the Web, and thus increases the chance of getting sufficient, relevant data samples for mining. This talk will discuss the mining opportunities and challenges based on searching open source code from the Web and present new approaches that mine open source code searched from the Web to assist code reuse and defect detection
Speaker: Tao Xie
Tao Xie is an Assistant Professor in the Department of Computer Science at North Carolina State University. He received his Ph.D. in Computer Science from the University of Washington in 2005. He leads the Automated Software Engineering Research Group at North Carolina State University. His research centers around two major themes: automated software testing and mining software engineering data. He has served on a number of conference program committees including ISSTA 2008/2009, ASE 2006/2007(Expert-Review Panel)/2008, ICST 2008, AOSD 2007, and ICSM 2007/2008. Besides doing research, he has contributed to understanding the software engineering research community by building community webs such as Software Engineering Academic Genealogy and Software Engineering Conference Map.
Google Tech Talks
June, 4 2008