Videos tagged with Big Data
Highly scaled distributed web applications are predicated on a functional network, yet organizations rarely have detailed information about the consumption and expense of network resources. This data is essential for effective denial of service detection, intrusion detection, troubleshooting, capacity planning, and traffic engineering, but the time, cost and knowledge required to acquire and an...
Surge 2010 ~ Working with Dimensional Data in a Distributed Hash Table
Recently a new class of database technologies has developed offering massively scalable distributed hash table functionality. Relative to more traditional relational database systems, these systems are simple to operate and capable of managing massive data sets. These characteristics come at a cost though: an impoverished query language that, in practice, can handle little more than exact-match...
Surge 2011 ~ Panel ~ Big Data meets Cloud
Half of the pundits claim that The Cloud is the future. The other half warn us that Big Data is coming. Real engineers know these two don't always marry easily. How do we build systems that really scale, inside or outside the cloud? What conventional approaches don't work? What new approaches are available? What must we sacrifice from using them, and what do we gain in return? How do the econom...
Surge 2011 ~ Architectures for real-time data
At Circonus we process a lot of data. We learned early on that some data can be sampled and some data cannot. The way you treat data when you "need it all" to make good sense of things is radically different than the way you must treat sampled datapoints. This presentation will walk through the architectural evolution of our system as it had to scale to billions of events per day and trillions ...
Surge 2011 ~ Hybrid data storage: finding balance.
Over the past several years Clearspring has developed custom distributed processing and storage systems for dealing with the billions of views our web products receive per day. A central part of this system is a tree-based storage structure that fills a useful middle ground between the datamodel-centric view of row oriented databases and the query-centric view more common with column oriented o...
Surge 2011 ~ Under the Hood of a Distributed Database Service
Cloudant was born of a desire to make scalable data storage and analysis simple, both for users and administrators. This talk will focus on the technologies and processes that have proven instrumental in the pursuit of that vision. Adam will discuss local storage engines that are insensitive to system crashes and distributed databases that remain available during network partitions. He'll descr...
Surge 2011 ~ Wrestling Large Data Volumes to the Ground
Managing large volumes of data can be challenging—sometimes just getting the data requires moving mountains! Recently, the Exceptional Performance team at Yahoo! was challenged to develop a massive business intelligence system for user performance data. And of course, it had to be fast! Our proposed solution broke all records for data scaling and attempted to optimize the entire data pipeline. ...
Google I/O 2010 - BigQuery and Prediction APIs
Google I/O 2010 - BigQuery and Prediction APIs App Engine 101 Amit Agarwal, Max Lin, Gideon Mann, Siddartha Naidu Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictio...
Google I/O 2010 - BigQuery and Prediction APIs
Google I/O 2010 - BigQuery and Prediction APIs App Engine 101 Amit Agarwal, Max Lin, Gideon Mann, Siddartha Naidu Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictio...
Using Spring with NoSQL Databases
Summary Mark Pollack and Chris Richardson discuss NoSQL, exemplifying with Redis, Cassandra and MongoDB, and Spring Data, a project meant to provide a unified programming model for accessing NoSQL DBs. Bio Dr. Mark Pollack has been a core Spring developer since 2003 and founder and leader of Spring.NET. Mark is a Microsoft MVP. Chris Richardson leads the cloud development at SpringSource and wa...