BigDataCamp LA

Session Schedule

Jun 14, 2014 Room Session Name
8:00 am - 9:00 am Lobby Registration, Coffee, Bagels & Networking
Registration, Coffee, Bagels & Networking
9:00 am - 10:25 am Main Hall Big Data on the Bleeding Edge - More, better, faster by Lynn Langit
Lynn Langit
Lynn Langit is a specialist in Big Data technologies (SQL and NoSQL, She lives in Southern California. Lately she's worked with AWS Redshift, Azure storage and Google HR datastore in production. She also works with SQL Server, MongoDB, Neo4j and more. Lynn's published three books on SQL Server Busin...
9:00 am - 10:25 am Main Hall Evolution of the Big Data Stack by Jonathan Hsieh
Jonathan Hsieh
Jonathan Hsieh, Team Tech Lead and Software Engineer at Cloudera. He is an HBase committer and PMC member, a committer and founder of the Apache Flume project.
9:00 am - 10:25 am Main Hall Dr. Konstantin "Cos" Boudnik, VP at WANdisco
Konstantin Boudnik
Open Source and Big Data
9:00 am - 10:25 am Main Hall The State of Big Data in LA - Jonathan Gray, CEO & co-founder of Continuuity
Jonathan Gray
The State of Big Data in LA - Jonathan Gray, CEO & co-founder of Continuuity
10:30 am - 12:00 pm 130 Tutorials (90 mins) - Hadoop Fundamentals by Santosh Jha of Aziksa, Ends at 12:00 PM
Santosh Jha
This session will begin with the big data motivation and explaining all the components in Hadoop including Hadoop cluster and distributed file systems. How to use HDFS distributed storage. What is map reduce and its effect in distributed processing. In the second part of the session, each participan...
10:30 am - 11:10 am 138 Sponsored Talk - Apache Tez by Bikas Saha of Hortonworks
Bikas Saha
Abstract - Apache Tez is a modern data processing engine designed for YARN on Hadoop 2. Tez aims to provide high performance and efficiency out of the box, across the spectrum of low latency queries and heavy-weight batch processing. It provides a sophisticated topology API, advanced scheduling and...
10:30 am - 11:10 am 140 Sponsored Talk - Malware Detection Using Spark by Sungwook Yoon of MapR
Sungwook Yoon
IT environment moves rapidly around hi-tech industries. Every year we see the advance of new technology stacks serving billions of people world wide. What lurks around this blazing speed of technology is still not thoroughly tested new technology being exploited by malware writers. Target, Ebay may...
11:15 am - 11:55 am 128 Hadoop - How to Amplify and Proliferate Your Big Data Model by Tim Bezold of Datameer
Abstract - The classic data mining cycle involves steps like data understanding and evaluation that are constantly repeated when training a new model for a specific data-mining problem. When it comes to big data, algorithms are often too complex to involve the full data set in each step of the cycle...
11:15 am - 11:55 am 138 NoSQL - The Imminent Fracture in Corporate Data Architecture: Fast + Big by Scott Jar of VoltDB
Abstract - This talk will focus on the structural transformation of the Corporate Data Architecture, highlighting the major technology components necessary to make up what will be ubiquitous in the near future. What used to be two separate functions – the application and the analytics – are beginnin...
11:15 am - 11:55 am 140 Data Science - The Future's so bright (You can barely make any predictions about it) by Timothy Shea of DataSift
Abstract - How can we prove if 2 events are related, or if they happen totally by chance? Do our observations about the past tell us anything about the future? And by the way, what kind of foundation is this to be able to Drop Science? In this session we'll look at the influence that social data fro...
12:00 pm - 1:00 pm Cafeteria Lunch
Lunch
1:00 pm - 1:40 pm 128 Hadoop - Apache Tajo: An open source Big data warehouse system on Hadoop by Hyunsik Choi of Gruter
Abstract - Apache Tajo is an open source big data warehouse system on Hadoop. It became an Apache Top-level project in March 2014. Currently, it provides most of SQL features, and we have recently added window functions. In this talk, I'll present the introduction to Apache Tajo and share the curren...
1:00 pm - 1:40 pm 130 Tutorials (90 mins) - Pig: The Prequel to SQL by David Wolcott of Lynda.com, Ends at 2:30 PM
Dave Wolcott
Abstract - This will be a hands on tutorial of rudimentary commands using utilities and commands in the Hadoop Stack. Attendees will land data in the Hadoop File System and manipulate it with Pig and then generate a report with Hive (hence the title of the session) and schedule jobs using Oozie. Alt...
1:00 pm - 1:40 pm 138 NoSQL - Aggregation Options for MongoDB by Asya Kamsky of MongoDB
Asya Kamsky
Abstract - MongoDB scales easily to store mass volumes of data. However, when it comes to making sense of it all what options do you have? MongoDB has several native tools for processing data. This presentation will focus on native implementation of the new Aggregation Framework and Map Reduce. This...
1:00 pm - 1:40 pm 140 Data Science at CARD.com: Starting Small with Big (Data) Dreams by Ajay Gopal of CARD.com
How much data should an early start-up collect? How do you grow your infrastructure as the data grows? How do your decide what to test and what not? Answers to these questions are different for each startup. Ajay will share some lessons learnt over the past year at CARD.com and their big-data vision...
1:45 pm - 2:25 pm 128 NoSQL - Don’t reinvent the big data wheel! Building real-time, big data applications on Cassandra with the open-source Kiji project by Clinton Kelly of WibiData
Abstract - Kiji is an open-source, platform that provides developers a head start building Big Data Applications on Cassandra. Created by engineers with experience building personalized applications at companies like Google, Kiji includes modules for capturing and analyzing data, and training and a...
1:45 pm - 2:25 pm 138 Hadoop - Hive 0.13: An upgrade in Performance, Scaling, Security and Multi-tenancy by Vikram Dixit K of Hortonworks
Abstract - Over 145 developers representing 44 companies, from across the Apache Hive community contributed over 390,000 lines of code to the Hive project in just 13 months, nearly doubling the Hive code base and resulting in the release of Hive 0.13. With these changes, hive has become faster, more...
1:45 pm - 2:25 pm 140 Data Science - Data Science: Methods & Tools by Szilard Pafka of Epoch & LA Data Meetups
Abstract - This is an overview of the data science field at a mix of high-level and technical, beginner and intermediate level. I will review the process of analyzing data, the most common set of tools for data munging, data visualization and machine learning, some of the best practices for doing da...
2:30 pm - 3:10 pm 128 Hadoop - Impala Under the Covers by Ahad Rana of Factual
Abstract - A brief exploration of the technical underpinnings of Impala, and how it can be used to provide interactive query capabilities on top of Hadoop data. Factual makes heavy use of Hadoop and other related technologies in all aspects of its daily operations. Bio - Ahad is the Director of E...
2:30 pm - 3:10 pm 130 Big Data - Hybrid Architecture for Integrated User View of Data of different Temperature and Velocity by Peyman Mohajerian of Teradata
Abstract - There are use cases where data is gathered at different velocity and there is a mixture of new and legacy mutable data set at big data scale. How do we architect a solution that takes all these seemingly contradictory factors into account and still provides a single transparent view to th...
2:30 pm - 3:10 pm 138 Data Science - Call of Data: Navigating a virtual Warzone in Call of Duty by Dylan Rogerson of Activision
Abstract - Call of Duty is the biggest first person shooter video game franchise of all time. With millions of players active every day, Activision’s Game Analytics Team navigates massive amounts of data to better inform game design decisions. Call of Duty’s online multi-player matches take place on...
2:30 pm - 3:10 pm 140 NoSQL - Big Data on the Bleeding Edge - More, Better, Faster
Lynn Langit
What comes next? What are the most innovative developments in Big Data storage and query design? Where is the innovation, what should be you be trying out and looking at? In this talk I'll cover the latest and greatest for the Big Data world - this will include in-memory stores such as Aerospike, tr...
3:15 pm - 3:45 pm Lobby Coffee Break
Coffee Break
3:45 pm - 4:25 pm 128 Sponsored Talk - Cloudera and Spark: Fast, Powerful Data Processing in the Enterprise Data Hub by Ben White of Cloudera
Abstract - Apache Spark is an open source, parallel data processing framework that complements Apache Hadoop to make it easy to develop fast, unified Big Data applications combining batch, streaming, and interactive analytics on all your data. Cloudera now offers commercial support for Spark with Cl...
3:45 pm - 4:25 pm 130 Big Data - From Big Data to Big Insight by Dr. Alex Liu of IBM
Abstract - Alex will present a process of using big data technologies to turn big data into big insights with real life examples in financial services and retailing. Specifically, he will discuss how big data can be used to improve predictive models so to derive more insights for companies. Based on...
3:45 pm - 4:25 pm 138 Sponsored Talk - Big Data and Lynda.com by Subash D'Souza of Lynda.com
Abstract - lynda.com is an online learning company that helps anyone learn software, design, and business skills to achieve their personal and professional goals. With the growth of users has brought a proliferation of data that can be used to make lynda.com's user experience more personalized and e...
3:45 pm - 4:25 pm 140 Sponsored Talk - Rethinking SQL for Big data – Don’t compromise on flexibility or performance by Neeraja Rentachintala of MapR
Abstract - Can I reduce the time to value for my business users on Hadoop data? How can I do SQL on semi-structured types? How do I create and manage schemas for my data when the applications are changing fast? What types of distributed systems problems do I have to solve when you move beyond tradit...
4:30 pm - 5:10 pm 128 Data Science - The Role of Data Science in Asking and Answering 'Good' Questions by Eric Kostello of Nielsen
Eric Kostello
Abstract - Engaging in data analysis very often means struggling to produce answers under less than ideal conditions. (Poor quality data, poor understanding of the issue, etc.) I suggest some strategies and approaches that I have found useful for producing analytical results that contribute value. I...
4:30 pm - 5:10 pm 130 Big Data - Introduction To Apache Storm by Joe Rossi of Trace3
Abstract - A session focused on ramping you up on what Apache Storm is, how it works and what it's capable of. We will also look at what Storm-on-YARN brings to the table and some future projects in the Apache Storm space to keep an eye on. Bio - Joe is the Engineering Lead and Big Data Architect...
4:30 pm - 5:10 pm 138 NoSQL - NoSQL on the Amazon Cloud: DynamoDB by Michael Limcaco of Amazon
Abstract - DynamoDB is a fast, fully managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic. Its guaranteed throughput and single-digit millisecond latency make it a great fit for gaming, ad tech, mobile...
4:30 pm - 5:10 pm 140 Hadoop - Yarns about YARN: Migrating to MapReduce v2 by Kathleen Ting of Cloudera
Kathleen Ting
The next generation of MapReduce, YARN, has widely touted job throughput and Apache Hadoop cluster utilization benefits. Less known are the pitfalls littering the migration path to YARN. Learn from our extensive field experience to avoid those pitfalls and get your YARN cluster configured right the...
5:15 pm - 5:55 pm 128 NoSQL - Computable Object Store with OpenStack Swift and ZeroVM by Adrian Otto of Rackspace
Adrian Otto
Recording available here. Abstract - ZeroVM combines with OpenStack Swift through a project called ZeroCloud. ZeroCloud is middleware for Swift and adds a job manager and a ZeroVM daemon installed on the storage nod...
5:15 pm - 5:55 pm 130 Big Data - Empowering Your Customers With Your Big Data by Taylor Dondich of MaxCDN
Abstract - As a service provider, your infrastructure will produce a great deal of information that can help your customers make informed decisions on how to better use your offerings. Find out how to use Big Data and present it to your customers in interesting ways to give them the confidence neede...
5:15 pm - 5:55 pm 138 Data Science - Supervised Learning for Recommendations @ Meetup by Evan Estola of Meetup.com
Abstract - Collaborative Filtering and other common recommendation algorithms are a powerful technique for some scenarios. I will cover how to design a recommendation system from the ground up using an ensemble classifier and supervised learning to avoid some of the pitfalls of collaborative filteri...
5:15 pm - 5:55 pm 140 Hadoop - Wadoop - Xtensible Security Framework for Hadoop by Vivek Shrivastava of Wipro
VIvek
With the advent of bigdata being a major player in the technology shift , security and governance has been perceived to be a big challenge in most of the industry verticals. e.g. Banking,Finance, Insurance. Wadoop - a security framework for Hadoop, will be piloted in this conference and introduced t...
6:00 pm - 7:00 pm Cafeteria Networking Event
Networking Event