Workshop – Big Data: The Leading Ways to Improve Business with Data Science (Non-Technical)
Wednesday, May 19 – Livestream
Full day: 8:00am – 3:00pm PDT
Intended Audience: Managers, decision makers, practitioners, and professionals interested in a broad overview and introduction
Knowledge Level: All levels
Attendees will receive an electronic copy of the course notes and materials
Attend this management- and executive-level workshop for a conceptual overview of today’s emerging trends, capacities, and opportunities.
“Big Data” is everywhere and “data science” is all the rage. These topics are impacting every industry and institution. Big excitement about big data comes from the intersection of dramatic increases in computing power and data storage with growing streams of data coming from almost every person and process on Earth. The pressing question is, how do we best make value of all this data – what should we do with it?
Working with big data and applying data science effectively depends on understanding the sources of data and the issues in storing and analyzing it:
- Where does big data come from?
- How do you manage, store, and compute on big data?
- What qualifies as “big”?
This one-day workshop reviews major big data success stories that have transformed businesses and created new markets.
Marc will cover these revealing stories in order to illustrate the key concepts, tools, and value-proven applications driving the big data revolution.
“Big data” is a open buzzword – it could be defined as any amount of data you can’t afford to handle – but the big, newfound value achieved by computing at scale is no fad.
What you will learn:
- Where does big data come from: Common sources of big data.
- What makes data big: Velocity, Variety, and Volume!
- How can we leverage it: Open tools and platforms for storing and analyzing big data.
- The new paradigm: Today’s shift from hypothesis testing to a broad exploration for correlations is a revolutionary change in the way data is explored.
- Where data science comes into play: Best practices in the field.
- Best practices for analyzing big data: Key methods in data science, predictive analytics, machine learning, and network and text analytics to analytically learn from data.
- Social Data: Finding key connections in webs of people and events.
- Applications of big data insights to business.
- Future directions in big data: bigger, bolder, and better.
- Workshop starts at 8:00am PDT
- AM Break from 9:30 – 9:45am PDT
- Lunch Break from 11:00 – 11:45am PDT
- PM Break: 1:15pm – 1:30pm PDT
- Workshops ends at 3:00pm PDT
Marc Smith, Chief Social Scientist, Connected Action Consulting Group
Dr. Marc A. Smith is a sociologist specializing in the social organization of online communities and computer mediated interaction. Smith leads the Connected Action consulting group. Smith co-founded the Social Media Research Foundation (http://www.smrfoundation.org/), a non-profit devoted to open tools, data, and scholarship related to social media research. He contributes to the open and free NodeXL project (http://nodexl.codeplex.com) that adds social network analysis features to the familiar Excel spreadsheet. NodeXL enables social network analysis of email, Twitter, Flickr, WWW, Facebook and other network data sets. Along with Derek Hansen and Ben Shneiderman, he is the co-author and editor of Analyzing Social Media Networks with NodeXL: Insights from a connected world, from Morgan-Kaufmann which is a guide to mapping connections created through computer-mediated interactions. Smith has published research on social media extensively, providing a map to the landscape of connected communities on the Internet.
Vladimir Barash, Director, Graphika
Vladimir Barash is Director Graphika Labs. He has received his Ph.D. from Cornell University, where he studied Information Science and wrote his thesis on the flow of rumors and virally marketed products through social networks. At Graphika, Vladimir’s research focuses on deep learning applications of network analysis, detection and deterrence of disinformation operations on networks, and causal mechanisms of large-scale social behavior.
In addition to his research duties, Vladimir has a decade’s experience working with big data, from scientific computing (Matlab, scipy) to parallel processing technologies (Hadoop / Hive) to data storage and pipelining (Redis, mongodb, MYSQL) at the terabyte scale. At Graphika, Vladimir has co-designed and implemented systems that process tens of millions every six hours to deliver timely information on influencers and conversation leaders in online communities tailored to client interests. Vladimir is proficient in over a dozen programming languages and frameworks and has designed production-ready systems for every stage of big data analysis, from collection to client-facing presentation via web, spreadsheet or graphic visualization.
Vladimir has been active in the Social Media Research Foundation (SMRF) and the NodeXL project, helping build a network analysis package that brings relational data analysis at scale to the fingertips of any interested user, without requiring specialized knowledge or technical training beyond familiarity with Microsoft Excel. NodeXL has enabled users in academia, industry and the general public to analyze tens of thousands of social networks, from networks of politicians voting on bills to networks of motorcycle enthusiasts working together. As part of his work with SMRF and the NodeXL team, Vladimir has contributed a chapter on Twitter analysis to Analyzing Social Media Networks with NodeXL: Insights from a Connected World.
Vladimir’s work has received awards at the International Conference for Weblogs in Social Media and Bits on Our Minds. He has presented his research at academic and industrial campuses all over North America and Europe, including: Xerox/PARC, Microsoft, Colgate University, Northeastern University, UMCP and Oxford University (Oxford Internet Institute). He currently resides in Somerville, MA.