Workshop – Supercharging Prediction with Ensemble Models
Thursday, June 7, 2018 in Las Vegas
Full-day: 8:30am – 4:30pm
Room: Emperors II
- Practitioners: Analysts who would like to learn theoretical principles of and practical tips for how to build model ensembles.
- Technical Managers: Project leaders and managers who are responsible for developing predictive analytics solutions and want to understand the potential value and limitations of model ensembles.
Knowledge Level: Beginning to intermediate understanding of statistical methods or predictive modeling algorithms.
Once you know the basics of predictive analytics and machine learning—including data exploration, data preparation, model building, and model evaluation—what can be done to improve model accuracy? One key technique is the use of model ensembles, combines several or even thousands of models into a single, new model score. It turns out that model ensembles are usually more accurate than any single model, and they are typically more fault tolerant than single models.
Are model ensembles an algorithm or an approach? How can one understand the influence of key variables in the ensembles? Which options affect the ensembles most? This workshop dives into the key ensemble approaches, including Bagging, Random Forests, and Stochastic Gradient Boosting. Attendees will learn “best practices” and attention will be paid to learning and experiencing the influence various options have on ensemble models so that attendees will gain a deeper understanding of how the algorithms work qualitatively and how one can interpret resulting models. Attendees will also learn how to automate the building of ensembles by changing key parameters.
Participants are expected to know the principles of predictive analytics and how the most important algorithms in predictive analytics work (like decision trees, neural networks, regression, etc.).
Course Notes and Free Textbook
All data referenced in the workshop will be provided on a USB drive and will also be made available via an internet link. Electronic copies of the workshop notebook will be distributed to attendees upon arrival on the USB drive. All attendees will also receive a paperback copy of Dean’s book, Applied Predictive Analytics.
The key concepts covered during this workshop can be applied to many predictive analytics projects regardless of the software used. Live demonstrations using Salford Systems SPM and KNIME will be included in the workshop. Participants will receive an evaluation copy of SPM as part of the registration. KNIME is open source.
Laptops are not required for this course, but is recommended to view the course slides and take notes. Additionally, all participants who would like to experiment with ensembles during the demonstrations may do so with the software provided.
View Dean Abbott describing the course in this brief video:
- Software installation (if not already installed): 8:30am
- Workshop program starts at 9:00am
- Morning Coffee Break at 10:30am – 11:00am
- Lunch provided at 12:30pm – 1:15pm
- Afternoon Coffee Break at 3:00pm – 3:30pm
- End of the Workshop: 4:30pm
Coffee breaks and lunch are included.
Dean Abbott, President, Abbott Analytics
Dean Abbott is Co-Founder and Chief Data Scientist of SmarterHQ, and President of Abbott Analytics in San Diego, California. Mr. Abbott is an internationally recognized data mining and predictive analytics expert with over three decades of experience applying advanced data mining algorithms, data preparation techniques, and data visualization methods to real-world problems, including fraud detection, risk modeling, text mining, personality assessment, response modeling, survey analysis, planned giving, and predictive toxicology.
Mr. Abbott is the author of Applied Predictive Analytics (Wiley, 2014) and co-author of IBM SPSS Modeler Cookbook (Packt Publishing, 2013). He is a highly-regarded and popular speaker at Predictive Analytics and Data Mining conferences and meetups, and is on the Advisory Boards for the UC/Irvine Predictive Analytics Certificate as well as the UCSD Data Mining Certificate programs.
He has a B.S. in Mathematics of Computation from Rensselaer (1985) and a Master of Applied Mathematics from the University of Virginia (1987).