Data Centric Deep Learning
Learn to build, improve, and repair deep learning models with a data-centric approach. This course will put you in the shoes of a deep learning engineer, and simulate the real world challenge of improving data quality, building and testing deep learning models, and improving performance with a human-in-the-loop. Week by week, we will develop an understanding of the critical role of data in deep learning operations – from integration tests to deep learning tooling to iterative annotation. Learn the best practices for deep learning in the real world.
Course taught by expert instructors
Senior Manager at Apple and Instructor at Stanford University
Andrew Maas is co-founder and CEO of Pointable, a platform for metrics-driven development of RAG-LLM conversational agents. He previously led teams developing data-centric deep learning approaches at Apple and was a co-founder of Roam Analytics (acquired by Parexel) -- a natural language extraction platform for healthcare. Andrew earned a PhD in computer science from Stanford University, advised by Andrew Ng and Dan Jurafsky, where his work focused on large-scale deep learning for spoken and written language. Andrew also advises machine learning startups and teaches a graduate course on spoken language processing at Stanford.
PhD Scholar at Stanford
Mike Wu is currently a fifth year PhD student at Stanford University advised by Noah Goodman. His research spans the fields of inference algorithms, deep generative models, and unsupervised learning. Mike’s research has appeared in NeurIPS, ICLR, AISTATS, and other top ML conferences with two best paper awards and his work has been featured in the New York Times. Mike previously worked as a software engineer at an AI startup called Lattice Data, and as a research engineer at Meta’s applied machine learning group. Mike and Andrew designed and taught a new version of Stanford’s CS224S: Spoken Language Processing in 2022.
Learn and apply skills with real-world projects.
Students who want to learn the infrastructure and operations behind practical deep learning for real world applications.
Students who have taken the first two courses in the Uplimit ML foundations track.
Data scientists and research engineers looking for best practices in building and maintaining deep learning models.
- And students curious about the new data-centric approach to ML and AI.
Familiarity with Python, and comfortable with reading documentation for learning new tools. Uplimit's Python for Machine Learning course or equivalent.
Experience in basic machine learning and data science. Uplimit Introduction to Applied ML: Supervised Learning course or equivalent.
Basic web development with tools like Flask. Students do not need to be experts at building web applications.
Basic experience in deep learning, including using PyTorch. Uplimit Deep learning essentials, ML Coursera course, or equivalent.
Try these prep courses first
- How to inspect and improve data quality and annotation quality.
- How to identify and remove data anomalies or outliers.
- The types of annotation errors and their effects on model performance.
- Data analysis in NLP and computer vision.
- Simulations of annotation errors and a model evaluation framework.
- Annotation analysis for (1) a bounding-box task for object detection and (2) a text span task for entity recognition.
- Train deep learning models in two different modalities: text and images.
- To construct reproducible end-to-end machine learning workflows.
- To finetune small networks on top of foundation models in computer vision.
- Post-training processing (such as exporting, tracking, compression) of deep learning models for deployment.
- Best practices for continuous testing of deep learning models.
- Comfort with popular deep learning tools like Weights and Biases, ONNX, and FastAPI.
- Integration tests, regression tests, and directionality tests for model quality assurance.
- A MetaFlow pipeline that chains together training, evaluation, and deployment on a benchmark dataset of handwritten digits.
- The role of active learning and self-learning in a deep learning framework.
- How to use unlabeled data and model uncertainty to improve performance.
- Best practices for designing web applications with embedded ML models.
- Tools to identify which examples to prioritize for labeling.
- Tools to noisily label large batches of data quickly without a third party service.
- A lightweight web application in Flask that supports human-in-the-loop labeling.
- How to identify and handle distribution shift and adversarial examples.
- The different types of distribution shift in NLP and computer vision.
- Data augmentation techniques for model robustness.
- Leverage the implemented workflows to quickly retrain and deploy a model.
- Pipeline to handle the appearance of a new label class.
- Repair models in response to adversarial examples in a visual classification task with outlier image watermarks.
- Monitoring tools to track model performance and detect distribution shifts.
A course you'll actually complete. AI-powered learning that drives results.
Transform your learning programs with personalized learning. Real-time feedback, hints at just the right moment, and the support for learners when they need it, driving 15x engagement.
Live courses by leading experts
Our instructors are renowned experts in AI, data, engineering, product, and business. Deep dive through always-current live sessions and round-the-clock support.
Practice on the cutting edge
Accelerate your learning with projects that mirror the work done at industry-leading tech companies. Put your skills to the test and start applying them today.
Flexible schedule for busy professionals
We know you’re busy, so we made it flexible. Attend live events or review the materials at your own pace. Our course team and global community will support you every step of the way.
Each course comes with a certificate for learners to add to their resume.
15-20x engagement compared to async courses
Support & accountability
You are never alone, we provide support throughout the course.
Get reimbursed by your company
More than half of learners get their Courses and Memberships reimbursed by their company.
Hundreds of companies have dedicated L&D and education budgets that have covered the costs.
Course success stories
Learn together and share experiences with other industry professionals
This course is incredibly important and useful! I believe it should be required in any data-science curriculum. We gained practical skills to tackle problems that data scientists and machine learning engineers often face when dealing with real-world messy data. I learned so much more than the course material due to the encouragement and guidance of Mike Wu!
DCDL has taken my experience with ML from modeling datasets in Colab notebooks to working in a full ML system in a codebase. We touched upon the full lifecycle of ML — from annotating and cleaning data, to model training, to evaluation and testing, deployment, and monitoring. What an incredibly insightful 4 weeks of learning!
This final course in the ML track series provided a realistic framework bridging the concepts we have covered in all 3 classes into a more productionalized format. This course has given a real insight into what a real ML backend may look like and the steps required to get there.