Skip to main content Skip to secondary navigation

Diffusion & Large Vision Models Workshop

Main content start

This workshop explores the evolution of computer vision from early classification models to modern generative systems powered by diffusion and large vision models. Through a mix of theory and practical insights, learners will understand how these models work and how they’re applied in real-world scenarios.

  • Overview of key milestones in computer vision, from CNNs to Vision Transformers.
  • Introduction to multimodal learning with models like CLIP that connect vision and language.
  • Deep dive into generative models: autoencoders, GANs, and diffusion models.
  • Controllability and practical applications: inpainting, segmentation, text-to-image, and video generation.

By the end, participants will gain a strong conceptual understanding of how large vision models are designed, how they generate and edit images, and how they are shaping the future of generative models. 

COURSE PREREQUISITES

  1. College-level Calculus and Linear Algebra.
  2. A basic understanding of Machine Learning concepts.
  3. Prior computer vision and deep learning knowledge is beneficial but not mandatory.

 

Workshop Event Details

   Date: Friday, August 15, 2025
   Time: 9 AM to 5 PM 
   Learning Format: In-Person
   Location: Stanford University Main Campus
   Credential: Certificate of Completion
   Tuition: $850.00 (Affiliates)
                  $850.00 (ICME Nonprofit Collaborators)
                  $1350.00 (Non-Affiliates)

The class will be limited to 40 people. A 10% discount will be applied to groups of 5 or more from the same company. For any questions, please contact Tanya Schornack, Program Manager of Affiliates & Career Services at ICME.

WORKSHOP INSTRUCTORS
 

Afshine Amidi
Adjunct Lecturer, ICME
Senior Machine Learning Scientist at Netflix

 

 

Shervine Amidi, ICME Alum
Adjunct Lecturer, ICME
Senior Software Engineer at Google
 

WORKSHOP OUTLINE

Part 1. Foundations of Vision Models and Representation Learning

  • Introduction to computer vision.
  • Key vision tasks: classification, generation, editing.
  • Image representations and Convolutional Neural Networks (CNNs).
  • Vision Transformers: architecture, advantages, and performance.
     

Part 2. Multimodal Embeddings and Generative Models

  • Contrastive learning and the foundation of multimodal understanding.
  • Generative Adversarial Networks (GANs): structure, training, and limitations.
  • Diffusion models: motivation, mechanism, and recent improvements.
  • Enhancing controllability and extending to video generation.
     

Part 3. Applications, Challenges, and Future Directions

  • Real-world use cases: segmentation, inpainting, text-to-image, and video synthesis.
  • Discussion on scalability, controllability, and interpretability.
  • Reframing the role of large vision models in the broader AI landscape.

FREQUENTLY ASKED QUESTIONS

  1. Why is registration limited? To enhance individual engagement and hands-on learning, our workshop is held in person with a limited number of participants giving priority to ICME affiliate members.
  2. How do I know if my company is an ICME affiliate? Click the link to learn if your company is an ICME affiliate: https://icme.stanford.edu/Affiliate-Program
  3. How do I pay for the workshop? After registering, you will receive an email asking for your payment.
  4. Do you offer group discounts? We offer a 10% group discount if an organization registers at least five people. For more information, please contact Tanya Schornack at tschorna@stanford.edu.
  5. When will I receive my certificate? Expect to receive it within one week after the workshop concludes.
  6. What is the refund policy? Refunds are processed on a case-by-case basis. For credit card payments, a 2.9% fee will be deducted from the total amount. To request a refund, please contact ICME at icme-contact@stanford.edu. ICME reserves the right to decline refund requests.
  7. Do I need to bring my own computer to the workshop? Yes, you will need to bring your own computer with a full charge, as outlets are limited. We'll send additional notifications closer to the event regarding accessing class materials and a website.
  8. Will lunch be provided during the workshop? Lunch will be provided for all participants during the workshop. If you have any dietary requirements, please let us know.
  9. Where will the workshop be held? The workshop is scheduled to take place on the Stanford main campus. Detailed information will be sent to you after registration.
  10. Where should I park? Visitor parking is available in designated Stanford campus areas. All visitor parking payments are contactless and managed through ParkMobile. Detailed information on the nearest parking lots will be sent to you after registration. Click here to purchase visitor parking.

COURSE CANCELLATIONS

ICME reserves the right to cancel a course for any reason. If ICME cancels a course, you will automatically be granted a full refund, including the registration fee and any course fees.