AI & ML

Multimodal AI with Gemini & GPT-4V

Build AI that sees, reads, and understands

Instructor Shweta Deshpande

Duration 2 hours

Platform Live on Zoom

Level Advanced

Spots Available 10

About This Workshop

Build multimodal applications that process images, text, audio, and video together. Use Gemini Pro Vision, GPT-4V, and CLIP. Build an AI that reads handwritten notes and generates reports.

About Shweta Deshpande

Applied AI Researcher at Google India. Worked on Gemini multimodal capabilities for Indian languages and scripts.

₹99 Enroll Now