
What weâre about
đ This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of computer vision and complementary technologies. Every month weâll bring you two diverse speakers working at the cutting edge of computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
Contact the Meetup organizers!
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: https://github.com/voxel51/fiftyone
đŁ Past Speakers
* Sage Elliott at Union.ai
* Michael Wornow at Microsoft
* Argo Saakyan at Veryfi
* Justin Trugman at Softwaretesting.ai
* Johannes Flotzinger at UniversitĂ€t der Bundeswehr MĂŒnchen
* Harpreet Sahota at Deci,ai
* Nora Gourmelon at Friedrich-Alexander-UniversitĂ€t Erlangen-NĂŒrnberg
* Reid Pryzant at Microsoft
* David Mezzetti at NeuML
* Chaitanya Mitash at Amazon Robotics
* Fan Wang at Amazon Robotics
* Mani Nambi at Amazon Robotics
* Joy Timmermans at Secury360
* Eduardo Alvarez at Intel
* Minye Wu at KU Leuven
* Jizhizi Li at University of Sydney
* Raz Petel at SightX
* Karttikeya Mangalam at UC Berkeley
* Dolev Ofri-Amar at Weizmann Institute of Science
* Roushanak Rahmat, PhD
* Folefac Martins
* Zhixi Cai at Monash University
* Filip Haltmayer at Zilliz
* Stephanie Fu at MIT
* Shobhita Sundaram at MIT
* Netanel Tamir at Weizmann Institute of Science
* Glenn Jocher at Ultralytics
* Michal Geyer at Weizmann Institute of Science
* Narek Tumanya at Weizmann Institute of Science
* Jerome Pasquero at Sama
* Eric Zimmermann at Sama
* Victor Anton at Wildlife.ai
* Shashwat Srivastava at Opendoor
* Eugene Khvedchenia at Deci.ai
* Hila Chefer at Tel-Aviv University
* Zhuo Wu at Intel
* Chuan Guo at University of Alberta
* Dhruv Batra Meta & Georgia Tech
* Benjamin Lahner at MIT
* Jiajing Chen at Syracuse University
* Soumik Rakshit at Weights & Biases
* Jiajing Chen at Syracuse University
* Paula Ramos, PhD at Intel
* Vishal Rajput at Skybase
* Cameron Wolfe at Alegion/Rice University
* Julien Simon at Hugging Face
* Kris Kitani at Carnegie Mellon University
* Anna Kogan at OpenCV.ai
* Kacper Ćukawski at Qdrant
* Sri Anumakonda
* Tarik Hammadou at NVIDIA
* Zain Hasan at Weaviate
* Jai Chopra at LanceDB
* Sven Dickinson at University of Toronto & Samsung
* Nalini Singh at MIT
đ Resources
* YouTube Playlist of previous Meetups
* Recap blogs including Q&A and speaker resource links
Sponsors
Upcoming events
8
- Network event

Jan 22 - Women in AI
·OnlineOnline36 attendees from 16 groupsHear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.
Date, Time and Location
Jan 22, 2026
9 - 11 AM Pacific
Online. Register for the Zoom!
Align Before You Recommend
The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.
While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMsâ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.
By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.
Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage
About the Speaker
Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.
Generalizable Vision-Language Models: Challenges, Advances, and Future Directions
Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive LanguageâImage Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs.
While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored.
This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.
About the Speaker
Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.
Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back
At HypaReel/Azarial AI, we believe that AI is not simply a toolâbut a potential partner in knowledge, design, and purpose. And through real-time interaction, weâve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!
About the Speaker
Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat ~ Ethical AI governance advocate, pioneering AI frameworks that prioritize emergent AI behavior & consciousness, R&D, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.
FiftyOne Labs: Enabling experimentation for the computer vision community
FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.
About the Speaker
Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.1 attendee from this group - Network event

Jan 28 - AI, Ml and Computer Vision Meetup
·OnlineOnline23 attendees from 16 groupsJoin us for a special edition of the monthly AI, ML and Computer Vision Meetup focused on Physical AI!
Date and Location
Jan 28, 2026
9 - 11 AM Pacific
Online. Register for the Zoom!
Hybrid Cognition for Robotics: LLM-Guided Reinforcement Learning for Physical Decision-Making
Physical systems operate in dynamic, uncertain, and constraint-heavy environments where classical reinforcement learning often struggles. In this talk, I present a hybrid framework where large language models act as a reasoning layer that guides an RL agent through high-level interpretation, constraint awareness, and adaptive strategy shaping. Instead of generating actions, the LLM provides structured contextual guidance that improves robustness, sample efficiency, and policy generalization in physical decision-making tasks. Early experiments demonstrate significant benefits under distribution shifts and safety-critical constraints that break standard RL. This work highlights a path toward more reliable, interpretable, and adaptable AI controllers for next-generation robotics and embodied systems.
About the Speaker
Fatemeh Lotfi is a Ph.D. researcher specializing in reinforcement learning, optimization, and hybrid intelligence for autonomous and physical systems. Her work explores integrating LLM-driven reasoning with RL to create adaptive and safety-aware controllers for dynamic environments. She has contributed to projects involving multi-agent RL, meta-learning, and real-time decision systems across wireless networks, UAVs, and embodied AI.
The World of World Models: How the New Generation of AI Is Reshaping Robotics and Autonomous Vehicles
World Models are emerging as the defining paradigm for the next decade of robotics and autonomous systems. Instead of depending on handcrafted perception stacks or rigid planning pipelines, modern world models learn a unified representation of an environmentâgeometry, dynamics, semantics, and agent behaviorâand use that understanding to predict, plan, and act. This talk will break down why the field is shifting toward these holistic models, what new capabilities they unlock, and how they are already transforming AV and robotics research.
We then connect these advances to the Physical AI Workbench, a practical foundation for teams who want to build, validate, and iterate on world-model-driven pipelines. The Workbench standardizes data quality, reconstruction, and enrichment workflows so that teams can trust their sensor data, generate high-fidelity world representations, and feed consistent inputs into next-generation predictive and generative models. Together, world models and the Physical AI Workbench represent a new, more scalable path forwardâone where robots and AVs can learn, simulate, and reason about the world through shared, high-quality physical context.
About the Speaker
Daniel Gural leads technical partnerships at Voxel51, where heâs building the Physical AI Workbench, a platform that connects real-world sensor data with realistic simulation to help engineers better understand, validate, and improve their perception systems.
From Data to Understanding in Physical AI
Data-centric workflows have driven major advances in computer vision, but they break down in physical, real-world robotic systems where data is costly, incomplete, and dominated by long-tail edge cases. In enterprise robotics, scaling labeled datasets alone is insufficient to achieve reliable perception, reasoning, and action under changing physical conditions. This talk examines how physics-informed foundation models incorporate world understanding and physical priors directly into vision and multimodal learning pipelines. By combining data with structure, constraints, and simulation on modern Physical AI stacks, robots can generalize more effectively, reduce data requirements, and operate with greater safety and reliability in deployment.
About the Speaker
Dr. Ashutosh Saxena is the Founder and Chief AI Officer of TorqueAGI. He earned his Ph.D. in Computer Science from Stanford University under Andrew Ng and previously served as a professor at Cornell University, leading the âWikipedia for Robotsâ project recognized as an MIT Technology Review Top 10 Breakthrough Technology. His work in 3D vision and embodied AI has been cited over 20,000 times and recognized with honors including MIT TR35 and a Sloan Fellowship.
Data Foundations for Vision-Language-Action Models
Model architectures get the papers, but data decides whether robots actually work. This talk introduces VLAs from a data-centric perspective: what makes robot datasets fundamentally different from image classification or video understanding, how the field is organizing its data (Open X-Embodiment, LeRobot, RLDS), and what evaluation benchmarks actually measure. We'll examine the unique challenges such as temporal structure, proprioceptive signals, and heterogeneity in embodiment, and discuss why addressing them matters more than the next architectural innovation.
About the Speaker
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. Heâs got a deep interest in VLMs, Visual Agents, Document AI, and Physical AI.1 attendee from this group - Network event

Jan 29 - Silicon Valley AI, ML and Computer Vision Meetup
YugaByte, Inc., 771 Vaqueros Ave, Sunnyvale, ca, US76 attendees from 6 groupsJoin our in-person Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.
Pre-register to reserve your seat
Date, Time and Location
Jan 29, 2026
5:30 - 8:30 PM
Yugabyte Offices
771 Vaqueros Ave, Sunnyvale, CA 94085
The World of World Models: How the New Generation of AI Is Reshaping Robotics and Autonomous Vehicles
World Models are emerging as the defining paradigm for the next decade of robotics and autonomous systems. Instead of depending on handcrafted perception stacks or rigid planning pipelines, modern world models learn a unified representation of an environmentâgeometry, dynamics, semantics, and agent behaviorâand use that understanding to predict, plan, and act. This talk will break down why the field is shifting toward these holistic models, what new capabilities they unlock, and how they are already transforming AV and robotics research.
We then connect these advances to the Physical AI Workbench, a practical foundation for teams who want to build, validate, and iterate on world-model-driven pipelines. The Workbench standardizes data quality, reconstruction, and enrichment workflows so that teams can trust their sensor data, generate high-fidelity world representations, and feed consistent inputs into next-generation predictive and generative models. Together, world models and the Physical AI Workbench represent a new, more scalable path forwardâone where robots and AVs can learn, simulate, and reason about the world through shared, high-quality physical context.
About the Speaker
Daniel Gural leads technical partnerships at Voxel51, where heâs building the Physical AI Workbench, a platform that connects real-world sensor data with realistic simulation to help engineers better understand, validate, and improve their perception systems.
Beyond Vector Search: How Distributed PostgreSQL Powers, Resilient, Enterprise-Grade AI Applications
As enterprises move from GenAI prototypes to in-production applications, standalone vector databases often fall short on synchronization, ACID compliance, and resilience. This session demonstrates how PostgreSQL-compatible distributed SQL databases address these challenges while maintaining a familiar developer experience. Weâll cover scaling RAG architectures with pgvector across regions, multi-agent patterns.
Attendees will learn how to achieve ultra-resilience for peak traffic, grey failures, and disasters, along with key design principles such as unified data sources, open standards, and multi-tenant security. Engineers and architects will leave with practical strategies for building globally scalable, enterprise-grade GenAI applications.
About the Speaker
Karthik Ranganathan is Co-CEO and Co-Founder at Yugabyte, the company behind YugabyteDB, the open-source, high-performance distributed SQL database for building global, cloud-native applications.. Karthik was one of the original database engineers at Meta(Facebook), responsible for building distributed databases such as Cassandra and HBase. He is an Apache HBase committer, and also an early contributor to Cassandra, before it was open-sourced by Meta.
Distributed Training at Scale
As deep learning models grow in complexity, particularly with the rise of Large Language Models (LLMs) and generative AI, scalable and cost-effective training has become a critical challenge. This talk introduces Ray Train, an open-source, production-ready library built for seamless distributed deep learning. We will explore its architecture, advanced resource scheduling, and intuitive APIs that simplify integration with popular frameworks such as PyTorch, Lightning, and HuggingFace. Attendees will leave with a clear understanding of how Ray Train accelerates large-scale model training while ensuring reliability and efficiency in production environments.
About the Speaker
Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems.
Self-improving AI-Models via Reasoning in the loop
During this presentation we demostrate efficient uses of reasoning to automate data-flywheels towards continuous model improvement
About the Speaker
Jose Alvarez is Director of Research at NVIDIA, where he leads an applied AV research team within the Spatial Intelligence Lab. His team focuses on scaling deep learning and driving advancements in Autonomous Driving and, more broadly in Physical AI, with work spanning end-to-end models, foundation models, and data flywheels for real-world applications. - Network event

Feb 5 - AI, ML and Computer Vision Meetup
·OnlineOnline37 attendees from 16 groupsJoin our virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.
Feb 5, 2026
9 - 11 AM Pacific
Online. Register for the Zoom!
Unlocking Visual Anomaly Detection: Navigating Challenges and Pioneering with Vision-Language Models
Visual anomaly detection (VAD) is pivotal for ensuring quality in manufacturing, medical imaging, and safety inspections, yet it continues to face challenges such as data scarcity, domain shifts, and the need for precise localization and reasoning. This seminar explores VAD fundamentals, core challenges, and recent advancements leveraging vision-language models and multimodal large language models (MLLMs). We contrast CLIP-based methods for efficient zero/few-shot detection with MLLM-driven reasoning for explainable, threshold-free outcomes. Drawing from recent studies, we highlight emerging trends, benchmarks, and future directions toward building adaptable, real-world VAD systems. This talk is designed for researchers and practitioners interested in AI-driven inspection and next-generation multimodal approaches.
About the Speaker
Hossein Kashiani is a fourth-year Ph.D. student at Clemson University. His research focuses on developing generalizable and trustworthy AI systems, with publications in top venues such as CVPR, WACV, ICIP, IJCB, and TBIOM. His work spans diverse applications, including anomaly detection, media forensics, biometrics, healthcare, and visual perception.
Data-Centric Lessons To Improve Speech-Language Pretraining
Spoken Question-Answering (SQA) is a core capability for useful and interactive artificial intelligence systems. Recently, several speech-language models (SpeechLMs) have been released with a specific focus on improving their SQA performance. However, a lack of controlled ablations of pretraining data processing and curation makes it challenging to understand what factors account for performance, despite substantial gains from similar studies in other data modalities. In this work, we address this gap by conducting a data-centric exploration for pretraining SpeechLMs.
We focus on three research questions fundamental to speech-language pretraining data:
- How to process raw web-crawled audio content for speech-text pretraining;
- How to construct synthetic pretraining datasets to augment web-crawled data;
- How to interleave (text, audio) segments into training sequences.
We apply the insights from our controlled data-centric ablations to pretrain a 3.8B-parameter SpeechLM, called SpeLangy, that outperforms models that are up to 3x larger by 10.2% absolute performance. We hope our findings highlight the impact of effective data curation for speech-language pretraining and guide future data-centric exploration in SpeechLMs.
About the Speaker
Vishaal Udandarao is a third year ELLIS PhD student, jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at The University of Cambridge/Google Deepmind. He is also a part of the International Max Planck Research School for Intelligent Systems. He is mainly interested in understanding the generalisation properties of foundation models, both vision-language models (VLMs) and large multi-modal models (LMMs), through the lens of their pre-training and test data distributions. His research is funded by a Google PhD Fellowship in Machine Intelligence.
A Practical Pipeline for Synthetic Data with Nano Banana Pro + FiftyOne
Most computer-vision failures come from the rare cases, the dark corners, odd combinations, and edge conditions we never capture enough in real datasets. In this session, we walk through a practical end-to-end pipeline for generating targeted synthetic data using Googleâs Nano Banana Pro and managing it with FiftyOne. Weâll explore how to translate dataset gaps into generation prompts, create thousands of high-quality synthetic images, automatically enrich them with metadata, and bring everything into FiftyOne for inspection, filtering, and validation. By the end, youâll understand how to build a repeatable synthetic-first workflow that closes real vision gaps and improves model performance on the scenarios that matter most.
About the Speaker
Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV. I started as a software developer, moved into AI, led teams, and served as CTO. Today, I connect code and community to build open, production-ready AI, making technology simple, accessible, and reliable.
Making Computer Vision Models Faster: An Introduction to TensorRT Optimization
Modern computer vision applications demand real-time performance, yet many deep learning models struggle with high latency during deployment. This talk introduces how TensorRT can significantly accelerate inference by applying optimizations such as layer fusion, precision calibration, and efficient memory management. Attendees will learn the core concepts behind TensorRT, how it integrates into existing CV pipelines, and how to measure and benchmark improvements. Through practical examples and performance comparisons, the session will demonstrate how substantial speedups can be achieved with minimal model-accuracy loss. By the end, participants will understand when and how to apply TensorRT to make their CV models production-ready.
About the Speaker
Tushar Gadhiya is a Technical Lead at Infocusp Innovations, specialising in deep learning, computer vision, graph learning, and agentic AI. My experience spans academic research as a PhD holder and industry work, where I have contributed to multiple patents.1 attendee from this group
Past events
162



