Skip to content

About us

🖖 This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of computer vision and complementary technologies. Every month we’ll bring you two diverse speakers working at the cutting edge of computer vision.

  • Are you interested in speaking at a future Meetup?
  • Is your company interested in sponsoring a Meetup?

Contact the Meetup organizers!

This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: https://github.com/voxel51/fiftyone

📣 Past Speakers

* Sage Elliott at Union.ai
* Michael Wornow at Microsoft
* Argo Saakyan at Veryfi
* Justin Trugman at Softwaretesting.ai
* Johannes Flotzinger at UniversitĂ€t der Bundeswehr MĂŒnchen
* Harpreet Sahota at Deci,ai
* Nora Gourmelon at Friedrich-Alexander-UniversitĂ€t Erlangen-NĂŒrnberg
* Reid Pryzant at Microsoft
* David Mezzetti at NeuML
* Chaitanya Mitash at Amazon Robotics
* Fan Wang at Amazon Robotics
* Mani Nambi at Amazon Robotics
* Joy Timmermans at Secury360
* Eduardo Alvarez at Intel
* Minye Wu at KU Leuven
* Jizhizi Li at University of Sydney
* Raz Petel at SightX
* Karttikeya Mangalam at UC Berkeley
* Dolev Ofri-Amar at Weizmann Institute of Science
* Roushanak Rahmat, PhD
* Folefac Martins
* Zhixi Cai at Monash University
* Filip Haltmayer at Zilliz
* Stephanie Fu at MIT
* Shobhita Sundaram at MIT
* Netanel Tamir at Weizmann Institute of Science
* Glenn Jocher at Ultralytics
* Michal Geyer at Weizmann Institute of Science
* Narek Tumanya at Weizmann Institute of Science
* Jerome Pasquero at Sama
* Eric Zimmermann at Sama
* Victor Anton at Wildlife.ai
* Shashwat Srivastava at Opendoor
* Eugene Khvedchenia at Deci.ai
* Hila Chefer at Tel-Aviv University
* Zhuo Wu at Intel
* Chuan Guo at University of Alberta
* Dhruv Batra Meta & Georgia Tech
* Benjamin Lahner at MIT
* Jiajing Chen at Syracuse University
* Soumik Rakshit at Weights & Biases
* Jiajing Chen at Syracuse University
* Paula Ramos, PhD at Intel
* Vishal Rajput at Skybase
* Cameron Wolfe at Alegion/Rice University
* Julien Simon at Hugging Face
* Kris Kitani at Carnegie Mellon University
* Anna Kogan at OpenCV.ai
* Kacper Ɓukawski at Qdrant
* Sri Anumakonda
* Tarik Hammadou at NVIDIA
* Zain Hasan at Weaviate
* Jai Chopra at LanceDB
* Sven Dickinson at University of Toronto & Samsung
* Nalini Singh at MIT

📚 Resources

* YouTube Playlist of previous Meetups
* Recap blogs including Q&A and speaker resource links

Sponsors

Voxel51

Voxel51

Administration, promotion, giveaways and charitable contributions.

Voxel51

Voxel51

Administration, promotion, giveaways and charitable contributions.

Upcoming events

12

See all
  • Network event
    May 13 - Best of 3DV 2026

    May 13 - Best of 3DV 2026

    ·
    Online
    Online
    30 attendees from 16 groups

    Welcome to the Best of 3DV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

    Date, Time and Location

    May 13, 2026
    9AM Pacific
    Online.
    Register for Zoom!

    Material selection in 2D and beyond - methods, tricks and applications

    In this talk, we'll explore image understanding from a material-centric perspective, namely through the lens of material understanding. Materials distinguish themselves by their response to light, which is governed and modelled through physical properties like roughness or gloss - however, understanding such properties is a non-trivial task for current models and network architectures. We'll see how we can select materials similar to a given query material, significantly improve selection fidelity and eventually even venture beyond 2D, to enable selection in the 3D domain.

    About the Speaker

    Michael Fischer is a research scientist at Adobe research London. He obtained his PhD from University College London (UCL), advised by Niloy Mitra and Tobias Ritschel. Michael has authored several top-tier publications (CVPR, ICCV, SIGGRAPH, ...) and is a recipient of both the Meta PhD scholarship and the Rabin Ezra scholarship. His research interests focus on image- and scene-understanding, material perception, selection and editing and efficient optimization.

    Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers

    This paper presents LAPA (Look Around and Pay Attention), a novel end-to-end transformer-based architecture for multi-camera point tracking that integrates appearance-based matching with geometric constraints. Traditional pipelines decouple detection, association, and tracking, leading to error propagation and temporal inconsistency in challenging scenarios. LAPA addresses these limitations by leveraging attention mechanisms to jointly reason across views and time, establishing soft correspondences through a cross-view attention mechanism enhanced with geometric priors. Instead of relying on classical triangulation, we construct 3D point representations via attention-weighted aggregation, inherently accommodating uncertainty and partial observations.

    About the Speaker

    Bishoy Galoaa is a PhD Student in Northeastern University

    Gaussian Wardrobe: Compositional 3D Gaussian Avatars for Free-Form Virtual Try-On

    We introduce Gaussian Wardrobe, a novel framework to digitalize compositional 3D neural avatars from multi-view videos. Existing methods for 3D neural avatars typically treat the human body and clothing as an inseparable entity, which fails to capture the dynamics of complex free-form garments and limits the reuse of clothing across different subjects. To overcome these problems, our method decomposes neural avatars into bodies and layers of shape-agnostic neural garments. Our framework learns the geometry and deformations of each garment layer from multi-view videos and normalizes them into a shape-independent space using 3D Gaussians. We demonstrate that these compositional garments contribute to a versatile digital wardrobe, enabling a practical 3D virtual try-on application where clothing can be freely transferred to new subjects.

    About the Speaker

    Hsuan-I Ho obtained his doctoral degree from ETH Zurich, supervised by Prof. Otmar Hilliges and Prof. Marc Pollefeys. His research focuses on human-centric machine perception, including 3D human reconstruction, human modeling, and pose estimation. The goal is to push the boundary of human and machine interaction on future human-centric reasoning and physical AI.

    Consistency Models for 3D Point Cloud Generation

    ConTiCoM-3D is a new method for creating 3D point clouds. It works directly with 3D points and can generate shapes very quickly in only one or two steps. Unlike many older methods, it does not need a separate teacher model or a complex latent space. Tests on typical benchmarks show that it can produce high-quality 3D shapes while being faster than many existing approaches.

    About the Speaker

    Sebastian Eilermann is a PhD student specialising in 3D generative AI. My research focuses on developing advanced methods for creating and understanding three-dimensional content. I explore the intersection of machine learning, computer vision and generative modelling to enable the generation of more realistic and efficient 3D assets.

    • Photo of the user
    • Photo of the user
    3 attendees from this group
  • Network event
    May 14 - AI, ML and Computer Vision Meetup

    May 14 - AI, ML and Computer Vision Meetup

    ·
    Online
    Online
    41 attendees from 16 groups

    Join our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

    Day, Time, Location

    May 14, 2026
    9:00-11:00 AM Pacific
    Online.
    Register for the Zoom!

    Concept-Aware Batch Sampling Improves Language-Image Pretraining

    What data should a vision-language model be trained on, and who gets to decide what “good data” even means? Most existing curation pipelines are limited because they are offline (they produce a static dataset from a set of predetermined filtering criteria) and concept-agnostic (they rely on model-based scores that can silently introduce new biases in what concepts the model sees). In this talk, I will discuss our new work CABS that tackles both these problems with large-scale sample-level concept annotations and flexible online batch sampling.

    First, we construct DataConcept, a 128M web-crawled image–text collection annotated with fine-grained concept composition, and show how this enables Concept-Aware Batch Sampling (CABS)—a simple online method that constructs training batches on-the-fly to match target concept distributions. We develop two variants, CABS-DM for maximizing concept coverage and CABS-FM for prioritizing high object multiplicity, and demonstrate consistent gains for CLIP/SigLIP-style models across 28 benchmarks.

    Finally, I’ll show that these improvements translate into strong vision encoders for training generative multimodal models, including autoregressive systems like LLaVA, where the encoder quality materially affects downstream capability.

    About the Speaker

    Adhiraj Ghosh is a first year ELLIS PhD student, working with Matthias Bethge at The University of TĂŒbingen. He completed his undergraduate degree in Electrical and Electronics Engineering jointly at the Manipal Institute of Technology and SMU Singapore from 2016 to 2020, and his masters in Machine Learning at The University of TĂŒbingen from 2022 to 2024.

    Do Your Agents Actually Work? Measuring Skills and MCP in Practice

    This talk shows how to evaluate agent performance in real scenarios using FiftyOne Skills and MCP. We will cover practical ways to design scenarios, run agents, and measure how they use tools, including signals like latency, token usage, and output quality. The goal is to move beyond final outputs and better understand agent behavior, helping teams build more reliable and measurable agent systems.

    About the Speaker

    Adonai Vera - Machine Learning Engineer & DevRel at Voxel51. With over 7 years of experience building computer vision and machine learning models using TensorFlow, Docker, and OpenCV.

    The last mile of OCR [in 2026]

    OCR is nailing it in benchmarks but the real work lies in The Long Tail of IDP. Large tables, old scans, mixed-language docs, handwriting, complex layouts is where most enterprise and real-world document work happens. This is where the best benchmarked models still struggle. In this talk, we will go through how LandingAI’s Agentic Document Extraction (ADE) goes beyond OCR and Parsing to enable real-world document AI use cases and workloads.

    We'll cover:

    • The pillars of Agentic Document Extraction
    • Building document processing pipelines with ADE API/SDK
    • Using Skills to have Coding Agents build for you
    • How ADE gives LLMs the last mile - Analysing LLM performance on large table, scanned docs, complex layouts and enabling them with the structured output from ADE

    About the Speaker

    Ankit Khare has been building Developer Relations function at high-growth startups like Rockset (a world-class retrieval system, later acquired by OpenAI), Twelve Labs (a video intelligence startup backed by Index Ventures, Radical Ventures, and NEA), and Abacus.AI (an AI Super Assistant backed by Index Ventures, Eric Schmidt, and Ram Shriram). Before that, he was an AI engineer at third insight and an AI researcher at the LEARN Lab at UT-Arlington, working on visual scene understanding and image captioning agents.

    The Energy Layer of AI: Powering the Next Wave of Inference

    The talk explores how inference cost fundamentally ties to energy at scale, especially as the AI industry shifts toward always-on, agent-driven workloads, and the focus moved from training to inference economics. Medi will share lessons and observations from his team's R&D efforts in making AI workloads grid-aware, energy-intelligent, and dynamically optimized in real time.

    About the Speaker

    Medi Naseri is the Founder and CEO of LƍD Technologies, where he leads the development of energy-intelligent infrastructure for flexible data centers and the broader compute ecosystem.
    With a PhD in Electrical Engineering specializing in control and power systems, Medi brings deep technical expertise to the challenge of scaling AI within real-time grid constraints.

    • Photo of the user
    1 attendee from this group
  • Network event
    May 20 - Getting Started with FiftyOne

    May 20 - Getting Started with FiftyOne

    ·
    Online
    Online
    6 attendees from 16 groups

    This workshop provides a technical foundation for managing large scale computer vision datasets. You will learn to curate, visualize, and evaluate models using the open source FiftyOne app.

    Date, Time and Location

    May 20, 2026
    10 AM PST - 11 AM Pacific
    Online. Register for the Zoom!

    The session covers data ingestion, embedding visualization, and model failure analysis. You will build workflows to identify dataset bias, find annotation errors, and select informative samples for training. Attendees leave with a framework for data centric AI for research and production pipelines, prioritizing data quality over pure model iteration.

    What you'll learn

    • Structure unstructured data. Map data and metadata into a queryable schema for images, videos, and point clouds.
    • Query datasets with the FiftyOne SDK. Create complex views based on model predictions, labels, and custom tags. Use the FiftyOne to filter data based on logical conditions and confidence scores.
    • Visualize high dimensional embeddings. Project features into lower dimensions to find clusters of similar samples. Identify data gaps and outliers using FiftyOne Brain.
    • Automate data curation. Implement algorithmic measures to select diverse subsets for training. Reduce labeling costs by prioritizing high entropy samples.
    • Debug model performance. Run evaluation routines to generate confusion matrices and precision recall curves. Visualize false positives and false negatives directly in the App to understand model failures.
    • Customize FiftyOne. Build custom dashboards and interactive panels. Create specialized views for domain specific tasks.

    Prerequisites:

    • Working knowledge of Python and machine learning and/or computer vision fundamentals.
    • All attendees will get access to the tutorials and code examples used in the workshop.
  • Network event
    May 21 - Women in AI Meetup

    May 21 - Women in AI Meetup

    ·
    Online
    Online
    46 attendees from 16 groups

    Hear talks from experts on the latest topics in AI, ML, and computer vision on May 21.

    Date, Time and Location

    May 21, 2026
    9 - 11 AM pacific
    Online.
    Register for the Zoom!

    Beyond Models: LLM-Guided Reinforcement Learning for Real-World Wireless Systems

    Reinforcement learning agents often perform well in simulation but break down when deployed in real, non-stationary, constraint-driven environments such as wireless systems. This work explores using large language models not as annotators or reward hacks, but as a reasoning layer that guides RL decision-making with domain logic, scenario interpretation, and adaptive constraints.

    We present an architecture where the LLM provides structured, high-level advisory signals while the RL policy remains the final action authority to avoid hallucination-driven failures. Early experiments show that this hybrid setup improves robustness under distribution shifts and complex constraint scenarios where standard RL collapses. The goal is not to replace RL with LLMs, but to combine learning and reasoning into a more deployable control-intelligence framework.

    About the Speaker

    Fatemeh Lotfi is a Ph.D. researcher focused on integrating large language models and reinforcement learning for adaptive wireless control systems. Her work targets the limitations of classical RL under real-world uncertainty by introducing reasoning-driven guidance mechanisms using LLMs. She has contributed to multiple AI-for-infrastructure projects, including advanced O-RAN automation.

    Responsible and Ethical AI in Healthcare: Building Trustworthy and Inclusive Intelligent Systems

    In this session, I will discuss how Responsible AI principles: including fairness, transparency, accountability, and reliability can be practically embedded into healthcare AI systems. Key discussion points will include:

    • Addressing bias and equity challenges in healthcare datasets and model training.
    • Building explainable and interpretable AI to strengthen clinician trust and adoption.
    • Ensuring ethical deployment of generative AI models within regulated healthcare environments.
    • Establishing governance frameworks for data privacy, model monitoring, and regulatory compliance.

    About the Speaker

    Jahnavi Kachhia is the Global Product Owner, AI & ML at Abbott, leading large-scale AI initiatives for the FreeStyle Libre platform to enhance clinical decision-making and patient outcomes. Previously at Meta’s Reality Labs, she advanced AR/VR innovation and LLM-based intelligent systems. An active contributor to the AI research community, she serves on the IJCAI 2025 Program Committee and reviews for AAAI, IJCNN, and IEEE conferences.

    AI Applications in Drug Repurposing

    Drug repurposing is increasingly important because it offers a faster, lower-cost path to therapeutic discovery compared to de novo drug development, especially in oncology where many cancers still lack effective targeted options. In under-studied cancers such as endometrial cancer, the challenge is often a lack of large, high-quality clinical or response datasets, making purely data-dependent approaches difficult to scale reliably. This motivates combining data-independent strategies (e.g., pathway- and mechanism-driven modeling) with data-dependent learning when interaction evidence is available. A practical and scalable direction is drug–target interaction (DTI) prediction, where AI models can leverage molecular and protein representations to prioritize mechanistically plausible drug candidates for repurposing.

    About the Speaker

    Madhurima Mondal's academic journey has been shaped by strong foundations in mathematical and scientific problem-solving, including multiple national-level achievements such as Regional Mathematics Olympiad (RMO), NTSE, and the KVPY fellowship. She completed my B.Tech and M.Tech in Electronics & Electrical Communication Engineering from IIT Kharagpur, and I am currently a PhD candidate in Electrical & Computer Engineering at Texas A&M University,

    Mapping to Belonging: How Ethically Governed AI Can Make Real Places More Accessible, Legible, and Human

    Can AI help people belong in the places where they live, work, travel, and get together?

    This talk explores that question through real-world work at the intersection of accessibility, computer vision mapping, civic data, and ethically governed AI. I will show how AI can support the collection and interpretation of pedestrian accessibility data, reduce the burden of documenting barriers, and help transform lived experience into structured information that can be used across routing tools, planning systems, and public decision-making. I will also argue that public-interest AI only works when it is governed well. In accessibility work, the risks are clear: over-averaging, hidden bias, false completeness, and systems that optimize for efficiency while overlooking the people most affected by missing or poor-quality data. Ethically governed AI must therefore be designed to preserve local context, support transparency, include community participation, and make room for experiences that conventional systems often ignore.

    About the Speaker

    Anat Caspi is Director of the Taskar Center for Accessible Technology at the University of Washington, where she leads research and public-interest technology efforts focused on accessibility, mobility, and inclusive transportation data.

    • Photo of the user
    1 attendee from this group

Group links

Organizers

Dave M. and 1 other are Super Organizers

Members

235
See all