Orr Zohar

Bio

I am a PhD student researching GenAI at Stanford University, where I am fortunate to be advised by Prof. Serena Yeung-Levy and funded by a Knight-Hennessy Scholarship.

My research primarily focuses on Large Multimodal Models, including model architectures, self-training methodologies, data generation, and evaluation strategies.

News

[2025/02]: Released SmolVLM2, a nano-scale LMM runnable on mobile devices.
[2025/01]: Video-STaR, was accepted to ICLR 2025!
[2024/07]: Published Apollo, a comprehensive study on video-language multimodal models.
[2024/06]: VideoAgent was accepted to ECCV 2024.

Publications

Most recent publications on Google Scholar.
^‡ indicates equal contribution.

Selected
AI
All

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Orr Zohar, Xiaohan Wang, Yann Dubois, Nikhil Mehta, Tong Xiao, Philippe Hansen-Estruch, Licheng Yu, Xiaofang Wang, Felix Juefei-Xu, Ning Zhang, Serena Yeung-Levy, and Xide Xia

CVPR 2025

project arxiv

@article{zohar2024apollo,
  title={Apollo: An Exploration of Video Understanding in Large Multimodal Models},
  author={Zohar, Orr and Wang, Xiaohan and Dubois, Yann and Mehta, Nikhil and Xiao, Tong and Hansen-Estruch, Philippe and Yu, Licheng and Wang, Xiaofang and Juefei-Xu, Felix and Zhang, Ning and Yeung-Levy, Serena and Xia, Xide},
  journal={arXiv preprint arXiv:2412.10360},
  year={2024}
}

SmolVLM: Redefining small and efficient multimodal models

Andres Marafioti*, Orr Zohar*, Miquel Farré*, Merve Noyan, Pedro Cuenca, Cyril Zakka, Loubna Ben Allal, Anton Lozhkov, Nouamane Tazi, Vaibhav Srivastav, Joshua Lochner, Hugo Larcher, Mathieu Morlon, Lewis Tunstall, Leandro von Werra, Thomas Wolf

arXiv (2025)

project arxiv

@article{marafioti2025smolvlm,
  title={SmolVLM: Redefining small and efficient multimodal models},
  author={Marafioti, Andr{\'e}s and Zohar, Orr and Farr{\'e}, Miquel and Noyan, Merve and Bakouch, Elie and Cuenca, Pedro and Zakka, Cyril and Allal, Loubna Ben and Lozhkov, Anton and Tazi, Nouamane and others},
  journal={arXiv preprint arXiv:2504.05299},
  year={2025}
}

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Orr Zohar, Xiaohan Wang, Yonatan Bitton, Idan Szpektor, Serena Yeung-Levy

ICLR (2025)

project arxiv code

@inproceedings{zohar2025videostar,
  title={Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision},
  author={Zohar, Orr and Wang, Xiaohan and Bitton, Yonatan and Szpektor, Idan and Yeung-Levy, Serena},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=JYV2hrtFSv}
}

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Xiaohan Wang*, Yuhui Zhang*, Orr Zohar, Serena Yeung-Levy

ECCV (2024)

project arxiv code

@article{wang2024videoagent,
  title={VideoAgent: Long-form Video Understanding with Large Language Model as Agent},
  author={Wang, Xiaohan and Zhang, Yuhui and Zohar, Orr and Yeung-Levy, Serena},
  journal={European Conference on Computer Vision (ECCV)},
  year={2024}
}

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Philippe Hansen-Estruch, David Yan, Ching-Yao Chung,Orr Zohar, Jialiang Wang, Tingbo Hou, Tao Xu, Sriram Vishwanath, Peter Vajda, Xinlei Chen

arXiv preprint (2025)

project arxiv

@article{hansenestruch2025vitok,
  title={Learnings from Scaling Visual Tokenizers for Reconstruction and Generation}, 
  author={Philippe Hansen-Estruch and David Yan and Ching-Yao Chung and Orr Zohar and Jialiang Wang and Tingbo Hou and Tao Xu and Sriram Vishwanath and Peter Vajda and Xinlei Chen},
  journal={arXiv preprint arXiv:2501.09755},
  year={2025}
}

LOVM: Language-Only Vision Model Selection

Orr Zohar, Shih-Cheng Huang, Kuan-Chieh Wang, Serena Yeung-Levy

NeurIPS (2023)

project proceeding arxiv code

@inproceedings{zohar2023lovm,
  title = {LOVM: Language-Only Vision Model Selection},
  author = {Zohar, Orr and Huang, Shih-Cheng and Wang, Kuan-Chieh and Yeung, Serena},
  year = {2023},
  booktitle = {Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
}

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Orr Zohar, Xiaohan Wang, Yann Dubois, Nikhil Mehta, Tong Xiao, Philippe Hansen-Estruch, Licheng Yu, Xiaofang Wang, Felix Juefei-Xu, Ning Zhang, Serena Yeung-Levy, and Xide Xia

CVPR 2025

project arxiv

@article{zohar2024apollo,
  title={Apollo: An Exploration of Video Understanding in Large Multimodal Models},
  author={Zohar, Orr and Wang, Xiaohan and Dubois, Yann and Mehta, Nikhil and Xiao, Tong and Hansen-Estruch, Philippe and Yu, Licheng and Wang, Xiaofang and Juefei-Xu, Felix and Zhang, Ning and Yeung-Levy, Serena and Xia, Xide},
  journal={arXiv preprint arXiv:2412.10360},
  year={2024}
}

SmolVLM: Redefining small and efficient multimodal models

arXiv (2025)

project arxiv

@article{marafioti2025smolvlm,
  title={SmolVLM: Redefining small and efficient multimodal models},
  author={Marafioti, Andr{\'e}s and Zohar, Orr and Farr{\'e}, Miquel and Noyan, Merve and Bakouch, Elie and Cuenca, Pedro and Zakka, Cyril and Allal, Loubna Ben and Lozhkov, Anton and Tazi, Nouamane and others},
  journal={arXiv preprint arXiv:2504.05299},
  year={2025}
}

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Orr Zohar, Xiaohan Wang, Yonatan Bitton, Idan Szpektor, Serena Yeung-Levy

ICLR (2025)

project arxiv code

@inproceedings{zohar2025videostar,
  title={Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision},
  author={Zohar, Orr and Wang, Xiaohan and Bitton, Yonatan and Szpektor, Idan and Yeung-Levy, Serena},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=JYV2hrtFSv}
}

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Xiaohan Wang*, Yuhui Zhang*, Orr Zohar, Serena Yeung-Levy

ECCV (2024)

project arxiv code

@article{wang2024videoagent,
  title={VideoAgent: Long-form Video Understanding with Large Language Model as Agent},
  author={Wang, Xiaohan and Zhang, Yuhui and Zohar, Orr and Yeung-Levy, Serena},
  journal={European Conference on Computer Vision (ECCV)},
  year={2024}
}

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Philippe Hansen-Estruch, David Yan, Ching-Yao Chung,Orr Zohar, Jialiang Wang, Tingbo Hou, Tao Xu, Sriram Vishwanath, Peter Vajda, Xinlei Chen

arXiv preprint (2025)

project arxiv

@article{hansenestruch2025vitok,
  title={Learnings from Scaling Visual Tokenizers for Reconstruction and Generation}, 
  author={Philippe Hansen-Estruch and David Yan and Ching-Yao Chung and Orr Zohar and Jialiang Wang and Tingbo Hou and Tao Xu and Sriram Vishwanath and Peter Vajda and Xinlei Chen},
  journal={arXiv preprint arXiv:2501.09755},
  year={2025}
}

Open World Object Detection in the Era of Foundation Models

Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung-Levy, Kuan-Chieh Wang

arXiv preprint (2023)

project arxiv code

@InProceedings{zohar2023fomo,
  title={Open World Object Detection in the Era of Foundation Models},
  author={Zohar, Orr and Lozano, Alejandro and Goel, Shelly and Yeung, Serena and Wang, Kuan-Chieh},
  year={2023},
  booktitle={arXiv preprint arXiv:2312.05745},
  arxiv={2312.05745}
}

LOVM: Language-Only Vision Model Selection

Orr Zohar, Shih-Cheng Huang, Kuan-Chieh Wang, Serena Yeung-Levy

NeurIPS (2023)

project proceeding arxiv code

@inproceedings{zohar2023lovm,
  title = {LOVM: Language-Only Vision Model Selection},
  author = {Zohar, Orr and Huang, Shih-Cheng and Wang, Kuan-Chieh and Yeung, Serena},
  year = {2023},
  booktitle = {Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
}

PROB: Probabilistic Objectness for Open World Object Detection

Orr Zohar, Kuan-Chieh Wang, Serena Yeung-Levy

CVPR (2023)

project proceeding arxiv code

@inproceedings{Zohar_2023_CVPR,
  title = {PROB: Probabilistic Objectness for Open World Object Detection},
  author = {Zohar, Orr and Wang, Kuan-Chieh and Yeung, Serena},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = jun,
  year = {2023},
  pages = {11444-11453}
}

Analyzing surgical technique in diverse open surgical videos with multitask machine learning

Emmett D Goodman, Krishna K Patel, Yilun Zhang, William Locke, Chris J Kennedy, Rohan Mehrotra, Stephen Ren, Melody Guan, Orr Zohar, Maren Downing, Hao Wei Chen, Jevin Z Clark, Margaret T Berrigan, Gabriel A Brat, Serena Yeung-Levy

JAMA Surgery (2023)

project

@article{goodman2022surgery,
  title = {Analyzing Surgical Technique in Diverse Open-Surgical Videos with Multi-Task Machine Learning},
  author = {Goodman, Emmett D. and Patel, Krishna K. and Zhang, Yilun and Locke, William and Kennedy, Chris J. and Mehrotra, Rohan and Ren, Stephen and Guan, Melody Y. and Zohar, Orr and Downing, Maren and Chen, Hao Wei and Clark, Jevin Z. and Brat, Gabriel A. and Yeung, Serena},
  journal = {JAMA Surgery},
  issn = {2168-6254},
  month = dec,
  year = {2023}
}

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Orr Zohar, Xiaohan Wang, Yann Dubois, Nikhil Mehta, Tong Xiao, Philippe Hansen-Estruch, Licheng Yu, Xiaofang Wang, Felix Juefei-Xu, Ning Zhang, Serena Yeung-Levy, and Xide Xia

CVPR 2025

project arxiv

@article{zohar2024apollo,
  title={Apollo: An Exploration of Video Understanding in Large Multimodal Models},
  author={Zohar, Orr and Wang, Xiaohan and Dubois, Yann and Mehta, Nikhil and Xiao, Tong and Hansen-Estruch, Philippe and Yu, Licheng and Wang, Xiaofang and Juefei-Xu, Felix and Zhang, Ning and Yeung-Levy, Serena and Xia, Xide},
  journal={arXiv preprint arXiv:2412.10360},
  year={2024}
}

SmolVLM: Redefining small and efficient multimodal models

arXiv (2025)

project arxiv

@article{marafioti2025smolvlm,
  title={SmolVLM: Redefining small and efficient multimodal models},
  author={Marafioti, Andr{\'e}s and Zohar, Orr and Farr{\'e}, Miquel and Noyan, Merve and Bakouch, Elie and Cuenca, Pedro and Zakka, Cyril and Allal, Loubna Ben and Lozhkov, Anton and Tazi, Nouamane and others},
  journal={arXiv preprint arXiv:2504.05299},
  year={2025}
}

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Orr Zohar, Xiaohan Wang, Yonatan Bitton, Idan Szpektor, Serena Yeung-Levy

ICLR (2025)

project arxiv code

@inproceedings{zohar2025videostar,
  title={Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision},
  author={Zohar, Orr and Wang, Xiaohan and Bitton, Yonatan and Szpektor, Idan and Yeung-Levy, Serena},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=JYV2hrtFSv}
}

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Xiaohan Wang*, Yuhui Zhang*, Orr Zohar, Serena Yeung-Levy

ECCV (2024)

project arxiv code

@article{wang2024videoagent,
  title={VideoAgent: Long-form Video Understanding with Large Language Model as Agent},
  author={Wang, Xiaohan and Zhang, Yuhui and Zohar, Orr and Yeung-Levy, Serena},
  journal={European Conference on Computer Vision (ECCV)},
  year={2024}
}

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Philippe Hansen-Estruch, David Yan, Ching-Yao Chung,Orr Zohar, Jialiang Wang, Tingbo Hou, Tao Xu, Sriram Vishwanath, Peter Vajda, Xinlei Chen

arXiv preprint (2025)

project arxiv

@article{hansenestruch2025vitok,
  title={Learnings from Scaling Visual Tokenizers for Reconstruction and Generation}, 
  author={Philippe Hansen-Estruch and David Yan and Ching-Yao Chung and Orr Zohar and Jialiang Wang and Tingbo Hou and Tao Xu and Sriram Vishwanath and Peter Vajda and Xinlei Chen},
  journal={arXiv preprint arXiv:2501.09755},
  year={2025}
}

Open World Object Detection in the Era of Foundation Models

Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung-Levy, Kuan-Chieh Wang

arXiv preprint (2023)

project arxiv code

@InProceedings{zohar2023fomo,
  title={Open World Object Detection in the Era of Foundation Models},
  author={Zohar, Orr and Lozano, Alejandro and Goel, Shelly and Yeung, Serena and Wang, Kuan-Chieh},
  year={2023},
  booktitle={arXiv preprint arXiv:2312.05745},
  arxiv={2312.05745}
}

LOVM: Language-Only Vision Model Selection

Orr Zohar, Shih-Cheng Huang, Kuan-Chieh Wang, Serena Yeung-Levy

NeurIPS (2023)

project proceeding arxiv code

@inproceedings{zohar2023lovm,
  title = {LOVM: Language-Only Vision Model Selection},
  author = {Zohar, Orr and Huang, Shih-Cheng and Wang, Kuan-Chieh and Yeung, Serena},
  year = {2023},
  booktitle = {Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
}

PROB: Probabilistic Objectness for Open World Object Detection

Orr Zohar, Kuan-Chieh Wang, Serena Yeung-Levy

CVPR (2023)

project proceeding arxiv code

@inproceedings{Zohar_2023_CVPR,
  title = {PROB: Probabilistic Objectness for Open World Object Detection},
  author = {Zohar, Orr and Wang, Kuan-Chieh and Yeung, Serena},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = jun,
  year = {2023},
  pages = {11444-11453}
}

Analyzing surgical technique in diverse open surgical videos with multitask machine learning

JAMA Surgery (2023)

project

@article{goodman2022surgery,
  title = {Analyzing Surgical Technique in Diverse Open-Surgical Videos with Multi-Task Machine Learning},
  author = {Goodman, Emmett D. and Patel, Krishna K. and Zhang, Yilun and Locke, William and Kennedy, Chris J. and Mehrotra, Rohan and Ren, Stephen and Guan, Melody Y. and Zohar, Orr and Downing, Maren and Chen, Hao Wei and Clark, Jevin Z. and Brat, Gabriel A. and Yeung, Serena},
  journal = {JAMA Surgery},
  issn = {2168-6254},
  month = dec,
  year = {2023}
}

Biointerfaced Sensors for Biodiagnostics

Orr Zohar*, Muhammad Khatib*, Rawan Omar, Rotem Vishinkin, Yoav Y Broza, Hossam Haick

View (2021)

Self-Healing Soft Sensors: From Material Design to Implementation

Muhammad Khatib, Orr Zohar, Hossam Haick

Advanced Materials (2021)

A Multifunctional Electronic Skin Empowered with Damage Mapping and Autonomic Acceleration of Self-Healing in Designated Locations

Muhammad Khatib, Orr Zohar, Walaa Saliba, Hossam Haick

Advanced Materials (2020)

Highly Efficient and Water-Insensitive Self-Healing Elastomer for Wet and Underwater Electronics

Muhammad Khatib, Orr Zohar, Walaa Saliba, Simcha Srebnik, Hossam Haick

Advanced Functional Materials (2020)

Angular Compounding for Speckle Reduction in Optical Coherence Tomography using Geometric Image Registration Algorithm and Digital Focusing

Jingjing Zhao, Yonatan Winetraub, Edwin Yuan, Warren H Chan, Sumaira Z Aasi, Kavita Y Sarin, Orr Zohar, Adam de la Zerda

Scientific Reports (2020)

Epitaxial Superconducting Tunnel Diodes for Light Detection Applications

Krishna Balasubramanian, John Wright, Orr Zohar, Boaz Taitler, Shlomi Bouscher, Huili Grace Xing, Debdeep Jena, Alex Hayat

Optical Materials Express (2020)

Photoresponse above 85 K of Selective Epitaxy Grown High-Tc Superconducting Microwires

Xinxi Xing, Krishna Balasubramanian, Shlomi Bouscher, Orr Zohar, Yuval Nitzav, Amit Kanigel, Alex Hayat

Applied Physics Letters (2020)

Vitæ

Full Resume in PDF.

Meta 2024 - 2025

Research Scientist, Intern
GenAI
Zohar Consulting Services 2023 - Now

President
AI Consulting
Stanford University 2021 - now

Ph.D. Student & Knight-Hennessy Scholar
MARVL - Medical AI and Computer Vision Lab
Stanford University 2021 - 2023

MSc. Student
Computer Science
proteanTecs LTD 2020 - 2024

Machine Learning & Algorithms Engineer
AI
Technion University 2019 – 2021

MSc. Student
Master of Electrical Engineering

Website Design, Acknowledgements

You can find all the code needed to build this website in my Github. Feel free to use it, but please link to here, as well as Martin Saveski and Nerfies, whose templates I adapted for the website. Licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.