I am an AI and 3D computer vision engineer with 4+ years of professional experience building perception systems across autonomous driving, mobile mapping, infrastructure inspection, multimodal sensing, and robotics.
I completed an M.Sc. in Artificial Intelligence with a Minor in Robotics at FAU Erlangen-Nürnberg. My work spans images, point clouds, video, and sound, with a focus on building perception systems that remain useful outside controlled laboratory conditions.
I have worked on semantic segmentation, stereo vision, structure from motion, LiDAR processing, NeRF-based reconstruction, multimodal learning, synthetic video evaluation, and robotic perception. I am especially interested in the connection between 3D perception, world models, multimodal AI, and physical systems.
- 3D computer vision and geometric perception
- Physical AI and robot perception
- Multimodal and audio-visual learning
- Synthetic data, world models, and video evaluation
- Autonomous driving and mobile robotics
- Spatial AI, reconstruction, mapping, and active perception
- Contributed to computer-vision systems used across approximately 5,000 km of road inspection
- Helped reduce manual road-inspection review by up to 90%
- Developed perception methods reaching approximately 80–85% IoU where applicable
- Built a LiDAR dynamic-object-removal evaluation pipeline running at approximately 51 ms per scan
- Worked across research and engineering roles at Volkswagen, NavVis, Fraunhofer IIS, and RoadGauge AI
- Led a perception team and coordinated dataset-labeling workflows for production computer-vision systems
A multimodal event-recognition pipeline combining audio-event detection with prompted visual detection and segmentation. The system uses recipe-based reasoning to recognize activities such as chopping or washing objects from synchronized audio and video evidence.
Technologies: Python, PyTorch, YOLO-World, FastSAM, EPIC-SOUNDS, multimodal inference
A reproducible research demo for evaluating world-model-generated robotics videos using temporal stability, semantic consistency, physical plausibility, and downstream-task usefulness.
Technologies: Python, NVIDIA Cosmos/NIM, OpenCLIP, OpenCV, Streamlit, Pytest
Camera-calibration tooling for Basler industrial cameras using the PyPylon SDK, developed as part of practical machine-vision work.
Technologies: Python, PyPylon, OpenCV, industrial cameras
A project portfolio presenting work across computer vision, robotics, 3D reconstruction, multimodal AI, and autonomous systems.
Technologies: TypeScript, React, web deployment
View portfolio · View repository
My current work focuses on evaluating and improving perception data for autonomous and embodied systems. This includes questions such as:
- How can synthetic multiview video be evaluated beyond visual realism?
- How can temporal, semantic, and geometric consistency be measured?
- How can multimodal sensing improve robustness when one modality becomes unreliable?
- How can perception systems communicate uncertainty to downstream planning and control?


