Skip to content
View AfaqSaeed's full-sized avatar
🌏
GG
🌏
GG
  • 04:08 (UTC +05:00)

Highlights

  • Pro

Block or report AfaqSaeed

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AfaqSaeed/README.md

Hi, I'm Muhammad Afaq Saeed

Physical AI | AI Engineer | 3D Computer Vision | Robotics

Email LinkedIn Portfolio YouTube Germany


About Me

I am an AI and 3D computer vision engineer with 4+ years of professional experience building perception systems across autonomous driving, mobile mapping, infrastructure inspection, multimodal sensing, and robotics.

I completed an M.Sc. in Artificial Intelligence with a Minor in Robotics at FAU Erlangen-Nürnberg. My work spans images, point clouds, video, and sound, with a focus on building perception systems that remain useful outside controlled laboratory conditions.

I have worked on semantic segmentation, stereo vision, structure from motion, LiDAR processing, NeRF-based reconstruction, multimodal learning, synthetic video evaluation, and robotic perception. I am especially interested in the connection between 3D perception, world models, multimodal AI, and physical systems.


Core Interests

  • 3D computer vision and geometric perception
  • Physical AI and robot perception
  • Multimodal and audio-visual learning
  • Synthetic data, world models, and video evaluation
  • Autonomous driving and mobile robotics
  • Spatial AI, reconstruction, mapping, and active perception

Selected Highlights

  • Contributed to computer-vision systems used across approximately 5,000 km of road inspection
  • Helped reduce manual road-inspection review by up to 90%
  • Developed perception methods reaching approximately 80–85% IoU where applicable
  • Built a LiDAR dynamic-object-removal evaluation pipeline running at approximately 51 ms per scan
  • Worked across research and engineering roles at Volkswagen, NavVis, Fraunhofer IIS, and RoadGauge AI
  • Led a perception team and coordinated dataset-labeling workflows for production computer-vision systems

Tech Stack

Languages

Languages

AI and Computer Vision

AI and Computer Vision

Robotics and 3D

ROS ROS2 PCL Open3D Eigen CloudCompare

Engineering and Deployment

Engineering and Deployment CUDA HPC


Featured Projects

Audio-Visual Event Recognition

A multimodal event-recognition pipeline combining audio-event detection with prompted visual detection and segmentation. The system uses recipe-based reasoning to recognize activities such as chopping or washing objects from synchronized audio and video evidence.

Technologies: Python, PyTorch, YOLO-World, FastSAM, EPIC-SOUNDS, multimodal inference

View repository


Robot Video Quality Evaluation with NVIDIA Cosmos

A reproducible research demo for evaluating world-model-generated robotics videos using temporal stability, semantic consistency, physical plausibility, and downstream-task usefulness.

Technologies: Python, NVIDIA Cosmos/NIM, OpenCLIP, OpenCV, Streamlit, Pytest

View repository


Basler Camera Calibration with PyPylon

Camera-calibration tooling for Basler industrial cameras using the PyPylon SDK, developed as part of practical machine-vision work.

Technologies: Python, PyPylon, OpenCV, industrial cameras

View repository


Personal Portfolio

A project portfolio presenting work across computer vision, robotics, 3D reconstruction, multimodal AI, and autonomous systems.

Technologies: TypeScript, React, web deployment

View portfolio · View repository


Research and Engineering Focus

My current work focuses on evaluating and improving perception data for autonomous and embodied systems. This includes questions such as:

  • How can synthetic multiview video be evaluated beyond visual realism?
  • How can temporal, semantic, and geometric consistency be measured?
  • How can multimodal sensing improve robustness when one modality becomes unreliable?
  • How can perception systems communicate uncertainty to downstream planning and control?

Connect With Me

Email LinkedIn Portfolio


Pinned Loading

  1. Audio-Visual-Object-Detection Audio-Visual-Object-Detection Public

    Python

  2. Sensor-Simulation-Project Sensor-Simulation-Project Public

    Python

  3. cosmos-robot-video-eval cosmos-robot-video-eval Public

    Python

  4. deeplabv3 deeplabv3 Public

    Forked from fregu856/deeplabv3

    PyTorch implementation of DeepLabV3, trained on the Cityscapes dataset.

    Python

  5. mapillary/OpenSfM mapillary/OpenSfM Public

    Open source Structure-from-Motion pipeline

    Python 3.8k 899

  6. AfaqSaeed.github.io AfaqSaeed.github.io Public

    TypeScript