Skip to content

duy-phamduc68/TrafficLab-3D

Repository files navigation

TrafficLab 3D

TrafficLab puts accessibility at the forefront, with just access to mp4 CCTV footage and knowing where that location is on Google Maps, anyone can create a fancy digital twin demo that demonstrates advanced computer vision, especially for students, individual investigators, and enthusiasts who might not have access to camera calibration and synchronized high quality satellite imagery.

Demo

oosmetrics

Release: v1.1

Developed by Yuk

Complementary resources:

It is very recommended that you read through this README if you want to run this program on your own machine. Click here to jump to Getting Started

Project Status and Future Direction

Important: This repository represents the initial Proof of Concept (PoC) for TrafficLab 3D. While the tool is functional for academic and demonstration purposes, the current codebase is an experimental, monolithic prototype. Instead of archiving this repository to focus on a rewrite, I am keeping development open to anyone who shares the vision of scaling this framework. The immediate goal is refactoring the project into a modular architecture, and improving it especially with what I was able to learn from this PoC. Please contact me first if you are interested in developing this vision with me!

Contributions and Collaboration

If you want to help transition this project to a maintainable ecosystem, contributions are welcome:

  • Architecture: Advice on structural decoupling and data schemas is encouraged via GitHub Issues.
  • Code: Before submitting a Pull Request, please open an Issue to describe the optimization or refactoring step so we can align on the direction.
  • Methodology: To understand the underlying theory first, please refer to the academic report or the blog post.
TrafficLab-3D/
├── location/
│   └── {location_code}/
│       ├── footage/
│       │   └── *.mp4
│       │
│       ├── illustrator/                 (optional, Adobe Illustrator assets)
│       │   ├── layout_{location_code}.ai
│       │   ├── roi_{location_code}.ai
│       │   └── *.ai
│       │
│       ├── G_projection_{location_code}.json
│       ├── cctv_{location_code}.png     (critical!)
│       ├── sat_{location_code}.png      (critical!)
│       ├── layout_{location_code}.svg   (optional)
│       └── roi_{location_code}.png      (optional)
│
├── media/                               (resources for README and Introduction tab)
│
├── trafficlab/                                 (main codebase)
│
├── models/                              (object detection & tracker models)
│   └── *.pt                             (YOLO checkpoints)
│
├── output/
│   └── model-{model_name}_tracker-{tracker_name}/
│       └── {config-name}/
│           └── {location_code}/
│               └── *.json.gz             (inference outputs)
│
├── environment.yml
├── inference_config.yaml
├── prior_dimensions.json
└── main.py

Introduction

TrafficLab is an end-to-end traffic analysis suite that covers:

  • Calibration: Establishing a two way projection between any CCTV and its satellite map, with support for custom SVG.
  • Inference: Easily swap object detection models and object tracker along with numerous kinetics and comprehensive control of arguments.
  • Visualization: A "digital twin" experience with side-by-side, synchronized view of CCTV with 3D bounding boxes and satellite view with floor box, speed, and orientation.

WelcomeTab

Get started by navigating to any tabs on the top left corner of the program.

Functionality

TrafficLab functionalities are spread across 3 main tabs, you can navigate to any of these tabs without losing work on another, below are brief description of each of the tabs.

Calibration Tab

CalibrationStart

Calibration Tab produces G Projection JSON files (refer to the report) which helps establish a two-way projection between the CCTV and SAT (satellite) domain, it presents a comprehensive, backwards compatible stage-based calibration process comprising of the following stages:

  • Phase 1: Undistort
    • Pick Stage: Quickly validate and initialize construction/reconstruction of G Projection for a given location code.
    • Lens Stage: Configure intrinsics matrix K.
    • Undistort Stage: Adjust distortion coefficients obeying the Brown-Conrady distortion model (5 coefficients).
    • Validation 1: You can confirm the distortion and intrinsics, concluding the Phase.
  • Phase 2: Homography
    • Homography Anchors Stage: Manual pair point based homography computation with RANSAC solver.
    • Homography FOV Stage: Check the warped CCTV overlaid on the SAT map, which also doubles up as a FOV polygon for intuitive visualization.
    • Validation 2: Click a ground contact point in CCTV and see it shows up on SAT map.
  • Phase 3: Parallax
    • Parallax Subjects Stage: Establish Head and Ground Contact point of 2 Subjects, input their height, calculate the camera's position.
    • Distance Reference Stage: Enter the distance (obtainable from Google Maps/Earth) to establish pixel per meter ratio.
    • Validation 3: Click head point, enter height, see ground contact point in CCTV and actual position on SAT map.
  • Optional:
    • SVG Stage: Compute affine matrix between SVG and SAT.
    • ROI Stage: Choose a discard strategy for ROI.
  • Final Validation: Test how 2D bounding box converts to 3D box in CCTV and floor box in SAT.
  • Save Stage: Confirm saving a G Projection for the location code.

CalibrationEnd

Note: Location Code:

You will have to prepare the necessary folders and files to perform calibration, there is also a Location Tab to help you with creating the barebone location folder, ready for calibration. you can create custom SVG and ROI using Adobe Illustrator, refer to the blog post/Youtube video for a more detailed guide on crafting said resources.

LocationTab

Inference Tab

InferenceTab

Inference Tab is a Hub for you to keep track of all your production of the output JSON files, these files are what are actually used by the visualization engine, eliminating the need to perform demanding computation on top of heavy rendering. For arguments, you will control them through inference_config.yaml and prior_dimensions.json in the project's root. JSON will be saved as the compressed .json.gz format for storage efficiency. Controllable arguments includes:

  • Object detection model.
  • Object tracker.
  • Speed and orientation smoothing kinematics.

Visualization Tab

VisualizationTab

The visualization engine for the output JSON files, features comprehensive controls via a tool bar and keyboard shortcuts, flexible side by side view of CCTV and SAT panel.

Getting Started

Install the necessary conda/venv environment, then run main.py:

conda env create -f environment.yml
python main.py

In this Google Drive, you can find:

  • Some finetuned YOLOv8-s and YOLOv11-s models for the models/ folder.
  • Two folders of the same location code with their projection constructed, 1 with SVG and 1 without. If you want more pre-constructed projections of different locations then contact me. Put these in location/.
  • UPDATE: I've added quite a few more pre-constructed location code that I used myself in my testing to the Drive. Feel free to load them into TrafficLab, run inference, and see the visualization!
  • One preprocessed .json.gz output file ready for visualization (need the 119NH folder in location/ from the same Drive).

This project was inspired by the paper Rezaei et al. 2023

Run Configs

If you do want to configure your own model and adjust kinematics, you will have to inspect the inference_config.yaml and prior_dimensions.json files.

Open Problems

  • This method only works on a singular flat planar environment.
  • The data is very "physics-ignoring", it mostly take output from detector and tracker as-is, which can be noisy:
    • Occulsion will make an object behind it disappear (eg. a large bus, truck covering a sedan, pedestrian).
    • The vehicle 3D boxes does not follow any vehicle model, ensuring they adhere to something like a kinematic bicycle model will eliminate a lot of problems where vehicle boxes just rotates or teleport randomly.
    • A user can't just appear or disappear in the middle of the scene, some entry/exit annotations could help with this.
  • Crafting the initial G-Projection and SVG Map is still a very tedious task.

Changelog

  • v1.0: Initial release.
  • v1.1: Refactored codebase and bug fixes.

Long-term Vision

I wish to scale this idea to be city-wide, with automatic calibration + continuous detector & tracker improvement. Eventually being sufficient for high-fidelity downstream tasks such as simulation, digital twin, natural language query, reinforcement learning, etc...