Skip to content
Snippets Groups Projects
mhby1g21's avatar
mhby1g21 authored
88b9276a
History

Immersive Audio-Visual VR Scene Reproduction

This project aims to reconstruct 3D geometry and acoustic properties of environments from a single 360° image for plausible audio-visual VR reproduction. It builds on the seminal work of Dr. Hansung Kim and extends research done by University of Southampton 4th year CS and EE students.

Project Structure

The repository is structured as follows:

  • 360monodepth/: Submodule for monocular 360° depth estimation (Docker-based)
  • Atiyeh-RIR-evaluation-Matlab: Matlab related scripts for audio analysis
  • AVVR-Papers/: Related research papers
  • edgenet360/: Submodule for mesh generation (WSL-based)
    • Data/: Directory for input images
    • Output/: Directory for generated meshes in .obj format
  • Intern-logs/: Weekly logs from internship work including AudioResult excel
    • Internship-Report.pdf: 10-week internship technical report
  • Dynamic-Backward-Attention_Transformer/: Submodule for material recognition using Dynamic Backward Attention Transformer
  • RIR_Analysis: Python notebook for sine sweep and deconvolution by Mona
  • scripts/: Automation and integration scripts
  • Unity/:
    • AV-VR/: Main Unity project folder, extending GDP work
    • S3A/: Dr. Hansung's original Unity project for reference (Steam Audio integration, sound source positioning)

Key Files

  • scripts/config.ini: Modify the value in this file to fit system
  • scripts/GUI.py: Main script to be run after following the setup instructions
  • AVVR-Papers/report.pdf: 23/24 GDP's report
  • Manual.docx / Manual.pdf: User manual provided by the GDP group
  • Intern-logs/Internship-Report.pdf: 10-week internship technical report
  • .gitignore: Lists files and directories to be ignored by Git
  • .gitmodules: Defines submodule configurations

Getting Started (Setup)

  1. Clone the project repository:
git clone https://git.soton.ac.uk/gdp-project-4/AVVR-Pipeline-GDP4.git
cd AVVR-Pipeline-GDP4
  1. Update submodules:
git submodule update --init --recursive
  1. Set up environments:

a. For material recognition/DBAT (uses conda):

cd Dynamic-Backward-Attention_Transformer
conda env create -f environment.yml

Download pre-trained checkpoints and also swin_tiny_patch4_window7_224.pth into folder checkpoints

mkdir checkpoints\dpglt_mode95\accuracy checkpoints\swin_pretrain

b. For blenderFlip.py (uses conda):

cd scripts
conda env create -f unity_conda_env.yml

c. For edgenet360 (uses WSL):

cd edgenet360
conda env create -f tf2_new_env.yml
  • Download the weights from here and put in edgenet360/weights folder if its not already there

d. For 360monodepth (uses Docker):

  • Install Docker
  • Build and run the Docker container:
cd 360monodepth
docker build -t 360monodepth .
docker run -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 360monodepth sh -c "cd /monodepth/code/python/src; python3 main.py --expname test_experiment --blending_method all --grid_size 8x7"
  1. Configure paths: Edit scripts/config.ini to set binary directories for Anaconda in Windows and WSL.

  2. Run the main ML pipeline:

cd scripts
python GUI.py
  1. GUI choices

GUI

  • Tick Create depth map, tick include Top for mesh (.obj) with ceiling.
  • Choose image from different scenes folder (KT, ST, UL, MR, LR) in edgenet360/Data
  • The pipeline should run for about 5-15 minutes depending on the system spec.
  • Resulting output .obj will be in edgenet360/Output folder with name final_output_scene_mesh.obj

Refer to Manual.pdf for detailed prerequisites and setup instructions for the ML pipeline if needed and Unity VR rendering.

Pipeline Overview

image

Video Demonstration

KT and ST scene demo

Contributing

To contribute to this project:

  1. Fork the necessary submodules if you are making changes to them.
  2. Create an issue describing the changes you propose.
  3. Submit a pull request referencing the issue.

Please ensure your code adheres to the project's coding standards and includes appropriate documentation.

Acknowledgements

This work is based on Dr. Hansung Kim's research at the University of Southampton and extends the Group Design Project by 4th year EE students.

For more information on the foundational work, please visit:

[Github repo link for previous work TBD]

Future Work

  • Enhance monodepth depth image to fit better with EdgeNet360
  • Remove unnecessary files to reduce git repository size
  • Export the whole pipeline into a single executable file without need for prerequisite and setups (ambitious goal)

License

Currently, this project is under the MIT License. However, considering that it builds upon existing research work, we are reviewing the most appropriate license that respects the original contributions while allowing for further research and development.

Note: The license may be subject to change in the future to better align with research and collaborative requirements.