Immersive Audio-Visual VR Scene Reproduction
This project aims to reconstruct 3D geometry and acoustic properties of environments from a single 360° image for plausible audio-visual VR reproduction. It builds on the seminal work of Dr. Hansung Kim and extends research done by University of Southampton 4th year CS and EE students.
Project Structure
The repository is structured as follows:
-
360monodepth/
: Submodule for monocular 360° depth estimation (Docker-based) -
Atiyeh-RIR-evaluation-Matlab
: Matlab related scripts for audio analysis -
AVVR-Papers/
: Related research papers -
edgenet360/
: Submodule for mesh generation (WSL-based)-
Data/
: Directory for input images -
Output/
: Directory for generated meshes in .obj format
-
-
Intern-logs/
: Weekly logs from internship work including AudioResult excel-
Internship-Report.pdf
: 10-week internship technical report
-
-
Dynamic-Backward-Attention_Transformer/
: Submodule for material recognition using Dynamic Backward Attention Transformer -
RIR_Analysis
: Python notebook for sine sweep and deconvolution by Mona -
scripts/
: Automation and integration scripts-
360monodepthexecution/
: Powershell automation scripts for docker (360monodepth) -
debug_tool/
: Debug tools to run modules one by one
-
-
Unity/
:-
AV-VR/
: Main Unity project folder, extending GDP work -
S3A/
: Dr. Hansung's original Unity project for reference (Steam Audio integration, sound source positioning)
-
Key Files
-
scripts/config.ini
: Modify the value in this file to fit system -
scripts/GUI.py
: Main script to be run after following the setup instructions -
scripts/debug_tool/GUI_debug.py
: Main script for debugging the module one by one -
AVVR-Papers/report.pdf
: 23/24 GDP's report -
Manual.docx
/Manual.pdf
: User manual provided by the GDP group -
Intern-logs/Internship-Report.pdf
: 10-week internship technical report -
.gitignore
: Lists files and directories to be ignored by Git -
.gitmodules
: Defines submodule configurations
Getting Started (Setup)
- Clone the project repository:
git clone https://git.soton.ac.uk/gdp-project-4/AVVR-Pipeline-GDP4.git
cd AVVR-Pipeline-GDP4
- Update submodules:
git submodule update --init --recursive
- Set up environments:
a. For material recognition/DBAT (uses conda): Get into Dynamic-Backward-Attention-Transformer directory,
cd Dynamic-Backward-Atention-Transformer
conda env create -f environment.yml
While inside DBAT folder, Download pre-trained checkpoints and also swin_tiny_patch4_window7_224.pth into folder checkpoints Put epoch=126-valid_acc_epoch=0.87.ckpt checkpoint to Dynamic-Backward-Atention-Transformer\checkpoints\dpglt_mode95\accuracy and swin-tiny-patch4-window7-224.pth to Dynamic-Backward-Attention-Transformer\checkpoints\swin_pretrain
mkdir checkpoints\dpglt_mode95\accuracy checkpoints\swin_pretrain
b. For blenderFlip.py (uses conda):
cd scripts
conda env create -f unity_conda_env.yml
c. For edgenet360 (uses WSL):
- Install WSL and Anaconda following these instructions: https://info.stat.cmu.edu/index.php?title=Windows_Subsystem_for_Linux_(WSL)_and_Python
- Make sure wsl thats called in cmd is the one with anaconda installed
- Then create the tf2 environment:
- MAKE SURE THE COMMAND BELOW ARE RUN IN WSL, from wsl, go up directory then go to /mnt/ to wherever the repo was in to access .yml
cd edgenet-360
conda env create -f tf2_new_env.yml
- Download the weights from here and put in edgenet360/weights folder if its not already there
d. For 360monodepth (uses Docker):
- Install Docker
- Build and run the Docker container:
cd 360monodepth
docker build -t 360monodepth .
docker run -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 360monodepth sh -c "cd /monodepth/code/python/src; python3 main.py --expname test_experiment --blending_method all --grid_size 8x7"
- Configure paths:
- Make a copy of scripts/config.example.ini in the same location and rename it to config.ini.
- Edit config.ini to set binary directories for Anaconda in Windows and WSL, and copy paste the new docker image id.
- Run the main ML pipeline:
cd scripts
python GUI.py
- GUI choices
- Tick Create depth map, tick include Top for mesh (.obj) with ceiling.
- Choose image from different scenes folder (KT, ST, UL, MR, LR) in edgenet360/Data
- The pipeline should run for about 5-15 minutes depending on the system spec.
- Resulting output .obj will be in edgenet360/Output folder with name final_output_scene_mesh.obj
Refer to Manual.pdf for detailed prerequisites and setup instructions for the ML pipeline if needed and Unity VR rendering.
- Debug GUI to run modules one by one for troubleshooting
- Save time by not needing to run all modules sequentially to isolate error faster.
- Easier to understand the underlying input and output of each modules.
Pipeline Overview
Video Demonstration
Contributing
To contribute to this project:
- Fork the necessary submodules if you are making changes to them.
- Create an issue describing the changes you propose.
- Submit a pull request referencing the issue.
Please ensure your code adheres to the project's coding standards and includes appropriate documentation.
Acknowledgements
This work is based on Dr. Hansung Kim's research at the University of Southampton and extends the Group Design Project by 4th year EE students.
For more information on the foundational work, please visit:
[Github repo link for previous work TBD]
Future Work
- Enhance monodepth depth image to fit better with EdgeNet360
- Remove unnecessary files to reduce git repository size
- Export the whole pipeline into a single executable file without need for prerequisite and setups (ambitious goal)
License
Currently, this project is under the MIT License. However, considering that it builds upon existing research work, we are reviewing the most appropriate license that respects the original contributions while allowing for further research and development.
Note: The license may be subject to change in the future to better align with research and collaborative requirements.