Introduction

🎉 Welcome to EzhouNet A framework based on graph neural network and anchor interval for the respiratory sound event detection .

This repository provides an end-to-end deep learning method for sound event detection (SED).
We focus on respiratory sound events, and the idea was inspired by anchor boxes in computer vision.

Instead of using frame-level post-processing, we directly learn event intervals by:

Generating anchor intervals with
desed_task/dataio/datasets_resp_v9_8_7.py → RespiraGnnSet(Dataset).generate_anchor_intervals
Refining interval offsets with
desed_task/nnet/EzhouNet_v9_7_9.py → GraphRespiratory(nn.Module).Interval_Refine

⚠ Please note: while this method has been shown effective for sound event detection, in the respiratory sound detection scenario, it is not yet ready for clinical use.
And this repo serves as a reference implementation for researchers. The original design principles are detailed in our paper, though many modules have since been updated.

🚀 Getting Started

Install the evaluation functions following the steps in DESED_task.
These will be used to compute sound event detection metrics.
Set up your environment:
- python=3.8
- pytorch=1.13.1
- pytorch-lightning=2.2.5
- torch_geometric=2.5.2
- Install dependencies:
```
pip install -r requirements.txt
```
Prepare your dataset.
- For respiratory sounds, we used SPRsound and HF Lung V1.

🏋️ Training

cd into the this path :

/Respira_SED_LGNN/recipes/dcase2023_task4_baseline/

1. Learn start & end offsets of anchor intervals

Set requires_grad=True or False to control whether bins are learnable:

self.start_weight_params = nn.ParameterList([
    nn.Parameter(torch.linspace(-1.50, 1.50, dist_bins_list[i]), requires_grad=False)
    for i in range(self.num_scales)
])
self.end_weight_params = nn.ParameterList([
    nn.Parameter(torch.linspace(-1.50, 1.50, dist_bins_list[i]), requires_grad=False)
    for i in range(self.num_scales)
])

python   train_respiratory_lab9_8_6.py

2. YOLO-style learning of center & width offsets

a_w = (ends - starts).clamp(min=1e-6)  # anchor width, seconds
a_c = 0.5 * (starts + ends)            # anchor center

pred_centers = a_c + t_c_pred * a_w
pred_widths = a_w * torch.exp(t_w_pred.clamp(min=-6.0, max=6.0))

s = (pred_centers - 0.5 * pred_widths).clamp(min=0.0, max=float(audio_len))
e = (pred_centers + 0.5 * pred_widths).clamp(min=0.0, max=float(audio_len))

 python   train_respiratory_lab10_1_2.py

3. Combine both methods

Mixing center-offset and start/end-offset learning improves detection performance.

       python   train_respiratory_lab10_1_3.py

here is a reference result:

Using confidence threshold: conf=0.501
Category-specific NMS IoU thresholds:
  Stridor: 0.5
  Wheeze: 0.4
  Crackle: 0.15
  Rhonchi: 0.3
	 call the compute event based metrics  

the Event based overall   f score: 0.19796610169491524, 	 error rate : 2.3084479371316307

the Event based class wise average f score: 0.1848416711564406,	 error rate : 2.966633604392851

  Class-wise metrics
  ======================================
    Event label  | Nref    Nsys  | F        Pre      Rec    | ER       Del      Ins    |
    ------------ | -----   ----- | ------   ------   ------ | ------   ------   ------ |
    Rhonchi      | 29      94    | 14.6%  9.6%   31.0%   | 3.62   0.69   2.93    |
    Stridor      | 5       18    | 17.4%  11.1%  40.0%   | 3.80   0.60   3.20    |
    Crackle      | 287     496   | 17.4%  13.7%  23.7%   | 2.25   0.76   1.49    |
    Wheeze       | 188     358   | 24.5%  18.7%  35.6%   | 2.19   0.64   1.55    |

Using confidence threshold: conf=0.65
Category-specific NMS IoU thresholds:
  Stridor: 0.5
  Wheeze: 0.4
  Crackle: 0.15
  Rhonchi: 0.3
	 call the compute event based metrics  

the Event based overall   f score: 0.20275862068965514, 	 error rate : 2.257367387033399

the Event based class wise average f score: 0.19081504850632036,	 error rate : 2.8438182069170024

  Class-wise metrics
  ======================================
    Event label  | Nref    Nsys  | F        Pre      Rec    | ER       Del      Ins    |
    ------------ | -----   ----- | ------   ------   ------ | ------   ------   ------ |
    Rhonchi      | 29      88    | 15.4%  10.2%  31.0%   | 3.41   0.69   2.72    |
    Stridor      | 5       17    | 18.2%  11.8%  40.0%   | 3.60   0.60   3.00    |
    Crackle      | 287     486   | 17.9%  14.2%  24.0%   | 2.21   0.76   1.45    |
    Wheeze       | 188     350   | 24.9%  19.1%  35.6%   | 2.15   0.64   1.51    |

Using confidence threshold: conf=0.8
Category-specific NMS IoU thresholds:
  Stridor: 0.5
  Wheeze: 0.4
  Crackle: 0.15
  Rhonchi: 0.3
	 call the compute event based metrics  

the Event based overall   f score: 0.20889202540578686, 	 error rate : 2.1886051080550097

the Event based class wise average f score: 0.20316344967739192,	 error rate : 2.6116479647528896

  Class-wise metrics
  ======================================
    Event label  | Nref    Nsys  | F        Pre      Rec    | ER       Del      Ins    |
    ------------ | -----   ----- | ------   ------   ------ | ------   ------   ------ |
    Rhonchi      | 29      82    | 16.2%  11.0%  31.0%   | 3.21   0.69   2.52    |
    Stridor      | 5       14    | 21.1%  14.3%  40.0%   | 3.00   0.60   2.40    |
    Crackle      | 287     479   | 18.3%  14.6%  24.4%   | 2.18   0.76   1.43    |
    Wheeze       | 188     333   | 25.7%  20.1%  35.6%   | 2.06   0.64   1.41    |

🔮 Further Steps

If you’d like to improve upon this work, here are some suggestions:

Avoid group cyclic slicing of spectrogram feature maps. While useful for grouped feature extraction, it makes quantization & deployment difficult.
Try alternative respiratory features: spectrograms, MFCCs, energy, or statistical features (see the paper Benchmarking of eight RNN variants for breath phase and adventitious sound detection on hf_lung_v1).
Explore advanced multi scale graph convolution modules for node updates , e.g.:

💡 Inspiration

The idea came a biomedical conference in the University, where I saw graph neural networks being widely applied to biosignals. That’s how EzhouNet was born — named after the city of Ezhou.

During the research time in Ezhou, I met a friend there, Kun, who took me to visit Liangzi Lake. He said:

“People knows Wuhan’s East Lake, but few know Liangzi Lake in Ezhou.” It truly is an ecological gem. 🌿🌊

Feel free to fork, experiment, and improve your lab. If you like it, give it star.

Happy coding, and good luck with your projects! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Auiltxary_Function		Auiltxary_Function
HF_v1_data		HF_v1_data
PSDS_Eval		PSDS_Eval
data		data
desed_task		desed_task
recipes		recipes
sed_scores_eval		sed_scores_eval
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
setup_v1.py		setup_v1.py
temp.py		temp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

🚀 Getting Started

🏋️ Training

1. Learn start & end offsets of anchor intervals

2. YOLO-style learning of center & width offsets

3. Combine both methods

🔮 Further Steps

💡 Inspiration

About

Uh oh!

Releases

Packages

Languages

chumingqian/EzhouNet

Folders and files

Latest commit

History

Repository files navigation

Introduction

🚀 Getting Started

🏋️ Training

1. Learn start & end offsets of anchor intervals

2. YOLO-style learning of center & width offsets

3. Combine both methods

🔮 Further Steps

💡 Inspiration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages