audio/SongRec/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213

SongRec is an open-source Shazam client for Linux, written in Rust.

Features:

* Recognize audio from an arbitrary audio file.
* Recognize audio from the microphone.
* Usage from both GUI and command line (for the file recognition part).
* Provide an history of the recognized songs on the GUI, exportable to 
CSV.
* Continuous song detection from the microphone, with the ability to 
choose your input device.
* Ability to recognize songs from your speakers rather than your 
microphone (on compatible PulseAudio setups).
* Generate a lure from a song that, when played, will fool Shazam into 
thinking that it is the concerned song.

A (command-line only) Python version, which I made before rewriting in 
Rust for performance, is also available for demonstration purposes. It 
supports file recognition only.

## How it works

For useful information about how audio fingerprinting works, you may 
want to read [this article](http://coding-geek.com/how-shazam-works/). 
To be put simply, Shazam generates a spectrogram (a time/frequency 2D 
graph of the sound, with amplitude at intersections) of the sound, and 
maps out the frequency peaks from it (which should match key points of 
the harmonics of voice or of certains instruments).

Shazam also downsamples the sound at 16 KHz before processing, and cuts 
the sound in four bands of 250-520 Hz, 520-1450 Hz, 1450-3500 Hz, 
3500-5500 Hz (so that if a band is too much scrambled by noise, 
recognition from other bands may apply). The frequency peaks are then 
sent to the servers, which subsequently look up the strongest peaks in 
a database, in order look for the simultaneous presence of neighboring 
peaks both in the associated reference fingerprints and in the 
fingerprint we sent.

Hence, the Shazam fingerprinting algorithm, as implemented by the 
client, is fairly simple, as much of the processing is done 
server-side. The general functionment of Shazam has been documented in 
public [research 
papers](https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf) and 
patents.


Note: It is not mandatory, but if you want to be able to recognize more 
formats than WAV, OGG, FLAC and MP3, you should ensure that you have 
the `ffmpeg` package installed.

## Compilation

(**WARNING**: Remind to compile the code in "--release" mode for 
correct performance.)

### Installing Rust

First, you need to [install the Rust compiler and package 
manager](https://www.rust-lang.org/tools/install). It has been observed 
to work with `rustc` 1.43.0 to the current rustc 1.47.0.

Install Rust and put it in path, for all distributions:

```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Type 
"1"
# Login and reconnect to add Rust to the $PATH, or run:
source $HOME/.cargo/env

# If you already installed Rust, then update it:
rustup update
```

### Install dependent libraries (nothing exotic)

Debian:

```bash
sudo apt install build-essential libasound2-dev libgtk-3-dev libssl-dev
```

Void Linux (libressl):

```shell
sudo xbps-install base-devel alsa-lib-devel gtk+3-devel libressl-devel
```

Void Linux (openssl):

```shell
sudo xbps-install base-devel alsa-lib-devel gtk+3-devel openssl-devel
```

### Compiling the project

This will compile and run the projet:

```bash
# For the stable release:
cargo install songrec
songrec

# For the Github tree:
git clone git@github.com:marin-m/songrec.git
cd songrec
cargo run --release
```

For the latter, you will then find the project's binary (that you will 
be able to move or execute directly) at `target/release/songrec`.

## Sample usage

Passing no arguments or using the `gui` subcommand will launch the GUI, 
and try to recognize audio real-time as soon as the application is 
launched:

```
./songrec
./songrec gui
```

Using the `gui-norecording` subcommand will launch the GUI without 
recognizing audio as soon as the software is started (you will need to 
click the "Turn on microphone recognition" button to do so):

```
./songrec gui-norecording
```

The GUI allows you to recognize songs either from your microphone, 
speakers (on compatible PulseAudio setups), or from an audio file. The 
MP3, FLAC, WAV and OGG formats should be accepted for audio files if 
FFMpeg is not installed, and any audio or video formats supported by 
FFMpeg should be accepted if FFMpeg is installed.

The following commands allow to recognize sound from your microphone or 
from a file using the command line (`listen` runs while the microphone 
is usable while `recognize` recognizes only one song), use the `-h` 
flag in order to see all the available options:

```
./songrec listen -h
./songrec recognize -h
```

By default, only the artist and track name of the concerned song are 
displayed to the standard output, and other information may be 
displayed to the error output. The `--csv` and `--json` options allow 
to display more programmatically usable information to the standard 
output.

The above decribes the newer CLI interface of SongRec, but an older 
interface, operating only on audio files or raw audio fingerprints, is 
also available and described below.

The following subcommand will try to recognize audio from the middle of 
an audio file, and print the JSON response from Shazam servers:

```
./songrec audio-file-to-recognized-song sound_file.mp3
```

The following subcommands will do the same with an intermediary step, 
manipulating data-URI audio fingerprints as used by Shazam internally:

```
./songrec audio-file-to-fingerprint sound_file.mp3
./songrec fingerprint-to-recognized-song 
'data:audio/vnd.shazam.sig;base64,...'
```

The following will produce back hearable tones from a given 
fingerprint, that should be able to fool Shazam into thinking that this 
is the original song (either to the default audio output device, or to 
a .WAV file):

```
./songrec fingerprint-to-lure 'data:audio/vnd.shazam.sig;base64,...'
./songrec fingerprint-to-lure 'data:audio/vnd.shazam.sig;base64,...' 
/tmp/output.wav
```

When using the application, you may notice that certain information 
will be saved to `~/.local/share/SongRec` (or an equivalent directory 
depending on your operating system), including the CSV-format list of 
the last recognized songs and the last selected microphone input device 
(so that it is chosen back when restarting the app). You may want to 
delete this directory in case of persistent issues.

## Privacy

SongRec collects no data and contacts no other servers than Shazam's. 
SongRec does not upload raw audio data anywhere: only fingerprints of 
the audio are uploaded, which means sequences of frequency peaks 
encoded in the form of "(frequency, amplitude, time)" tuples.

This does not suffice to represent anything hearable alone (use the 
"Play a Shazam lure" button to see how much this is different from full 
sound); that means that no actually hearable sound (e.g voice 
fragments) is sent to servers, only metadata derived on the 
characteristics of the sound that may only suffice to recognize a song 
already known by Shazam is being sent.

## Legal

This software is released under the [GNU GPL 
v3](https://www.gnu.org/licenses/gpl-3.0.html) license. It was created 
with the intent of providing interoperability between the remote Shazam 
services and Linux-based deskop systems.

Please note that in certain countries located outside of the European 
Union, especially the United States, software patents may apply.