top of page

Guitar Music Transcription from Silent Video

Authors: 

Abstact

Musical note tracking (NT), identifying the pitch of played notes and their temporal information, is typically computed from audio data. Although audio is the natural source of information for NT, audio-based methods have limitations, mostly for polyphonic music analysis. When a string instrument is played, each of its strings vibrates at a certain frequency, producing a sound wave. We propose a novel, physics-based method for polyphonic NT of string instruments. First, the string vibrations are recovered from silent video captured by a commercial camera mounted on the instrument. These vibrations are also used to detect the string locations in the video. The NT of each string is then computed from a set of 1D signals extracted from the video. Analyzing each string separately allows us to overcome the limitations of audio-based polyphonic NT. By directly considering the expected frequencies of the played notes, their aliases, and their harmonics, we can overcome some limitations posed by the relatively low sampling rate of the camera. For a given frame rate, we analyze the set of notes that cannot be detected due to noise as well as indistinguishable pairs of notes. Our method is tested on real data, and its output is sheet music that can allow musicians to play the visually captured music. Our results show that the visual-based NT method can play an important role in solving the NT problem.

bottom of page