Audio Indexing for Music Information Retrieval

Audio Indexing for Music Information Retrieval

An efficient Music Information Retrieval[MIR] system utilizes similarity searching:

to handle the errors in the query.
because each music is produced as a variation in the original theme.

Given a collection of musical sequences and a search sequence, the aim is to find the sub sequence of query in the data set. Each musical sequence is considered to be a sequence of integers in the range[0,255], which represents the signal amplitude of the music.

Time series representation of the music collection is used as a feature for categorization. It helps in efficient query processing as it indexes the audio sequence more effectively as compared to other available features. Using Discrete Fourier Transform[DFT] along with the time series representation, the musical sequence in the database is converted to a series of points in space. For space optimization, few of the leading points are utilized for further processing, cutting the size of an audio file to be stored significantly. Each cluster of points remaining in the sequence is categorized by a sliding window on which DFT is applied to generate a 3-D point in space for time series analysis. These points are grouped together to form minimum bounding rectangles[MBR], using the fact that point close to each other in multidimensional space must belong to the same set in transformed space. For increasing the efficiency and speed, these MBR's are then indexed using R*-tree.

Feature extraction process - converting sub sequence to a
point in multi dimensional space

False Alarm Resolution:

False Alarm Resolution is used to further process the query results. An improvised version for False Alarm Resolution was proposed due to the following reasons:

audio files comprise of larger query sequences, due to which each false alarm resolution can drastically decrease the efficiency of the entire retrieval system.
previous versions of False Alarm Resolution handled only a single sequence of data, whereas a Musical Information Retrieval system has to handle a collection of such data sets.

The former limitation is handled by examining the pages in reverse order while resolving a false alarm, whereas the latter limitation is handled by grouping all the possible matches for each sequence separately followed by sorting within each group. This helps to solve the problem for the random scattering of results while accessing the music sequence.

Performance Evaluation:

The MS-indexing and ST-indexing used for result analysis showed that as the DFT window size was increased, the corresponding execution time for both the algorithms increases, which essentially occurs due to the following reason:

The size of DFT coefficient sequence decreases with increase in window size. Due to this, the average size of MBR decreases and as a result less amount of MBR is obtained. A few MBR also supplements indexing efficiency and increases the rate of false alarm resolution. MS-Index is a faster when compared to ST-Index because queries which are larger in size utilizes DFT window that is proportionally large and the reverse schema gets more efficient by trimming intermediate pages.

Further, the performance was graded by evaluating the retrieval time for the different size of the query. The results showed an increase in retrieval time with an increase in query size.

How the system proposed can be improved?

The idea proposed is limited to range searches. However, with nearest neighbor search, this limitation can be removed.
Furthermore, the methodology proposed can be improved by incorporating, a ranking system such as similarity search, that would allow and cluster audio sequence based on it's best possible match in the data set.
A hashmap can be used to store the mapping between audio sequence and the corresponding description of the matched file to improve upon the extraction of details about the matched sequence from the system in constant time.
Also, annotating the music sequence could help to cluster similar files in the data set. This can improve the retrieval time of the overall system.
Finally, a collection of features(such as pitch, rhythm, etc.) can be used to improve the accuracy of the system.

A use case for the above model could be an audio forgery detection system.

References:

1. Audio Indexing for Efficient Music Information Retrieval
2. A Survey of Audio-Based Music Classification and Annotation

IIITD IR MELANAGE

Search This Blog