S
S
Sergey Vanyushin2013-08-30 16:04:24
Text Processing Automation
Sergey Vanyushin, 2013-08-30 16:04:24

What to do with the lyrics of 372 thousand songs?

It just so happens that I have a parsed database of English lyrics from www.lyrics.net . It contains 56,198 artists, 113,151 albums and 372,357 songs .
All metadata is stored in mysql with the following structure:
9f5cb97412ba994518914a3d756b46d2.png
Texts are stored in txt.gz files, occupying 1.5 Gb .
What to do with this data?
I posted a dump of the database and all the texts in the files. Wrong with the weight. The archive is 170 mb in size, unpacked - about 700 mb.
Download: yadi.sk/d/K5XoBd9S8hgGF

Answer the question

In order to leave comments, you need to log in

6 answer(s)
S
Sergey Vanyushin, 2013-09-04
@wapmorgan

posted - yadi.sk/d/K5XoBd9S8hgGF

V
Viktor Kuznetsov, 2013-08-30
@janitor

You can search for similar songs, combine them into groups, view statistics (popular song titles, popular words, etc.)

K
Killy, 2013-08-30
@Killy

In the structure of the "wc_lyricsnet_songs" table, I did not understand where the actual lyrics were. But apparently they are, otherwise it's not interesting.
First, you can figure out what, in fact, this data contains, and what metrics can be calculated from this data:
Artist:
Data:
- Artist name
Metrics:
- Number of registered albums
- Number of registered songs
- Average / maximum / etc values ​​​​of album metrics and songs by artist
Album:
Data:
— Album title
— Year of album release
Metrics:
— Number of songs in album
— Length of album title
— Number of words in album title
— Album release year
— Average/max/etc metrics of songs in the album
Song:
Data:
— Title
— Text
Metrics:
— Song title length
— Number of words in the song title
— Lyric length
— Number of words in the text
— Number of unique words in the text
— The average number of repetitions of words in the text
Then think about what can be learned from all this:
a)
For the entire database or for samples by metrics, calculate frequency dictionaries for text data. Identify popular patterns in the names of {artists/albums/songs} - make your own name generator.
b)
Search for extreme values ​​of metrics (not forgetting about normalization).
For example, the most verbose performers. Or the authors of the most capacious texts.
c)
Draw a "metric x metric" grid. See if anything interesting happens at the intersections.
For example, [Album release year] x [Number of unique words in text]. For each year (sample by metric 1), we take the average value of metric 2, divide by the total number of registered songs for that year (sample size). We build a graph and check whether the texts become more primitive on average.
d)
Frequency dictionaries for samples. Draw a metric x data grid. See if anything interesting happens at the intersections. Calculate frequency dictionaries from the data from the samples based on the metric. Compare results and find deviations.
For example, [Album Release Year] x [Song Title]. Will it be possible to follow the musical fashion in this way?
When using external data (frequency dictionary of the English language, lastfm, etc.), you can still come up with all sorts of metrics. Not always trivial, though. But more promising. For example:
- "Simplicity of the text" - how it consists of frequent or, conversely, rare words. This is cooler than the number of unique words in the text.
- The attitude of the performer to this or that muses. direction (genre). Having statistical data on the bright representatives of the genre, you can try to make your own genre recognizer for arbitrary texts, for example.
The idea, I hope, is clear. You can continue to brainstorm and experiment to come up with new and combined metrics, find where to extract additional data, etc…

D
deserg, 2013-08-30
@deserg

if there is really nothing to do, then based on the texts, you can compile statistics on the most frequently used words in songs (what they sing about), the length of the songs (who is the writer), and all sorts of ratings there.

Q
QZip, 2013-08-31
@QZip

Doorways are made of such material;)

S
Sergey, 2013-12-11
@begemot_sun

At you the picture of structure of a DB broke.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question