OpenScore: One Year On

It has already been a year since our Kickstarter campaign was successfully funded. A lot has happened in the world of digital sheet music; with a new app for IMSLP, the complete redesign of MuseScore’s website, its merger with Ultimate Guitar, and the release of MuseScore 2.3. We also oversaw the digitisation of over 250 pieces for voice and piano as part of the OpenScore Lieder Corpus, proving that large scale crowdsourced music digitisation is possible.

We have also made significant progress with the pieces selected by our Kickstarter backers, but we haven’t got as far as we were hoping to. There has been a huge amount of interest from transcribers, yet turning this interest into progress has been more challenging. We have received submissions for most of the works on the list, and the quality has generally been very good, but even the very best transcriptions have needed substantial reworking before being ready for publication. This was necessary to ensure the scores meet our guidelines and are consistent with other editions.

Key challenges

The length of the pieces
- Many are over 100 pages
The number of instruments in each piece
- Around 5 - 10 instruments for most pieces, but some have over 100 instruments!
The variety of instruments
- Each piece uses a different set of instruments, so requires a different template score.
- Pieces often contain specialist instruments that transcribers are not familiar with.
The need for consistency
- Each transcriber brings their own unique skills, experience, and style to the project.
- Consistency is needed between the different works, and also within works, as some of the longer works have been distributed among multiple transcribers.
The need for semantic correctness
- Digital scores contain special markup that allows computers to understand them. Without this “semantic” information, the computer wouldn’t be able to tell whether “Allegro” is a tempo marking or just an ordinary word.
- Semantic information is required for things like playback, editing, transposition, part extraction, and content re-flow - in short everything that makes a digital score different to a paper score!
- Scores can still look fine without semantic information, so transcribers often don’t realise when it is missing or incorrect, and often don't understand why it is important.
Exceptions to rules
- Every piece brings its own unique set of challenges that require new decisions to be made, and past decisions to be reviewed, often before the transcription can even begin.
- Virtually every piece has brought up at least one new quirk of music notation that we have never encountered before.

You might think that accuracy would be a problem for a crowdsourced project, but we have been pleasantly surprised with the degree of effort transcribers have gone to in ensuring that their transcriptions match the original. Where mistakes have slipped through, they have generally been spotted pretty quickly by other users, and corrected just as fast. The problem is not with accuracy, but with meeting the technical requirements needed for optimum playback, and also to enable part extraction, transposition, and to re-flow the layout on different devices. Contributors are good at spotting inaccuracies because they make the score “look wrong”. However, scores that don’t meet the technical requirements can still “look right”, so contributors struggle to spot these issues, or to even understand why they matter.

As a result of these challenges, we have had to rethink our strategy. The initial plan had been to digitise the initial set of pieces and then develop the automation tools required to be able to digitise the rest of the public domain. However, owing to the challenges listed above, it is now clear that the automation tools are required even for the initial set of works.

What about the Lieder Corpus?

The Lieder Corpus project successfully digitised 250 pieces in the space of 4 months. It was able to do so because most of the above challenges did not apply. Pretty much all of the lieder pieces consisted of just a voice and a piano, so we were able to provide transcribers with a single template score that met the guidelines and covered all of the pieces. Also, the Lieder Corpus pieces were all roughly 1 - 3 pages of music, so we could simply give all lieder contributors the same reward: one month of MuseScore Pro for every completed transcription. The majority of the OpenScore pieces are much longer, and have a varying density of music on each page, so even the task of assigning a reward value to each piece is significant, let alone checking the scores for accuracy and consistency with the guidelines.

What next?

Over the last few months we have been working on a set of automation tools designed to make the task easier for transcribers and reviewers. One of these tools allows us to join scores together much more reliably than we could using the albums feature built-in to MuseScore. This enables us to split larger scores among multiple transcribers, with each transcriber working on a different section, and then join all the sections together at the end using the new tool. Another tool checks scores against the guidelines, reporting any problems and giving instructions on how to fix them. We will be making this tool available via a web interface, so that transcribers can upload scores and get feedback without waiting for a manual review. This in turn will mean less work for reviewers, enabling us to get through pieces much faster, while still maintaining a high standard of quality.

These changes mean that we have had to push the target date for delivery back to February 2019, by which time we hope to have complete transcriptions for all but the very longest of pieces. We will be attending the FOSDEM open source event on the 2nd and 3rd of February 2019, where we will give a presentation about the project and unveil the completed collection. If you can make it to Brussels that weekend then please come and join us to talk about all things OpenScore!