Our collection of authentic WhatsApp chats is the basis of our research and thus of the whole project. They were  gathered in summer 2014, when we asked the Swiss population to donate their WhatsApp chats to science.

At the moment, we are busy aggregating the data into a corpus that can be used for linguistic research. In the course of this process, the size of the corpus will be adjusted, e.g. because in some cases two communication partners sent in the same chat. The figures presented here are thus extremely tentative. 

  • Number of chats: ~617
  • Number of messages (with permission to be used): ~750'000
  •  Number of tokens: ~5.5 Mio
  • Number of emojis: ~350'000

The corpus will be available to other researchers when the project comes to its end, i.e. after March 2020. 

More information about the corpus can be found in the publication "What’s up, Switzerland? A corpus-based research project in a multilingual country".