Our collection of authentic WhatsApp chats is the basis of our research and thus of the whole project. They were  gathered in summer 2014, when we asked the Swiss population to donate their WhatsApp chats to science.

At the moment, we are busy aggregating the data into a corpus that can be used for linguistic research. In the course of this process, the size of the corpus will be adjusted, e.g. because in some cases two communication partners sent in the same chat. The figures presented here are thus extremely tentative. 

  • Number of chats: ~617
  • Number of messages (with permission to be used): ~750'000
  •  Number of tokens: ~5.5 Mio
  • Number of emojis: ~350'000

The corpus is freely available for academic, non-commercial research.

More information about the corpus can be found in the publication "What’s up, Switzerland? A corpus-based research project in a multilingual country".