Bijankhan Corpus

The Bijankhan corpus (Persian: پیکرهٔ بی‌جن‌خان‎) is a tagged corpus that is suitable for natural language processing (NLP) research on the Persian language. This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural, etc.; in about 4300 different subject categories. The corpus contains about 2.6 million manually tagged words with a tag set that contains 550 Persian part-of-speech tags.

Bijankhan Corpus

The Bijankhan corpus (Persian: پیکرهٔ بی‌جن‌خان‎) is a tagged corpus that is suitable for natural language processing (NLP) research on the Persian language. This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural, etc.; in about 4300 different subject categories. The corpus contains about 2.6 million manually tagged words with a tag set that contains 550 Persian part-of-speech tags.