al-Thurayya: an Islamic Supplement to Pleiades


With Teams Pelagios and Pleiades—in alphabetical order: Elton Barker, Tom Elliot, Leif Isaksen, Rainer Simon—visiting Tufts University within the framework of the Perseids Named Entity Hackathon (organized and led by Bridget Almas) at the Perseus Project, we had a chance to test how their systems work with Arabic texts.1 Pelagios offers a convenient workflow for “geographical” reading of texts, which consists of two main steps: first, one tags places that occur in the text, then one “geo-resolves” tagged places into geographical locations that get displayed on an interactive geographical map (For more details, see Pelagios Website). The first step is smooth and easy and works nice for texts in any language as long as it is provided in Unicode. The second step depends on the availability of relevant gazetteers, to which Pelagios is or can be connected. Thus, Pelagios does a great job when it comes to the “geo-resolution” of toponyms included into Pleiades, which now has almost 35,000 places from the Ancient world. Since there is no gazetteer for the classical Islamic world, “geo-resolution” of classical Arabic sources is problematic at the moment. A gazetteer for the Islamic world is badly needed in general.

As is the case with a creation of any database, creating a gazetteer is an extremely time-consuming task. The key seems to be in generating a snowball effect: creating enough database entries that would encourage a community of potentially interested individuals to start contributing to an already substantial databank by offering new data, references, corrections and additions. Pleiades has successfully used this model. Having incorporated content from such extensive editions as “Digital Atlas of Roman and Medieval Civilizations” (DARMC) and “Barrington Atlas of the Greek and Roman World” (BAGRW), Pleiades offered a significant foundation for potential users to contribute to. It seems only logical to follows in the footsteps of such a successful project as Pleiades, and to use their infrastructure for developing an Islamic gazetteer, which will feature in Pleiades as al-Thurayya: a Supplement for the Islamic World. (In this light, the name al-Thurayya, Arabic for Pleiades, seems quite appropriate; Tom Elliot, one of the managing editors of Pleiades, will be providing support for the integration of al-Thurayya into Pleiades.)

Continue reading

  1. For more details, see Marie-Claire Beaulieu’s post on Perseids Website. []

Comments Off

Filed under Maps

Dissertation online: Computational Reading of Arabic Biographical Collections


My dissertation—“Computational Analysis of Arabic Biographical Collections with Special Reference to Preaching in the Sunnī World (661-1300 CE)”—is now available online through the digital library @ the University of Michigan. Even with very extensive Appendices, several thousand graphs and maps still did not make it into the dissertation. Hopefully, if I can find enough time, I will make an online appendix with the visualizations of all generated data that consists mainly of chronological graphs of “descriptive names” and chronological maps that show how their geographies were changing over time (all based on “The History of Islam” of al-Dhahabī (d. 1348)). Continue reading

Comments Off

Filed under Dissertation Appendices, Graphs, Maps, Research

Toward Abstract Models for Islamic History

NB: the paper has been presented @ Digital Humanities and Islamic & Middle Eastern Studies, Brown University, Providence, RI (October 24-25, 2013); the video recording of the presentation is available @ > Day One (timestamp of the presentation 2:48:00; Q&A: 3:51:30); the entire paper is also available as a PDF; comments are welcome @

All models are false, but some are useful
George P. Box

Why Models?

The advent of digital humanities has brought the notion of ‘‘big data” into the purview of humanistic inquiry. Humanists now have access to huge corpora that open research possibilities that were unthinkable a decade or two ago. However, working with corpora requires a rather different approach that is more characteristic of sciences than humanities. Namely, one has to be transparent and explicit with regard to how data are extracted and how they are analyzed. Text-mining techniques rely on explicit algorithms because they help tracing mistakes, correcting them and, ultimately, improving results.1 Analytical procedures for studying extracted data rest on explicit algorithms for the same reason. As a way of constructing algorithms, modeling is part and parcel of developing complex computational procedures.

Working with big data also requires a different kind of modeling. Opting for the breadth of data we have to give up the richness of details. Close reading—to which humanists are most accustomed—becomes impossible.2 Working with big data one cannot maintain the nuanced complexity of details that became the hallmark of close reading as an approach. Instead of relying on complex textual evidence and reading between the lines one has to work with relatively simple textual markers—essentially, words or simple phrases—that are treated as indicators of large trends. Yet, it is through such analysis that we can look into long-term and large-scale processes that will always remain beyond the scope of close reading. The literary historian Franco Moretti dubbed such an approach ‘‘distant reading, ” explaining ‘‘distance” not as an obstacle, but a specific form of knowledge.3 With emphasis on fewer elements that allows us to get a sharper sense of their overall interconnection, we can distinguish shapes, relations, structures. Most importantly, we can trace small changes over long periods of time.

Modeling is an important part of this approach. With models we simplify reality down to a limited number of factors4 through the analysis of which we hope to get insights into complex processes.5 This simplification is the reason why all models are false. Yet, models are a valuable and powerful tool. They pave the way to improving our understanding of the world. Unlike theories, models are experimental and driven by data. Good models offer invaluable glimpses into the subjects of our inquiry.6 With them we can explore, explain, project. With them we can get a big picture. That is why some models are useful.

What follows is an attempt to model Islamic élites based on the data from al-Dhahabī’s (d. 748/1348 CE) Taʾrīkh al-islām in order to explore major social transformations that the Muslim community underwent in the course of almost seven centuries of its history. The main types of data used in the model are dates, toponyms,7 linguistic formulae (or, wording patterns), synsets (lists of words that point to a specific concept or entity), and, most importantly, ‘‘descriptive names” (sing. nisba).

The detailed discussion of main assumptions regarding these types of data as well as the discussion of such general issues relevant to the study of Arabic biographical collections can be found elsewhere.8 Here it is most important to dwell on our assumptions regarding ‘‘descriptive names” that are regarded by some scholars as the most valuable kind of data that literary sources offer to the social historian of the Islamic world, and by others as highly problematic as such. The major problem with nisbas is that it is not always clear what they stand for. For example, if an individual is described in a biographical collection as ṣaffār, does this actually mean that he was involved in ‘‘copper smithing”? When our subject is just one particular individual, it is not so difficult to establish the more or less exact meaning of this descriptive name by cross-examining biographies of this individual in other biographical collections. This is particularly easy now when dozens of electronic texts of biographical collections are just few mouse-clicks away. However, such an approach becomes problematic when this rather time-consuming procedure has to be repeated for dozens of individuals. The approach becomes particularly difficult if our goal is to study some biographical collection in its entirety, since Arabic biographical collections often contain thousands of biographies and most biographies offer multiple descriptive names for the same individual. After a certain threshold it becomes impossible to apply this approach at all. Our source, Taʾrīkh al-islām, is well beyond this threshold. In the analysis that will follow, we will deal with the dataset of almost 70, 000 nisbas (with about 700 unique ones) that represent about 26, 000 individuals over the period of 41-700/1301 CE. Working with such a dataset one cannot possibly know the exact meaning of each and every nisba. At the same time we do not have any solid foundation to argue that descriptive names are to be treated in a particular manner, or to be discarded altogether. Yet, such a dataset is too unique an opportunity for research to ignore simply because we are not entirely sure what all these data mean. This is where modeling offers an optimal solution: we need to start with assumptions and be upfront about them. In what follows, descriptive names will be treated at their face value, if only because this is the most logical starting point.9

Continue reading

  1. For more details, see Chapter 1 in Romanov (2013). []
  2. While most humanists remain skeptical in regard to working with big data, the number of studies that show that close reading alone is not enough keeps on growing. They emphasize that case studies based on close reading do not allow for extrapolations; that humanists are prone to putting too much effort into studying objects that are unique and for this reason are least likely to represent larger trends. Most vivid examples can be found in the field of literary history, see, e.g., Moretti (2007), Moretti (2013) and Jockers (2013). []
  3. See, Moretti (2007), p. 4. []
  4. For example, Morris (2013) uses the size of the largest urban center as an indicator of the social development of a region to which it belongs. Bulliet (1979) uses onomastic data as the indicator of conversion. []
  5. For valuable examples of modeling ‘‘big data, ” see: Moretti (2007), Morris (2013); also see for the geographical model of the Roman world, developed by Walter Scheidel and Elijah Meeks. In the field of Islamic studies: Bulliet (1979). []
  6. Bulliet’s model of conversion is a great example of this. The very fact that this study is still criticized after more than three decades from its publication shows that a solid model cannot be discarded through a critique of where it fails, if otherwise it still remains plausible and coherent. For the most recent critique, see: Wasserstein (2013). []
  7. Both toponyms proper and toponymic nisbas linked with relevant toponyms. Toponymic data is crucial for our understanding of the social geography of the classical Islamic world. For my modeling of the geography of the Islamic world based on the data from Taʾrīkh al-islām see, Romanov (2013), p. 35-37, 41-42, 87-113. []
  8. Romanov (2013), p. 28-51. []
  9. For a detailed discussion, see Romanov (2013), p. 43-46. []

Comments Off

Filed under Working Papers

Webcast of: Digital Humanities + Islamic & Middle East Studies @ Brown University

The webcast of the entire conference is available online @ Brown University servers. Below is the program of the conference with timestamps for each presentation and discussion session. Continue reading

Comments Off

Filed under Events

Digital Humanities + Islamic & Middle East Studies @ Brown University (October 24-25)

On October 24-25, Brown University will be hosting an international Conference on the Digital Humanities + Islamic & Middle East Studies. The complete program may be found at the conference website, where you may also view a live web-cast of the presentations. Continue reading

Comments Off

Filed under Events

Prospects of Computational Reading

My dissertation, Computational Reading of Arabic Biographical Collections with Special Reference to Preaching in the Sunni World (661-1300 ce), turned out to be more on the method of computational reading rather than on anything else, but the results are most exciting in terms of prospects that this method opens. After less than two years of development this method allows getting almost instantaneous insights into a great number of historical issues. Although technologically the approach has been developed practically from scratch, in spirit it follows in the footsteps of the quantitative method that has been used by the scholars of Islam since the 1970s. In its current state the method is best suited for analyzing biographical data from social, chronological and geographical perspectives, yet the complexity of analytical tasks can be increased ad infinitum. Computational reading is flexible, scalable and fast beyond comparison with conventional methods. Dwelling on these properties should offer a glimpse into the prospects of its further implementation.

Continue reading

Comments Off

Filed under Dissertation Appendices, Graphs

Digital Humanities in Middle East Studies (MESA, New Orleans, October 11)

We are just about two weeks away from the 47th Annual Meeting of the Middle East Studies Association (October 10-13, 2013) that will take place in New Orleans, Louisiana. We—Will Hanley, Børre Ludvigsen, and Maxim Romanov—are glad to present two panels and a roundtable on Digital Humanities in Middle Eastern Studies that will take place on Friday, October 11.

Continue reading

Comments Off

Filed under Events

50 Seconds of Islamic History, Versions 2.x

Version 2.2: Regions and Urban Centers (scaled down)

Versions 2.x are based on the data set of about 330 toponyms—all place-names that occur 5 and more times in al-Dhahabī’s Taʾrīkh al-islām. Expanding the list of toponyms  did not significantly affect the number of biographies, which increased by slightly over a thousand: ~13,970 biographies in Versions 2.x versus ~12,850 biographies in Versions 1.x. On the other hand, now it is almost 50% of all the biographies from the considered section of this biographical collection (vols. 4-52:  ~29,000 biographical records).

Cities, their quarters and suburbs are now merged into metropolitan areas (e.g., Baghdad, Nishapur, Damascus, Cairo, Cordoba, etc.). Provinces (not shown separately) and urban centers (black color) are grouped into regions (firebrick color), which makes it easier to see changes on the regional level. In general, regions correspond to major provinces of the Islamic world, but in some cases they include more than one. At the moment grouping is done purely mathematically through the comparison of distances between urban centers and possible central points of regions. This resulted in some minor distortions, but will be fixed in the following versions. (For example, Basra happened to be grouped with the province of al-Ahwāz/Khūzistān, instead of al-ʿIrāq, to which it belongs.)

Continue reading

Comments Off

Filed under Dissertation Appendices, Graphs, Maps

Review/Manual: The Encyclopaedia of Arabic Poetry by Cultural Foundation, Abu Dhabi (UAE)

Prepared in cooperation with Professor Michael Bonner

In 1997, Cultural Foundation in Abu Dhabi ( started development of an extremely useful multimedia Encyclopaedia of [Arabic] Poetry (al-Mawsūʿa al-Shiʿriyya). The first edition (published in 1998) was pretty small and included only about 180,000 bayts by 88 poets together with the electronic version of Ibn Manẓūr’s Lisān al-ʿArab. The second edition (published in 2001) was much more impressive, containing more than 1,800,000 bayts by more than 1,000 poets as well as 46 additional sources of reference character.

The third edition, which will be reviewed in details in what follows, significantly surpassed earlier versions: it contains 2,439,589 bayts by 2,300 poets, and is supplemented with 265 books on different aspects Arabic language and literature (al-lugha wa-l-adab) and 10 major Arabic dictionaries. Chronologically, it includes every piece of Arabic poetry from pre-Islamic era until nowadays (kullu ma qīla min al-shiʿr al-ʿarabī mundhu mā qabla l-islām wa-ḥattā l-ʿaṣr al-ḥadīth) — everything for poets who died before 1953, and after that date — only dīwāns of most important poets (ahammu l-shuʿarāʾ). The entire program fits into only 1 CD.

Continue reading

Comments Off

Filed under Manuals, Reviews, Teaching Projects

Transliterating Arabic in MS Word

The auto-correct function in MS Word can make life easier when transliterating Arabic. I have been using a template file with main symbols assigned to simple combinations for quite a while and some of my fellow graduate students found this method very convenient. It is quite simple: create a new file with the attached template (see below)… type commas before and after the letter which is to be transliterated (for capital letters type letters twice). Some major terms are also added to this auto-correct template, e.g. Quran > Qurʾān, hadith > ḥadīth, madhhab > madhhab, etc. The list can always be expanded if necessary.

ʿ = ,`,		Ā = ,aa,	Ḍ = ,dd,	Ṭ = ,tt,	Ū = ,uu,
ʾ = ,/,		ā = ,a,		ḍ = ,d,		ṭ = ,t,		ū = ,u,
Ī = ,ii,	Ṣ = ,ss,	Ḥ = ,hh,	Ẓ = ,zz,
ī = ,i,		ṣ = ,s,		ḥ = ,h,		ẓ = ,z,

Download: MS Word Transliteration Template (MS Windows)

Comments Off

Filed under Workflow