al-Thurayyā Gazetteer Ver. 02

view in full screen

This is our first usable demo of al-Thurayyā Gazetteer. Currently it includes over 2,000 toponyms and almost as many route sections georeferenced from Georgette Cornu’s Atlas du monde arabo-islamique à l’époque classique: IXe-Xe siècles (Leiden: Brill, 1983). The gazetteer is searchable (upper left corner), although English equivalents are not yet included; in other words, look for Dimashq/دمشق, not Damascus.

You can browse the Gazetteer by clicking on any toponym marker. The popup will show the toponym both in Arabic script and transliterated. We are using a slightly modified transliteration system that facilitates conversion between fully transliterated, transliterated, and Arabic forms of toponyms. It should be easily understandable. There may be typos, because of the nature of how the data has been generated, so please, let us know if something should be corrected. The popup also offers a selection of possible sources on a toponym in question. You can check Arabic Sources: currently, al-Samʿānī’s Kitāb al-ansāb and Yāqūt’s Muʿjam al-buldān. Currently, the Gazetteer will only check for exact matches, which means that in some cases there will not be any entry at all, while in other cases there may be more than one and they may refer to other places with the same name. Improving the precision of this lookup is on our to-do list. You can also check if there is information on a toponym in question in Brill’s Encyclopaedia of Islam, Pleiades, and Wikipedia.

Credits & Acknowledgments

Many thanks to Adam Tavares (programmer @ Perseus Project, Tufts) and, particularly, Cameron Jackson (senior, double-majoring in Arabic and Computer Science, Tufts) for the technical development; to Vickie Sullivan (Chair, Classics Department), Gregory Crane and the entire Perseus team on the both sides of the Atlantic for support and inspiration.

Islamic Urban Centers (661–1300 CE)

Back to al-Dhahabī’s Ta’rīkh al-islām. The present dynamic cartogram shows how the prominence of major urban centers was changing over time. The focus is again on “descriptive names” (nisba) and the “size” of each urban center on the cartogram reflects the number of individuals with “descriptive names” that refer to that urban center. A “prominent center” in the current dataset is a place with which at least 10 individuals from Ta’rīkh al-islām are associated1 (the overall number of individuals in the current dataset is slightly over 29,000 for the period of 661–1300 CE). Each frame features the names of the top 15 urban centers (the largest among them gradually change their hue from green to red).

Continue reading

  1. The nature of nisbas is not unproblematic and anyone who has worked with biographical collections is likely to object saying that, for example, not every individual identified as “al-Madanī” was actually a Medinan; besides there definitely are Medinans who are not identified as such with this specific toponymic nisba, not to mention that the “descriptive name” al-Madanī (and its variation al-Madīnī) may refer to urban centers other than Medina. (See, for example, al-Samʿānī (1998), 5:235–239.) While such objections are not invalid, at this point of our knowledge and understanding of the overabundant biographical data from Arabic sources we simply do not know to what extent the presence of false positives (i.e., Madanīs who have nothing to do with Medina) and the absence of false negatives (i.e., the Medinans who are not identified as Madanīs) actually affects the overall picture. Working with big data requires some clearly identified methodological assumptions regarding the types of data used in modeling. My computational analysis of data from the Ta’rīkh al-islām yields about 700 unique nisbas (with over 300 toponymic ones) that identify at least 10 different individuals, while the overall number of these nisbas runs into over 70,000 instances, considering that individuals are often described with more than one nisba. While 70,000 data points can hardly be called “big data” by any scientific standards, this dataset is too big to make exact identification of each and every nisba possible. Thus, under these circumstances, treating nisbas at their face values is simply the most logical way to begin large scale analysis of biographical data from Arabic sources; as our knowledge about the “behavior” of nisbas in biographical collections improves—and this can be achieved only through large-scale exploratory analysis—these methodological assumptions can and will be adjusted. For the detailed discussion of methodological assumptions see, Romanov (2013), 28–40.  []

al-Thurayya: an Islamic Supplement to Pleiades


With Teams Pelagios and Pleiades—in alphabetical order: Elton Barker, Tom Elliot, Leif Isaksen, Rainer Simon—visiting Tufts University within the framework of the Perseids Named Entity Hackathon (organized and led by Bridget Almas) at the Perseus Project, we had a chance to test how their systems work with Arabic texts.1 Pelagios offers a convenient workflow for “geographical” reading of texts, which consists of two main steps: first, one tags places that occur in the text, then one “geo-resolves” tagged places into geographical locations that get displayed on an interactive geographical map (For more details, see Pelagios Website). The first step is smooth and easy and works nice for texts in any language as long as it is provided in Unicode. The second step depends on the availability of relevant gazetteers, to which Pelagios is or can be connected. Thus, Pelagios does a great job when it comes to the “geo-resolution” of toponyms included into Pleiades, which now has almost 35,000 places from the Ancient world. Since there is no gazetteer for the classical Islamic world, “geo-resolution” of classical Arabic sources is problematic at the moment. A gazetteer for the Islamic world is badly needed in general.

As is the case with a creation of any database, creating a gazetteer is an extremely time-consuming task. The key seems to be in generating a snowball effect: creating enough database entries that would encourage a community of potentially interested individuals to start contributing to an already substantial databank by offering new data, references, corrections and additions. Pleiades has successfully used this model. Having incorporated content from such extensive editions as “Digital Atlas of Roman and Medieval Civilizations” (DARMC) and “Barrington Atlas of the Greek and Roman World” (BAGRW), Pleiades offered a significant foundation for potential users to contribute to. It seems only logical to follows in the footsteps of such a successful project as Pleiades, and to use their infrastructure for developing an Islamic gazetteer, which will feature in Pleiades as al-Thurayya: a Supplement for the Islamic World. (In this light, the name al-Thurayya, Arabic for Pleiades, seems quite appropriate; Tom Elliot, one of the managing editors of Pleiades, will be providing support for the integration of al-Thurayya into Pleiades.)

Continue reading

  1. For more details, see Marie-Claire Beaulieu’s post on Perseids Website. []

Dissertation online: Computational Reading of Arabic Biographical Collections


My dissertation—“Computational Analysis of Arabic Biographical Collections with Special Reference to Preaching in the Sunnī World (661-1300 CE)”—is now available online through the digital library @ the University of Michigan. Even with very extensive Appendices, several thousand graphs and maps still did not make it into the dissertation. Hopefully, if I can find enough time, I will make an online appendix with the visualizations of all generated data that consists mainly of chronological graphs of “descriptive names” and chronological maps that show how their geographies were changing over time (all based on “The History of Islam” of al-Dhahabī (d. 1348)). Continue reading

Toward Abstract Models for Islamic History

NB: the paper has been presented @ Digital Humanities and Islamic & Middle Eastern Studies, Brown University, Providence, RI (October 24-25, 2013); the video recording of the presentation is available @ > Day One (timestamp of the presentation 2:48:00; Q&A: 3:51:30); the entire paper is also available as a PDF

All models are false, but some are useful
George P. Box

Why Models?

The advent of digital humanities has brought the notion of ‘‘big data” into the purview of humanistic inquiry. Humanists now have access to huge corpora that open research possibilities that were unthinkable a decade or two ago. However, working with corpora requires a rather different approach that is more characteristic of sciences than humanities. Namely, one has to be transparent and explicit with regard to how data are extracted and how they are analyzed. Text-mining techniques rely on explicit algorithms because they help tracing mistakes, correcting them and, ultimately, improving results.1 Analytical procedures for studying extracted data rest on explicit algorithms for the same reason. As a way of constructing algorithms, modeling is part and parcel of developing complex computational procedures.

Working with big data also requires a different kind of modeling. Opting for the breadth of data we have to give up the richness of details. Close reading—to which humanists are most accustomed—becomes impossible.2 Working with big data one cannot maintain the nuanced complexity of details that became the hallmark of close reading as an approach. Instead of relying on complex textual evidence and reading between the lines one has to work with relatively simple textual markers—essentially, words or simple phrases—that are treated as indicators of large trends. Yet, it is through such analysis that we can look into long-term and large-scale processes that will always remain beyond the scope of close reading. The literary historian Franco Moretti dubbed such an approach ‘‘distant reading, ” explaining ‘‘distance” not as an obstacle, but a specific form of knowledge.3 With emphasis on fewer elements that allows us to get a sharper sense of their overall interconnection, we can distinguish shapes, relations, structures. Most importantly, we can trace small changes over long periods of time.

Modeling is an important part of this approach. With models we simplify reality down to a limited number of factors4 through the analysis of which we hope to get insights into complex processes.5 This simplification is the reason why all models are false. Yet, models are a valuable and powerful tool. They pave the way to improving our understanding of the world. Unlike theories, models are experimental and driven by data. Good models offer invaluable glimpses into the subjects of our inquiry.6 With them we can explore, explain, project. With them we can get a big picture. That is why some models are useful.

What follows is an attempt to model Islamic élites based on the data from al-Dhahabī’s (d. 748/1348 CE) Taʾrīkh al-islām in order to explore major social transformations that the Muslim community underwent in the course of almost seven centuries of its history. The main types of data used in the model are dates, toponyms,7 linguistic formulae (or, wording patterns), synsets (lists of words that point to a specific concept or entity), and, most importantly, ‘‘descriptive names” (sing. nisba).

The detailed discussion of main assumptions regarding these types of data as well as the discussion of such general issues relevant to the study of Arabic biographical collections can be found elsewhere.8 Here it is most important to dwell on our assumptions regarding ‘‘descriptive names” that are regarded by some scholars as the most valuable kind of data that literary sources offer to the social historian of the Islamic world, and by others as highly problematic as such. The major problem with nisbas is that it is not always clear what they stand for. For example, if an individual is described in a biographical collection as ṣaffār, does this actually mean that he was involved in ‘‘copper smithing”? When our subject is just one particular individual, it is not so difficult to establish the more or less exact meaning of this descriptive name by cross-examining biographies of this individual in other biographical collections. This is particularly easy now when dozens of electronic texts of biographical collections are just few mouse-clicks away. However, such an approach becomes problematic when this rather time-consuming procedure has to be repeated for dozens of individuals. The approach becomes particularly difficult if our goal is to study some biographical collection in its entirety, since Arabic biographical collections often contain thousands of biographies and most biographies offer multiple descriptive names for the same individual. After a certain threshold it becomes impossible to apply this approach at all. Our source, Taʾrīkh al-islām, is well beyond this threshold. In the analysis that will follow, we will deal with the dataset of almost 70, 000 nisbas (with about 700 unique ones) that represent about 26, 000 individuals over the period of 41-700/1301 CE. Working with such a dataset one cannot possibly know the exact meaning of each and every nisba. At the same time we do not have any solid foundation to argue that descriptive names are to be treated in a particular manner, or to be discarded altogether. Yet, such a dataset is too unique an opportunity for research to ignore simply because we are not entirely sure what all these data mean. This is where modeling offers an optimal solution: we need to start with assumptions and be upfront about them. In what follows, descriptive names will be treated at their face value, if only because this is the most logical starting point.9

Continue reading

  1. For more details, see Chapter 1 in Romanov (2013). []
  2. While most humanists remain skeptical in regard to working with big data, the number of studies that show that close reading alone is not enough keeps on growing. They emphasize that case studies based on close reading do not allow for extrapolations; that humanists are prone to putting too much effort into studying objects that are unique and for this reason are least likely to represent larger trends. Most vivid examples can be found in the field of literary history, see, e.g., Moretti (2007), Moretti (2013) and Jockers (2013). []
  3. See, Moretti (2007), p. 4. []
  4. For example, Morris (2013) uses the size of the largest urban center as an indicator of the social development of a region to which it belongs. Bulliet (1979) uses onomastic data as the indicator of conversion. []
  5. For valuable examples of modeling ‘‘big data, ” see: Moretti (2007), Morris (2013); also see for the geographical model of the Roman world, developed by Walter Scheidel and Elijah Meeks. In the field of Islamic studies: Bulliet (1979). []
  6. Bulliet’s model of conversion is a great example of this. The very fact that this study is still criticized after more than three decades from its publication shows that a solid model cannot be discarded through a critique of where it fails, if otherwise it still remains plausible and coherent. For the most recent critique, see: Wasserstein (2013). []
  7. Both toponyms proper and toponymic nisbas linked with relevant toponyms. Toponymic data is crucial for our understanding of the social geography of the classical Islamic world. For my modeling of the geography of the Islamic world based on the data from Taʾrīkh al-islām see, Romanov (2013), p. 35-37, 41-42, 87-113. []
  8. Romanov (2013), p. 28-51. []
  9. For a detailed discussion, see Romanov (2013), p. 43-46. []

Prospects of Computational Reading

My dissertation, Computational Reading of Arabic Biographical Collections with Special Reference to Preaching in the Sunni World (661-1300 ce), turned out to be more on the method of computational reading rather than on anything else, but the results are most exciting in terms of prospects that this method opens. After less than two years of development this method allows getting almost instantaneous insights into a great number of historical issues. Although technologically the approach has been developed practically from scratch, in spirit it follows in the footsteps of the quantitative method that has been used by the scholars of Islam since the 1970s. In its current state the method is best suited for analyzing biographical data from social, chronological and geographical perspectives, yet the complexity of analytical tasks can be increased ad infinitum. Computational reading is flexible, scalable and fast beyond comparison with conventional methods. Dwelling on these properties should offer a glimpse into the prospects of its further implementation.

Continue reading

Digital Humanities in Middle East Studies (MESA, New Orleans, October 11)

We are just about two weeks away from the 47th Annual Meeting of the Middle East Studies Association (October 10-13, 2013) that will take place in New Orleans, Louisiana. We—Will Hanley, Børre Ludvigsen, and Maxim Romanov—are glad to present two panels and a roundtable on Digital Humanities in Middle Eastern Studies that will take place on Friday, October 11.

Continue reading

50 Seconds of Islamic History, Versions 2.x

Version 2.2: Regions and Urban Centers (scaled down)

Versions 2.x are based on the data set of about 330 toponyms—all place-names that occur 5 and more times in al-Dhahabī’s Taʾrīkh al-islām. Expanding the list of toponyms  did not significantly affect the number of biographies, which increased by slightly over a thousand: ~13,970 biographies in Versions 2.x versus ~12,850 biographies in Versions 1.x. On the other hand, now it is almost 50% of all the biographies from the considered section of this biographical collection (vols. 4-52:  ~29,000 biographical records).

Cities, their quarters and suburbs are now merged into metropolitan areas (e.g., Baghdad, Nishapur, Damascus, Cairo, Cordoba, etc.). Provinces (not shown separately) and urban centers (black color) are grouped into regions (firebrick color), which makes it easier to see changes on the regional level. In general, regions correspond to major provinces of the Islamic world, but in some cases they include more than one. At the moment grouping is done purely mathematically through the comparison of distances between urban centers and possible central points of regions. This resulted in some minor distortions, but will be fixed in the following versions. (For example, Basra happened to be grouped with the province of al-Ahwāz/Khūzistān, instead of al-ʿIrāq, to which it belongs.)

Continue reading