SPRAP

The Spanish and Portuguese Research Apprenticeship Program (SPRAP), allows undergraduate students to participate in research under the mentorship of senate faculty, non-senate faculty, and graduate students in the Department of Spanish and Portuguese. In parallel with the campus-wide Undergraduate Research Apprenticeship Program (URAP), participation in SPRAP is beneficial to both parties. Senate faculty, non-senate faculty, and graduate students will benefit from the work that undergraduate students provide for their research project/task and from the opportunity to train and mentor undergraduate students. Participating undergraduate students will benefit from acquiring professional experience, research skills, and knowledge. Participating Students will also be enrolled in 1-4 units of Spanish 197 on a P/NP grading basis. 

Current Projects

The Archive of Latine Feelings

Sponsor: Dr. Raúl Coronado

How can we write a history of Latine feelings? How can private writing give us access to how Mexicans in the Southwest thought about their feelings, their interiority, their sense of self? Knowing more about this can give us a better sense of two things: how have Latine communities expressed their emotions? Are there patterns in how they’ve expressed emotions? Or are there beginnings and endings—a time when they expressed emotions in a certain way but that no longer exists for us today? A more specific question is: how have Latines expressed beauty, admiration, serenity, belonging, peace, equanimity? When have they felt these positive emotions?


One way we can begin to answer these questions is by turning to the history of nineteenth-century Spanish-Mexican private writing. By looking at their private writing—correspondence between loved ones, notebooks, diaries, etc.—we can get a glimpse into the history of Latine feelings.
Professor Coronado has spent many years collecting archival materials—private Latine writing from the 1800s held by various archival repositories. He’s scanned these documents and turned them into PDFs. Now, we are ready to turn these PDFs into OCRable scans. This means that we will convert handwritten letters, most of which are in cursive writing from the 1800s, into a document that is searchable. By scanning them, we can then use AI to do large scale searches for words that will allow us to answer our questions. This is a game changer.


We will be working with the Digital Humanities Lab. Previous SPRAP apprentices have prepared the PDF’s to be OCR’ed. The D-Lab then used AI to provide a transcription of the handwriting. We are ready for the next big step! We are now going to compare the transcriptions to the originals.
We will look at the transcriptions created by AI and check for accuracy. Eventually, we will create a database that tracks the language of feelings.

PhiloBiblon: A Database on Medieval Iberian Literature

Sponsor: Dr. Óscar Perea-Rodríguez

PhiloBiblon is a database for the study of the Romance literatures and cultures of medieval and early modern Iberia (Spanish, Catalan, Portuguese, and Galician). It currently contains over 420,000 records thanks to forty years of work by dedicated volunteers in the U.S., Spain, and Portugal. It has become an indispensable resource for Hispanists because of its comprehensive coverage of primary sources, both manuscript and printed, the texts they contain, the individuals and institutions involved with the production and transmission of those sources and texts, the libraries holding them, relevant secondary references, and authority files for persons, places, and institutions.


Now that the process of migrating all data from a siloed database system to a Linked Open Data via FactGrid, we are excited to offering to UC Berkeley students the possibility of joining us towards the take-off of the new database via

  1. checking the accuracy of registers on spreadsheets;
  2. incorporating new data entry in our database;
  3. spreading the progress of the project through both social networks and our own Youtube channel.

All tasks would be performed completely remote, so that you would be able to manage your schedule at your convenience, provided that you fulfill the agreed amount of units. If you have any previous experience in social networks as a community manager or content creator, please specify this fact in your application. Similarly, if you have any previous knowledge of data science, programming experience, or HTML language, please specify it in your application. We all are looking forward to welcoming you in the pioneer Digital Humanities venture applied to Medieval Iberian studies, a project established at UC Berkeley during the last four decades!

Venezuelan Phonetics

Sponsor: Nikolai Schwarz

Venezuelan Spanish has been severely underrepresented in the Hispanic linguistics literature. Though various impressionistic studies have been published, these often only scratch the surfaces of its linguistic processes. Pertinent to this project, Venezuelan Spanish often describes the /f/ phoneme surfacing as a bilabial fricative, rather than labiodental. However, this project examines a different variant of /f/ where it becomes a stop. To examine this, I will collect both video and audio data to assess the timing and acoustic characteristics of this variant. This project aims to contribute to an advanced description of Venezuelan Spanish, as well as new phonetic variables. Assistants on this project will aid in the extraction and marking of audio and visual data. Additionally, the students will gain a greater knowledge of phonetic research and the tools that can be used to undertake it.

Ladino / Judeo-Spanish Phonetics Infrastructure Development

Sponsor: Julian Vargo

Ladino is a non-standardized and diasporic Ibero-Romance language spoken across North America and the Mediterranean, mostly used by the Sephardic Jewish community. In the recent century, the Ladino-speaking community has been subject to several sociopolitical issues across the world, such as Turkey’s ‘Tevhid-i Tedrisat Kanunu,’ Israel’s ‘Melting Pot Policy’ and the Holocaust (Bey 1924, Gorny 2001). These events have led to further fragmentation of the speaker community and, subsequently, a lowered vitality of the language, with very low levels of intergenerational transmission. According to Ethnologue (2023), the language is spoken in Jerusalem, Sarajevo, Thessaloniki, Istanbul, Izmir, and Morocco, and online Ladino speakers (source: LadinoKomunita) have reported being from the United States, Canada, Macedonia, and Bulgaria, amongst other countries. A number of revitalization efforts have taken place in recent decades to increase language use, such as the development of the newspaper El Amaneser (est. 2005), a popular online forum LadinoKomunita (est. 1999), and Ladino language classes. After the COVID-19 pandemic, this highly diasporic language saw increased usage online, allowing speakers from different regions to more easily communicate with one another (Peck 2023). Despite Ladino’s small yet quickly growing renaissance, only a small number of studies (Sala 1971, Hualde & Şaul 2011, Bradley & Delforge 2006) have actually investigated the phonetic minutiae of the language (i.e., detailing the sounds of the language and what makes a Ladino ‘accent’). Thus, this project aims to build the infrastructure required to perform phonetic analysis on the language, setting up scholars interested in Judeo-Spanish for success in their future studies. Students working on this project will be expected to:

  1. Gather a list of all high-quality audio from various online sources (i.e., YouTube videos, archived recordings, etc.) and transcribe every sound from these recordings in Praat (Boersma & Weenink 2024).
  2. Develop the first online Ladino pronunciation dictionary, similar to the format of Carnegie Mellon’s Pronunciation Dictionary, which will contain several thousand Ladino words and their respective phonemic transcriptions. Words will be sourced from existing Ladino dictionaries, online Ladino texts, and transcripts of Ladino Interviews provided by the Sephardic Center of Istanbul.
  3. Develop a forced alignment model using the Montreal Forced Aligner software to automatically create sound-by-sound transcriptions of Ladino speech. This is a crucial step in phonetics research, allowing scholars to analyze larger datasets and generate more generalizable findings.

Acoustic Effects of Language Coactivation in Translation

Sponsor: Verónica Grajeda

Have you ever wondered about bilingual speech production? What about how translating or interpreting from one language to another affects target pronunciation? This study takes a dual perspective on bilingualism by comparing two acoustic productions elicited during a reading task versus an interpretation task. Bilingualism research is motivated to explain the intricate interaction of two or more languages in the bi-/multilingual mind. Different fields have explored bilinguals' language capacities and production to different ends. In translation studies, for example, research has observed the cognitive process between semantic activation through to lexical selection during translation. Psycholinguistic research on bilinguals have questioned the extent to which multiple languages can be activated at the same time and have offered compelling evidence for total language activation in the bilingual brain regardless of the target-language produced. From yet another perspective, phonetic studies offer evidence of cross-linguistic influences in production during bilingual mode tasks in comparison to unilingual mode tasks. Results from these phonetic studies also exhibit clear evidence of phonetic convergence that is correlated with increased co-activation. Inspired by this previous research, the current study aims to take a step towards unifying these different perspectives by investigating the acoustic effects that arise due to co-activation of both Spanish and English in the bilingual brain during a translation and interpretation task.

Multilingual Hispanic Speech in California (MuHSiC) Corpus

Sponsors: Dr. Justin Davidson and Julian Vargo

Thanks to a University of California Multicampus Research Grant, UC-Berkeley is partnering with UCLA and UC Santa Cruz to document, linguistically analyze, and ultimately legitimize Spanish-English bilingualism in California. Despite having more Spanish speakers than any other Spanish-speaking country in the world (save Mexico), the United States is often considered a largely monolingual English-speaking country. The long history of (Standard) English hegemony in the United States has resulted in, among many other things, a general lack of empirical research on US Spanish and its speakers.
Having already collected a total of 200 interviews in prior semesters and
gathered machine-automated transcriptions of their content, in this Fall 2025 semester, we continue to make progress on the corpus assembly phase. In particular, students are needed to listen to recorded interviews and manually correct any errors in the automated transcripts. Once the transcripts are all corrected, only then can we proceed to phonetic and other linguistic analyses of the corpus.
Students working on this project will be expected to:

  1. Attend a training workshop to learn how to correct interview transcriptions
  2. Correct a given set of transcriptions over the semester
  3. Edit transcripts containing erroneous timestamps for phonetic analysis
  4. Handle the sociolinguistic data of speaker generation level or language dominance

Web Development of the Multilingual Hispanic Speech in California (MuHSiC) Corpus

Sponsors: Dr. Justin Davidson and Julian Vargo

Thanks to a University of California Multicampus Research Grant, UC-Berkeley is partnering with UCLA and UC Santa Cruz to document, linguistically analyze, and ultimately legitimize Spanish-English bilingualism in California. Despite having more Spanish speakers than any other Spanish-speaking country in the world (save Mexico), the United States is often considered a largely monolingual English-speaking country. The long history of (Standard) English hegemony in the United States has resulted in, among many other things, a general lack of empirical research on US Spanish and its speakers.
Having already collected a total of 200 interviews in prior semesters and gathered machine-automated transcriptions of their content, in this Fall 2025 semester we aim to finalize the website that will host the corpus (recordings, transcripts, acoustic analytics, etc.).
Students working on this project will be expected to:

  1. Edit either front-end or back-end code for the website (Astro, CSS, HTML, Java)
  2. Design a professional looking site using graphs of data analysis and relevant pictures to be uploaded
  3. Manage the access of participant metadata
  4. Conduct acoustical or sociological analyses as needed for statistics to be displayed on the website.
  5. Troubleshoot any issues that may arise related to metadata, browser compatibility, file storage, etc.
  6. Edit any images for the website on Photoshop or other image editor that the student prefers.
  7. Create pull requests on GitHub.