Justin Davidson

Contact

5215 Dwinelle Hall

Office Hours: Thursdays and Fridays, 11AM-Noon

Job title: 
Associate Professor
Bio/CV: 

Ph.D., University of Illinois at Urbana-Champaign, 2015. Spanish Linguistics, Romance linguistics, SLATE (Second Language Acquisition and Teacher Education).

Research Expertise and Interests

Sociolinguistics, contact linguistics and language contact, language variation and change, Romance linguistics, quantitative methods (statistics, variable rule analyses for sociolinguistics, and computer software for statistics), sociohistorical linguistics, sociophonetics, bilingualism, Catalan, Spanish, dialectal diversification, non-English language pedagogy.

Research Profile

My main research agenda is guided by questions that primarily address language variation and language change in contact situations, specifically as linked to the empirical assessment of linguistic influence (via language contact), incorporating a variety of linguistic frameworks and methodologies. In particular, I have explored bi-directional effects of language contact between Spanish and Catalan manifested phonetically in the speech of the diverse community of Catalan-Spanish bilingual speakers in Barcelona and Valencia, Spain. I am interested in the dynamics of language use in bilingual speech communities, particularly as a consequence of a complex interplay between both linguistic and social factors, and my research aims to account for why, as well as by what processes certain linguistic features (and not others) propagate throughout the wider community of speakers. Central to this line of research is the pursuit of the best quantitative models in sociolinguistics, from which I have developed a vested interest in evaluating (and combining) various statistical toolkits (in addition to an attempt to help new R users become more accustomed to analyzing data with R – see files at the bottom of this page!). I have also published on the diachronic development of diaspora varieties of Catalan from a framework of sociohistorical linguistics, as well as the variable acquisition of Spanish inflectional morphology by U.S. heritage speakers and L2-speakers using empirical methodologies informed by the fields of second language acquisition and psycholinguistics. With respect to U.S. Spanish, I am actively involved in the documentation and legitimization of U.S. Spanish via my Corpus of Bay Area Spanish (CBAS) project, expanded in 2023 to explore Spanish-English bilingualism across California as MuHSiC (Multilingual Hispanic Speech in California), in collaboration with Hispanic Linguistics faculty at UC Santa Cruz and UCLA.

Davidson_CV_September_2025(link is external)

Active On-Site Project

In Fall 2016, I founded the Corpus of Bay Area Spanish (CBAS). Data collection, in the form of both formal (3-4 word phrase readings) and informal (casual interview) Spanish speech, responded to specific questions regarding Spanish-English language contact as manifested in the diverse population of Spanish-speakers living in the Bay Area (corresponding to approximately 25% of the total Bay Area population according to recent Census data).

CBAS has since been transformed, thanks to a 2023 UC Multicampus Research Initiatives Grant in parnership with UC Santa Cruz and UC Los Angeles, into MuHSiC (Multilingual Hispanic Speech in California). The MuHSiC corpus consists of 600 interviews culled from diverse array of Spanish-English bilinguals across California. All 600 participants participated in a 35-minute sociolinguistic interview in Spanish and a separate 35-minute sociolinguistic interview in English, provided extensive sociodemographic background information, and completed the Bilingual Language Profile as an assessment of language dominance. All speech recordings, participant metadata, speech transcriptions and force-aligned Praat textgrids via the Montreal Forced Aligner will be freely available for consultation on the MuHSiC website (eta: late Spring 2025). The MuHSiC corpus will permit linguistic analyses of several multilingual linguistic features (from phonology/phonetics to morphosyntax and the lexicon, etc.) from a diverse set of Linguistics frameworks, including Variationist Sociolinguistics, Second/Heritage Language Acquisition, and Contact Linguistics.

For prospective and current undergraduate and graduate students, the MuHSiC project offers the opportunity to engage first-hand in corpus-based sociolinguistic research. Undergraduate and graduate students are openly invited to collaborate in data analysis, leading to possibilities for advanced (Hispanic) Linguistics undergraduate research in the form of a Senior Thesis, or, for graduate students in (Hispanic) Linguistics, opportunities for professional research and publications. Interested students should contact me via e-mail for an appointment to discuss MuHSiC collaboration.

Are you considering applying for a PhD in Hispanic Linguistics (HLL Track 3) or Romance Linguistics (RLL) at UC Berkeley?

Before you apply, I strongly advise you to contact me via email! Additionally, note the following expectations for strong, competitive candidates:

1) If you're applying with a BA as your highest degree, completing an honors thesis or equivalent capstone project in (Hispanic/Romance) Linguistics is highly recommended.

2) Students holding an MA degree in (Hispanic/Romance) Linguistics are especially competitive, but not all MA programs are the same! If you're considering an MA before you begin a PhD, contact me to discuss strong options, all of which are tuition-free and come with guaranteed graduate employment salaries/stipends.

R Tutorial for the Non-Coding-Inclined 

PDF File: R Tutorial for the Non-Coding-Inclined (version 4.2.2.t2)(link is external)

R File: R Tutorial for the Non-Coding-Inclined (version 4.2.2.t2)(link is external)

The files above contain a series of R codes and explanations for numerous kinds of quantitative analysis, as well as templates/examples. The files are current with respect to the most recent update to R, but will change as R and packages within it are updated. Feel free to e-mail me if you believe any of the codes is no longer valid.

As for the PDF version, beyond the introductory pages, which cover the minimal coding you’ll need, general tips and terminology, a flow-chart to decide what test is appropriate for your data, the interpretation of ANOVA outputs vs. regression outputs, as well as a series of example R outputs and the corresponding tables/prose that one could create from them for a publication, each page is titled with the name of the test covered. Crucially, expectations for the data (i.e., what kinds of tests are suitable for which kind of data) appear below each title.

The R file version allows for perhaps easier copy-pasting, since users will not need to constantly click between R and their PDF-viewer program, but at the cost of color-coding. The only other differences between the R file and the PDF are the former’s omission of examples of R outputs and their interpretation, and additionally, since Chi-Squared in R is difficult to describe in pure prose, this test is omitted from the R file (though is still present as the final statistical test of the PDF file).

The only coding knowledge required relates to the independent variables included in a model, which is covered in the legend at the top of each test. For example, for models working with 4 IVs (GENDER as fixed, COUNTRY as fixed, VERB as fixed, and PARTICIPANT as random), the following “IVDump” notations are possible:

GENDER + COUNTRY                                                                 (fixed effects model with 2 main effects)

GENDER * COUNTRY                           (fixed effects model with 2 main effects and their interaction)

GENDER + COUNTRY * VERB                  (fixed effects model with 3 main effects and 1 interaction)

GENDER + (1|PARTICIPANT)       (mixed effects model with 1 main effect and 1 random intercept)

GENDER + (GENDER|PARTICIPANT)      (same as above but with the addition of a random slope)

Finally, below is an Excel spreadsheet with templates to show how different kinds of data are to be organized for analysis with R, as well as example data to practice analyses with.

Templates & Example Datasets for R

Role: