Prof. Dr. Anke Lüdeling
Profil
Forschungsthemen38
An Open Science Platform for Corpus Linguistics. Broadening the Scope of the Mind Research Repository
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 10/2014 - 09/2015 Projektleitung: Prof. Dr. Anke Lüdeling
Bilaterale Workshops Boston/Berlin (NEH/DFG) (Veranstaltungsreihe)
Quelle ↗Förderer: DFG sonstige Programme Zeitraum: 04/2009 - 12/2011 Projektleitung: Prof. Dr. Anke Lüdeling
CALLIDUS - Computer-Aided Language Learning: Lexikonerwerb im Lateinunterricht durch korpusgestützte Methoden
Quelle ↗Förderer: DFG sonstige Programme Zeitraum: 08/2017 - 07/2020 Projektleitung: Malte Dreyer, Prof. Dr. Stefan Kipf, Prof. Dr. Anke Lüdeling
CALLIDUS - Computer-Aided Language Learning: Lexikonerwerb im Lateinunterricht durch korpusgestützte Methoden
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 08/2017 - 12/2020 Projektleitung: Prof. Dr. Anke Lüdeling, Prof. Dr. Stefan Kipf, Malte Dreyer
CALLIDUS- Computer-Aided Language Learning: Lexikonerwerb im Lateinunterricht durch korpusgestützte Methoden.
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 06/2017 - 12/2020 Projektleitung: Prof. Dr. Stefan Kipf, Prof. Dr. Anke Lüdeling, Malte Dreyer
CLARIN-DE-HU
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 11/2014 - 06/2016 Projektleitung: Prof. Dr. Anke Lüdeling
CLARIN-D-MPI-HUB
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 09/2011 - 08/2014 Projektleitung: Prof. Dr. Anke Lüdeling
Crosslingual Language Varieties: Eine übergreifende Untersuchung
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 01/2019 - 12/2023 Projektleitung: Prof. Dr. Anke Lüdeling
Daidalos-Projekt - Entwicklung einer Infrastruktur zum Einsatz von Natural Language Processing für Forschende der Klassischen Philologie
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 07/2023 - 12/2026 Projektleitung: Dr. Andrea Beyer, Prof. Dr. Anke Lüdeling, Malte Dreyer
Daidalos-Projekt - Entwicklung einer Infrastruktur zum Einsatz von Natural Language Processing für Forschende der Klassischen Philologie
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 08/2023 - 12/2026 Projektleitung: Malte Dreyer, Dr. Maik Bierwirth
Eine minimale Infrastruktur für die nachhaltige Bereitstellung erweiterbarer Mehrebenensoftware für linguistische Korpora
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 10/2018 - 12/2021 Projektleitung: Prof. Dr. Anke Lüdeling
Entwicklung einer nachhaltigen und nutzerorientierten Speicherung und Bereitstellung von Forschungsdaten für die historische Linguistik
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 10/2011 - 12/2014 Projektleitung: Prof. Dr. Anke Lüdeling
FOR 2537/1: Emerging Grammars: ein sprachübergreifendes Korpus komparativer Daten aus Heritage- und Majoritäts-Sprachgebrauch (TP Pd)
Quelle ↗Förderer: DFG Forschungsgruppe Zeitraum: 04/2018 - 06/2021 Projektleitung: Prof. Dr. Heike Wiese, Prof. Dr. Anke Lüdeling
FOR 2537/1: Emerging Grammars: Ein sprachübergreifendes Korpus komparativer Daten aus Heritage- und Majoritäts-Sprachgebrauch (TP Pd)
Quelle ↗Förderer: DFG Forschungsgruppe Zeitraum: 05/2018 - 04/2021 Projektleitung: Prof. Dr. Anke Lüdeling, Prof. Dr. Heike Wiese
FOR 2537/2: "Das Lexikon von Herkunftssprachen: Dynamiken und Schnittstellen" in der Forschergruppe "Grammatische Dynamiken im Sprachkontakt: Ein komparativer Ansatz" (TP P11)
Quelle ↗Förderer: DFG Forschungsgruppe Zeitraum: 02/2022 - 02/2025 Projektleitung: Prof. Dr. Anke Lüdeling
Google European Digital Humanities Award 2010: Anke Lüdeling (Annotated Corpora in Studying and Teaching Variation and Change in Academic German)
Quelle ↗Förderer: Sonstige internationale Geldgeber Zeitraum: 12/2010 - 12/2026 Projektleitung: Prof. Dr. Anke Lüdeling
Handbuch Korpuslinguistik (Handbücher zur Sprache und Kommunikation)
Quelle ↗Zeitraum: 02/2004 - 12/2014 Projektleitung: Prof. Dr. Anke Lüdeling
Komplexe Datenbasen zur Rekonstruktion und Simulation evolutionärer Prozesse (TP 02)
Quelle ↗Förderer: Land Berlin - Andere Zeitraum: 07/2003 - 12/2007 Projektleitung: Prof. Dr. Anke Lüdeling
Kompost ― Ermittlung von Indikatoren für die Kompetenzeinschätzung von Schülertexten mittels computerlinguistischer Methoden und dialogische Entwicklung eines Prototyps für die computergestützte Analyse von Aufsätzen
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 11/2009 - 10/2012 Projektleitung: Prof. Dr. Anke Lüdeling
LangBank: Digital Infrastructure to Support the Study of Latin and Historical German
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 08/2015 - 05/2018 Projektleitung: Prof. Dr. Anke Lüdeling
Linguistische Annotation von Nichtstandardvarietäten - "Guidelines und Best Practices" (F-AG 7)
Quelle ↗Förderer: Bundesministerium für Forschung, Technologie und Raumfahrt Zeitraum: 09/2012 - 03/2013 Projektleitung: Prof. Dr. Anke Lüdeling
Long-term Access and Usage of Deeply Annotated Information II
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 03/2015 - 02/2018 Projektleitung: Prof. Dr. Anke Lüdeling
SFB 1412/1: Datenmanagement und statistische Analyse (TP INF)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2020 - 12/2023 Projektleitung: Malte Dreyer, Thomas Krause, Prof. Dr. Anke Lüdeling
SFB 1412/1: Non-native addressee register: Variation in der Interaktion mit Nichtmuttersprachler*innen (TP C06)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2020 - 12/2023 Projektleitung: Prof. Dr. Christine Mooshammer, Prof. Dr. Anke Lüdeling
SFB 1412/1: Registerkompetenz in Varietäten von fortgeschrittenen Lerner*innen (TP C04)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2020 - 12/2023 Projektleitung: Prof. Dr. Anke Lüdeling
SFB 1412/1: Register: Situationelle und funktionale Aspekte sprachlichen Wissens
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2020 - 12/2023 Projektleitung: Prof. Dr. Anke Lüdeling
SFB 1412/2: Datenverwaltung, -modellierung und -erkundung (TP Register INF)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2024 - 12/2027 Projektleitung: Prof. Dr. Anke Lüdeling, Thomas Krause, Malte Dreyer
SFB 1412/2: Register: Situationelle und funktionale Aspekte sprachlichen Wissens
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2024 - 12/2027 Projektleitung: Prof. Dr. Anke Lüdeling, Prof. Dr. Luka Szucsich
SFB 1412/2: Scheinbar freie (morpho)phonetische Variation (TP C06)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2024 - 12/2027 Projektleitung: Prof. Dr. Anke Lüdeling, Dr. Malte Belz, Prof. Dr. Christine Mooshammer
SFB 1412/2: Spezialisiertes Registerwissen junger Erwachsener: Modellierung der späten sprachlichen Entwicklung in L1 und L2 (TP C05)
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2024 - 12/2027 Projektleitung: Prof. Dr. Anke Lüdeling, Prof. Dr. Beate Lütke, Dr. Nicole Schumacher
SFB 1412: Register: Situationelle und funktionale Aspekte sprachlichen Wissens
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2020 - 12/2027 Projektleitung: Prof. Dr. Anke Lüdeling, Prof. Dr. Luka Szucsich
SFB 632/2: Computerlinguistik und Korpuslinguistik (TP D 01)
Quelle ↗409-02-A · SoftwaretechnikFörderer: DFG Sonderforschungsbereich Zeitraum: 07/2011 - 06/2015 Projektleitung: Prof. Dr. Anke Lüdeling
SFB 632: Linguistische Datenbanken
Quelle ↗Förderer: DFG Sonderforschungsbereich Zeitraum: 01/2008 - 06/2011 Projektleitung: Prof. Dr. Anke Lüdeling
Tagung 'Quantitative Investigations in Theoretical Linguistics' ( Veranstaltung: 28.03. - 31.03.2011, Berlin)
Quelle ↗Zeitraum: 02/2011 - 10/2011 Projektleitung: Prof. Dr. Anke Lüdeling
TextPloring: Forschungsdatenexploration in den Geisteswissenschaften mit dem LAUDATIO-Repository
Quelle ↗Förderer: DFG sonstige Programme Zeitraum: 07/2025 - 06/2027 Projektleitung: Malte Dreyer, Prof. Dr. Anke Lüdeling, Prof. Dr. Torsten Hiltmann
TextPloring: Forschungsdatenexploration in den Geisteswissenschaften mit dem LAUDATIO-Repository
Quelle ↗Förderer: DFG sonstige Programme Zeitraum: 10/2025 - 09/2027 Projektleitung: Prof. Dr. Torsten Hiltmann, Prof. Dr. Anke Lüdeling, Malte Dreyer
Textual Revisions in Second Language Writing / Projektbezogener Personenaustausch mit Hongkong
Quelle ↗Förderer: DAAD Zeitraum: 01/2012 - 12/2013 Projektleitung: Prof. Dr. Anke Lüdeling
Was ist schwierig? – Eine korpusbasierte Analyse struktureller Lernschwierigkeiten im Deutschen als Fremdsprache
Quelle ↗Förderer: DFG Sachbeihilfe Zeitraum: 08/2009 - 07/2012 Projektleitung: Prof. Dr. Anke Lüdeling
Mögliche Industrie-Partner10
Stand: 26.4.2026, 19:48:44 (Top-K=20, Min-Cosine=0.4)
- 51 Treffer62.1%
- Professionalisierung in der Deutsch-als-Zweitsprache-Förderung für geflüchtete Menschen mit LernschwierigkeitenT62.1%
- Professionalisierung in der Deutsch-als-Zweitsprache-Förderung für geflüchtete Menschen mit Lernschwierigkeiten
- 54 Treffer59.3%
- Realizing Leibniz's Dream: Child Languages as a Mirror of the Mind (LeibnizDream)P59.3%
- Realizing Leibniz's Dream: Child Languages as a Mirror of the Mind (LeibnizDream)
- 42 Treffer59.2%
- Zuwendung im Rahmen des Programms „exist – Existenzgründungen aus der Wissenschaft“ aus dem Bundeshaushalt, Einzelplan 09, Kapitel 02, Titel 68607, Haushaltsjahr 2026, sowie aus Mitteln des Europäischen Strukturfonds (hier Euro-päischer Sozialfonds Plus – ESF Plus) Förderperiode 2021-2027 – Kofinanzierung für das Vorhaben: „exist Women“T59.2%
- Zuwendung im Rahmen des Programms „exist – Existenzgründungen aus der Wissenschaft“ aus dem Bundeshaushalt, Einzelplan 09, Kapitel 02, Titel 68607, Haushaltsjahr 2026, sowie aus Mitteln des Europäischen Strukturfonds (hier Euro-päischer Sozialfonds Plus – ESF Plus) Förderperiode 2021-2027 – Kofinanzierung für das Vorhaben: „exist Women“
- 26 Treffer57.9%
- Translation for Massive Open Online CoursesP57.9%
- Translation for Massive Open Online Courses
- 26 Treffer57.9%
- Translation for Massive Open Online CoursesP57.9%
- Translation for Massive Open Online Courses
- 25 Treffer57.9%
- Translation for Massive Open Online CoursesP57.9%
- Translation for Massive Open Online Courses
- 36 Treffer57.7%
- Unterstützung einer inklusiven Anleitung für den Englischunterricht als Fremdsprache für gehörlose und schwerhörige SchülerT57.7%
- Unterstützung einer inklusiven Anleitung für den Englischunterricht als Fremdsprache für gehörlose und schwerhörige Schüler
Ecole Pouchet
PT44 Treffer57.7%- Unterstützung einer inklusiven Anleitung für den Englischunterricht als Fremdsprache für gehörlose und schwerhörige SchülerT57.7%
- Unterstützung einer inklusiven Anleitung für den Englischunterricht als Fremdsprache für gehörlose und schwerhörige Schüler
- 44 Treffer57.7%
- Unterstützung einer inklusiven Anleitung für den Englischunterricht als Fremdsprache für gehörlose und schwerhörige SchülerT57.7%
- Unterstützung einer inklusiven Anleitung für den Englischunterricht als Fremdsprache für gehörlose und schwerhörige Schüler
- 42 Treffer57.7%
- Unterstützung einer inklusiven Anleitung für den Englischunterricht als Fremdsprache für gehörlose und schwerhörige SchülerT57.7%
- Unterstützung einer inklusiven Anleitung für den Englischunterricht als Fremdsprache für gehörlose und schwerhörige Schüler
Publikationen25
Top 25 nach Zitationen — Quelle: OpenAlex (BAAI/bge-m3 embedded für Matching).
Cambridge University Press eBooks · 335 Zitationen · DOI
The origins of learner corpus research go back to the late 1980s when large electronic collections of written or spoken data started to be collected from foreign/second language learners, with a view to advancing our understanding of the mechanisms of second language acquisition and developing tailor-made pedagogical tools. Engaging with the interdisciplinary nature of this fast-growing field, The Cambridge Handbook of Learner Corpus Research explores the diverse and extensive applications of learner corpora, with 27 chapters written by internationally renowned experts. This comprehensive work is a vital resource for students, teachers and researchers, offering fresh perspectives and a unique overview of the field. With representative studies in each chapter which provide an essential guide on how to conduct learner corpus research in a wide range of areas, this work is a cutting-edge account of learner corpus collection, annotation, methodology, theory, analysis and applications.
181 Zitationen · DOI
This handbook provides an up-to-date survey of corpus linguistics. Spoken, written, and multimodal corpora serve as the bases for quantitative and qualitative research on many issues of linguistic interest. The two volumes together comprise 61 articles by renowned experts from around the world. They sketch the history of corpus linguistics and its relationship with neighbouring disciplines, show its potential, discuss its problems, and describe various methods of collecting, annotating, and searching corpora, as well as processing corpus data. Key features: up-to-date and complete handbook includes both an overview and detailed discussions gathers together a great number of experts
112 Zitationen
Linguistic distinctions between the notions of a phrase, a word and their components are challenged by so-called particle verbs in German and similar features in other languages. Particle verbs look like single words, yet are typically assembled from word-like fragments that together behave more like components of a phrase than a word. Particle verbs have previously been analyzed as morphological objects or as phrasal constructions, but neither approach fits cleanly within its chosen framwork. The resolution presented in this book, is that particle verbs should be seen as lexicalized phrasal constructions. Emphasizing morphological and sytactic testability, over 100 colloquial examples are shown to break the rules of previous approaches while remaining consistent to the book's proposition. Preverb constructions (PVCs) are introduced and diagrammed to help distinguish particle verbs from similar constructions, and to demonstrate how structural and morphological factors have been misidentified in the past. All this reveals the roles of listedness and non-transparency in word formation and clarifies the conclusion that particle verbs do not form a definable class of words.
edoc Publication server (Humboldt University of Berlin) · 101 Zitationen · DOI
ANNIS (see Dipper & Götze 2005; Chiarcos et al. 2008) is a flexible web-based corpus architecture for search and visualization of multi-layer linguistic corpora. By multi-layer we mean that the same primary datum may be annotated independently with (i) annotations of different types (spans, DAGs with labelled edges and arbitrary pointing relations between terminals or non-terminals), and (ii) annotation structures that possibly overlap and/or conflict hierarchically. In this paper we present the different features of the architecture as well as actual use cases for corpus linguistic research on such diverse areas as information structure, learner language and discourse level phenomena. The supported search functionalities of ANNIS2 include exact and regular expression matching on word forms and annotations, as well as complex relations between individual elements, such as all forms of overlapping, contained or adjacent annotation spans, hierarchical dominance (children, ancestors, left- or rightmost child etc.) and more. Alternatively to the query language, data can be accessed using a graphical query builder. Query matches are visualized depending on annotation types: annotations referring to tokens (e.g. lemma, POS, morphology) are shown immediately in the match list. Spans (covering one or more tokens) are displayed in a grid view, trees/graphs in a tree/graph view, and pointing relations (such as anaphoric links) in a discourse view, with same-colour highlighting for coreferent elements. Full Unicode support is provided and a media player is embedded for rendering audio files linked to the data, allowing for a large variety of corpora. Corpus data is annotated with automatic tools (taggers, parsers etc.) or taskspecific expert tools for manual annotation, and then mapped onto the interchange format PAULA (Dipper 2005), where stand-off annotations refer to the same primary data. Importers exist for many formats, including EXMARaLDA (Schmidt 2004), TigerXML (Brants & Plaehn 2000), MMAX2 (Müller & Strube 2006), RSTTool (O’Donnell 2000), PALinkA (Orasan 2003) and Toolbox (Stuart et al. 2007). Data is compiled into a relational DB for optimal performance. Query matches and their features can also be exported in the ARFF format and processed with the data mining tool WEKA (Witten & Frank 2005), which offers implementations of clustering and classification algorithms. ANNIS2 compares favourably with search functionalities in the above tools as well as other corpus search engines (EXAKT, http://www.exmaralda.org/exakt.html, TIGERSearch, Lezius,2002, CWB, Christ 1994) and other frameworks/architectures (NITE, Carletta et al. 2003, GATE, Cunningham, 2002).
94 Zitationen · DOI
The world wide web is a mine of language data of unprecedented richness and ease of access (Kilgarriff and Grefenstette 2003). A growing book-body of studies has shown that simple algorithms using web-based evidence are successful at many linguistic tasks, often outperforming sophisticated methods based on smaller but more controlled data sources (cf. Turney 2001; Keller and Lapata 2003). Most current internet-based linguistic studies access the web through a commercial search engine. For example, some researchers rely on frequency estimates (number of hits) reported by engines (e.g. Turney 2001). Others use a search engine to find relevant pages, and then retrieve the pages to build a corpus (e.g. Ghani and Mladenic 2001; Baroni and Bernardini 2004). In this study, we first survey the state of the art, discussing the advantages and limits of various approaches, and in particular the inherent limitations of depending on a commercial search engine as a data source. We then focus on what we believe to be some of the core issues of using the web to do linguistics. Some of these issues concern the quality and nature of data we can obtain from the internet (What languages, genres and styles are represented on the web?), others pertain to data extraction, encoding and preservation (How can we ensure data stability? How can web data be marked up and categorized? How can we identify duplicate pages and near duplicates?), and others yet concern quantitative aspects (Which statistical quantities can be reliably estimated from web data, and how much web data do we need? What are the possible pitfalls due to the massive presence of duplicates, mixed-language pages?). All points are illustrated through concrete examples from English, German and Italian web corpora.
75 Zitationen
Learner corpora – principled collections of learner language – provide interesting insights into the mechanisms by which a foreign language is acquired. For overviews over the current state of learner corpus research see Granger (2002, to appear), Nesselhauf (2004), and Pravec (2002). Learner corpora are used to test hypotheses in the theory of acquisition in two main ways. First,
publish.UP (University of Potsdam) · 73 Zitationen
We present a general framework for integrating annotations from different tools and tag sets. When annotating corpora at multiple linguistic levels, annotators may use different expert tools for different phenomena or types of annotation. These tools employ different data models and accompanying approaches to visualization, and they produce different output formats. For the purposes of uniformly processing these outputs, we developed a pivot format called PAULA, along with converters to and from tool formats. Different annotations are not only integrated at the level of data format, but are also joined on the level of conceptual representation. For this purpose, we introduce OLiA, an ontology of linguistic annotations that mediates between alternative tag sets that cover the same class of linguistic phenomena. All components are integrated in the linguistic information system ANNIS : Annotation tool output is converted to the pivot format PAULA and read into a database where the data can be visualized, queried, and evaluated across multiple layers. For cross-tag set querying and statistical evaluation, ANNIS uses the ontology of linguistic annotations. Finally, ANNIS is also tied to a machine learning component for semiautomatic annotation.
Deutsch als Fremdsprache · 53 Zitationen · DOI
46 Zitationen · DOI
Cambridge University Press eBooks · 44 Zitationen · DOI
A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.
Yearbook of morphology · 43 Zitationen · DOI
arXiv (Cornell University) · 33 Zitationen · DOI
This article describes the results of a case study that applies Neural\nNetwork-based Optical Character Recognition (OCR) to scanned images of books\nprinted between 1487 and 1870 by training the OCR engine OCRopus\n[@breuel2013high] on the RIDGES herbal text corpus [@OdebrechtEtAlSubmitted].\nTraining specific OCR models was possible because the necessary *ground truth*\nis available as error-corrected diplomatic transcriptions. The OCR results have\nbeen evaluated for accuracy against the ground truth of unseen test sets.\nCharacter and word accuracies (percentage of correctly recognized items) for\nthe resulting machine-readable texts of individual documents range from 94% to\nmore than 99% (character level) and from 76% to 97% (word level). This includes\nthe earliest printed books, which were thought to be inaccessible by OCR\nmethods until recently. Furthermore, OCR models trained on one part of the\ncorpus consisting of books with different printing dates and different typesets\n*(mixed models)* have been tested for their predictive power on the books from\nthe other part containing yet other fonts, mostly yielding character accuracies\nwell above 90%. It therefore seems possible to construct generalized models\ntrained on a range of fonts that can be applied to a wide variety of historical\nprintings still giving good results. A moderate postcorrection effort of some\npages will then enable the training of individual models with even better\naccuracies. Using this method, diachronic corpora including early printings can\nbe constructed much faster and cheaper than by manual transcription. The OCR\nmethods reported here open up the possibility of transforming our printed\ntextual cultural heritage into electronic text by largely automatic means,\nwhich is a prerequisite for the mass conversion of scanned books.\n
32 Zitationen
Error annotation is a key feature of modern learner corpora. Error identification is always based on some kind of reconstructed learner utterance (target hypothesis). Since a single target hypothesis can only cover a certain amount of linguistic information while ignoring other aspects, the need for multiple target hypotheses becomes apparent. Using the German learner corpus Falko as an example we therefore argue for a flexible multi-layer standoff corpus architecture where competing target hypotheses can be coded simultaneously. Surface differences between the learner text and the target hypotheses can then be exploited for automatic error annotation.
Studies in corpus linguistics · 30 Zitationen · DOI
Error annotation is a key feature of modern learner corpora. Error identification is always based on some kind of reconstructed learner utterance (target hypothesis). Since a single target hypothesis can only cover a certain amount of linguistic information while ignoring other aspects, the need for multiple target hypotheses becomes apparent. Using the German learner corpus Falko as an example, we therefore argue for a flexible multi-layer stand-off corpus architecture where competing target hypotheses can be coded in parallel. Surface differences between the learner text and the target hypotheses can then be exploited for automatic error annotation.
Artificial intelligence · 29 Zitationen · DOI
26 Zitationen · DOI
Item does not contain fulltext
23 Zitationen
In this paper we want to focus on a small facet of morphological productivity: on quantitative measures and their applicability to “real life ” corpus data.1 We will argue that – at least for German – there are at present no morphological systems available that can automatically preprocess the data to a quality necessary to apply statistical models for the calculation of productivity rates.2 Before coming to the quantitative aspects we want to clarify the notion morphological productivity. Morphological productivity has long been a topic in theoretical morphology (see for example Schultink 1961, Aronoff 1976, van Marle 1985, and Plag 1999). It has been defined in many ways. We choose a definition by Schultink (1961, p. 113) which contains three aspects that are important to us: We see productivity as a morphological phenomenon as the possibility for language users to coin unintentionally an in principle unlimited number of new formations, by using the morphological procedure that lies behind the form-meaning correspondence of some known words.3 The three important aspects are unintentionality, unlimitedness, and regularity. They are all interdependent. The first aspect – unintentionality – helps us to distinguish between productivity (which is a linguistic rule-based notion) and creativity (which is a general cognitive ability and cannot be captured within morphology alone): Words formed by productive processes are often not recognized
Oxford University Press eBooks · 21 Zitationen · DOI
Abstract This chapter describes the contributions that Corpus Linguistics (the study of linguistic phenomena by means of systematically exploiting collections of naturally-occurring linguistic data) can make to IS research. It discusses issues of designing a corpus that can serve as a basis for qualitative or quantitative studies, and then turns to the central issue of data annotation: what corpora are available that have been annotated with IS-related annotations, and how can such annotations be evaluated? In case a corpus does not have direct IS annotation, can other types of annotations, especially in the form of multi-layer annotation, be used as indirect evidence for the presence of IS phenomena? Next, the present state of the art in automatic IS annotation (by means of techniques from computational linguistics) is sketched, and finally, several sample studies that exploit IS annotations are introduced briefly.
20 Zitationen · DOI
Es gibt viele linguistische Forschungsfragen, für deren Beantwortung man Korpusdaten qualitativ und quantitativ auswerten möchte. Beide Auswertungsmethoden können sich auf den Korpustext, aber auch auf Annotationsebenen beziehen. Jede Art von Annotation, also Kategorisierung, stellt einen kontrollierten und notwendigen Informationsverlust dar. Das bedeutet, dass jede Art von Kategorisierung auch eine Interpretation der Daten ist. In den meisten großen Korpora wird zu jeder vorgesehenen Annotationsebene, wie z. B. Wortart-Ebene oder Lemma-Ebene, genau eine Interpretation angeboten. In den letzten Jahren haben sich neben den großen, ,,flach“ annotierten Korpora Korpusmodelle herausgebildet, mit denen man konfligierende Informationen kodieren kann, die so genannten Mehrebenen-Modelle (multilevel standoff corpora), in denen alle Annotationsebenen unabhängig vom Text gespeichert werden und nur auf bestimmte Textanker verweisen. Ich argumentiere anhand der Fehlerannotation in einem Lernerkorpus dafür, dass zumindest Korpora, in denen es stark variierende Annotationsbedürfnisse und umstrittene Analysen geben kann, davon profitieren, in Mehrebenen-Modellen kodiert zu werden.
edoc Publication server (Humboldt University of Berlin) · 20 Zitationen · DOI
This paper deals with the syntactic annotation of corpora that contain both ‘canonical’ and ‘non-canonical’ sentences.
Language Learning · 18 Zitationen · DOI
The present study analyzes morphological productivity for complex verbs in second language acquisition by analyzing a corpus of German as a Foreign Language (GFL). It shows that advanced learners of GFL use prefix and particle verbs relatively frequently and productively but less so than native speakers do and discusses these findings in the light of different linguistic models and acquisition theories. It argues that corpus data must be evaluated against good models and that it is necessary to make the categorization decisions available as annotations.
Elsevier eBooks · 18 Zitationen · DOI
edoc Publication server (Humboldt University of Berlin) · 17 Zitationen · DOI
Neoclassical word-formation is word-formation with elements of Greek or Latin origin. In the European languages neoclassical word-formation is found 'next to ' native word-formation. In these languages neoclassical elements combine productively with each
17 Zitationen · DOI
No natural language has a closed vocabulary (Kornai 2002). In addition to mechanisms to add to the base vocabulary, like borrowing, shortening, creativity etc. the productivity of morphological processes can form new complex entries. Some word formation processes can be used to form new words more easily than others. This fact, called morphological productivity, has been recognized for a long time and discussed from many points of view (see for example Aronoff 1976; Booij 1977; Baayen and Lieber 1991; Baayen 1992; Plag 1999; Bauer 2001; Baayen 2001; Nishimoto 2004). This paper is concerned with evidence for different aspects of morphological productivity. Our claim is that the problem of productivity can only be understood when different kinds of evidence – quantitative and qualitative – are combined. We will try to understand more about the interaction of qualitative and quantitative aspects of morphological productivity. We illustrate our claim by looking at a morphological element that has not received much attention in morphological descriptions yet: German -itis.1
Frontiers in Psychology · 16 Zitationen · DOI
In this paper, we present corpus data that questions the concept of native speaker homogeneity as it is presumed in many studies using native speakers (L1) as a control group for learner data (L2), especially in corpus contexts. Usage-based research on second and foreign language acquisition often investigates quantitative differences between learners, and usually a group of native speakers serves as a control group, but often without elaborating on differences within this group to the same extent. We examine inter-personal differences using data from two well-controlled German native speaker corpora collected as control groups in the context of second and foreign language research. Our results suggest that certain linguistic aspects vary to an extent in the native speaker data that undermines general statements about quantitative expectations in L1. However, we also find differences between phenomena: while morphological and syntactic sub-classes of verbs and nouns show great variability in their distribution in native speaker writing, other, coarser categories, like parts of speech, or types of syntactic dependencies, behave more predictably and homogeneously. Our results highlight the necessity of accounting for inter-individual variance in native speakers where L1 is used as a target ideal for L2. They also raise theoretical questions concerning a) explanations for the divergence between phenomena, b) the role of frequency distributions of morphosyntactic phenomena in usage-based linguistic frameworks, and c) the notion of the individual adult native speaker as a general representative of the target language in language acquisition studies or language in general.
Kooperationen8
Bestätigte Forscher↔Partner-Paare aus HU-FIS — Gold-Standard-Positive für das Matching.
SFB 1412/2: Register: Situationelle und funktionale Aspekte sprachlichen Wissens
university
TextPloring: Forschungsdatenexploration in den Geisteswissenschaften mit dem LAUDATIO-Repository
university
SFB 1412/2: Register: Situationelle und funktionale Aspekte sprachlichen Wissens
other
Linguistische Annotation von Nichtstandardvarietäten - "Guidelines und Best Practices" (F-AG 7)
university
Crosslingual Language Varieties: Eine übergreifende Untersuchung
university
TextPloring: Forschungsdatenexploration in den Geisteswissenschaften mit dem LAUDATIO-Repository
university
TextPloring: Forschungsdatenexploration in den Geisteswissenschaften mit dem LAUDATIO-Repository
university
SFB 1412/2: Register: Situationelle und funktionale Aspekte sprachlichen Wissens
university
Stammdaten
Identität, Organisation und Kontakt aus HU-FIS.
- Name
- Prof. Dr. Anke Lüdeling
- Titel
- Prof. Dr.
- Fakultät
- Sprach- und literaturwissenschaftliche Fakultät
- Institut
- Institut für deutsche Sprache und Linguistik
- Arbeitsgruppe
- Sprachwissenschaft des Deutschen
- Telefon
- +49 30 2093-85109
- HU-FIS-Profil
- Quelle ↗
- Zuletzt gescrapt
- 26.4.2026, 01:08:47