|
Norwegian Language Resources Inventory (draft)
This preliminary overview of digital language resources and tools in Norway was collected by questionnaire, with the support of the RCN through the NO-CLARIN preparatory project.
1. The NHH Termbase (NHH-T)
Type |
Multilingual terminology database |
Size |
1600 termbase entries |
Languages |
Norwegian, English |
Rightholders |
NHH |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
NHH |
Effort needed (a) technical (b) nontechnical |
a) 4 pw b) 4 pw |
Rationale for selection |
Updated and covering central concepts in microeconomics and other
economic-administrative domains |
Present usage |
Regular use for educational purposes at NHH, formal public launch autumn 2010, large
potential target group |
Similar resources or cooperations |
EuroTermBank (LV), Rikstermbanken (SE), DanTerm (DA), IATE (EU) |
Data or tool |
data |
2. KB-N (Kunnskapsbank for norsk økonomisk-administrativt domene)
Type |
Multilingual terminology database for the business and economics domains |
Size |
8467 termbase entries |
Languages |
Norwegian, English |
Rightholders |
NHH, Uni Digital |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
NHH |
Effort needed (a) technical (b) nontechnical |
a) 8 pw b) 8 pw, need for updating/quality check |
Rationale for selection |
Relevant for research and development due to wide scope of concepts in
economic-administrative domains |
Present usage |
Currently used by about ten researchers but has a much larger potential target group |
Similar resources or cooperations |
EuroTermBank (LV), Rikstermbanken (SE), DanTerm (DA), IATE (EU) |
Data or tool |
data |
3. The NOT database (NOT-basen, Norsk termbank)
Type |
Multilingual terminology database |
Size |
30 521 termbase entries |
Languages |
Norwegian, English |
Rightholders |
Uni Digital |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
NHH |
Effort needed (a) technical (b) nontechnical |
a) 8 pw b) 16 pw |
Rationale for selection |
Very wide coverage of domains and concepts including petroleum sector, not updated |
Present usage |
Currently used by about ten researchers but has a much larger potential target group |
Similar resources or cooperations |
EuroTermBank (LV), Rikstermbanken (SE), DanTerm (DA), IATE (EU) |
Data or tool |
data |
4. The UHR database (UHR-basen, Universitets- og høyskolerådets termbase)
Type |
Multilingual terminology database for administration in higher education |
Size |
1000 termbase entries |
Languages |
Norwegian, English |
Rightholders |
UHR |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
n/a |
Anticipated location |
UHR |
Effort needed (a) technical (b) nontechnical |
a) 4 pw b) 4 pw |
Rationale for selection |
Fully updated termbase covering central administrative concepts related to higher
education |
Present usage |
Regular use for administrative purposes, some hundreds of users |
Similar resources or cooperations |
EuroTermBank (LV), Rikstermbanken (SE), DanTerm (DA), IATE (EU) |
Data or tool |
data |
5. The RTT-material (RTT-materialet, Rådet for teknisk terminologi)
Type |
Multilingual terminology database for technical domains |
Size |
48,314 termbase entries |
Languages |
Mainly Norwegian, English, French |
Rightholders |
Fagbokforlaget |
Anticipated access policy |
n/a |
Anticipated reuse policy |
n/a |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
a) 8 pw b) 16 pw |
Rationale for selection |
Wide coverage of technical domains, not updated |
Present usage |
Currently a small user group but a wider target group |
Similar resources or cooperations |
EuroTermBank (LV), Rikstermbanken (SE), DanTerm (DA), IATE (EU) |
Data or tool |
data |
6. The EEA-EU database (EØS-EU-basen, EØS-sekretariatets terminologidatabase og norske
oversettelser av rettsakter innlemmet i EØS-avtalen)
Type |
Multilingual terminology and translation database related to EEA and EU |
Size |
36,776 termbase entries |
Languages |
Norwegian, English, French |
Rightholders |
EØS-sekretariatet, part of EØS/EFTA-seksjonen i Europaavdelingen, UD |
Anticipated access policy |
n/a |
Anticipated reuse policy |
n/a |
Anticipated location |
EØS-sekretariatet, part of EØS/EFTA-seksjonen i Europaavdelingen, UD |
Effort needed (a) technical (b) nontechnical |
a) 8 pw b) 8 pw |
Rationale for selection |
Continuously updated, high-quality termbase, inclusion must be negotiated with
EØS-sekretariatet |
Present usage |
Regular use for translation in government and public administration, hundreds of
external users, continuously updated |
Similar resources or cooperations |
EuroTermBank (LV), Rikstermbanken (SE), DanTerm (DA), IATE (EU) |
Data or tool |
data |
7. SNORRE (Standard Norges termbase)
Type |
Multilingual terminology database |
Size |
n/a |
Languages |
Norwegian, English |
Rightholders |
Standard Norge |
Anticipated access policy |
n/a |
Anticipated reuse policy |
n/a |
Anticipated location |
Standard Norge |
Effort needed (a) technical (b) nontechnical |
a) 16 pw b) 8 pw |
Rationale for selection |
Termbase being updated in ongoing project |
Present usage |
New terminology resource, launch planned in autumn 2010 |
Similar resources or cooperations |
EuroTermBank (LV), Rikstermbanken (SE), DanTerm (DA), IATE (EU) |
Data or tool |
data |
8. The KRIPOS database (KRIPOS-basen)
Type |
Multilingual terminology database for policing purposes |
Size |
n/a |
Languages |
Norwegian, English, French |
Rightholders |
KRIPOS |
Anticipated access policy |
n/a |
Anticipated reuse policy |
n/a |
Anticipated location |
KRIPOS, language section |
Effort needed (a) technical (b) nontechnical |
a) 16 pw b) 8 pw |
Rationale for selection |
Continuously updated, high-quality termbase, inclusion must be negotiated with
KRIPOS |
Present usage |
Regular internal use for policing purposes |
Similar resources or cooperations |
EuroTermBank (LV), Rikstermbanken (SE), DanTerm (DA), IATE (EU) |
Data or tool |
data |
9. NRK corpus
Type |
Speech and text corpus, phonetic transcription, lexicon. ~23 hours read and
spontaneous speech, 1.5 M words of news texts (subtitles) |
Size |
Ca. 23 hours of speech, 1.5 M words |
Languages |
Norwegian |
Rightholders |
NRK, SINTEF |
Anticipated access policy |
Free for research purpose |
Anticipated reuse policy |
Restricted |
Anticipated location |
SINTEF |
Effort needed (a) technical (b) nontechnical |
a) 4 pw b) 6 pw |
Rationale for selection |
Limited availability of particularly spontaneous speech in Norwegian, necessary for
robust speech recognition development. |
Present usage |
Development of automatic speech recognition systems |
Similar resources or cooperations |
NST database (University of Bergen), EUROM.0 (Norwegian University of Science and
Technology, EUROM.1 (Norwegian University of Science and Technology) |
Data or tool |
data |
10. NST database enhancements
Type |
Speech and text corpus |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Consortium of Norsk Språkråd, IBM, Norwegian University of Science and Technology,
University of Bergen, University of Oslo |
Anticipated access policy |
See NST database |
Anticipated reuse policy |
See NST database |
Anticipated location |
Norwegian University of Science and Technology |
Effort needed (a) technical (b) nontechnical |
a) 3 pw b) 5 pw |
Rationale for selection |
The performed enhancements/corrections of the NST database saves workload for
potential users of the NST database |
Present usage |
Development of speech recognition systems |
Similar resources or cooperations |
NST database (University of Bergen) |
Data or tool |
data |
11. The Place Name Archive Hordaland (Stadnamn i Hordaland)
Type |
Place name database with home names from the county of Hordaland, spoken and written. |
Size |
Ca. 250,000 names |
Languages |
Norwegian |
Rightholders |
Department of Linguistic, Literary and Aesthetic studies, University of Bergen |
Anticipated access policy |
Public and free for both research and public purposes |
Anticipated reuse policy |
Public and free |
Anticipated location |
University of Bergen |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Comprehensive database with local place names in Hordaland. |
Present usage |
n/a |
Similar resources or cooperations |
Place name archives in Oslo, Trøndelag and Sogn og Fjordande county (University of
Oslo, Norwegian University of Science and Technology, SandFj Fylkesarkiv). |
Data or tool |
data |
12. Typology of Norwegian Tonal Accents/Norwegian Tonal Accent Database (Norsk
tonelagstypologi)
Type |
Speech database, recordings of scripted frame utterances with a target words
representing realizations of the Norwegian tonal accents in different dialect
systematized for accent, structure of stressed syllable, syllable count, structure of
phonological word, position in phrase, etc. Originates in a project funded by the
Norwegian Research Council 2000-2002: Norsk tonelagstypologi. 20,595 individual sound
files from 116 recordings, each of one speaker. Each recording consists of the same set
of 69 short utterances, read twice. Most of the recordings are split into separate files
and organized in a Filemaker database. |
Size |
6.7 GB, 20,595 sound files |
Languages |
30 different Norwegian dialects |
Rightholders |
Gjert Kristoffersen, University of Bergen |
Anticipated access policy |
Free for research purposes |
Anticipated reuse policy |
Free for research purposes |
Anticipated location |
University of Bergen |
Effort needed (a) technical (b) nontechnical |
8pw. If opting for the Imdi metadata standard and the Elan annotation tool, the
original 'unsplit' recordings should be stored as single Elan files. Annotations will
have to be added to these files utterance by utterance. All recordings contain the same
utterances (although not necessarily in the same order), possibly facilitating this
task. An Imdi template based on one of the informants already exists. Since most of the
metadata are already defined in the Filemaker database, this should help speed up the
remaining work. |
Rationale for selection |
Of interest for linguists working with linguistic tone and intonation. There is a
small international community interested in Scandinavian tonal accents. Use of the
resource will probably presuppose a certain knowledge of a Scandinavian language, but
this will to a certain extent also depend on the level of detail of the linguistic
metadata provided. These are not sufficient today in this respect. |
Present usage |
Have mostly been used by the participants of the original project, but other
colleagues have from time to time also used parts of the data |
Similar resources or cooperations |
None, but the databases built by the Swedish project Swedia contains data that to
a certian extent are comparable to our data. |
Data or tool |
data |
13. Light stressed syllables (Jamvektsbasen)
Type |
Speech database, recordings of scripted frame utterances with a target words
representing realizations of stressed syllables in different Norwegian and Swedish
dialect, most of them where light stressed syllables have been preserved from Old Norse,
systematized for tonal accent, structure of stressed syllable, syllable count, structure
of phonological word, etc. Most of the recordings are split into separate files and
organized in a Filemaker database. |
Size |
Ca. 2 GB |
Languages |
Norwegian dialects from North Gudbrandsdal, Tinn, Oppdal. Dalarna Swedish: Älvdalen,
Våmhus, Vinäs, Skattungbyn, Östre Mora, Sollerön |
Rightholders |
Gjert Kristoffersen, University of Bergen |
Anticipated access policy |
Free for research purposes |
Anticipated reuse policy |
Free for research purposes |
Anticipated location |
University of Bergen |
Effort needed (a) technical (b) nontechnical |
8 pw. The three most recent recordings of three Swedish dialects are stored as
‘unsplit', annotated Elan-files. All other files are yet to be annotated. |
Rationale for selection |
I tillegg til forskere som arbeider med trykkrealisasjon i germansk, er basen også
av interesse for forskere som jobber med germanske tonelag, ikke minst fra en
historisk-komparativ synsvinkel. |
Present usage |
Materialet har hittil bare vært brukt av eier |
Similar resources or cooperations |
None. Concerning collaborations, see pt. 10 |
Data or tool |
data |
14. The Dialect Collection at the University of Bergen
Type |
Audio recordings/samples of dialects (mainly filed in analogue media). Digitising in
progress. Descriptions of dialects according to standard questionnaires. |
Size |
1600 sound recordings/samples on tape. Hours: unknown. Number of ther documents:
unknown |
Languages |
Norwegian |
Rightholders |
The dialect collection at the Department of Linguistic, Literary and Aesthetic
Studies (LLE) at the University of Bergen. |
Anticipated access policy |
Restricted, but free for research purposes. Most recordings are protected by law.
The material is only to be used by researchers. |
Anticipated reuse policy |
Restricted. Material protected by law only to be used for scientific purposes. |
Anticipated location |
The dialect collections at the Department of Linguistic, Literary and Aesthetic
Studies (LLE), at the University of Bergen. |
Effort needed (a) technical (b) nontechnical |
The resource is currently being developed, and the project is fully financed.
Digitising in progress, the deadline of which is unknown. |
Rationale for selection |
n/a |
Present usage |
The material is used in connection with a research project. |
Similar resources or cooperations |
Dialect archives at other institutions of higher learning in Norway: the
University of Oslo, the Norwegian University of Science and Technology, and the
University of Tromsø |
Data or tool |
data |
15. The Industrial Area Project (Industristadprosjektet)
Type |
Speech corpus |
Size |
213 hours of sound recordings, 2100 pages of text. More material will be collected
during the project. |
Languages |
Norwegian |
Rightholders |
Department of Linguistic, Literary and Aesthetic Studies (LLE) at the University of
Bergen. |
Anticipated access policy |
Restricted, but free for research purposes. Most recordings are protected by law.
The material is only to be used by researchers. |
Anticipated reuse policy |
Restricted. Material protected by law only to be used for scientific purposes. |
Anticipated location |
The dialect collections at the Department of Linguistic, Literary and Aesthetic
Studies (LLE), at the University of Bergen. |
Effort needed (a) technical (b) nontechnical |
The resource is currently being developed, and the project is fully financed.
Digitising in progress, the deadline of which is n/a. |
Rationale for selection |
n/a |
Present usage |
The material is used in connection with a research project. |
Similar resources or cooperations |
Dialect archives at other institutions of higher learning in Norway: the
University of Oslo, the Norwegian University of Science and Technology, and the
University of Tromsø |
Data or tool |
data |
16. Processes of dialect change (Dialektendringsprosessar)
Type |
Corpus of transcribed interviews |
Size |
491,621 words (September 1, 2010). Expected size next year: 1.5 M words |
Languages |
Norwegian |
Rightholders |
Dialektendringsprosessar (Helge Sandøy, LLE, University of Bergen). |
Anticipated access policy |
Free for research purposes |
Anticipated reuse policy |
Restricted, but free for researchers |
Anticipated location |
University of Bergen and Uni Digital |
Effort needed (a) technical (b) nontechnical |
a) 1 person years, b) 3 person years. The corpus will hopefully be extended by data
from new projects |
Rationale for selection |
An efficient and important tool in research on changes in the Norwegian language |
Present usage |
10 researchers, currently |
Similar resources or cooperations |
The Text Laboratory, University of Oslo. |
Data or tool |
data |
17. Modern import words in the Nordic languages (Moderne importord i språka i Norden,
MIN)
Type |
Corpus of newspapers from certain days in 1975 and 2000 in all Nordic countries.
Import words are annotated for etymological source, style, topic, etc. |
Size |
1.9 M words. |
Languages |
Icelandic, Faroese, Norwegian, Danish, Sweden-Swedish, Finland-Swedish, Finnish |
Rightholders |
Moderne importord i språka i Norden v/ Helge Sandøy |
Anticipated access policy |
Free for public |
Anticipated reuse policy |
Public and free |
Anticipated location |
University of Bergen and Uni Digital |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
The data were collected in order to compare the rate and typology of the usage of
import words in the Nordic languages. The corpus can be reused e.g. in order to
illustrate language usage in related languages. |
Present usage |
This was a subproject of MIN, and the corpus is used in reports presented in two
volumes of the series Moderne importord i språka i Norden (Novus forlag, Oslo). |
Similar resources or cooperations |
none |
Data or tool |
data |
18. KIAP Corpus (Cultural Identity in Academic Prose)
Type |
Corpus of published research articles in economics, linguistics, and medicine |
Size |
3,150,000 words |
Languages |
English, French, Norwegian |
Rightholders |
Kjersti Fløttum and Uni Digital |
Anticipated access policy |
Free for research purposes |
Anticipated reuse policy |
Research purposes |
Anticipated location |
University of Bergen/Uni Digital |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
Relevant for the study of differences in academic discourse |
Present usage |
Researchers and PhDs (5 -10). During main project period: 20-30 |
Similar resources or cooperations |
University of Grenoble |
Data or tool |
data |
19. The Norwegian Spanish Parallel Corpus, NSPC
Type |
Corpus (parallell translational corpus -unidirectional) |
Size |
1.5 M words in each language |
Languages |
Norwegian - Spanish |
Rightholders |
Lidun Hareide, University of Bergen |
Anticipated access policy |
Free for research purposes |
Anticipated reuse policy |
Research purposes |
Anticipated location |
Uni Digital |
Effort needed (a) technical (b) nontechnical |
a) 1pm, b) 1pm |
Rationale for selection |
All texts in the Norbok database published in Norwegian and translated into Spanish
between 2000 and 2008. |
Present usage |
In process of completion |
Similar resources or cooperations |
The NSPC is built to be roughly comparable to the P-ACTRES English - Spanish
Parallel corpus built at the University of León, Spain |
Data or tool |
data |
20. Medieval Nordic Text Archive (Menota)
Type |
Corpus of Medieval Nordic texts, several of which are linguistically annotated |
Size |
Presently 17 texts comprising 923,000 words encoded according to a high philological
standard, the archive is expected to grow considerably in the coming years, in part due
to the Menotec project (2010-2012) |
Languages |
Medieval Nordic (1100-1500), i.e. Old Icelandic, Old Norwegian, Old Swedish and Old
Danish, and also Latin texts of Nordic provenance from the same period |
Rightholders |
The editor(s) of each text, as specified in the header of each XML file,
bibliographical usage discussed here: http://www.menota.org/help/bibliographics.page |
Anticipated access policy |
Free |
Anticipated reuse policy |
As specified in § 3 of the access agreement (deposit license),
http://www.menota.org/avtaler/depo1-2.html |
Anticipated location |
University of Oslo, Unit for Digital Documentation |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Menota is presently the major digital archive of freely available Medieval Nordic
texts encoded according to a high academic standard |
Present usage |
Based on feedback, users are typically academics (at all levels), exact data for
usage not available |
Similar resources or cooperations |
Handrit.is is not a text archive, but it is a catalogue which could be linked to
Menota, and vice versa, however, it only covers Old Icelandic and Old Norwegian |
Data or tool |
data |
21. Infrastructure for the Exploration of Syntax and Semantics (INESS)
Type |
E-infrastructure for syntactically (and semantically) annotated corpora, including
the first extensive treebank for Norwegian. |
Size |
The treebank for Norwegian will be built in the period until October 2015. The
projected size of the gold standard treebank is 500,000 words. The projected size of the
automatically annotated treebank is 500 M words. Other languages will also be added. |
Languages |
Norwegian, Sami, German, English and other languages. |
Rightholders |
University of Bergen and Uni Digital. |
Anticipated access policy |
Mostly public and free, but depending on conditions for source texts. |
Anticipated reuse policy |
Mostly public and free, but depending on conditions for source texts. |
Anticipated location |
University of Bergen |
Effort needed (a) technical (b) nontechnical |
The resource is currently being developed, and the project is fully financed for
180 pm. |
Rationale for selection |
Treebanks can be used for developing analyzers for various applications. The
infrastructure will provide treebanking support for others. |
Present usage |
Not yet in use. |
Similar resources or cooperations |
No similar resources for Norwegian. |
Data or tool |
infrastructure |
22. Corpus of Norwegian as a second language, ASK (Norsk andrespråkskorpus)
Type |
Text corpus of Norwegian as a second language, searchable by linguistic annotations
and informant attributes. The data is collected from Norsk språktest's archives of
examination results from foreigners learning Norwegian as a second language. |
Size |
2000 texts, ca. 600,000 words in total. A control corpus with 200 texts written by
people with Norwegian as their mother tongue. |
Languages |
Norwegian |
Rightholders |
University of Bergen |
Anticipated access policy |
Free for research purposes |
Anticipated reuse policy |
Restricted |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Useful for L2 studies. |
Present usage |
Intensively used by about 20 researchers at masters and PhD level. |
Similar resources or cooperations |
First corpus in Norway providing this type of language data. |
Data or tool |
data |
23. COLA: Corpus oral de lenguaje adolescente
Type |
Speech corpus with linked transcriptions. Teenage talk from Spanish speaking cities. |
Size |
0.8 M words, 50 hours of audio files |
Languages |
Spanish from Madrid, Buenos Aires and Santiago de Chile |
Rightholders |
University of Bergen, Annette Myre Jørgensen |
Anticipated access policy |
Non-commercial research (as agreed with informants) |
Anticipated reuse policy |
Non-commercial research |
Anticipated location |
Uni Digital |
Effort needed (a) technical (b) nontechnical |
Conversion of texts to Corpuscle format: 1 month. Work on user interface: 0,5 month |
Rationale for selection |
Important source for studying Spanish youth language. |
Present usage |
250 web users in 25 countries worldwide. Important source for studying Spanish youth
language. Popular in among researchers of oral language, teachers of Spanish, students
of Spanish. Has been used as a basis for 15 MA and 10 PhD theses. |
Similar resources or cooperations |
Can be compared with COLT and UNO. |
Data or tool |
data |
24. COLT: Corpus f London teenage language
Type |
Corpus |
Size |
0.5 M words, 50 hours of audio files |
Languages |
English |
Rightholders |
University of Bergen. |
Anticipated access policy |
Non-commercial research |
Anticipated reuse policy |
Non-commercial research |
Anticipated location |
Uni Digital or University of Bergen. COLT is presently distributed on the ICAME CD and
as a set of 3 CDs with audio. Some of the texts are available as a part of BNC, but they
have been further processed in Bergen. The audio files are only available from Bergen. |
Effort needed (a) technical (b) nontechnical |
Conversion of texts to Korpuscle format: 1 month. Work on user interface: 0.5 month |
Rationale for selection |
Important source for studying English youth language. |
Present usage |
We have distributed 25 sets of CDs with transcripts/audio. A wider user group access
the corpus through the web interface. |
Similar resources or cooperations |
Can be compared with COLA and UNO. |
Data or tool |
data |
25. UNO
Type |
Corpus |
Size |
0.2 M words, 30 hours of audio files |
Languages |
Norwegian |
Rightholders |
Probably Kristine Hasund, HIA |
Anticipated access policy |
Non-commercial research |
Anticipated reuse policy |
Non-commercial research |
Anticipated location |
Uni Digital |
Effort needed (a) technical (b) nontechnical |
Conversion of texts to Corpuscle format: 1 month. Work on user interface: 0,5 month |
Rationale for selection |
The first source of this kind for Norwegian (spontaneous dialogue) |
Present usage |
n/a |
Similar resources or cooperations |
Big Brother corpus, Oslo |
Data or tool |
data |
26. ICAME
Type |
Collection of corpora, written, spoken, historical |
Size |
18 corpora, 14 M words |
Languages |
English |
Rightholders |
The collectors of the various corpora |
Anticipated access policy |
Non-commercial research. Today the material can be distributed on a CD if the user
signs the conditions on the order form. We have to renegotiate the policy, this may be
different for the different corpora. |
Anticipated reuse policy |
Possibly id. |
Anticipated location |
Uni Digital |
Effort needed (a) technical (b) nontechnical |
Conversion of texts to XML: 9 months. Work on user interface: 1 month |
Rationale for selection |
Standard resource for research on English. Many of the corpora are only available
for use with legacy concordance programs like WordSmith. The corpora should be made
searchable via a web interface in Corpuscle. The historical ones are particularly
valuable since few texts exist as compared to modern texts. |
Present usage |
We have distributed more than 1000 CDs with these corpora. The corpora are very
popular among scholars of English. |
Similar resources or cooperations |
Some of these corpora are available to registered users at the University of
Lancaster (hence duplication should be avoided through cooperation). |
Data or tool |
data |
27. Newspaper corpus
Type |
Corpus |
Size |
800 M running words |
Languages |
Norwegian |
Rightholders |
The newspaper publishers. We are allowed to let users search the corpus and show the
hits with limited context. |
Anticipated access policy |
Free but conditions to be re-negotiated |
Anticipated reuse policy |
Free but conditions to be re-negotiated |
Anticipated location |
Uni Digital |
Effort needed (a) technical (b) nontechnical |
Conversion and re-tagging of 23 newspapers: 11.5 months. Work on user interface: 1
month |
Rationale for selection |
The largest collection of Norwegian texts available for language studies. Dynamic
corpus, extraction of new word forms (unregistered earlier). Distribution of hits by
newspaper and year. |
Present usage |
290 registered users. |
Similar resources or cooperations |
NoWac, Text Laboratory Oslo |
Data or tool |
data |
28. Wittgenstein Archives Bergen 5000 pages (WAB 5000)
Type |
corpus in XML (TEI-P5) and HTML output (different versions) formats, XSLT
stylesheets, Web interface |
Size |
More than 2 M words |
Languages |
German and English |
Rightholders |
The Master and Fellows of Trinity College, Cambridge, Bertrand Russell Archives,
Ontario (Ts-201a1, Ts-201a2), Oxford University Press, Oxford, University of Bergen,
Bergen, Uni Research, Bergen |
Anticipated access policy |
Creative Commons General Public License Attribution, Non-Commercial, Share-Alike
version 3 (CCPL BY-NC-SA) |
Anticipated reuse policy |
Creative Commons General Public License Attribution, Non-Commercial, Share-Alike
version 3 (CCPL BY-NC-SA) |
Anticipated location |
University of Bergen |
Effort needed (a) technical (b) nontechnical |
a) XSLT programming and Web services: 4 PM, b) guidance of programming,
administration, dissemination and communication, incl. Web sites: 4 PM |
Rationale for selection |
One of the most important resources for 20th century philosophy and thought, a
study case and test-bed for philology, literary studies and XML/TEI research and
applications, multilingual |
Present usage |
n/a |
Similar resources or cooperations |
http://wittgensteinsource.org/ , http://wab.aksis.University of
Bergen.no/wab_hw.page/ |
Data or tool |
data |
29. Korpuscle (Korpuskel)
Type |
Corpus tool |
Size |
n/a |
Languages |
unknnown |
Rightholders |
Uni Digital |
Anticipated access policy |
LLGPL (Lisp Lesser General Public License, basically public domain) |
Anticipated reuse policy |
LLGPL |
Anticipated location |
Uni Digital, downloadable |
Effort needed (a) technical (b) nontechnical |
2 person months |
Rationale for selection |
Usability: Handles any corpus that is annotated on a word and/or structural level.
Unicode support. Suitable for large corpora (order of magnitude 1 billion tokens and
more). Powerful search engine (functionality of Corpus Workbench's query language plus
support for multi-valued and set-valued attributes, hierarchical structures with
arbitrary nesting, and more), fast query processing using newly developed algorithms
based on suffix arrays and finite state automata. Support for user annotations and
editing using integrated relational database. Customizable Web interface. |
Present usage |
At present the system is used by several projects/corpora at Uni Digital and
University of Bergen: ASK, Dialektendringsprossessar, Norsk Aviskorpus |
Similar resources or cooperations |
Similar tools: Corpus Workbench/CQPWeb. Potential collaborations: The Text
Laboratory ved University of Oslo, Språkbanken (Sverige), many potential users
internationally |
Data or tool |
tool |
30. TCA2 (Translation Corpus Aligner 2)
Type |
Software to prepare texts for parallel corpora |
Size |
n/a |
Languages |
n/a |
Rightholders |
Uni Digital |
Anticipated access policy |
Free for research purposes |
Anticipated reuse policy |
So far others have been allowed to modify the code for their own purposes. |
Anticipated location |
Uni Digital, downloadable |
Effort needed (a) technical (b) nontechnical |
Depends on which enhancements are desired. Text editing: 0.5 months? Word
alignment/term extraction module: 3 months? A web version: 4 months? All the work is
technical work. |
Rationale for selection |
Handles pairs of texts that are translations of each other, where sentences have
been XML tagged. Alignment is done partly automatic, partly by manual intervention.
Automatic alignment assumes sentences are related 1-1, 1-2, 1-0, 2-1, or 0-1. The
process is helped by "anchor files" which contains pairs of words/phrases that are more
or less translations of each other. The program is based on earlier program by Knut
Holfland and a lot of (his) experience with sentence alignment. On the whole users are
very satisfied. |
Present usage |
Used by 10 users in 8 projects for 6 language pairs. |
Similar resources or cooperations |
Several command driven aligners exist, but not so many with a GUI. |
Data or tool |
tool |
31. IDP (Interactive Dynamic Presentation)
Type |
XSLT-based software to filter and present XML-TEI-encoded texts in a web page, in
user-defined ways |
Size |
n/a |
Languages |
n/a |
Rightholders |
Uni Digital and University of Bergen |
Anticipated access policy |
Online use is open. |
Anticipated reuse policy |
So far others have been allowed to modify the code for their own purposes. |
Anticipated location |
Uni Digital and University of Bergen |
Effort needed (a) technical (b) nontechnical |
4 person months programming |
Rationale for selection |
not relevant |
Present usage |
n/a |
Similar resources or cooperations |
http://wittgensteinsource.org/ , http://wab.aksis.University of
Bergen.no/wab_hw.page/ |
Data or tool |
tool |
32. PROIEL corpus
Type |
corpus of classical Bible translations |
Size |
518,000 words |
Languages |
Ancient Greek, Latin, Classical Armenian, Old Church Slavic, Gothic |
Rightholders |
PROIEL project, IFIKK, Oslo |
Anticipated access policy |
Creative Commons Attribution-Noncommercial-Share Alike 3.0 |
Anticipated reuse policy |
Public and free (noncommercial) |
Anticipated location |
University of Oslo |
Effort needed (a) technical (b) nontechnical |
Sources available, much work required to create a proper query interface |
Rationale for selection |
Covers the New Testment and translations, as well as a number of original texts in
Latin. Only large corpus covering the classical ancient languages, interest also for
Bible scholars |
Present usage |
294 registered users |
Similar resources or cooperations |
(Small) Latin treebanks at the Perseus project (Tufts, USA) and in Milano,
otherwise there are very few resources for these languages |
Data or tool |
data |
33. English Resource Grammar (ERG)
Type |
computational grammar |
Size |
250,000 lines of code |
Languages |
English |
Rightholders |
DELPH-IN |
Anticipated access policy |
open source (MIT) |
Anticipated reuse policy |
Public and free |
Anticipated location |
CLARINO LAP (on-line use) and repository (for download) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
The ERG is the largest freely available precision grammar for English, already a
point of reference for many |
Present usage |
about one dozen current users world-wide |
Similar resources or cooperations |
ParGram English grammar |
Data or tool |
tool |
34. PET
Type |
parser |
Size |
60,000 lines of code |
Languages |
language-independent |
Rightholders |
DELPH-IN |
Anticipated access policy |
open source (LGPL) |
Anticipated reuse policy |
Public and free |
Anticipated location |
CLARINO LAP (on-line use) and repository (for download) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
PET provides the run-time environment for the ERG (and other DELPH-IN grammars) |
Present usage |
many dozens of current users world-wide |
Similar resources or cooperations |
Xerox Linguistic Environment (XLE) |
Data or tool |
tool |
35. Linguistic Knowledge Builder (LKB)
Type |
grammar engineering toolkit |
Size |
160,000 lines of code |
Languages |
language-independent |
Rightholders |
DELPH-IN |
Anticipated access policy |
open source (MIT) |
Anticipated reuse policy |
Public and free |
Anticipated location |
CLARINO repository (for download) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
the LKB is a very popular tool for unification-based grammar engineering |
Present usage |
several hundreds of current users world-wide, both for teaching and RandD usage |
Similar resources or cooperations |
Xerox Linguistic Environment (XLE) |
Data or tool |
tool |
36. Redwoods
Type |
manually annotated HPSG treebank |
Size |
250,000 words |
Languages |
English |
Rightholders |
DELPH-IN project |
Anticipated access policy |
open source (MIT) |
Anticipated reuse policy |
Public and free |
Anticipated location |
CLARINO LAP (on-line use) and repository (for download) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
largest available HPSG treebank |
Present usage |
about one dozen current users world-wide |
Similar resources or cooperations |
n/a |
Data or tool |
data |
37. WikiWoods
Type |
automatically annotated HPSG treebank based on Wikipedia |
Size |
900 M words |
Languages |
English |
Rightholders |
DELPH-IN project |
Anticipated access policy |
open source (MIT) |
Anticipated reuse policy |
Public and free |
Anticipated location |
CLARINO LAP (on-line use) and repository (for download) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
rich syntacto-semantic annotations for the complete English Wikipedia |
Present usage |
a handful of (power) users world-wide |
Similar resources or cooperations |
n/a |
Data or tool |
data |
38. MaltXLE
Type |
Architecture for 'stacked' dependency parsing with LFG features |
Size |
n/a |
Languages |
English and German |
Rightholders |
Lilja Øvrelid |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
CLARINO LAP (on-line use) |
Effort needed (a) technical (b) nontechnical |
a) 2 pw b) 2 pw |
Rationale for selection |
stacked parsing increases domain robustness, deeper linguistic features useful for
applications |
Present usage |
Internal (in-house) use only |
Similar resources or cooperations |
Presupposes MaltParser and XLE |
Data or tool |
tool |
39. Leksikografisk bokmålskorpus (LBK)
Type |
text corpus |
Size |
40 mill words |
Languages |
Norwegian |
Rightholders |
Dept of Linguistic and Nordic studies at University of Oslo |
Anticipated access policy |
Licenced |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
Coverage of text types according to time use of reading by SSB |
Present usage |
n/a |
Similar resources or cooperations |
Not in Norway or over Norwegian language |
Data or tool |
data |
40. The French Newspaper Corpus
Type |
Text corpus. Part-of-speech tagged newspaper texts in French. Avalaible through web
interface. Search by variables such as part of speech, suffix etc. |
Size |
115 M words |
Languages |
French |
Rightholders |
Developer: The Text Laboratory. Texts: LCD and ACL. |
Anticipated access policy |
Accessible for users from University of Oslo |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
University of Grenoble |
Data or tool |
data |
41. The KAL Corpus
Type |
Text corpus. 3300 texts written by pupils. Marks and and other background data is
available. Search by a range of variables. Annotation:
http://omilia.uio.no/kal/filer/tekn_info.html |
Size |
3300 texts |
Languages |
Norwegian |
Rightholders |
Annotated corpus: The Text Laboratory. Pupil texts: The project "Kvalitetssikring av
læringsutbyttet i norsk skriftlig": http://prosjekt.hihm.no/r97-kal/ |
Anticipated access policy |
Free for research purposes |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
42. LOGON Tourist Corpus
Type |
Text corpus. Parallell aligned tourist information texts in Norwegian and English. |
Size |
ca 175,000 words |
Languages |
Norwegian and English |
Rightholders |
Developer: The Text Laboratory in cooperation with the LOGON-project |
Anticipated access policy |
Access only for research and development purposes. Association with Språkbanken
needs to be examined/determined. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
43. NoWaC - Norwegian Web as Corpus
Type |
Text corpus. Web based corpus for Norwegian bokmål, 700 M words. Constructed through
automatic retrieval of documents from the .no domain. The documents are downloaded from
the Internet and then processed. POS-tagged. |
Size |
700 M words |
Languages |
Norwegian bokmål |
Rightholders |
The Text Laboratory/PhD student Emiliano Guevara. |
Anticipated access policy |
Free for research purposes |
Anticipated reuse policy |
Free for research purposes |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
44. NP-annotated Norwegian corpus
Type |
Text corpus. Norwegian texts in which all NPs are annotated with information about
their form, meaning and discurse relations. Available through web interface. Search by
all information marked on the NPs. |
Size |
The resource is under construction and is fully financed. |
Languages |
Norwegian |
Rightholders |
Developer: The Text Laboratory. Text and annotation: Norwegian University of Science
and Technology |
Anticipated access policy |
Restricted access, regulated by Norwegian University of Science and Technology
(Norwegian University of Science and Technology) |
Anticipated reuse policy |
n/a |
Anticipated location |
Norwegian University of Science and Technology |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
45. The OPUS Corpus
Type |
Text corpus. Written language from 60 languages. OPUS is a growing collection of
translated texts from the web. The OPUS project converts and aligns free online data,
adds linguistic annotation, and provides the community with a publicly available
parallel corpus. OPUS is based on open source products and the corpus is also delivered
as an open content package. Several tools are used to compile the current collection.
All pre-processing is done automatically. No manual corrections have been carried out. |
Size |
30 M words |
Languages |
60 languages |
Rightholders |
Developer: The Text Laboratory in cooperation with Jörg Tiedemann, University of
Groningen |
Anticipated access policy |
Public and free. |
Anticipated reuse policy |
Restricted |
Anticipated location |
University of Groningen |
Effort needed (a) technical (b) nontechnical |
n/a. Currently available |
Rationale for selection |
The main motivation for compiling OPUS is to provide an open source parallel corpus
that uses standard encoding formats including linguistic annotation. A public collection
of parallel corpora that can freely be used and distributed makes it possible for
everyone to run experiments on bitexts and their results can easily be compared. |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
46. The Oslo Corpus of tagged Norwegian texts, Bokmål (Oslokorpuset av taggede, norske
tekster, bokmål)
Type |
Text corpus. Texts from fiction, newspapers/magazines and factual prose. Available
through web interface. Search by variables such as genre, part of speech, suffix etc.
Tagged with the Oslo-Bergen-tagger. |
Size |
18.5 M words |
Languages |
Norwegian bokmål |
Rightholders |
Developer: The Text Laboratory. |
Anticipated access policy |
Restricted: access only for research and development purposes. Association with
Språkbanken needs to be examined/determined. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
47. The Oslo Corpus of tagged Norwegian texts, Nynorsk (Oslokorpuset av taggede, norske
tekster, Nynorsk)
Type |
Text corpus. Texts from fiction, newspapers/magazines and factual prose. Available
through web interface. Search by variables such as genre, part of speech, suffix etc.
Tagged with the Oslo-Bergen-tagger. |
Size |
3.8 M words |
Languages |
Norwegian nynorsk |
Rightholders |
Developer: The Text Laboratory. |
Anticipated access policy |
Restricted: access only for research and development purposes. Association with
Språkbanken needs to be examined/determined. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
48. The Oslo Corpus of Bosnian Texts
Type |
Text corpus. Bosnian texts from various genres. Available through web interface.
Search by variables such as genre, part of speech, suffix etc. |
Size |
1.5 M words |
Languages |
Bosnian |
Rightholders |
Developer: The Text Laboratory. |
Anticipated access policy |
Restricted: access only for research and development purposes. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
49. Oslo Multilingual Corpus
Type |
Text corpus. Mulitlingual parallell text corpora (subcorpora) with original texts and
translations. Available through web interface. Search by variables such as genre, part
of speech, suffix etc. SGML-tagged, tagged gramatically with several different taggers. |
Size |
15.5 mill words |
Languages |
Principally Norwegian, English, French and German, but also smaller corpora with
Dutch and Portuguese texts. |
Rightholders |
Developer: The Text Laboratory in cooperation with the SPRIK projekt, ILOS,
University of Oslo |
Anticipated access policy |
Restricted: access only for research and development purposes at University of Oslo
and University of Bergen. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
50. Sami-Norwegian Corpus
Type |
Text corpus. Sami-Norwegian parallell aligned texts. Available through web interface.
Search by variables such as genre, part of speech, suffix etc. Tagged gramatically with
the Sami CG-tagger developed at the Center for Sami Language Technology (Senter for
samisk språkteknologi), University of Tromsø. |
Size |
Unknown |
Languages |
Sami and Norwegian |
Rightholders |
Developer: The Text Laboratory and Center for Sami Language Technology, University
of Tromsø. Text and annotation: Center for Sami Language Technology, University of
Tromsø |
Anticipated access policy |
Access regulated by the Center for Sami Language Technology, University of Tromsø. |
Anticipated reuse policy |
n/a |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
51. The Sidaama Corpus
Type |
Text corpus. Sidaama texts from the New Testament. Translated by Kjell Magne Yri. |
Size |
150,000 words |
Languages |
Sidaama |
Rightholders |
Developer: The Text Laboratory. Texts: Kjell Magne Yri |
Anticipated access policy |
At Kjell Magne Yri's (ILN) disposal |
Anticipated reuse policy |
n/a |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
52. The Usenet Corpus
Type |
Text corpus. Norwegian texts from the no*-hierachy Usenet (newslist web domain) from
1998 to 2002. |
Size |
140 M words |
Languages |
Norwegian |
Rightholders |
Developer: The Text Laboratory. |
Anticipated access policy |
Public and free. Probably to be made accessible through Språkbanken. |
Anticipated reuse policy |
Public and free |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a. Currently available |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
53. The Big Brother Corpus
Type |
Speech corpus. Norwegian speech corpus. Ortographically transcribed and linked to
video files. Almost all the television broadcasts from the first season of Big Brother
in 2001. Spontanous speech, including laughter, crying, yelling, discussions etc.
XML-tagged and gramatically tagged by the NoTa-tagger. |
Size |
Ca. 550,000 words |
Languages |
Norwegian |
Rightholders |
Developer: The Text Laboratory. |
Anticipated access policy |
Restricted: access only for research and development purposes. Association with
Språkbanken needs to be examined/determined. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a. Currently available |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
54. The Ruija Corpus
Type |
Speech corpus. Speech corpus with spoken language from 'kvensk'-speaking areas
(1962-2009. Both 'kvensk' and Norwegian speech. The corpus is built after the model of
NoTa-Oslo, among others. |
Size |
Ca. 70 interviews |
Languages |
Kven and Norwegian |
Rightholders |
Developer: The Text Laboratory. Material from two projects with project manager Pia
Lane, ILN |
Anticipated access policy |
At the LICHEN project's (by researcher Pia Lane, ILN) disposal. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
55. Nordic Dialect Corpus (Nordisk dialektkorpus)
Type |
Speech corpus. Nordic Dialect Corpus is a corpus of Norwegian, Swedish, Danish,
Faroese and Övdalian (and soon Icelandic and Finland Swedish) spoken language. It
consists of spontaneous speech data from dialects of the North Germanic languages across
all of the Nordic countries. The linguistic data in the corpus comes frome a variety of
sources, both old and new. It is transcribed and linked to audio and video, has a map
function, and can be searched in a large variety of ways. For Norwegian: 100 points of
reference (målepunkt) in Norway. 400 informants, each doing a 10 minute interview,
participating in a 30 minute conversation.Transcription resembling spoken language, and
with Norwegian translation. Phonetic transcription translated by the ScanDiaSyn
transliterator, and grammatically tagged by the NoTa-tagger. |
Size |
Approx 2 M words (September 2010). More data will be added |
Languages |
Norwegian dialects, Swedish dialects, Danish dialects, Faroese and Övdalian dialects
(and soon Icelandic and Finland Swedish |
Rightholders |
Developer: The Text Laboratory. The Norwegian material is collected in collaboration
with Norwegian University of Science and Technology and University of Tromsø. The
material from the other Nordic countries are supplied by the respective countries. |
Anticipated access policy |
Restricted: access only for research and development purposes. Association with
Språkbanken needs to be examined/determined. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
56. Norwegian Corpus of Spoken Language (Norsk talespråkkorpus Oslo)
Type |
Speech corpus. 166 informants, representative with respect to variables such as
gender, age, education and residence. Interviews and conversations. Orthographically
transcribed speech linked to audio and video files. Web interface, searchable by text
and variables. XML-tagged and grammatically tagged by the NoTa-tagger. |
Size |
Ca. 900,000 words |
Languages |
Norwegian |
Rightholders |
Developer: The Text Laboratory. |
Anticipated access policy |
Restricted: access only for research and development purposes. Association with
Språkbanken needs to be examined/determined. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
57. The TAUS Corpus
Type |
Speech corpus. Original audio files and transcripts from the TAUS project. 59
informants are orthographically re-transcribed, and this transcription linked to the
audio files. Web interface searchable by text and variables. XML-tagged and
grammatically tagged by the NoTa-tagger. |
Size |
Ca. 244,000 words |
Languages |
Norwegian |
Rightholders |
Developer: The Text Laboratory. Material from the TAUS-project. |
Anticipated access policy |
Restricted: access only for research and development purposes. Association with
Språkbanken needs to be examined/determined. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a. Currently available |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
58. The UPUS Corpus
Type |
Speech corpus. Conversations and interviews with young people from multi-ethnic
groups in Oslo. Orthographically transcribed and linked to audio and video files. Web
interface, searchable by text and variables. XML-tagged and grammatically tagged by the
NoTa-tagger. |
Size |
Unknown. Currently interviews and conversations with 55 adolecents |
Languages |
Norwegian |
Rightholders |
Developer: The Text Laboratory. Material from the UPUS-project. |
Anticipated access policy |
At the UPUS project's disposal. Currently only accessible for research related to
the project. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a. Currently available |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
59. GREI
Type |
Grammar game/treebank. Morphological and syntactic analysis of sentences in Norwegian
bokmål and nynorsk, used for grammar games and syntactic tree construction. Encoded
according to the VISL project standard. |
Size |
750 sentences in Norwegian Bokmål and 750 sentences in Norwegian Nynorsk |
Languages |
Norwegian (Bokmål and Nynorsk) |
Rightholders |
Developer: the VISL-project (at the University of Southern Denmark, Odense). The
Text Laboratory is responsible for creating and analysing the Norwegian sentences. |
Anticipated access policy |
The Norwegian sentence analyses are freely accessible. The VISL-project at the
University of Southern Denmark, is the rightholder of games and other tools. |
Anticipated reuse policy |
The Norwegian sentence analyses are freely accessible |
Anticipated location |
University of southern Denmark, Odense: beta.visl.sdu.dk |
Effort needed (a) technical (b) nontechnical |
n/a. Currently available |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
60. The Sofie Treebank
Type |
Treebank. Aligned sentences in nine languages from two chapters of Jostein Gaarder's
Sophie's World. |
Size |
Sentences from two chapters in Sophie's World |
Languages |
Danish, Dutch, English, Estonian, Finnish, German, Icelandic, Norwegian and Swedish. |
Rightholders |
Developer: The Text Laboratory in cooperation with the Nordic Treebank Network
participants. |
Anticipated access policy |
Restricted: access only for research and development purposes. |
Anticipated reuse policy |
Unresolved. Association with Språkbanken needs to be examined/determined. |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
61. The Norwegian Wordbank (Norsk ordbank)
Type |
Lexical database. Electronical database with lexical base units. Each unit is linked
to all its inflectional forms. Base forms from Bokmålsordboka, Nynorskordboka, the IBM
glossary and more. Currently updated according to recent changes in ortography/spelling
regulations. Words are listed with all inflectional forms. |
Size |
Ca. 150,000 entries in Norwegian Bokmål and 124,000 in Norwegian Nynorsk. |
Languages |
Norwegian (bokmål and nynorsk) |
Rightholders |
Developer: EDD. A board with members from bokmålsleksikografi (Lexicography for
Norwegian bokmål), Språkrådet, The Text Laboratory and EDD is responsible for
operations/running and sale/marketing. |
Anticipated access policy |
Accessible through GPL-licence, otherwise for sale. |
Anticipated reuse policy |
Unresolved. Probably to be made accessible through Språkbanken. |
Anticipated location |
EDD, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a. Currently available |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
62. Nordic Syntactic Judgment Database
Type |
Database. Electronic database with survey data from Nordic dialects. For Norwegian:
100 points of reference (målepunkt), 400 informants. Data from corresponding
surveys/investigations in other Nordic countries provided through ScanDiaSyn. |
Size |
Unknown. Growing |
Languages |
Nordic dialects |
Rightholders |
Developer: The Text Laboratory. The Norwegian material is collected in collaboration
with Norwegian University of Science and Technology and University of Tromsø. The
material from the other Nordic countries are supplied by the respective countries. |
Anticipated access policy |
Accessible for research purposes. |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
63. ScanLex-leksikon
Type |
Lexical database. Aligned word lists from English and six Nordic languages/language
variants. Automatically generated from the parallel algined texts in the OPUS corpus. |
Size |
Ca. 76,000 pairs of words |
Languages |
Danish, Icelandic, Norwegian bokmål, Norwegian nynorsk, Swedish and English |
Rightholders |
Developer: The Text Laboratory. |
Anticipated access policy |
Public and free. |
Anticipated reuse policy |
Unresolved. Probably to be made accessible through Språkbanken. |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
64. List of animate nouns
Type |
Word list/lexicon/glossary. List of Norwegian animate nouns, extracted from Norwegian
web sites using automated Google searches. |
Size |
1018 nouns |
Languages |
Norwegian |
Rightholders |
Developer: Anders Nøklestad, The Text Laboratory. |
Anticipated access policy |
Public and free. |
Anticipated reuse policy |
Unresolved. Probably to be made accessible through Språkbanken. |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
65. Anaphora resolusion system
Type |
Language processing tool. Tool identifying the antecedents of pronominal anaphors in
Norwegian texts. The Oslo-Bergen tagger is used to pre-process the input text. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Developer: Anders Nøklestad, The Text Laboratory. |
Anticipated access policy |
Public and free. |
Anticipated reuse policy |
n/a |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
66. PP Scope Disambiguator
Type |
Language processing tool. Tool for disambiguating PP scope. Determines whether PPs
which are syntactically ambiguous with respect to scope modify a preceding noun or the
main verb in the sentence. The Oslo-Bergen tagger is used to pre-process the input text. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Developer: Anders Nøklestad, The Text Laboratory. |
Anticipated access policy |
Public and free. |
Anticipated reuse policy |
n/a |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
67. Named Entity Recognizer for Norwegian
Type |
Language processing tool. Part of the Oslo-Bergen-tagger. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Developers: The Text Laboratory and Aksis, University of Bergen (now Uni Digital) |
Anticipated access policy |
May be downloaded for non-commercial use according to GPL conditions. |
Anticipated reuse policy |
Special terms for use of the LISP rule interpreter. |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
68. Named Entity Recognizer for Norwegian 2
Type |
Language processing tool. Named entity recognition tool (NE-recognizer). The NER
classifies names using statistical methods (memory-based learning or maximum entropy
modeling). The Oslo-Bergen tagger is used to pre-process input text. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Developer: Anders Nøklestad, The Text Laboratory. |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
n/a |
Anticipated location |
The Text Laboratory, ILN, University of Oslo / Uni Digital |
Effort needed (a) technical (b) nontechnical |
n/a. Currently available |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
69. The NoTa-tagger
Type |
Language processing tool. Statistics speech tagger (tree tagger) trained on material
from NoTa-Oslo. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Developer: The Text Laboratory. |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
n/a |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
70. The Oslo-Bergen-tagger
Type |
Language processing tool. Morphological and syntactic CG1-tagger for Norwegian bokmål
and nynorsk. Norsk ordbank is used for multi-tagging and pre-processing. The rule
interpreter is implemented in Allegro Common Lisp. A later version, CG3, uses a rule
interpreter from University of Southern Denmark, Odense |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Developers: The tagger project (The Text Laboratory and EDD) and Aksis, University
of Bergen (now Uni Digital) |
Anticipated access policy |
May be downloaded for non-commercial use according to GPL conditions. |
Anticipated reuse policy |
Unresolved. Linguistic rules can probably be made accessible through Språkbanken.
Special terms for use of the LISP rule interpreter. CG3 rule interpreter on
gpl-conditions from SDU |
Anticipated location |
The Text Laboratory, ILN, University of Oslo / Uni Digital |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
71. ScanDiaSyn Dialect Transliterator
Type |
Language processing tool. Semi-automatic dialect translator translating between
dialect and Norwegian bokmål. |
Size |
n/a |
Languages |
From Norwegian dialects to bokmål |
Rightholders |
Developer: The Text Laboratory. |
Anticipated access policy |
Public and free. |
Anticipated reuse policy |
n/a |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
72. Glossa
Type |
Corpus search and results management system. Web-based tool facilitating writing
complex search expressions, exploring result sets, creating statistics based on the
result sets, and editing and storing the result sets. For different types of corpora:
one language, parallel corpora and speech corpora (sound and audio) |
Size |
n/a |
Languages |
n/a |
Rightholders |
Developer: The Text Laboratory. |
Anticipated access policy |
Public and free on GPL-license |
Anticipated reuse policy |
GPL-license |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
The system is operative, but needs extensions in the context of an infrastructure
centre |
Rationale for selection |
n/a |
Present usage |
Used by several research groups in Norway and abroad |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
73. SIMPLE-editing (SIMPLE-redigering)
Type |
Lexicon editing system. Editing system making it possible to edit the
complicated/complex SIMPLE dictionary for Norwegian. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Developers: The Text Laboratory and Lexicography for Norwegian bokmål
(bokmålsleksikografi), ILN |
Anticipated access policy |
Public and free. |
Anticipated reuse policy |
n/a |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
74. Nordsamisk analyser
Type |
tagger/analyser |
Size |
115,779 entries |
Languages |
sme |
Rightholders |
University of Tromsø |
Anticipated access policy |
GPL |
Anticipated reuse policy |
GPL |
Anticipated location |
University of Tromsø |
Effort needed (a) technical (b) nontechnical |
2 man-years |
Rationale for selection |
Working analysers |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
75. Lulesamisk analyser
Type |
tagger/analyser |
Size |
37,450 entries |
Languages |
smj |
Rightholders |
University of Tromsø |
Anticipated access policy |
GPL |
Anticipated reuse policy |
GPL |
Anticipated location |
University of Tromsø |
Effort needed (a) technical (b) nontechnical |
2 man-years |
Rationale for selection |
Working analysers |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
76. Sørsamisk analyser
Type |
tagger/analyser |
Size |
62,386 entries |
Languages |
sma |
Rightholders |
University of Tromsø |
Anticipated access policy |
GPL |
Anticipated reuse policy |
GPL |
Anticipated location |
University of Tromsø |
Effort needed (a) technical (b) nontechnical |
2 man-years |
Rationale for selection |
Working analysers |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
77. Færøysk analyser
Type |
tagger/analyser |
Size |
87,528 entries |
Languages |
fao |
Rightholders |
University of Tromsø |
Anticipated access policy |
GPL |
Anticipated reuse policy |
GPL |
Anticipated location |
University of Tromsø |
Effort needed (a) technical (b) nontechnical |
2 man-years |
Rationale for selection |
Working analysers |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
78. Grønlandsk analyser
Type |
tagger/analyser |
Size |
159,392 entries |
Languages |
kal |
Rightholders |
University of Tromsø, Oqaasillerifik |
Anticipated access policy |
GPL |
Anticipated reuse policy |
GPL |
Anticipated location |
University of Tromsø |
Effort needed (a) technical (b) nontechnical |
2 man-years |
Rationale for selection |
Working analysers |
Present usage |
400 entries/day |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
79. Nordsamisk korpus
Type |
corpus |
Size |
485,509 words |
Languages |
sme |
Rightholders |
University of Tromsø |
Anticipated access policy |
GPL |
Anticipated reuse policy |
GPL |
Anticipated location |
University of Tromsø |
Effort needed (a) technical (b) nontechnical |
1 man-year |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
80. Lulesamisk korpus
Type |
corpus |
Size |
25,832 words |
Languages |
smj |
Rightholders |
University of Tromsø |
Anticipated access policy |
GPL |
Anticipated reuse policy |
GPL |
Anticipated location |
University of Tromsø |
Effort needed (a) technical (b) nontechnical |
1 man-year |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
81. Sørsamisk korpus
Type |
corpus |
Size |
15,211 words |
Languages |
sma |
Rightholders |
University of Tromsø |
Anticipated access policy |
GPL |
Anticipated reuse policy |
GPL |
Anticipated location |
University of Tromsø |
Effort needed (a) technical (b) nontechnical |
1 PY |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
82. Norwegian University of Science and Technology database of spoken language
Type |
Speech database, partly annotated |
Size |
8000 sound files, 13,1 GB |
Languages |
(Mainly) Norwegian, English and Czech: Norwegian spoken by natives and non-natives
(Chinese, English, German, French, Russian, Persian); English spoken by natives and
non-natives; Czech spoken by natives (Norwegian and Czech) |
Rightholders |
ISK, Norwegian University of Science and Technology |
Anticipated access policy |
Free for research purposes |
Anticipated reuse policy |
Free for research purposes |
Anticipated location |
ISK, Norwegian University of Science and Technology |
Effort needed (a) technical (b) nontechnical |
The effort to make this material available in a systematic way is hard to estimate
but will be large |
Rationale for selection |
Useful for research in phonetics and speech technology |
Present usage |
Internal use until now |
Similar resources or cooperations |
n/a |
Data or tool |
data |
83. LEXIN
Type |
Web-based dictionaries made especially for immigrants in Norway. In addition to
information about parts of speech, inflection and pronunciation, the dictionary includes
simple explanations and examples of everyday usage, concrete as well as metaphorical.
The Norwegian LEXIN project is based on a Swedish dictionary series of the same name. In
Sweden, the LEXIN dictionaries have been translated into more than 20 languages and are
published both electronically and in printed versions. Since 1996 Uni Digital,
commissioned by the Ministry of Education, Research and Church Affairs, the Norwegian
Board of Education (from 1999), and the Norwegian Directorate for Education and Training
(since 2004) has worked on developing corresponding dictionaries for Norwegian. The
current inventory consists of a Bokmål dictionary, a Nynorsk dictionary, a
Bokmål-Nynorsk dictionary, and 25 dictionaries from Bokmål or Nynorsk to 13 languages. 3
dictionaries currently under development. |
Size |
28 dictionaries |
Languages |
Norwegian, Arabic, Kurdish (Kurmanji), Kurdish (Sorani), Persian, Polish, Russian,
Somali, Tamil, Thai, Tigrinya, Turkish, Urdu, English. |
Rightholders |
Norwegian Directorate for Education and Training (Utdanningsdirektoratet) |
Anticipated access policy |
Restricted, unresolved |
Anticipated reuse policy |
Unresolved |
Anticipated location |
Uni Digital |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
The LEXIN dictionaries are developed for immigrants with little or no experience in
the use of dictionaries or other linguistic resources. The dictionaries, intended for
immigrants are clearly set out and easy to use, among other things because all
information about an entry word is found with the word itself. |
Present usage |
Publicly available online. |
Similar resources or cooperations |
The Swedish and Danish LEXIN Dictionaries |
Data or tool |
data/tool |
84. Scarrie proofreading tool
Type |
Proofreading tool. The Norwegian part of SCARRIE aims at advanced spelling correction
in Bokmål. It uses word form dictionaries in combination with special mechanisms for
handling multi-word expressions and for recognizing newly seen compounds, proper names
and other words not present in the dictionaries. In cooperation with Norwegian
University of Science and Technology, a suitable Norwegian word form dictionary has been
built. Predictable misspellings are supplied with recommendations for corrections. New
compounds are detected by an analysis based on rules supplied by the University of Oslo.
Words that are outside the scope of the dictionary and are likely errors are processed
by the correction mechanisms including sound-based similarity. In addition, a robust
grammar was developed for the detection and correction of certain classes of errors
which cannot be handled at word level, i.e. agreement errors. Finally, corrections are
carried out so as to fit in the written norm which the document is written in (on a
range from conservative to radical Bokmål). |
Size |
Not relevant |
Languages |
Norwegian |
Rightholders |
The partners of the SCARRIE-project: WordFinder Software AB (Växjö, Sweden),
Universitetet i Bergen, Institutionen för lingvistik at Uppsala Universitet, Center for
Sprogteknologi (København) and Svenska Dagbladet (Stockholm). |
Anticipated access policy |
Free for research purposes. The material is restricted by property rights and
cannot be used for commercial puposes without agreement. |
Anticipated reuse policy |
Restricted |
Anticipated location |
University of Bergen and Uni Digital |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
The main result consists of an implemented and tested prototype with enhanced
capabilities for advanced error correction. It was tested on a limited test set and the
results were favourable in comparison to state of the art products. |
Present usage |
n/a |
Similar resources or cooperations |
The Swedish and Danish Scarrie tools |
Data or tool |
tool |
85. Scarrie lexicon
Type |
Norwegian word form dictionary. The word forms in this list are tagged with
information about lemma (basic form), standard, style or written norm, morphosyntactic
characteristics and possibly replacement. The lexical information for Norwegian has been
coded in several word lists. The main lexicon comprises open class words for Bokmål:
adjectives, adverbs, nouns and main verbs. This dictionary contains 360,933 wordform
entries organised in 72,626 lemmas (corresponding to citation forms). This means that
for each citation form, on the average 5 inflected word forms are stored. Additional
separate word lists have been made for closed class (grammatical) words, affixes,
abbreviations and words occurring only in multi-word expressions. |
Size |
360,933 wordform entries, 72,626 lemmas |
Languages |
Norwegian |
Rightholders |
The partners of the SCARRIE-project: WordFinder Software AB (Växjö, Sweden),
Universitetet i Bergen, Institutionen för lingvistik at Uppsala Universitet, Center for
Sprogteknologi (København) and Svenska Dagbladet (Stockholm). |
Anticipated access policy |
Free for research purposes. The material is restricted by property rights and
cannot be used for commercial puposes without agreement. |
Anticipated reuse policy |
Restricted |
Anticipated location |
University of Bergen and Uni Digital |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
The Norwegian lexicons for SCARRIE have been provided with new information not
available before, specifically verb subcategorization and lexical variants (style
replacements). The dictionaries with inflected forms contain massive information for
replacement under given styles. |
Present usage |
n/a |
Similar resources or cooperations |
The Swedish and Danish Scarrie lexicons |
Data or tool |
data |
86. The Norwegian Treebank Pilot Project (TREPIL)
Type |
Development of a suitable methodology and sophisticated tools for the semiautomatic
construction of a treebank. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
University of Bergen and Uni Digital |
Anticipated access policy |
n/a |
Anticipated reuse policy |
n/a |
Anticipated location |
University of Bergen |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
87. BREDT
Type |
Development of statistical methods based on existing theories and resources to
automatically detect referential chains in texts. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Unknown (possibly University of Bergen) |
Anticipated access policy |
Demonstrator downloadable from the project web page. |
Anticipated reuse policy |
Demonstrator downloadable from the project web page. |
Anticipated location |
University of Bergen |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
88. The Text Corpus for Norwegian Nynorsk (Det nynorske tekstkorpuset)
Type |
Corpus |
Size |
More than 70 M words |
Languages |
Norwegian Nynorsk |
Rightholders |
Norsk Ordbok 2014, ILN, Universitetet i Oslo |
Anticipated access policy |
Public and free: available on the Internet in both tagged and untagged versions |
Anticipated reuse policy |
The textual property rights are regulated by NO 2014 |
Anticipated location |
Department of Linguistics and Scandinavian Studies (University of Oslo) |
Effort needed (a) technical (b) nontechnical |
1 man-year extending the property rights of the existing material. Addition of new
material will be continued throughout the NO 2014 project period, and is fully fincaced
by the project. |
Rationale for selection |
The corpus contains texts in Norwegian Nynorsk from the 1860s up to 2010. There's
no restrictions on the use. The corpus material is highly important in the development
of standards/norms for Norwegian language, is reusable and consists of both factual
prose and fiction. |
Present usage |
Used in the editing of Norsk Ordbok, as well as by public users online (user
statistics not available). |
Similar resources or cooperations |
None |
Data or tool |
data |
89. The Norsk Ordbok Card File Archive (Norsk Ordboks setelarkiv)
Type |
Database of electronic word cards |
Size |
3.2 M digitized word cards |
Languages |
Norwegian Nynorsk |
Rightholders |
Norsk Ordbok 2014, ILN, Universitetet i Oslo |
Anticipated access policy |
Public and free: available on the Internet |
Anticipated reuse policy |
No restrictions on use |
Anticipated location |
Department of Linguistics and Scandinavian Studies (University of Oslo) |
Effort needed (a) technical (b) nontechnical |
More efforts could be put into online accessibility, workload estimated to ca. 1
man-year. All technical resources will need regular maintenance/update, estimated to 1-2
man-years every 10 years. The 2014 card application is recently developed by the
Department for digital documentation (Eining for digital dokumentasjon), University of
Oslo |
Rationale for selection |
The archive collocated with similar/corresponding collections in Norsk Ordboks
metaordbok. |
Present usage |
Used in the editing of Norsk Ordbok, as well as by public users online (user
statistics not available). |
Similar resources or cooperations |
Unique database, no similar existing resources for Norwegian |
Data or tool |
data |
90. The Norwegian Atlas of Dialects (Norsk Dialektatlas, kartsamlinga)
Type |
Collection of digital dialectal maps |
Size |
596 maps |
Languages |
Norwegian Nynorsk |
Rightholders |
Norsk Ordbok 2014, ILN, Universitetet i Oslo |
Anticipated access policy |
Public and free: available on the Internet |
Anticipated reuse policy |
No restrictions on use |
Anticipated location |
Department of Linguistics and Scandinavian Studies (University of Oslo) |
Effort needed (a) technical (b) nontechnical |
The search system is currently being developed. User systems for digital maps is
still under development at an international basis. The collection of digital maps will
be integrated into the existing collection of resources to give a more complete overview
over Norwegian dialects. At least 2 man-years are estimated to link language data and
maps. |
Rationale for selection |
Accounts for spoken language from all parts of Norway. Contains dialect isoglosses
and word geography. |
Present usage |
Used in the editing of Norsk Ordbok, as well as by public users online (user
statistics not available). |
Similar resources or cooperations |
Unique database, no similar existing resources for Norwegian |
Data or tool |
data |
91. The Dialect Synopsis (Målføresynopsisen)
Type |
Database of scanned protocol pages and a rudimentary search interface |
Size |
43 protocols (more than 10,000 hand written protocol pages) |
Languages |
Norwegian Nynorsk |
Rightholders |
Norsk Ordbok 2014, ILN, Universitetet i Oslo |
Anticipated access policy |
Public and free: available on the Internet |
Anticipated reuse policy |
No restrictions on use |
Anticipated location |
Department of Linguistics and Scandinavian Studies (University of Oslo) |
Effort needed (a) technical (b) nontechnical |
The search system is currently being developed, and data is being added to the
digital Atlas. Estimated effort to make the search system sufficiently accurate is
approximately 1 man-year. |
Rationale for selection |
Accounts for spoken language from all parts of Norway, mapping Norwegian dialect
phonology, morphology, and partly syntactically. Point of reference for future spoken
language research, both for Norwegian and other Scandinavian languages. |
Present usage |
Used in the editing of Norsk Ordbok, as well as by public users online (user
statistics not available). |
Similar resources or cooperations |
Unique database, no similar existing resources for Norwegian |
Data or tool |
data |
92. The Dictionary Hotel (Ordbokshotellet)
Type |
Database of digitized versions of published collections of words including metadata
from all over Norway. |
Size |
Ca. 30 digitized collections (September 1th 2010). Another 30 collections are ready
to be incorporated in the database. In total there is approximately 500 such collections
of words which should be integrated in this database. |
Languages |
Norwegian Nynorsk |
Rightholders |
Norsk Ordbok 2014, ILN, Universitetet i Oslo |
Anticipated access policy |
Public and free: available on the Internet |
Anticipated reuse policy |
No restrictions on use |
Anticipated location |
Department of Linguistics and Scandinavian Studies (University of Oslo) |
Effort needed (a) technical (b) nontechnical |
Approximately 0,5 man-year for each collection of words (500 collections in total) |
Rationale for selection |
Accounts for spoken language from all parts of Norway. Contributes to the overall
knowledge of Norwegian dialects/spoken language. |
Present usage |
Used in the editing of Norsk Ordbok, as well as by public users online (user
statistics not available). |
Similar resources or cooperations |
Unique database, no similar existing resources for Norwegian |
Data or tool |
data |
93. The Norsk Ordbok Meta Dictionary (Norsk Ordboks metaordbok)
Type |
Electronical index over Norsk Ordbok 2014's digital resources. In this index, the
material is organized and made accessible by normalized entries. |
Size |
Ca. 600,000 entries |
Languages |
Norwegian Nynorsk |
Rightholders |
Norsk Ordbok 2014, ILN, Universitetet i Oslo |
Anticipated access policy |
Public and free: available on the Internet |
Anticipated reuse policy |
No restrictions on use |
Anticipated location |
Department of Linguistics and Scandinavian Studies (University of Oslo) |
Effort needed (a) technical (b) nontechnical |
As of October 2010, more than 10 man-years has been put into normalizing the Meta
Dictionary according to the 1938-standard. This effort requires regular and continuous
maintenance of standarizing procedures. Due to the constant addition of new material,
resources are allocated to this work throughout the entire project period. The costs of
integrating the Meta Dictionary with the Norwegian Wordbank is not yet estimated, since
this is not a part of the NO 2014 project plan and responsibilities. |
Rationale for selection |
This index accounts for Norwegian Nynorsk after the spelling norm standard of 1938,
and corresponding indexes should be established for each of the official spelling
standards in order to account for the entire history of Norwegian standards. As a fully
developed resource the Meta Dictionary could be integrated with the Norwegian Wordbank,
increasing the current lemma inventory from 100 000 entries to approximately 600 000. |
Present usage |
Used in the editing of Norsk Ordbok, as well as by public users online (user
statistics not available). |
Similar resources or cooperations |
Unique database, no similar existing resources for Norwegian |
Data or tool |
data |
94. The Nynorsk version of the Norwegian Word Bank (Norsk Ordbank, nynorskversjonen)
Type |
Electronical database of lemmas, incl. register of all inflectional forms |
Size |
Ca. 100,000 entries |
Languages |
Norwegian Nynorsk |
Rightholders |
Department of Linguistics and Scandinavian Studies (University of Oslo)/The
Norwegian Language Council (Språkrådet) |
Anticipated access policy |
Available on the Internet, password-restricted access |
Anticipated reuse policy |
Restricted (this resource is of commercial interest for product(s) requiring
Norwegian according to current spelling norms. This goes for both the digitized and
paper products) |
Anticipated location |
Department of Linguistics and Scandinavian Studies (University of Oslo) |
Effort needed (a) technical (b) nontechnical |
Consecutively being developed. Analyser tools should be applied in order to
increase the number of lemmas. Upgrades estimated to approximately 0,5 man-year during
2010. |
Rationale for selection |
Accounts for Norwegian Nynorsk according to current spelling norms. Also provides
an historical overview Norwegian spelling norms. |
Present usage |
Used in the development of standards/norms for Norwegian language |
Similar resources or cooperations |
Unique database, no similar existing resources for Norwegian |
Data or tool |
data |
95. The Dictionary Home (Ordboksheimen)
Type |
Electronical database of older dialectal collections |
Size |
Ca. 20,000 words |
Languages |
Older Norwegian Nynorsk (not standarized) |
Rightholders |
Norsk Ordbok 2014, ILN, Universitetet i Oslo |
Anticipated access policy |
Public and free: available on the Internet |
Anticipated reuse policy |
No restrictions on use |
Anticipated location |
Department of Linguistics and Scandinavian Studies (University of Oslo) |
Effort needed (a) technical (b) nontechnical |
Currently being developed. Approximately 10 man-years of specialized work is
required to collect and digitize all existing material of older Norwegian dialect
text/samples. |
Rationale for selection |
Accounts for spoken language from all over Norway, contributes to the overall
knowledge of Norwegian dialects/spoken language and their history. |
Present usage |
Used in the editing of Norsk Ordbok, as well as by public users online (user
statistics not available). |
Similar resources or cooperations |
Unique database, no similar existing resources for Norwegian |
Data or tool |
data |
96. ADB_OD_Nor.NOR by NST
Type |
Acoustic database for speech recognition. Recorded for acoustic modelling for
PC/Multimedia speech recognition and dictation software. The recordings were made in
office environments and are based on phonetically balanced manuscripts derived from the
Norwegian corpus. The database consists of a training and a testing part. The training
part is used to train the acoustic model and the testing part is used to test it. One
sound file contains one manuscript line, most often a sentence, in some cases a phrase
or a single word. The recording script for the training data contains a dictation part
and an ASR-part. The dictation part is aimed at general dictation and contains regular
sentences extracted from the corpus. The first 222 units (sentences) are aimed at
dictation. The last 90 units are aimed ASR and consist of person names, place names,
single words, acronyms and other types of data specifically needed for training a speech
recognizer. The recording script for the test database is similarly divided into a
dictation and an ASR-part. |
Size |
312 training recordings, 987 test recordings |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Free for research and development purposes |
Anticipated reuse policy |
Restricted (free for research and development purposes) |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
97. ADB_D_IBM-N by NST
Type |
Acoustic database for speech recognition. Recorded for acoustic modelling for
dictation software (desktop). The recordings were made with the IBM-software ObjectRexx
in the start-up phase of the cooperation between NST and IBM as a part of the training
of NST-employees. The database consists of three parts recorded for the purposes of
testing, training and modelling. One sound file contains one manuscript line (e.g.,
sentence, phrase, single word, series of digits and numbers and series of letters). This
database is not validated, therefore documentation is limited. |
Size |
576 lines, 33,360 recordings |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Free for research and development purposes |
Anticipated reuse policy |
Restricted (free for research and development purposes) |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
98. ADB_T_Nor.NOR by NST
Type |
Acoustic database for speech recognition. Contains telephone recordings over landline
and mobile phones. These data are aimed at speech recognition over the telephone. The
material is not divided in testing and training data. NST followed the general SpeechDat
II-procedures for the recordings. The recordings were made partly with LandH-software
and partly with UMS Diginform. The recordings contain 17 utterances of semi-spontaneous
speech in the form of answers to questions and 40 utterances of read sentences. This
database is only partly validated. |
Size |
3108 land line recordings and 1596 mobile phone recordings (validated). |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Free for research and development purposes |
Anticipated reuse policy |
Restricted (free for research and development purposes) |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
99. Database with recorded hesitation sounds by NST
Type |
Acoustic database for speech recognition. Collected for the creation of acoustic
models of hesitation sounds, i.e., non-verbal sounds produced between words, if a
speaker is hesitating. This material is used for general dictation systems. |
Size |
50 sentences, 300 recordings |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Free for research and development purposes |
Anticipated reuse policy |
Restricted (free for research and development purposes) |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
100. Speech Synthesis for Norwegian by NST/IBM
Type |
Acoustic database for speech synthesis. For the development of the IBM's speech
synthesiser, professional voices were engaged for the recordings, i.e., one male voice
per language. The recordings were made with IBM equipment in a sound studio in Voss,
Norway but usage of this proprietary recording software does not prevent future usage of
the data since the data are available in standard PCM-format. The recording manuscripts
are based on the NST corpus. An optimal set of sentences was produced with IBM's
OptScript software. |
Size |
5363 recordings |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Free for research and development purposes |
Anticipated reuse policy |
Restricted (free for research and development purposes) |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
101. The Norwegian NST lexicon
Type |
Meta-lexicon complied of several resources. The lexicon was augmented with data from
a Norwegian inflector program. The inflector's in-data consist of 50 000 base forms.
These are identical to those in NorKompLeks – a bought resource based on Bokmålsordboka.
The base forms are converted to SAMPA and manually controlled and if necessary, changed
to NST's transcription conventions. The transcriptions of approx. 254 000 entries are
manually controlled, while the 499 000 entries generated from the inflector are only
partially controlled. All entries, except garbage terms, are annotated with information
in all obligatory fields.The vocabulary is general and no special domains are
represented. The lexicon consists of the 100k-list. All terms in the NST-recording
manuscripts are transcribed in the lexicon. Further, the lexicon contains all entries in
the Bokmålsordboka (via NorKompLeks/inflector) including conjugated forms and all terms
in the SpeechDat-material. More person names, place names, company names, etc. (from
e.g., Onomastica) have been added to the lexicon in later projects. |
Size |
Total number of entries: 784,240, total number of transcriptions: 1 006 562 |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Free for research and development purposes |
Anticipated reuse policy |
Free for research and development purposes |
Anticipated location |
Language Technology Resource Collection for Norwegian – Språkbanken |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
102. Transcription conventions for Norwegian by NST
Type |
Guidelines for the transcription of the NST Norwegian lexicon and the phoneme
inventory used in the Norwegian lexicon. |
Size |
n/a |
Languages |
n/a |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Free for research and development purposes |
Anticipated reuse policy |
Free for research and development purposes |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data/tool |
103. INSO (bokmål-utpakket, nynorsk-utpakket, NST)
Type |
Bought lexical resource, annotated with inflection, POS, morphology, compounding. |
Size |
71,006 base forms, 595,619 inflected forms |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Currently not accessible |
Anticipated reuse policy |
n/a |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
104. The Norwegian Computational Lexicon (Norkompleks, NST)
Type |
Bought lexical resource. The Norwegian Computational Lexicon (NorKompLeks) is the
result of a collaboration funded by NFR-, Telenor og Norwegian University of Science and
Technology. The outcome was a computational lexicon for both of the official Norwegian
languages (bokmål og nynorsk). The selection of words in the computational lexicon is
primarily from Bokmålsordboka og Nynorskordboka (both from the Lexicography devision at
the Department of Scandinavian Studies and Comparative Literature located at the
University of Oslo). Annotated with POS, morphology, phonetic transcription. |
Size |
80,443 base forms, 460,777 inflected forms |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Currently not accessible |
Anticipated reuse policy |
n/a |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
105. Onomastica (NST)
Type |
Bought lexical resource. The Norwegian material of a multi-language pronunciation
lexicon of proper names. Annotated with POS, phonetic transcription, quality. |
Size |
556,499 names |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Currently not accessible |
Anticipated reuse policy |
n/a |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
106. Statistisk sentralbyrå (NST)
Type |
Bought lexical resource. Pronunciation database of proper names. Annotated with
frequency, POS. |
Size |
71,795 names |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Currently not accessible |
Anticipated reuse policy |
n/a |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
107. Bronnoy_navn
Type |
Bought lexical resource. Pronunciation database of proper names. Annotated with POS. |
Size |
1,019,643 names |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Currently not available |
Anticipated reuse policy |
n/a |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
108. Bokmål corpus by NST
Type |
Text corpus. The Bokmål Corpus was to some extent cleaned up before the development
of manuscript sentences and lexical data took place. The resulting corpus consists of
text files with approx. 735 M words, while the complete corpus consists of approx. 975
words. The clean-up consisted of conversions from proprietary formats to text files,
removal of duplicates, and removal of unusable files (e.g., tiff-files, QuarkXpresss,
FrameMaker, etc.). The work is limited to conversions into text format. The texts
contain all text in the original documents. Anonymisation of correspondence was not done
- something which would be necessary for general distribution and usage of the material.
The material lacks tagging of linguistic information (POS, lemma, etc.). In some of the
material, structure is marked (paragraphs, headings, etc.). This was done by the
supplier and does not follow a standard defined for the corpus project (or any other
xml/sgml-standard.) NST started to code the corpus in XML. Some of the ITAvisen-texts
are coded with structural information. 3.8 M words (0,4% of the complete material) were
coded in this way. Only texts from ITAvisen were coded and no corresponding Nynorsk
material is available. |
Size |
Ca. 975 M words |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Currently not available |
Anticipated reuse policy |
n/a |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
109. Nynorsk corpus by NST
Type |
Text corpus. Raw data from very few sources (mostly internet texts, very small files) |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Joint ownership between University of Oslo, University of Bergen, Norwegian
University of Science and Technology, The Norwegian Language Council (Språkrådet) and
IBM AS |
Anticipated access policy |
Currently not available |
Anticipated reuse policy |
n/a |
Anticipated location |
The Norwegian Language Bank (Norsk Språkbank) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Part of the estate of Nordisk Språkteknologi Holding AS |
Present usage |
Commercial use or research |
Similar resources or cooperations |
n/a |
Data or tool |
data |
110. Prehistoric Artifacts, Sites and Monuments in Western Norway
Type |
Archaeology: A comprehensive overview of historical sites, monuments and artifacts in
the 78 municipalities belonging to the region served by the Bergen Museum. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Bergen museum, UiB |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
111. The Main Catalogue of the Artefact Collection (Oldsaksamlingen)
Type |
Archaeology: The main catalogue of acquisitions for Oldsaksamlingen in Oslo describes
all the artifacts that have arrived at the museum. All printed annual acquisition
records are now available. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Etnografisk museum, University of Oslo |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
112. The Archeaeology Database
Type |
Archaeology: A prototype of the various museum databases that are being developed.
This database makes it possible to search in the acquisition catalogue of
Oldsaksamlingen (see above) and among artifacts and recorded sites and monuments found
in the Marum area in the Sandefjord municipality. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Bergen museum, UiB |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
113. The Bergen Museum Main Inventory Catalogue
Type |
Archaeology, converted archive not yet available |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Bergen museum, UiB |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
114. The Bergen Museum Topographical Archives
Type |
Archaeology, converted archive not yet available |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Bergen museum, UiB |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
115. The Tromsø Museum Main Inventory Catalogue
Type |
Archaeology, converted archive not yet available |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
UiTø |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
116. Lists of Photographs, Tromsø Museum
Type |
Archaeology, converted archive not yet available |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
UiTø |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
117. The Main Inventory Catalogue of the Museum of Science and Natural History, Norwegian
University of Science and Technology
Type |
Archaeology, converted archive not yet available |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Museum of Science and Natural History, Norwegian University of Science and
Technology |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
118. The Bokmål Dictionary (Bokmålsordboka)
Type |
Lexicography, electronic dictionaries: The Section for Norwegian Lexicography and at
the Department for Scandinavian Languages and Comparative Literature at the University
of Oslo has collaborated with the Norwegian Universities' Documentation Project to offer
a simplified version of the most recent edition of the Bokmål Dictionary on the
Internet. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
ILN/University of Oslo and Språkrådet |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Restricted |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
119. The Nynorsk Dictionary (Nynorskordboka)
Type |
Lexicography, electronic dictionaries: The Section for Norwegian Lexicography at the
Department of Scandinavian Languages and Comparative Literature at the University of
Oslo has collaborated with the Norwegian Universities' Documentation Project with the
aim of offering a simplified version of the most recent edition of the Nynorsk
Dictionary on the Internet. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
ILN/University of Oslo and Språkrådet |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Restricted |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
120. The Lexical Source Manuscript (Grunnmanuskriptet)
Type |
Lexicography, electronic dictionaries: Grunnmanuskriptet is an old dictionary
manuscript from the 1930s (approximately 13500 typewritten A4 pages). The entries were
collected from the dictionaries by Aasen, Ross, Schjøtt, Vidsteen, Torp and others, but
the definitions for the entries are all given in Nynorsk. The manuscript was never
published as a dictionary, but has provided the basis for the development of the
Norwegian Dictionary (Norsk Ordbok). This manuscript has now been recorded as electronic
text and marked in a way that enables the reader to find information such as dialects
with their location, etymology, quotations and sources. |
Size |
Ca. 13,500 typewritten A4 pages |
Languages |
Norwegian |
Rightholders |
Section for Norwegian Lexicography and Dialectology in the Department of
Scandinavian Languages and Comparative Literature, University of Oslo. |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
121. Literary texts, Bokmål
Type |
Texts: Through the Documentation Project, a large part of the Bokmål text material
found in the Section for Norwegian Lexicography and Dialectology in the Department for
Scandinavian Languages and Comparative Literature at the University of Oslo was
digitized. The material in the archives consists of several card files with excerpts
from Norwegian literature. Instead of digitizing the material in the author archives,
the complete texts were scanned in. The scanned material includes texts dating from
(approximately)1550 to 1900, but most of it is from the 1800s. A total of 60000 book
pages were scanned in. |
Size |
60,000 pages |
Languages |
Norwegian |
Rightholders |
Section for Norwegian Lexicography and Dialectology in the Department of
Scandinavian Languages and Comparative Literature, University of Oslo. |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
122. Older collections of words
Type |
Texts: The Nynorsk section of the project has scanned in a selection of 34 older
collections of words. Five of these have been given key words according to the 1938
standard: Norderhov (1698), Robyggjelaget (end of the 1600s), "Den Norske Dictionarium"
(printed in Copenhagen in 1646), Stavanger (1698), and Bø in Vesterålen (1698). The
remaining 29 will eventually be made accessible for free text searches. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Section for Norwegian Lexicography and Dialectology in the Department of
Scandinavian Languages and Comparative Literature, University of Oslo. |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
123. Texts in Nynorsk
Type |
Texts: The scanned material from the Nynorsk project section is made up primarily of
texts of which only fragments are given in the card file. These texts will therefore
supplement the card-file database. The material includes a number of complete literary
texts by various authors writing in Nynorsk, the 1921 edition of the Bible in Nynorsk, a
selection of books from the Norwegian Folklore Association series (NFL), and some
complete annual volumes of "Syn og Segn", a Nynorsk literary journal. In addition, a
selection of older collections of words (see above) have been scanned in. The scanned
material will provide the basis for a larger body of texts which will be made available
for free text searches. These texts are not accessible at this stage. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Section for Norwegian Lexicography and Dialectology in the Department of
Scandinavian Languages and Comparative Literature, University of Oslo. |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
124. The Norwegian Dictionary (Norsk Ordbok)
Type |
Word Archives: The card file consists of approximately three million entries with key
words organized alphabetically in accordance with the 1938 standard. Each entry contains
excerpts from Nynorsk literature, journals and newspapers. In addition, information
about dialects is provided by native speakers all around the country. A facsimile has
been made of each entry, allowing the electronic retrieval of its image. Basic
information related to each entry, for example the key word (the word or phrase defined
or illustrated by the entry - when standardized according to the 1938 standard), the
grammar (which part of speech the key word corresponds to) and the source (the source of
the information on the entry). |
Size |
Ca. 3 M entries |
Languages |
Norwegian |
Rightholders |
Section for Norwegian Lexicography and Dialectology in the Department of
Scandinavian Languages and Comparative Literature, University of Oslo. |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
125. The Norwegian Dictionary (Norsk Ordbok) - additions after 1990
Type |
Word Archives: After the Documentation project was launched, new entries have been
recorded in the card file. These new entries eventually will be incorporated in the
card-file database, but, in the mean time, they are stored in a separate database for
new acquisitions. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Section for Norwegian Lexicography and Dialectology in the Department of
Scandinavian Languages and Comparative Literature, University of Oslo. |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
126. The Dictionary of Dialect Words from the Trøndelag Region (Trønderordboka)
Type |
Word Archives: In 1981 a project was initiated at the Department of Scandinavian
Studies and Comparative Literature, Norwegian University of Science and Technology
(Norwegian University of Science and Technology), Trondheim, which was aimed at
compiling a dictionary of the Trøndelag dialects. A total of approximately 180000
entries has been collected giving examples of the variations of the Trøndelag dialect.
The examples have been collected from literature and from the spoken language. The
entries are of the same type as those contributing to the Norwegian Dictionary (Norsk
Ordbok). They consist of a key word, information about or examples of how the word is
used, and information about the source of the recorded information. |
Size |
Ca. 180,000 entries |
Languages |
Norwegian |
Rightholders |
Department of Scandinavian Studies and Comparative Literature, Norwegian University
of Science and Technology |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
127. The New Words Database
Type |
Word Archives: The New Words Archives at the Section for Norwegian Lexicography and
Dialectology in the Department for Scandinavian Languages and Comparative Literature at
the University of Oslo, contains quotations from newspapers, journals and magazines.
There are approximately 300000 quotes from 174 different sources. The compilation of
this database has been going on for several decades, so the term "new word" should be
interpreted in an historical perspective. The word was new, or had acquired a new
meaning, or had come to be used in a different way at the time of registration. This
edition of the New Words Database consists of 116005 quotes, the majority of which are
from the years 1968 to 1972. However, the oldest entries are from 1920 while the most
recent ones are from 1994. The number of quotations will gradually increase as more of
the material is processed. In each citation one or more of the words are selected
(excerpted) and dealt with (this edition offers a total of 195744 excerpts). The
excerpted words are transformed to their basic form and provided with a code for
grammatical function (part of speech) and other relevant codes. Simple words (including
derivatives and composites) are marked with a single code for grammatical function while
composite words are given codes for both word elements although it is the last element
that determines the grammatical function of the composite word. One hundred four
different auxiliary codes provide information about morphology, phraseology, imagery,
etc. |
Size |
116,005 quotes |
Languages |
Norwegian |
Rightholders |
Section for Norwegian Lexicography and Dialectology in the Department of
Scandinavian Languages and Comparative Literature, University of Oslo. |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
128. Allex - African Languages Lexical Project
Type |
Word Archives: Text Corpora, Sound Corpora, Parsers, Dictionaries for Zimbabwe
languages |
Size |
n/a |
Languages |
Shona, Ndebele, Nambya |
Rightholders |
Department of Linguistics and Scandinavian Studies (University of Oslo) African
Languages Research Institute, University of Zimbabwe, Zimbabwe, Unit for Digital
Documentation (University of Oslo) |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
129. Ballads
Type |
Folklore Studies: In its work with the ballad material, the Documentation Project has
aimed to make a scholarly edition of Norwegian ballads in electronic format which
faithfully reproduces the original. The original material, housed in several different
archives, includes original manuscripts of ballad texts, old notations of the tunes and
audio recordings of old and more recent renditions of the ballads. This collection
represents 240 types of Norwegian ballads. Approximately 3900 different varieties have
been digitized in this project. The material is extensive and varied. Some of the
digitized texts are accompanied by notations of the tunes and audio recordings. |
Size |
240 types of Norwegian ballads, approximately 3900 different varieties have been
digitized |
Languages |
Norwegian |
Rightholders |
Department of Cultural Studies, University of Oslo and Norwegian Ballad Archives |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
130. Court proceedings and protocols
Type |
Digitized records of 1600s and 1700s court proceedings at the lowest courts, where a
public registrar or a notary public presided as judge. The records include accounts of
disagreements over debts, conflicts between neighbors, broken marriage vows as well as
serious criminal acts. The Gothic handwriting is different from printed Gothic text as
well as from modern handwriting. The language used in the protocols is unfamiliar, being
influenced formal, official styles of Danish and German. Furthermore, the texts include
a large number of peculiar symbols and abbreviations. The oldest preserved Norwegian
court protocols are from Rogaland in southwestern Norway (Jæren and Dalane, 1613, and
Ryfylke, 1616) and Finnmark (1620) in the far north. In 1633 a royal decree ordered the
recording of court protocols at the lower courts in Norway. Nonetheless, we have only a
few from the first half of the 1600s. However, after the introduction of absolute
monarchy in 1660, compliance with this decree seems to have been the rule though many of
the records from this period have been lost. After 1700 the court protocols were
systematically stored in well-organized volumes but, even so, there are occasional gaps. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Department of History, University of Oslo |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
131. Diplomatarium Norvegicum
Type |
Medival Material: Diplomatarium Norvegicum is a series of text sources which give a
verbatim and linguistically faithful reading of documents older than 1570. It is now, in
1998, 150 years since the first volume was published; the first of a total of 22 volumes
which include approximately 19000 documents. As the foremost example of Norwegian source
editions, Diplomatarium Norvegicum is the principal source for anyone working with
medieval text material. A facsimile of a diploma from 1224. The black and white
facsimile is of moderately good quality. You can find the digital text by searching in
the database. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Old Norse Dictionary Unit (Gammalnorsk Ordboksverk), University of Oslo |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
132. Henrik Wergeland - collected works
Type |
Literature: The texts in this web edition equal those found in the 23 volume edition
edited by Herman Jæger, Didrik Arup Seip, Halvdan Koht and Einar Høigård, published by
Steenske Forlag, Kristiana/Oslo 1918-40. As part of the Bokmål project, the texts were
scanned and extensively coded in SGML. The web presentation (as html documents) is
generated automatically from this coding. This makes the graphical layout of some of the
pages look strange. We will continue to enhance the typgraphical quality. When printing
these texts, one should be aware that some printers divide html documents somewhat on
random. We have planned to make PDF versions of all the texts. These can be read by the
program "ACROBAT READER" (which is published along with web browsers). The PDF format is
better suited to maintain typography at print. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
n/a |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
133. Norwegian Farm Names
Type |
Place names: O. Rygh’s collection, "Norwegian Farm Names" (Norske Gaardsnavne)
consists of 18 volumes, one for each of the Norwegian counties. It contains information
about all Norwegian farms and some of their subunits, amounting to a total of 55000
entries. The names are organized according to districts and by consecutive, increasing
farm registration numbers. The electronic version of this collection ("Elektroniske
Norske Gaardsnavne") is being created with support from the Norwegian Research Council
and from the following counties: Østfold, Vestfold, Akershus, Rogaland, Hordaland, Møre
og Romsdal, Sogn og Fjordane, Sør-Trøndelag and Nord-Trøndelag. More information on
"Norske Gaardsnavne". |
Size |
55,000 entries |
Languages |
Norwegian |
Rightholders |
Section for Place Name Studies, Department of Scandinavian Studies and Comparative
Literature, University of Oslo |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
134. The Land Register Draft of 1950 (Matrikkelutkastet frå 1950)
Type |
Place names: The Norwegian Finance Department has compiled approximately 85000 lists
of real estate in Norway, organized by consecutive, increasing farm numbers within each
municipality. The revision of the land register was never completed, and since Finnmark
county is not included in the lists, they are referred to as a draft. In addition to
farm names, the draft includes the names of private homes, vacation homes, lots, public
and private institutions, etc. The land register draft is an important tool for the
State Name-Consultancy Service (Statens navnekonsulenttjeneste) in its efforts to
standardize place names. Since it also includes the names of the owners of all the
listed properties, it can also be of use to researchers studying names of people. In
fact, it is often the only accessible comprehensive source of names for newer
properties. |
Size |
Approximately 85,000 lists of real estate |
Languages |
Norwegian |
Rightholders |
Section for Place Name Studies, Department of Scandinavian Studies and Comparative
Literature, University of Oslo |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
135. The Home Name Register
Type |
Place names: This register, found in the Section for the Study of Names at the
University of Oslo, includes the names of homes (farms, their subunits and summer
pastures) from ten of Norway’s counties. The names are organized by consecutive,
increasing farm numbers within each district. The register includes 109000 archive cards
which provide information on the spelling of the name, its pronunciation, correct
preposition and dative form of the name, older versions of the spelling, and variations
in spelling and pronunciation. They will often include comments on topography,
peculiarities of dialect, and the interpretation of the name. |
Size |
109,000 archive cards |
Languages |
Norwegian |
Rightholders |
Section for Place Name Studies, Department of Scandinavian Studies and Comparative
Literature, University of Oslo |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
136. The Place Name Archives at the University of Tromsø
Type |
Place names: The North Norway Archives of Dialects at the Department of Language and
Literature, University of Tromsø, includes records of Norwegian place names as well as
collections of Saami and Finnish place names. A significant part of the material was
collected locally in recent times, but it also includes copies of older collections from
other institutions. The size of the collection is estimated at one million names. In the
process of selecting material for digital conversion, certain guidelines have been
developed for determining priorities. |
Size |
Ca. 1 M names |
Languages |
Norwegian |
Rightholders |
Department of Language and Literature, University of Tromsø |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
137. Audio Recordings at the University of Tromsø
Type |
Place names |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Department of Language and Literature, University of Tromsø |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
138. Writings by School Children from Nordland and Troms Counties (the Indrebø material)
Type |
Place names: This database includes approximately 100000 names of places in North
Norway which were found in Indrebø’s collection of writings by school children. |
Size |
Ca. 100,000 names |
Languages |
Norwegian |
Rightholders |
Department of Language and Literature, University of Tromsø |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
139. Literature Lists, the etymology register in Uppsala, Sweden
Type |
Place names: The largest collection of place names in the Nordic countries is found
in the Ortnams Archives in Uppsala. It consists of approximately 240000 archive cards, a
number which is constantly increasing as new material is being excerpted. The register
includes Nordic names and name elements with literature references. The archives are not
yet accessible by the public. While the material is being digitized, a separate list is
being compiled over all the literature referred to in the archive. |
Size |
Ca. 240,000 archive cards |
Languages |
Swedish |
Rightholders |
Etymology registry in Uppsala, Sweden, and Section for Name Research, Department of
Scandinavian Studies and Comparative Literature, University of Oslo |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
140. Multi-tagger
Type |
The multi-tagger is a part of the Oslo-Bergen-tagger and is based on word lists from
Norsk Ordbank. The multi-tagger performs morphological analysis, compound analysis and
multi-word expression detection. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Developers: The tagger project (The Text Laboratory and EDD) and Aksis, University
of Bergen (now Uni Digital) |
Anticipated access policy |
May be downloaded for non-commercial use according to GPL conditions. |
Anticipated reuse policy |
May be downloaded for non-commercial use according to GPL conditions. |
Anticipated location |
The Text Laboratory, ILN, University of Oslo / Uni Digital |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
141. NorGram
Type |
LFG grammar which was developed in the Norwegian part of ParGram. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
LLE, University of Bergen |
Anticipated access policy |
Public and free, licence. Grammar is open source and free of charge, and the demo
version is available at http://decentius.aksis.uib.no/logon/xle.xml. However, the parser
requires the XLE (Xerox Linguistic Environment) from PARC, for which users need to sign
a license. XLE is free of charge but without source code and with strong restrictions on
usage. |
Anticipated reuse policy |
Restricted |
Anticipated location |
University of Bergen |
Effort needed (a) technical (b) nontechnical |
Will be continually updated in the INESS project |
Rationale for selection |
Reusable computational grammar for Norwegian Bokmål and Nynorsk with a broad
empirical coverage and a healty theoretical foundation. |
Present usage |
Used within TREPIL. |
Similar resources or cooperations |
NorGram is affiliated with the the Parallel Grammar Project (ParGram), an
international cooperative effort to develop parallel LFG grammars for English, French,
German, Norwegian, Japanese and Urdu. |
Data or tool |
tool |
142. Norwegian Syntax-based Grammar (Norsyg)
Type |
HPSG grammar for Norwegian. A continuation of earlier grammars: NorSource, Saargram,
Phdgram. The initial grammar was based on the Grammar Matrix version 0.6. The
implementation platform is the LKB system. Problem is limited coverage and robustness
(provides output for roughly 50% of the input sentences). |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Petter Haugereid, Norwegian University of Science and Technology |
Anticipated access policy |
Free for research, LGPL. The dictionary can be downloaded from Norsk Ordbank's site
at University of Oslo. |
Anticipated reuse policy |
Free for research, LPGL |
Anticipated location |
Norwegian University of Science and Technology |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
There are two important assumtions made in Norsyg that distinguishes it from other
implemented grammars. First, the linking between the syntax and the mantics is done in
the syntax, rather than in the lexicon. And second, the topic is realized at the bottom
of the tree, and not at the top. |
Present usage |
n/a |
Similar resources or cooperations |
NorGram |
Data or tool |
tool |
143. Shallow PARsing of TAgged Norwegian Nouns (Spartan)
Type |
Parser. A package of Perl scripts for extracting dependency relations between nouns
(from text). Requires input tagged with Oslo-Bergen-tagger. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
Erik Velldal, University of Oslo |
Anticipated access policy |
Public and free |
Anticipated reuse policy |
Public and free |
Anticipated location |
University of Oslo |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
tool |
144. NorKompLeks
Type |
Computational lexicon for both of the official Norwegian languages (bokmål og
nynorsk). The selection of words in the computational lexicon is primarily from
Bokmålsordboka og Nynorskordboka. The monolingual dictionary also provides arument
structure for verbs. |
Size |
n/a |
Languages |
Norwegian |
Rightholders |
The Department of Language and Commucation Studies at the Norwegian University of
Science and Technology |
Anticipated access policy |
Available both for research and commercial use. |
Anticipated reuse policy |
Available both for research and commercial use. |
Anticipated location |
Norwegian University of Science and Technology |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
145. Stor Ordbok (the big dictionary)
Type |
Electronic dictionary. Digital version of the most comprehensive Norwegian-English
and English-Norwegian dictionary in print. |
Size |
217,000 entries and multi word expressions, and 522 000 translations. |
Languages |
Norwegian, English |
Rightholders |
Probably Kunnskapsforlaget |
Anticipated access policy |
Restricted. Requires permission from publisher. Available through Internet
subscription. |
Anticipated reuse policy |
Restricted. Requires permission from publisher. |
Anticipated location |
Unknown (possibly Kunnskapsforlaget) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
Lists both British and American word forms and spelling standards. Covers both
everyday language and technical terms from special fields. |
Present usage |
Has been used in several research projects. |
Similar resources or cooperations |
Same publisher also has Norwegian-German and Norwegian-Italian. |
Data or tool |
data |
146. TriTrans
Type |
Online multi-language dictionary. Plain words only. |
Size |
Ca. 22,000 Norwegian words |
Languages |
Norwegian, English, Spanish. |
Rightholders |
n/a |
Anticipated access policy |
Free of charge |
Anticipated reuse policy |
Free of charge |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
147. Websters online dictionary
Type |
Online dictionary, Norwegian-English and English-Norwegian. |
Size |
n/a |
Languages |
Norwegian, English |
Rightholders |
Websters |
Anticipated access policy |
Crawling not allowed, but possibly free for research after permission from owners. |
Anticipated reuse policy |
Crawling not allowed, but possibly free for research after permission from owners. |
Anticipated location |
n/a |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
148. Clue dictionaries
Type |
Presumably the largest electronic dictionaries for Norwegian,
Norwegian-English-Norwegian and Norwegian-German-Norwegian. |
Size |
n/a |
Languages |
Norwegian, English, German. |
Rightholders |
Clue Norge ASA |
Anticipated access policy |
Commercial purchase (ca. 700 euros) |
Anticipated reuse policy |
Commercial purchase (ca. 700 euros) |
Anticipated location |
Clue Norge ASA |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
Commercial |
Similar resources or cooperations |
n/a |
Data or tool |
data |
149. dicts.info
Type |
Four sets of dictionaries in a multitude of languages: Universal dictionary,
Wiktionary, Omegawiki, and Wikipedia. All four sets include all the PRESEMT languages:
Norwegian, Italian, German, Czech, Greek and English. |
Size |
n/a |
Languages |
Norwegian, Italian, German, Czech, Greek and English. |
Rightholders |
dicts.info |
Anticipated access policy |
Free |
Anticipated reuse policy |
Free |
Anticipated location |
Unknown (possibly dicts.info) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
150. RuN corpus of parallel Norwegian-Russian-English-Serbian texts
Type |
A parallel Norwegian-Russian-English-Serbian corpus, partly based on the existing
Oslo Multilingual Corpus, is developed in the RuN project with technical assistance from
the Text Laboratory, University of Oslo. The corpus will provide a basis for contrastive
studies, including the study of grammatical phenomena such as Russian aspect,
information structure, (in)definiteness, bare nominals and tense/mood in Russian and
Norwegian from the perspective of both native speakers and second language learners. |
Size |
n/a |
Languages |
Norwegian, Russian, English, Serbian |
Rightholders |
ILOS, University of Oslo |
Anticipated access policy |
Restricted |
Anticipated reuse policy |
Restricted |
Anticipated location |
The Text Laboratory, ILN, University of Oslo) |
Effort needed (a) technical (b) nontechnical |
The resource is currently being developed, and the project is fully financed. The
project receives funding from the Norwegian Centre for International Cooperation in
Higher Education (SIU) through its Cooperation Programme with Russia. |
Rationale for selection |
The developers believe that focus on contrastive linguistics and translation
studies can bridge the gap between research and education in the field of advanced
second language learning of Russian and Norwegian. |
Present usage |
The RuN project has established an educational and research oriented environment for
graduate students and scholars from Russia (notably Murmansk Humanities Institute) and
the University of Oslo working on languages in contrast (Russian vs. Norwegian and/or
English) |
Similar resources or cooperations |
n/a |
Data or tool |
data |
151. Stockholm MULtilingual TReebank (SMULTRON)
Type |
A parallel treebank first developed by the Computational Linguistics Group at the
Department of Linguistics, at Stockholm University. Contains aligned syntactic trees for
(among others) Norwegian, English and German. Version 1.0 contains around 1000 sentences
in English, German and Swedish. The sentences have been PoS-tagged and annotated with
phrase structure trees. The trees have been aligned on sentence, phrase and word level.
Additionally, the German and Swedish monolingual treebanks contain lemma information.
The Institute of Computational Linguistics continues the work on the SMULTRON project.
Version 2.0 is an extension of the original treebank with a new text type: 500 sentences
from a user manual in English, German, Swedish and Spanish. Currently SMULTRON treebanks
with around 1500 sentences (version 2.0) in TIGER-XML format in 9 treebank files
(Spanish not yet included) plus 8 alignment files are being distributed. |
Size |
1500 sentences |
Languages |
Several languages including Norwegian. |
Rightholders |
The Computational Linguistics Group at the Department of Linguistics, at Stockholm
University. |
Anticipated access policy |
Free of charge for research purposes. Registered users only (name, affiliation, and
email address). |
Anticipated reuse policy |
Free of charge for research purposes. Registered users only (name, affiliation, and
email address). |
Anticipated location |
The Computational Linguistics Group at the Department of Linguistics, at Stockholm
University. |
Effort needed (a) technical (b) nontechnical |
There are plans to extend the treebank with new types and texts and more languages. |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
152. OPUS project corpora
Type |
Parallel text corpora for many different language pairs. OPUS is a growing collection
of translated texts from the web. The OPUS project converts and aligns free online data,
adds linguistic annotation, and provides the community with a publicly available
parallel corpus. OPUS is based on open source products and the corpus is also delivered
as an open content package. Several tools are used to compile the current collection.
All pre-processing is done automatically. No manual corrections have been carried out. |
Size |
n/a |
Languages |
Several languages including Norwegian. |
Rightholders |
Department of Linguistics and Philology, Uppsala University |
Anticipated access policy |
Free |
Anticipated reuse policy |
Free |
Anticipated location |
Unknown (possibly Department of Linguistics and Philology, Uppsala University) |
Effort needed (a) technical (b) nontechnical |
n/a |
Rationale for selection |
The main motivation for compiling OPUS is to provide an open source parallel corpus
that uses standard encoding formats including linguistic annotation. A public collection
of parallel corpora that can freely be used and distributed makes it possible for
everyone to run experiments on bitexts and their results can easily be compared. |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
153. Audio files with Norwegian dialects
Type |
Audio recordings of (older) dialects in Norway and from Norwegians in America (Seip
and Selmer 1931, Einar Haugen 1935-1948, Arnstein Hjelde 1987, Joseph Salmons,
University of Wisconsin 2009 and Janne Bondi Johannessen and Signe Laake 2010 (video
recordings)). The files are digitalized, but not transcribed. |
Size |
n/a |
Languages |
Norwegian dialects |
Rightholders |
n/a |
Anticipated access policy |
Restricted: access only for research and development purposes. |
Anticipated reuse policy |
Restricted. |
Anticipated location |
The Text Laboratory, ILN, University of Oslo |
Effort needed (a) technical (b) nontechnical |
The audio files need to be transcribed and made searchable by Glossa |
Rationale for selection |
n/a |
Present usage |
n/a |
Similar resources or cooperations |
Dialect archives at other institutions universities in Norway: |
Data or tool |
data |
154. Cadasters for Bergen 1686 and 1673
Type |
Text corpora. Cadastres (grunnbøker) of Bergen city, years 1686 and 1673. Available
through web interface, WebGIS and as PDF. Partly indexed on place names, addresses and
person names. |
Size |
n/a |
Languages |
Danish and Norwegian |
Rightholders |
Arne Solli and Geir Atle Ersland, AHKR, University of Bergen |
Anticipated access policy |
Open, http://gandalf.aksis.uib.no/bergis/GBB1686.page |
Anticipated reuse policy |
Restricted. |
Anticipated location |
University of Bergen |
Effort needed (a) technical (b) nontechnical |
none |
Rationale for selection |
historical research |
Present usage |
n/a |
Similar resources or cooperations |
n/a |
Data or tool |
data |
|
|