Open linguistic data, governance and data activism

Introduction

This module aims to investigate how the concepts of open data and open governance trigger open linguistic actions, practices and data activism. In particular, the first unit introduces the data governance frameworks of Openness and the CARE (Collective benefit, Authority to control, Responsibility and Ethics) principles. The second unit, through a discussion on the concept of language variety, places a particular emphasis on the value of indigenous and endangered languages, while the last unit offers a new perspective of language activism by introducing the concept of data activism.

This module, through the introduction of two state-of-the-art frameworks on data governance, namely Openness and CARE principles, aims to interrelate the concept of linguistic variety with that of data activism. The final result is a new approach to language activism through the prism of data governance and particularly Openness and CARE principles.

Katerina Zourou, Stefania Oikonomou

On completion of this module participants will be able to:

  • understand the role and the meaning of Openness and CARE principles frameworks;
  • understand the complexity of defining language variety;
  • identify the value of indigenous and endangered languages;
  • understand the contribution of data activism, allowing them to act towards the protection of language variety.

This module is open to any interested person eager to learn more on open linguistic data, governance and data activism. It particularly serves training purposes of:

  • university students in education and the learning sciences: this module will critically contribute to students’ pursuing bachelor and/or master degrees in any education-related field. In a multicultural educational environment, buttom-up actions in parallel with learning resources are deemed as necessary for safeguarding multilingualism;
  • civil society organisations: this module will offer individuals working at the civil society sector a concrete view over the topics of data governance, by emphasising on Openness and CARE principles  and language variety. The aim is to stimulate them towards undertaking actions in favour of language vitality and sustainability.
  • Unit 1: It focuses on Openness and CARE principles. The objective of this unit is for learners to become familiarised with these key definitions and to develop their skills on identifying cases of Openness and CARE principles. 
  • Unit 2: It introduces the theoretical framework on linguistic diversity, by bringing  emphasis on Indigenous and Endangered languages.
  • Unit 3: It deals with the central question of how language activism can be realised through data activism.

2 hours

The module’s main materials are based on a variety of resources:

  • Studies and reports issued by distinguished authors and international organisations on data governance, language variety and language activism. 
  • Explanatory videos with and video-lectures.
  • Materials provided by international organisations and Higher Educational Institutions through their official web pages.

Unit 1: Openness and CARE principles

In a world of complex social structures and various centres of knowledge production (e.g., universities, knowledge communities, etc.), digital networking has become a game-changer, radically transforming the way knowledge is generated and diffused. In this context, new and participatory information channels and flows have been created, offering people access to newly produced information and knowledge. Additionally, digital technologies have fostered an unprecedented accumulation and systematisation of data that was never witnessed before (Pentland, 2008). 

Therefore, regarding the above context, this unit will focus on issues related to two newly emerged frameworks of data governance; namely, Openness and CARE principles. To achieve the objectives of this module as well as the general objective of the BOLD project,  it  will examine the role of data by placing a particular emphasis on the following key questions: 

  • What is the ownership status of collected data? 
  • Who has the right to use the data? 
  • At what cost can someone use a dataset? 
  • What are the possible consequences of making data freely available? 

Hence, the learning objectives of this unit for learners are to:

  • become familiarised with the key definitions; 
  • to develop skills on identifying cases of Openness and CARE principles.

1.1 Openness

1.1.1 Introduction: Intellectual work and copyright

By delving into European history, we identify two critical points with regards to intellectual production before and after which a radical shift is observed. The first point is the invention of printing in 1434 by Gutenberg that totally changed the way books are produced and accessed by readers. The second point, coming as a logical consequence of the first, is the Statute of Anne (also known as the Copyright Act 1710), a legal act issued by the British Parliament in 1710 which was the first in history to mention Copyright as a legally regulated activity by the government and courts. 

What Copyright (symbol: ©) means, according to the Oxford Dictionary, is the right of a person or an organisation (henceforth: the originator) to legally print, (re-)publish, perform, (re-adapt) etc. an intellectual work. According to the World Intellectual Property Organization (WIPO), an intellectual work is defined as ‘the creations of the mind, such as inventions; literary and artistic works; designs; and symbols, names and images used in commerce’. In other words, an intellectual work is the intellectual property of its originator. Hence, Copyright refers to a set of legal regulations strictly rendering an intellectual work an absolute property of its originator

On an international level, Copyright is regulated by the Berne Convention (signed in 1886), managed and supervised by the WIPO [see here a synopsis of the Convention]. The convention outlines specific restrictions for users or consumers of an intellectual work in terms of having access, modification and sharing permissions without the legal approval of its originator.

1.1.2 The ‘Open’ definition

In contrast to the strict relation between an intellectual work and its originator, regulated by the Copyright, the concept of ‘Openness’ aims to define a more loose relation between the two. Specifically, according to the Open Knowledge Foundation [see here], Open is defined as the kind of knowledge, thus any kind of intellectual work related to it, that anyone is allowed to freely have access to and (re-use) it. This means for users that there are no charges for (re-)using the work and sharing it. The four principles constituting the Open definition must be fulfilled in order for an intellectual work to be considered as ‘Open: 

Prerequisites for Openness

According to the Open definition as provided by the Open Knowledge Foundation (n.d), for a work to be ‘Open’, it must satisfy the following requirement1: 

  1. Open licence: “The work must be in the public domain or provided under an open licence2 [see section 1.1.3 for a detailed definition of the term]. Any additional terms accompanying the work (such as a terms of use, or patents held by the licensor) must not contradict the work’s public domain status or terms of the licence”.  
  2. Access: “The work must be provided as a whole and at no more than a reasonable one-time reproduction cost, and should be downloadable via the Internet without charge. Any additional information necessary for licence compliance (such as names of contributors required for compliance with attribution requirements) must also accompany the work”. 
  3. Machine readability: “The work must be provided in a form readily processable by a computer and where the individual elements of the work can be easily accessed and modified”. [See also the FAIR principles that require data to be findable, accessible, interoperable and reusable.]
  4. Open format: “The work must be provided in an open format. An open format is one which places no restrictions, monetary or otherwise, upon its use and can be fully processed with at least one free/libre/open-source software tool”.3

The following schema depicts the four dimensions of Open definition.

It should be noticed that the conditions under which an Open work is publicly available must not be contradictory to any of the above prerequisites to the Open definition. Otherwise, the work is not ‘Open’.

Finally, regarding the definition of Openness, the following picture (made by Katja Mayer) summarises its several paradigms-fields of knowledge.

The Open paradigms CC BY 4.0 Katja Mayer

1 All definitions are quoted verbatim.
2 According to the Open Knowledge Foundation (n.d), “the term public domain denotes the absence of copyright and similar restrictions, whether by default or waiver of all such conditions”.
3 According to openscource.com, “Open source software is software with source code that anyone can inspect, modify, and enhance” while as “Source code” is defined “the part of software that […] computer programmers can manipulate to change how a piece of software […] works.

1.1.3 Open Licences

Technically, what distinguishes an open work from a copyright-protected one is its licence, namely the legal terms under which it is publicly provided (The Open Knowledge Foundation, see here). A licence contains the mixture of the conditions and permissions under which a work is available. 

Those conditions are divided in two main categories;

  • obligatory ones, touching upon the core of the Open definition, and 
  • optional ones (i.e. non-obligatory conditions). 

For a licence to be characterised as open, it must be subject to all obligatory conditions. [Check here the catalogue of  compatible licences, and here the catalogue of the non-compatible ones.]

The obligatory conditions are introduced as follows, accompanied with the definitions provided by the  Open Knowledge Foundation(n.d.):4

Obligatory Open conditions 

  1. Use: “The licence must allow free use of the licensed work”. 
  2. Redistribution: “The licence must allow redistribution of the licensed work, including sale, whether on its own or as part of a collection made from works from different sources”. 
  3. Modification: “The licence must allow the creation of derivatives of the licensed work and allow the distribution of such derivatives under the same terms of the original licensed work”. 
  4. Separation: “The licence must allow any part of the work to be freely used, distributed, or modified separately from any other part of the work or from any collection of works in which it was originally distributed. All parties who receive any distribution of any part of a work within the terms of the original licence should have the same rights as those that are granted in conjunction with the original work”. 
  5. Compilation: “The licence must allow the licensed work to be distributed along with other distinct works without placing restrictions on these other works”. 
  6. Non-discrimination: “The licence must not discriminate against any person or group”. 
  7. Propagation: “The rights attached to the work must apply to all to whom it is redistributed without the need to agree to any additional legal terms”.
  8. Application “The licence must allow use, redistribution, modification, and compilation for any purpose. The licence must not restrict anyone from making use of the work in a specific field of endeavour”. 
  9. No charge: “The licence must not impose any fee arrangement, royalty, or other compensation or monetary remuneration as part of its conditions”. 

The non-obligatory conditions are the following, as they defined by the Open Knowledge Foundation (n.d.):5

Non-Obligatory Open conditions

  1. Attribution: “The licence may require distributions of the work to include attribution of contributors, rights holders, sponsors, and creators as long as any such prescriptions are not onerous”. 
  2. Integrity: “The licence may require that modified versions of a licensed work carry a different name or version number from the original work or otherwise indicate what changes have been made”. 
  3. Share-alike: “The licence may require distributions of the work to remain under the same licence or a similar licence”. 
  4. Notice: “The licence may require retention of copyright notices and identification of the licence”.
  5. Source: “The licence may require that anyone distributing the work provide recipients with access to the preferred form for making modifications”. 
  6. Technical restriction prohibition: “The licence may require that distributions of the work remain free of any technical measures that would restrict the exercise of otherwise allowed rights”.
  7. Non-aggression: “The licence may require modifiers to grant the public additional permissions (for example, patent licences) as required for exercise of the rights allowed by the licence. The licence may also condition permissions on not aggressing against licensees with respect to exercising any allowed right (again, for example, patent litigation)”.

The most common type of open licences that systematise the above conditions are those of Creative Commons (i.e. CC licence). Each CC licence contains a different combination of conditions and permissions that a work is subject to. The following chart depicts the most commonly used types of CC licences.

The Creative Commons license types CC Attribution- Share Alike 3.0
4 All definitions are quoted verbatim.
5 All definitions are quoted verbatim.

1.1.4 Why Open data?

If copyright aims to protect originators of intellectual works, then why has the need for openness emerged? In other words, if someone particularly focuses on the case of data the question transforms into: “why should data be open?”. The answer to this question is provided through a series of questions related to the value of open data.

A. Who benefits from open data

If data were under the possession of one person, organisation/company or government, protected by Copyright, then there would be extreme costs for accessing and utilising them. Thus, they would be practically inaccessible to individuals, communities and organisations. Instead, many studies (i.a. Allam & Dhunny, 2019; Zhao & Zhang, 2020) have pointed out that open data have a multiplicative value for economies, since they give rise to the elaboration of processes and the acceleration of innovation (The open data handbook, n.d.).

B. What are the costs associated with data

It is well established that data has only a cost of production and minimal or no cost of reproduction (i.a. Benkler, 2006; Rifkin, 2014; Stigler, 1961). As raw data is not usually considered an intellectual output, it is questionable whether any Copyright protection should be applied to them. 

C. To whom does data belong

A critical issue related to data, is data ownership. As it will be further explained below (see CARE principles, section 1.2), data belong to those who produce them, not to those who process them, after collection has occurred. What seems to be challenging is for someone to delineate the concept of the ‘originator’, since originators are not usually those who possess the data, but the individuals who produced the data (the concept will be elaborated more in section 1.2). 

D. How can data accelerate innovation

In most cases, data is the final output of a process through which the produced data were collected via a (separate) monitoring process (e.g. traffic data collected through the GPS system). Openness applied in data offers individuals, organisations or governments the opportunity to combine them with other datasets that have never been combined before or utilise them in innovative ways. In other words, the more accessible data is, the more possibilities for new knowledge areas and hence innovative solutions and products are to emerge.

E. How does open data enhance multistakeholders collaboration?

By focusing on the value of multistakeholders collaboration that evolves around open data, there seems to emerge four possibilities of collaboration ​​(Meijer et al., 2019). The first is exclusively about data themselves, the combination of which could lead to new knowledge areas. The second is related to the promotion and establishment of innovative types of collaboration among different stakeholders. The third is about the social bonds developed among citizens, while the fourth involves a variety of actors such as governments, civil society organisations, public institutions etc. Hence, conceptualising what is described here, the term ‘multicentricity’, closely linked to openness, emerges and is defined as the development of networks of collaboration among multiple centres in society ​​(ibid.). 

F. How does open data contribute to active citizenship?

Data that are accessible to individuals who produced it fosters democracy, transparency and citizen participation, allowing citizens to become more aware of the meaning and value of their actions. Hence, in an open governance paradigm, where open data lies at its heart, citizens, namely individuals who are members of a community and from which certain legal rights and duties are enshrined by law (Center for the Study of Citizenship, n.d.), would become empowered to critically influence public policies and decisions through citizen engagement (Meijer et al., 2019). In this context, since open governance is defined as those innovative, bottom-up forms of action aimed at solving complex public issues, and is based on the fundamental changes derived from the widespread use of network technologies and open data, citizen deliberation6 and social action7 can become catalysts of governance (Meijer et al., 2019), as depicted in the schema below. An indicative example of an open governance initiative, combining citizen engagement with open data, is the citizens’ reaction to Canterbury’s earthquake in New Zealand in 2010. In this case, university students, in collaboration with public entities and civil society organisations, created an online regional map, using open governmental data on which people could pin locations where fresh water, power, gas or roads were blocked or damaged.

In conclusion, openness seems to be a radically new concept pushing societal change in completely new directions that could prompt progress and development in a more democratic and transparent way. Hence, open data seems to play a central role in this process.

7 The term ‘citizen deliberation’ refers to the process of citizens discussing and debating public issues. The goal of citizen deliberation is to inform citizens on issues related to public policy as well as to stimulate them to participate in the formation of it. For more information on this concept see the review of Delli Carpini et al. (2004).
8 According to Budiman (2023) a general and simple definition of the term ‘social action’ is any “action that is influenced and affects other people during social interactions”. For more information you can visit this online course.

1.2 Data governance and Indigenous communities

In section 1.1.4, we dealt with the issue of data ownership. The main argument that was put forward was that data does not belong to the organisation or individual that manages it, but to those it derives from. To acquire a better understanding of the issue, a necessary distinction should be done between data sovereignty and data governance. Specifically:

  • Data sovereignty: is the (legal) concept according to which information and data, which has been converted and stored in binary digital form, is subject to the laws of the country in which it is located (Global Indigenous Data Alliance, 2022). 
  • Data governance: refers to the ownership, collection, control, analysis, and use of data (Global Indigenous Data Alliance, 2022). 

In other words, data sovereignty indicates that there might be independent legal restrictions and obligations that data may be subject to, stemming from the national legislation of the country it is located in or generated from. Hence, data may be ascribed to legislative frameworks with which its owner, manager, originator and user have to comply with. Openness, described in the previous section, is a general framework on data governance defining and regulating issues beyond national or corporal legislations. In this context, Openness may also touch upon issues of data sovereignty. Similarly, the CARE principles (see section 1.2.1) that will be introduced in the following section, aim to provide an ethical as well as coherent framework to issues related to indigenous communities’ data governance.

1.2.1 The CARE principles

The CARE principles are a set of 4 principles standing for:

The CARE principles Logo CC BY 4.0 GIDA
  • Collective benefit
  • Authority to control 
  • Responsibility 
  • Ethics
The CARE principles constitute a framework of data governance that acknowledges to indigenous communities and nations the absolute right to govern the collection, ownership and application of their data (Global Indigenous Data Alliance, 2022). Their main purpose is to safeguard that indigenous data is used ethically towards the protection of the community where it is derived from (Australian Research Data Commons, n.d.). Hence, their use is indented to protect indigenous innovation8 and self-determination, thus serving indigenous people’s wellbeing in general (Russo Carroll et al., 2021). According to the Global Indigenous Data Alliance (2022), the exact content of the CARE terms is as it follows:
The CARE principles Summary infographic CC BY 4.0 GIDA

A. Collective benefit

“Data ecosystems shall be designed and function in ways that enable Indigenous peoples to derive benefit from the data”.

According to Global Indigenous Data Alliance (2022), for the principle to be applied, the following conditions are to be applied as well:

→ C1. For inclusive development and innovation 

“Governments and institutions must actively support the use and reuse of data by Indigenous nations and communities”.

→ C2. For improved governance and citizen engagement

“Ethical use of open data can improve transparency and decision-making by providing Indigenous nations and communities with a better understanding of their identity, territories, and resources”. 

→ C3. For equitable outcomes

“Any value created from Indigenous data should benefit Indigenous communities in an equitable manner and contribute to Indigenous aspirations for wellbeing” .

B.  Authority to ownership 

“Indigenous peoples’ rights and interests in indigenous data must be recognised and their authority to control such data be empowered”. 

According to Global Indigenous Data Alliance (2022), for the principle to be met, the following conditions are to be applied as well:

→ A1. Recognising rights and interests 

“Indigenous peoples have collective and individual rights to free, prior, and informed consent in the collection and use of such data, including the development of data policies and protocols for collection”.

→A2. Data for governance 

“Indigenous data must be made available and accessible to Indigenous nations and communities in order to support Indigenous governance”.

→ A3. Governance of data

“Indigenous peoples have the right to develop cultural governance protocols for  Indigenous data and be active leaders in the stewardship of, and access to, Indigenous data especially in the context of Indigenous Knowledge”.9

C. Responsibility 

“Those working with Indigenous data have a responsibility to share how those data are used to support Indigenous peoples’ self-determination and collective benefit”. According to Global Indigenous Data Alliance (2022), for the principle to be applied, the following conditions are to be applied as well:

→ R1. For positive relationships

“Those working with Indigenous data are responsible for ensuring that the creation, interpretation, and use of those data uphold, or are respectful of, the dignity of Indigenous nations and communities”.

R2. For expanding capability and capacity 

“Use of Indigenous data invokes a reciprocal responsibility to enhance data literacy within Indigenous communities and to support the development of an Indigenous data workforce and digital infrastructure to enable the creation, collection, management, security, governance, and application of data”.

R3. For Indigenous languages and worldviews 

Resources must be provided to generate data grounded in the languages, worldviews, and lived experiences (including values and principles) of Indigenous Peoples.

D. Ethics

“Indigenous peoples’ rights and wellbeing should be the primary concern at all stages of the data life cycle and across the data ecosystem”.

According to GIDA, for the principle to be applied, the following conditions are to be applied as well:

E1. For minimising harm and maximising benefit

Ethical data are data that do not stigmatise or portray Indigenous peoples, cultures, or knowledge in terms of deficit. Ethical data are collected and used in ways that align with Indigenous ethical frameworks and with rights affirmed in United Nations Declaration On The Rights Of Indigenous Peoples. 

E2. For justice

Ethical processes address imbalances in power, resources, and affect the expression of Indigenous and human rights. Hence, they must include representation from relevant Indigenous communities.

E3. For future use

Data governance should take into account the potential future uses and harms based on ethical frameworks grounded in the values and principles of the relevant Indigenous community. Metadata should acknowledge the provenance and purpose of any limitations or obligations in secondary use, including  issues of consent.

9 According to Nyayo Discovery (n.d.), Indigenous Knowledge is “a term used to refer to the large body of local knowledge held by indigenous people and includes customs, traditions, traditional ecological knowledge (TEK), group history, spiritual beliefs, cosmology and traditional language. […] This knowledge is often viewed holistically by indigenous groups with each component being greatly inter-connected; forming the foundation of a group’s identity and how they are identified by others”. 

1.2.2 Indigenous people’s rights in data

The Global Indigenous Data Alliance (GIDA), responsible for monitoring the implementation of the CARE principles, has formulated a set of additional rights with regards to indigenous people’ data. They are the following:

Indigenous peoples’ rights in Data according to GIDA CC BY 4.0 GIDA

Rights of this charter are divided into two categories, namely: (a) ‘data for governance’ safeguarding community’s self-determination and (b) ‘data of governance’ referring to the ethical use of data.

1.3 Summing up

In this unit, we introduced two frameworks for data governance, namely Openess and CARE principles. In a social context where data plays an increasingly important role, it becomes crucial to pursue a more socially relevant, participatory and transparent way of managing data. Openness is about the freedom of having access to, using and sharing a work (including datasets) and CARE principles are about indigenous communities rights in their data, practically addressing their concerns on the use of it. However, what should be highlighted is that CARE principles are not on a par with Copyright, neither contradicts with Openness. In the CARE principles framework, indigenous communities retain exclusive rights of their data not to make a profit out of them but for their protection. Hence, Openness and CARE principles complement  each other.

Unit 2: Linguistic diversity

How many languages are in the world? Why do people speak different languages and dialects? These are two of the most critical questions related to the concept of ‘Linguistic diversity’. Additionally, two more questions, related to the first ones, emerge, namely, (a) what is language and (b) what is a dialect? In other words, what are the criteria for distinguishing a language from a dialect?

Hence, the questions that users will be able to answer after the completion of this unit are: 

  • What is linguistic diversity?
  • What are indigenous and minority languages?
  • What are endangered languages? 
  • Why indigenous and endangered languages are important?

With the completion of this unit, learners will be able to: 

  • identify and critically reflect on their own myths and prejudices related to languages and linguistic diversity; 
  • become aware of indigenous and endangered languages, two distinct language categories, though sometimes co-existing.

2.1 Language(s) and dialect(s)

What distinguishes a language from a dialect? The simplest way for someone to divide between the two is by using the mutual intelligibility criterion which refers to: 

→ “the extent to which speakers of different speech communities can understand each other” (Trudgill, 1974). 

    • If speakers can understand each other without intentional study, then they speak different dialects. 
    • If speakers do not understand each other, then they speak different languages. 

[Click here and here to learn more on the mutual intelligibility criterion]

Hence, this criterion gives rise to the following well known definition of what language and dialect are. Precisely: 

  • a language is a set of different dialects, since they are mutually intelligible.
  • a dialect is an instance (or flavour) of a language.

In this framework, for example, we would consider Standard British English (BBC English or Queen’s English) and African American English as two different dialects of the ‘English’ language. [Click here and here to learn more on English dialects]. Hence, the ‘English’ language, like any other language, does not have any particular prototypic instance/dialect, since all its instances/dialects are equal and form the ‘English language’.

2.1.1 Issues with the terms ‘Language’ and ‘Dialect’

In this section, there are presented two problems related to the use of the terms ‘language’ and ‘dialect’, both deriving from the mutual intelligibility criterion if it is not taken into account. The first problem is related to the non-scientific meaning that historically both terms have been linked to, while the second problem is related to involvement of non-linguistic criteria, mainly political ones, for distinguishing between the two terms. Below there are specific examples for each case. 

I. The non-scientific meaning of the terms ‘language’ and ‘dialect’

Delvining more into the adopted definition of the term ‘dialect’, a non-scientific misunderstanding emerges according to which “a dialect is not an instance of a language, but a deviation from the standard language”. In this case, a language is not defined as a set of dialects (i.e. a dialectal family) but as a high quality prototypical instance of a language. For example, following this non-scientific reasoning, the Standard British English dialect would be the prototypical instance of the English language, while the African American English dialect, as any other English dialect, would be considered as a deviation from the standard language. Therefore, many myths, prejudices and racism ideologies lie in this reasoning in order to formulate arbitrary racial generalisations, such as classifying people in groups or categories of “lower societal value”, “level of education” or even “human value”. Hence, following this type of argumentation, those who speak the Standard prototypic language would be those of the upper class or the educated, while those who speak any other dialect would be classified lower than the upper class, usually connected with lower quality of education.  

However, the scientific community has provided several counter-arguments that tackle the claims made by supporters of the “true language” myth. Specifically:

A. The first argument is that dialects, as being used by their speakers for their communication and social life, are identified with myths, ideologies and stances related to their speakers. Hence, the connection of a dialect with a particular ideology or stance is arbitrary and based on the role of the dialect in the cultural repertoire of the speakers’ community. 

B. The second argument is closely connected to the first one, since there are no pure linguistic criteria for classifying languages. The existing classifying criteria (varying according to each case study) are of two types: (a) those related to cultural prejudices and ideological theses, hence non-linguistic criteria, and (b) those being apparently linguistic, namely, based on language characteristics, yet with non-scientific value. 

C. The third argument is condensed to the anecdotal quote attributed to  linguist Max Weinreich in 1945, according to which,a language is a dialect with an army and navy”.

Hence, what is usually perceived as the standard “prototypic” language, is a dialect, thus an instance of this language that is spoken by the speakers’ community/group exercising power and authority in society. Therefore, since this speakers’ community/group has a high social status, their dialect inherits it too. 

[Click here to get an extra reading on the power dynamics in linguistics]. 

For a more comprehensive understanding of why languages and dialects are of equal value, read chapter 1 by De Korne (2021): Advocating for linguistic equality.

II. The non-linguistic criteria for distinguishing between ‘languages’ and ‘dialects’

As mentioned above, in many cases the division between the terms ‘language’ and ‘dialect’ is not based on pure linguistic criteria, such as the mutual intelligibility criterion, but on other types of criteria. The following two examples are indicative.

Case A: Norwegian, Danish and Swedish

On the one hand, according to the mutual intelligibility criterion, if members of two distinctive speaker communities understand each other, then what they speak is different dialects. On the other hand, Norwegian, Danish and Swedish are officially recognised as totally distinctive languages due to their status as national languages of the three states they are spoken in. However, speakers of all three languages are able to mutually understand each other without intentional learning (see also: Lynganor, 2021). This means that, despite the ‘language’ status they have, according to the mutual intelligibility criterion, they should be considered as different dialects of a hypothetical ‘Scandinavian language’. However, this does not hold.

Case B: Griko

Grecians form Calabria in traditional clothing CC BY-SA 3.0

Griko is spoken by inhabitants of the so-called Greek villages in Southern Italy (mainly in the area of Salento). Those people are called Grecians (Graikanoi) and since 1999 have been officially recognised by the Italian Parliament as the “Greek national and linguistic minority” in Italy. Hence, Griko is officially considered as a dialect of the Greek language [see here and here more about Griko]. However, Griko and Standard Modern Greek are not mutually intelligible dialects, though considered as such, violating the criterion of mutual intelligibility. According to it, they should be identified as totally different languages. 


Therefore, despite the distinction between ‘language’ and ‘dialect’ based on pure linguistic criteria (i.e. the mutual intelligibility criterion), what is generally considered as dialect and language is usually defined by non-linguistic parameters;  namely, the political and social status of a particular group of people, states’ foreign policy, ideologies, etc. Hence, those non-linguistic parameters blur the dividing line between the terms language and dialect as determined by the mutual intelligibility criterion, rendering them unexploitable for scientific inquiry. Therefore, linguistic terminology, to be accurate and unbiased, needs neutral terms. The following section introduces the term ‘language variety’, as the solution that linguists come up with to problems such as the above exhibited.

2.1.2 Language Variety

To mitigate the issues arising between misconceptions of the words “language” and “dialect”, the term ‘language variety’ has been proposed to replace both. In particular: 

  • Language variety is any form of language, systematically distinctive from another one (Matthews, 2014; Nordquist, 2020).  

The application of this definition is extremely broad, since it captures not only the distinctiveness of two completely different languages, such as Greek and Japanese, but also the distinctiveness between the linguistic jargon in Greek and the Greek vernacular itself. Hence, the degree of distinctiveness varies according to what varieties are compared, based on pure linguistic characteristics and criteria. 

The most used and well established among linguists types of language variety, are the following:

  • Idiolect: the language variety of a specific individual (Matthews, 2014).
  • Dialect: a language variety located in a specific region; namely, a geographical variety (Matthews, 2014; Nordquist, 2020).
  • Sociolect: a language variety identified to a group of people sharing particular social features (e.g. a class, a social group, such as LGBTQ+, etc.) (Nordquist, 2020).
  • Ethnolect: a language variety spoken by an “ethnic-group” (Matthews, 2014; Nordquist, 2020). In this case, as ethnic-group is defined the group of people who identify themselves with specific shared attributes that distinguish them from other ethnic groups. 
  • Register: the sum of choices made by speakers in a specific situation, according to the context, the purpose and the audience/addressee (Nordquist, 2020). 
  • Jargon: the language variety identified in specific occupational, scientific and professional groups (Nordquist, 2020). 
  • Pidgin: a type of simplified language code, yet not an autonomous language, developed in a specific situation as the medium of two other language varieties with zero degree of mutual intelligibility of two speech communities in contact. A pidgin totally depends on the source varieties (i.e. the mother tongues of the speech communities in contact) and its purpose is to serve the communication between the two communities (Matthews, 2014). 
  • Creole: the language variety that has emerged from a pidgin. In other words, a pidgin has increased its level of complexity in grammar and vocabulary by having gained the status of an autonomous language without depending on pidgin’s source languages (Matthews, 2014).  

Follow the links below to delve deeper into the notion of language variety: types of language varieties, why are language groups important and language varietion and change.

The following two sections of this unit bring a particular focus on two language variety types: (a)  indigenous and (b) endangered languages.

2.2 Indigenous languages

Even though the term language has been replaced at some degree by the term language variety, in this section it will be used in a way that describes a language variety that is not intelligible by speakers of other speech communities. 

The main topic of this section is indigenous languages that are defined as those spoken by indigenous peoples around the world. There are several criteria for a language to be classified as “indigenous”. While there may be variations in specific definitions and criteria used across different contexts and adopted by various organisations, here are some common parameters considered when identifying a language as indigenous (based on the following resources: Indigenous Peoples: Language Guidlines, Survival International and UN on Indigenous Languages)

  1. Historical and Cultural Connection: Indigenous languages are typically spoken by indigenous or native people who have a long-standing historical and cultural connection to a specific region or territory. These communities often have ancestral ties to the land which they have inhabited for generations.
  2. Pre-Colonial Origins: Indigenous languages are characterised by their existence prior to the arrival of colonists or dominant extra-territorial linguistic groups. They are typically developed independently from major world languages and have unique linguistic features, reflecting the specific cultural and social context of these indigenous communities.
  3. Small Speaker Populations: Indigenous languages are commonly spoken by relatively small populations, often within a specific indigenous community or group. They may have a limited number of speakers, and in some cases, the language may be endangered or facing the risk of extinction.
  4. Oral Tradition: Many indigenous languages have traditionally been transmitted through oral means rather than through written literature or formal education systems. They have a strong oral tradition, passed down through generations via storytelling, songs, rituals, and everyday communication.
  5. Cultural Identity and Vitality: Indigenous languages play a crucial role in preserving the cultural identity, knowledge systems, and worldview of indigenous communities. They are deeply intertwined with cultural practices, rituals, traditional knowledge, and spiritual beliefs, contributing to the vitality and resilience of the indigenous cultures.

Click here to watch the TEDx talk on what indigenous languages are, what their differences are compared to the well known languages and what we can do to safeguard them from extinction.

According to the United Nations (On Indigenous Languages, n.d.), indigenous languages are estimated to be more than 4.000 globally, while 2100 of them face the risk of extinction.

2.2.1 Why indigenous languages matter?

To understand the value of indigenous languages, it is not an easy-to-do task. In most social contexts, indigenous languages are often overseen. Over the 19th and 20th centuries, it was colonisation that presented indigenous languages as “primitive” or “incomplete” oral systems classified as codes or “dialects”, thus perpetuating a negative interpretation (i.e., a deviation from a “complete” or “high-level” language). At the end of the 20th century and the beginning of the 21st century, as the global paradigm has shifted from colonisation to globalisation, the necessity for immediate communication and  economic growth has forced indigenous populations to abandon their languages and embrace those that could facilitate their communication and social recognition by speakers outside of their communities (Pine & Turin, 2017).

What Indigenous Languages offer? copyright unknown

However, the reasons for supporting indigenous languages could be given through a two axes approach that is originally introduced here. 

A. The first axis is about understanding the human species deeper and better. Indigenous languages are an integral part of the communities they are spoken by. Hence, protecting, safeguarding and studying them is a way for the scientific community to explore new and sometimes exotic cultures and languages. A holistic view on human culture should take into account every aspect and variety of human activity, including indigenous communities’ culture and language. 

B. The second axis is about diversity itself. In a context of favouring individual diversity, yet according to specific standards, societies in the West tend to become monolithic, usually recycling the same trends and standards. However, contrary to this paradigm, indigenous communities could be considered as a source of cultural diversity, by providing new standards to the world community. Therefore, safeguarding them is important not only for these communities themselves, but for humanity as a whole. 

[To learn more about the value of indigenous languages and culture, click here.]

2.2.2 Language rights of Indigenous peoples

A well established tool for protecting a population is by acknowledging its rights. For example, the Convention on the Rights of the Child aims to protect children worldwide from certain threats and abuses. 

In the same token, the United Nations Declaration on the Rights of Indigenous Peoples, that was adopted in 2007, aims to protect indigenous populations of its member states, by providing them with specific rights having a legally validated status. Among other rights, article 13 recognises indigenous peoples’ right “to revitalise, use, develop and transmit to future generations their histories, languages, oral traditions, philosophies, writing systems and literatures, and to designate and retain their own names for communities, places and persons”. Additionally, according to the same article, governments “shall take effective measures to ensure that this right is protected […]”. Furthermore, article 14 mentions that specific measures should be taken by governments, “in conjunction with indigenous peoples, […] particularly [for] children, including those living outside their communities, [allowing them] to have access, when possible, to an education in their own culture and […] language”. 

Finally, except for the charter of the United Nations, the Universal Declaration of Linguistic Rights that was signed by UNESCO in 1996, recognises linguistic rights to  every language group and indigenous communities.

2.3 Endangered languages

Indicatively, it is estimated that one indigenous language dies every two weeks! In general, language death is defined as the situation in which a language ceases to exist due to the death of its last speaker or due to its speakers progressively shifted to another language as a consequence of social pressures, demographic change and external forces (Matthews, 2014).

The status of a language before its total extinction (language death) is called ‘endangered’ [see here Mandana Seyfeddinipur’ talk on the importance of endangered and indigenous languages]. Hence, an endangered language is a language for which there is evidence that in the near or long term future, will or might cease to be spoken (Matthews, 2014), something that equals to its death. 

The following map, provided by the Endangered Languages Project, depicts the endangered languages of the world. [Click here to interact with it]

The ELP map of endangered languages in the world CC BY 4.0 ELP

[You are also suggested to read the UNESCO Atlas of the world’s languages in danger.]

Nevertheless, how do we classify a language as endangered? There are some approaches that aim to set specific and objective criteria. What is presented below is the framework embraced and suggested by UNESCO (2003), known as “Language Vitality and Endangerment”. 

This framework is based on three principles: 

  • 1st principle: The assessment of language vitality or endangerment should not be based solely on just one criterion. There should be taken into account a set of criteria corresponding to as many factors as possible related to language vitality. Otherwise, wrong or misleading conclusions may emerge. 
  • 2nd principle: For assessing language vitality or endangerment,  there is a continuum ranging from total stability to total extinction. Hence, the assessment of a language cannot be dyadic: endangered or not endangered language, since there are intermediate positions in the scale of vitality/engagement. 

3d principle: The scale of vitality/endangerment includes five degrees, which are introduced here:

The Languages vitality – endangerment scale

The criteria set by the framework are the following: 

  1. Intergenerational Language Transmission: this criterion is about whether or not a language  is  being  transmitted  from  one  generation  to  the  next. 
  2. Absolute Number of Speakers: this criterion is about the size of the language community, i.e. the speakers’ community. The smaller the community is, the greater the language is endangered. [This is the only criterion that is not scalable.]
  3. Proportion of Speakers within the Total Population: this criterion is about the  number  of  speakers  in  relation  to  the  total  population  of  a  group, where a group could be: an ethnic, a religious, a regional or a national group. 
  4. Shifts in Domains of Language Use: this criterion is about where, with whom, and the range of topics for which a language is used. 
  5. Response to New Domains and Media: this criterion is about the power of the dominant language to be used in new communicative domains, as social life dynamically changes. 
  6. Availability of Materials for Language Education and Literacy: this criterion is about the existence of resources for educational purposes. 
  7. Governmental   and   Institutional   Language   Attitudes   and   Policies,   Including Official Status and Use: this criterion is about the explicit  policies  and/or  implicit  attitudes of governments and institutions towards  a language variety.
  8. Community Members’ Attitudes towards Their Own Language: this criterion refers to the language community members’ attitude towards their language, since speakers’ stances towards their own language are not usually neutral, varying from positive to negative. 
  9. Type and Quality of Documentation: this criterion is about the quantity and the quality of resources existing within a language. 

To acquire a deeper understanding of endangered languages, you can watch the lecture given by  Prof. Norvin W. Richards.

2.4 Summing up

This unit started by considering and scientifically defining the two broadly known terms ‘language’ and ‘dialect’, based on a single criterion, that of mutual intelligibility. However, it was clarified that despite the capability of the criterion to make accurate distinctions, the terms “language” and “dialect” were abandoned due to the semantic implicatures of political and ideological value deriving from their use. Instead, it was suggested the ‘language variety’ term. Next, a particular emphasis was paid on two language variety categories, with particular interest on indigenous and endangered languages. For indigenous languages, there was an analysis of their value, as well as of the linguistic rights of indigenous communities. Regarding endangered languages, beyond their definition, the criteria according to which a language is deemed as endangered were specified.

Unit 3: Data activism for linguistic diversity

Throughout the preceding units, two particular topics were examined; unit 1 emphasised on two critical concepts related to data governance (Openness and CAREness), while unit 2 laid a particular emphasis on language variation, focusing on indigenous and endangered languages. 

The foundations of this unit are established on two pillars: the recognition that (a) all languages are of equal value and (b) that a critical number of languages across the world are endangered, i.e. under extinction. Therefore, there emerges the need not only for language experts but also for citizens to act towards safeguarding linguistic diversity, empowered by open data and forms of activism connected to them. Hence, unit 3 deals with the central question of what needs to be done to stand up for linguistic diversity in an open governance framework. Particularly, this unit introduces data activism as a means of advocacy for language variety safeguarding, while it looks into forms of data-driven linguistic activism in a technologically-advanced and interconnected era. Beyond that, it explores the relation among data activism and language revitalisation, thus bringing to the fore the value of data activism for linguistic diversity. 

Hence, after the completion of this unit, learners will be able to answer the following questions:

  • What is data activism?
  • How can open data enhance language activism?
  • What forms of open data-driven actions already exist?
  • What is the role of data activism in language revitalisation?

Therefore, the learning value of this unit for learners is:

  • to understand the concept of data activism; 
  • to understand innovative forms of language revitalisation, derived form data activism.

3.1 On the spotlight: Data activism

The digital world becomes an integral part of our daily lives. In this interconnected world, our single presence in the digital realm is productive of and translated into data. Hence, as open data has revolutionised the way data governance is applied (see unit 1), critical questions on how to use these data and for which purposes have emerged. 

In this challenging socio-technological context, the term “data activism” has been coined by Milan and Gutiérrez to define a “social practice that takes a critical approach to data” (2015). Although there are several definitions of big data, Gutiérrez informs us that these are “modifiable, distributed and interactive artefacts” (2018). Thus, data activism has arisen either as a reaction to the mass gathering of data or the exploitation of these for social change (Milan & Gutiérrez, 2015). 

As a social phenomenon on the rise, data activism can also be applied in safeguarding linguistic diversity. Based on Openness and CARE principles, data activism can be approached as a particular and innovative form of language activism that aims to preserve endangered and indigenous languages. Thus, in the next section, we’ll find out how open data has empowered language activists in their quest for preservation of endangered languages.

3.2 Open data for language activism: An overview of initiatives

Data means power for those who have access to them and know how to obtain knowledge and information by analysing them. Since languages provide a wealth of data, the notion of “linguistic data” has come to the fore. An indicative example of the value of these data is reflected in the creation of the Linguistic Data Consortium (LDC) in 1992 that aims to support language-related education and resources (LCD, n.d.).

As individuals, facilitated by digital technologies, are more drawn into forms of actions for a social purpose, the concept of “language activism” has gained the attention of the linguistic community. Although this unit doesn’t aim to provide a complete understanding of language activism, we need to define this social phenomenon to better frame it in relation with open data. Thus, Combs & Penfield (2012) define language activism as “energetic action focused on language use […] to create, influence and change existing language policies”. 

Thus, in the ocean of “language activism”, is there a role for open data? 

As highlighted in unit 1, open data have become synonyms of “democratisation” of knowledge diffusion and production (Ricker, Cinnamon & Dierwechter, 2020). Hence, the field of linguistics is not an exemption for this rule, since open data is of equal importance as well!

Open Knowledge Foundation (OKF) has settled a dedicated Working Group on open data in Linguistics that aim -among others- to promote open practices and tools in linguistic data. [For example, you can check the OKF Open Data in Linguistics Working Group here.]

Initiatives like this one showcase the growing interest in the value of open data for individuals and institutions. In the field of language studies, an outstanding initiative based on open linguistic data is the Open Language Archives Community (OLAC) founded in December 2000. In this case, OLAC’s aim is to create a virtual library of language resources, ranging from data to tools and best practice guidelines (OLAC, n.d.). 

In the coming sub-section, we look at hands-on and bottom-up forms of (open) data activism aiming to preserve and enhance language diversity.

3.2.1 The spectrum of data activism forms for language diversity and safeguarding

Imagine you are a member of an endangered (and) or indigenous language community that is underrepresented in the digital world, from available translations of webpages in your language to the limited access this creates on crucial digital security instructions. What would you do to turn attention to this issue and enable societal change? In this case, data can be quite useful in a) making your cause known to the public and b) filling in the gap of language representation in the online social environment.

Whether there is already existing open data referring to your community’s linguistic identity, or new data collected by individuals and groups committed to safeguarding language diversity, data can enable social awareness and potential transformations in the lives of affected communities

An indicative example, although not linked with the language studies field, is the Data 4 Black Lives initiative that uses data to empower Black communities in their struggle for social justice! Therefore, in an open governance paradigm,  enhancing social and digital representation of a language through open data, can serve as a stimulating form for the civil society, academia, and other institutions or organisations to create or (re-)adapt their policies towards committing to language diversity. In this context, several data-driven forms of activism for language safeguarding and diversity have emerged, bringing our attention to the spectrum of diversified means of data-based action. 

Despite not being exhaustive, we provide below a list of innovative and language-related forms of data activism used by individuals, communities and institutions.

A. Crowdsourcing

Have you ever thought of creating a language repository where you and your community members can keep alive the knowledge and characteristics of your native language? Are you a researcher who is interested in mapping language changes in a community but don’t have enough available data? Then, crowdsourcing is a valuable data-driven solution to you!

Based on the advancement of new technologies and the power of the crowds to contribute collectively in a common cause (Howe, 2006), crowdsourcing has emerged as a way to a) advocate for language inclusion and diversity and b) enrich existing perceptions and knowledge of languages (Munro et al. 2010). An indicative example is the crowdsourced-powered Dialäkt Äpp (DÄ) that reached 60,000 German-speaking Swiss citizens who detected dialect variants in their region (Leemann et al. 2016).

The Dialäkt Äpp logo

To learn more about crowdsourcing, check module e of the BOLD project module series!

B. Datathons

Datathons leverage the potential of data and data-related tools to solve a pressing problem (Datacamp, 2023). Datathoners usually work in groups with distinctive roles and tasks that lead to the creation of a solution to the problem.

The Wikimedia Hackathon 2015 in Lyon CC BY-SA 2.0 Chris Koerner

In the languages realm, datathons can enable individuals and communities to work on data-driven solutions to safeguard their linguistic identity. Despite the limited number of datathons that bring their focus on linguistic matters, the Summer Datathon on Linguistic Linked Open Data (SD- LLOD) is a remarkable initiative that combines advanced technologies with open linguistic data provided by users. The SD-LLOD datathon is organised by NexusLinguarum, the European network for Web-centered linguistic data science.

C. Citizen science

Have you seen people in your neighborhood mapping, through their mobile phone or pen and paper, a particular bird or plant species in a given time of the year? Have you heard of stories of communities collecting data on air or water pollution to advocate for socio-environmental justice?

These types of data-driven actions where people help foster scientific research and advocacy on a common issue, is better known as “citizen science”. Although we usually think of citizen science as a practice that is used in environmental problems, the rise of citizen social science gives prevalence to social issues that emerge in communities (Tauginienė, L. et al. 2020). Thus, setting up a citizen science project to monitor, analyse and advocate for the safeguarding and inclusion of your community’s linguistic identity is an innovative and data-based way to promote your cause through collective mobilisation and participation of members of your community. 

To learn more about citizen science for linguistic diversity, check module a of the BOLD project module series.

Beyond data activism: digital actions for linguistic diversity

Ensuring linguistic and cultural diversity online becomes a task that is usually carried out by members of linguistic communities which remain underrepresented in the physical as well as the digital world. Motivated by the lack of official public initiatives and support, as well as the long-established socio-linguistic injustices, individuals organise to create space and foster presence of their linguistic identity in the digital realm. 

One such initiative is the Dagbani Wikimedians User Group that aims to record Ghana-based Dagbani words and its meanings online. Likewise, the Igbo Wikimedians User Group focuses on fostering the presence of the Igbo language on the web, while particular emphasis has been given to the needs of minor Wikimedia Language Communities in terms of digital security knowledge and competences to safely navigate the web (Friday & Malvido, 2023).

Finally, as language is an integral part of communities’ heritage, UNESCO, in collaboration with Global Voices, developed a Language Digital Activism Toolkit to enhance multilingualism and language safeguarding on the digital sphere. The toolkit was created on 2021, with UNESCO offering a series of workshops titled “Strengthening your language on the internet through digital activism” to speakers or users of an indigenous, minority, or low-resource language (Rising Voices, n.d.).

3.3 Language revitalisation as an aspect of data activism

In the previous sub-unit, we explored the multiple forms of data activism for linguistic diversity, while we also mentioned the ability of data in triggering societal change towards defending and safeguarding language diversity. In this context, “language revitalisation” refers to a socially collective effort in reviving “the practice of lesser-used languages” (Eisenlohr, 2004), especially regarding endangered (and) or indigenous languages. 

Data, as pointed out in a series of studies (Galla, 2016; Elliott, 2021), play a major in the process of revitalisation (cf. Technology-enhanced language revitalisation, Penfield et al. 2006). More specifically, revitalisation should be perceived as a multilevel and multidimensional process that includes several means that enhance advocacy and action taking, such as: 

  1. The existence of language data: The quality as well as the quantity of data are critical factors since revitalisation is based on the knowledge extracted from them. 
  2. Open and CARE data: Through open data the society has the opportunity as well as the ability to more actively contribute and act, while through the CARE framework there are safeguarded principles such accountability and transparency in research, especially towards indigenous and minority populations. 
  3. Open and CARE tools: designed to be used by citizens for the creation of linguistic resources and educational materials. 
  4. Active participation and engagement: of community and non-community-members, with scientific or not background, recognising the value of linguistic variety. 
  5. The existence of an Open and CARE infrastructure: supporting the revitalisation process.

Hence, regarding all the aforementioned, Comajoan-ColoméIcon and Coronel-Molina (2020), outline the latest trends in language revitalisation, approaching it as a multidisciplinary field. Below, we briefly introduce only those trends that are directly compatible with the framework of language activism. 

 

  1. Top-down and Bottom-up approaches: Language revitalisation efforts tend to incorporate both top-down (initiated by governments, institutions, and policies) and bottom-up approaches (driven by community-led initiatives and grassroots movements). This allows for a more collaborative and transparent approach to revitalisation, compatible with Openness and CARE principles.
  2. Integration of technology: Τechnologies, including social networks, also play a role in language revitalisation efforts. These technologies can be used for the promotion and dissemination of a revitalisation project, increasing the degree of transparency and accountability as well as the attraction of more participants. Of course, digital technologies can also facilitate and support language learning, and the creation of (digital) resources.
  3. Focus on transmission beyond intergenerational: While intergenerational transmission remains a cornerstone factor, language revitalisation efforts try to serve the need for additional means of language transmission; namely, education, technology, social media, and other methods that are utilised for ensuring the language’s vitality and transmission to future generations.
  4. Increased understanding of language endangerment: There has been a growth in knowledge regarding the causes of language endangerment. A deeper understanding allows for more targeted and tailored revitalisation efforts based on the specific challenges faced by endangered languages, such as natural disasters, wars, repression, and pressure derived from more dominant language varieties.

To learn more on language revitalisation, follow this link where Sapién and Hirata-Edds (2017) explore the best possible use of the available documentation to develop curricula, lessons, and materials that support language revitalisation.

Finally, instead of a conclusion for this section, it is recommended to watch the documentary series ‘Voices on the Rise’, introducing a case of a bottom-up initiative of an indigenous community to revitalise its language and culture.

3.4 Summing up

In this unit, we jumped into the world of “data activism” and “open data” for linguistic diversity and language revitalisation. By looking closely at those terms and identifying key initiatives carried out by individuals, communities and institutions, we aimed to provide an overview of what data-driven actions for language safeguarding can offer as a means of awareness raising, advocacy and capacity building. Furthermore, unit 3 outlined innovative forms of digitally-enhanced data activism that reflect the increasing potential and use of digital technologies for language revitalisation. Through virtual communities of volunteers, experts or not, people unite forces driven by their determination to keep a particular language alive. As Penfield et al. (2006) wonderfully highlighted “Advocacy is born of passion for the cause of an endangered language and advocates need to be strong and clear in their conviction”

Acknowledgements

The team would like to sincerely thank Mr. Stavros Samiotis, former Web2Learn employee, who contributed to the preparation of the module, at various stages. We acknowledge his contribution here and thank him publicly.

Quiz

Bibliography

Allam, Z., & Dhunny, Z. (2019). On big data, artificial intelligence and smart cities. Cities, 81, 80-91.

Australian Reaserch Data Commons. (n.d.). ARDC. Ανάκτηση από CARE Principles: https://ardc.edu.au/resource/the-care-principles/

Benkler, Y. (2006). The Wealth of Networks: How Social Production Transforms Markets and Freedom.Connecticut: Yale University Press.

Budiman, A. (2023). Sinaoumedia.com. Ανάκτηση από Social Action: Definition, Types and Examples: https://sinaumedia.com/social-action-definition-types-and-examples/ 

Center for the Study of Citizenship. (n.d.). What is citizenship?. Wayne State University. Retrieved by https://csc.wayne.edu/what-is-citizenship 

Combs, M. C., & Penfield, S. D. (2012). Language activism and language policy. The Cambridge handbook of language policy, 461-474. https://old.coe.arizona.edu/sites/default/files/language_activism_and_language_policy.pdf 

Data For Black Lives. (n.d.). About. https://d4bl.org/about.html 

Datacamp. (February 2023). How to plan a successful datathon. https://www.datacamp.com/blog/how-to-plan-a-successful-datathon 

De Korne, H. (2021). Language Activism. Berlin: De Gruyter Mouton.

Delli Carpini, M., Lomax Cook, F., & Jacobs, L. (2004). Public deliberation, Discursive participation and Citizen engagement: A Review of the Empirical Literature. Annual Review of Political Science.

Eisenlohr, P. (2004). Language Revitalization and New Technologies: Cultures of Electronic Mediation and the Refiguring of Communities. Annual Review of Anthropology. Vol. 33:21-45. https://www.annualreviews.org/doi/abs/10.1146/annurev.anthro.33.070203.143900 

Elliott, R. (2021). Technology in Language Revitalization. Revitalizing Endangered Languages, 297. https://www.cambridge.org/core/services/aop-cambridge-core/content/view/ADCBBA31190F259BA13525C769E92A9A/9781108485753AR.pdf/Revitalizing_Endangered_Languages.pdf?event-type=FTLA 

Friday, T. , Malvido, M. (January 22, 2023). Igbo Wikimedians: Digital safety challenges for activists preserving their language through open knowledge. Rising Voices. https://rising.globalvoices.org/blog/2023/01/22/igbo-wikimedians-digital-safety-challenges-for-activists-preserving-their-language-through-open-knowledge/ 

Galla, C.K. (2016). Indigenous language revitalization, promotion, and education: function of digital technology, Computer Assisted Language Learning, 29:7, 1137-1151, DOI: 10.1080/09588221.2016.1166137 

Gutiérrez, M. (2018). Data activism and social change. Palgrave Studies in Communication for Social Change. https://link.springer.com/book/10.1007/978-3-319-78319-2 

Howe, J. (2006, January 6). The rise of crowdsourcing. Wired magazine, 14(6), 1-4. Retrieved from http://www.wired. com/wired/archive/14.06/crowds_pr.html 

Leemann et al. (January 4, 2016). Crowdsourcing Language Change with Smartphone Applications. PLOS ONE. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0143060&utm_source=mandiner&utm_medium=link&utm_campaign=mandiner_202110 

Linguistic Data Consortium. (n.d.). About. https://www.ldc.upenn.edu/about 

Lirri, E. (November 8, 2021). How Digital Activism Is Helping African Languages Be Part of a Multilingual Web. CIPESA. https://cipesa.org/2021/11/how-digital-activism-is-helping-african-languages-be-part-of-a-multilingual-web/ 

Lynganor, K. (2021). Norwegian.online. Ανάκτηση από How similar are Danish, Norwegian and Swedish?: https://norwegian.online/how-similar-are-norwegian-swedish-danish/ 

Matthews, P. (2014). The Concise Oxford Dictionary of Linguistics. Oxford: Oxford University Press.

Meijer, A., Lips, M., & Chen, K. (2019). Open Governance: A New Paradigm for Understanding Urban Governance in an Information Age. Frontiers in Sustainable Cities, 3(1).

Milan, S., Gutiérrez, M. (2015). Citizens’ Media Meets Big Data: The Emergence of Data Activism. Mediaciones (14).

Munro et al. (June 2010). Crowdsourcing and language studies: the new generation of linguistic data. Proceedings of the NAACLHLT2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pages 122–130, Los Angeles, California, https://aclanthology.org/W10-0719.pdf 

Nordquist, N. (2020, August 26). Definition and Examples of Language Varieties. ThoughtCo.

Nyayo Discovery. (n.d.). Nyayo Discovery. Ανάκτηση από Indigenous Knowledge: https://www.nyayodiscovery.com/indigenous-knowledge

Open Data Charter. (n.d.). Principles. https://opendatacharter.net/principles/  

Open Knowledge Foundation (n.d.). Open Data Handbook. What is open data? https://opendatahandbook.org/guide/en/what-is-open-data/  

Open Language Archives Community: http://www.language-archives.org/documents.html 

Penfield et al. (2006). Technology-enhanced language revitalization. https://aildi.arizona.edu/sites/default/files/technology_manual_2006_0.pdf 

Pentland, A. (2009). Reality Mining of Mobile Communications: Toward A New Deal On Data. Social Computing and Behavioral Modeling. Boston, MA: Springer.

Pine, A., & Turin, M. (2017). Language Revitalization. Στο Oxford Research Encyclopedias, Linguistics. Oxford : Oxford University Press.

Ricker, B., Cinnamon, J. and Dierwechter, Y. (2020), When open data and data activism meet: An analysis of civic participation in Cape Town, South Africa. The Canadian Geographer / Le Géographe canadien, 64: 359-373. https://doi.org/10.1111/cag.12608 

Rifkin, J. (2014). The Zero Marginal Cost Society. The internet of things, the collaborative commons and the eclipse of capitalism. New York: Palgrave Macmillan.

Rising Voices. (n.d.). Language Digital Activism Workshops for Asia. https://rising.globalvoices.org/language-digital-activism-workshops-for-asia/ 

Russo Carroll, S., Herczog, E., Hudson, M., Russell, K., & Stall, S. (2021). Operationalizing the CARE and FAIR Principles for Indigenous data futures. Scientific Data, 8.

Sapién, R.-M., & Hirata-Edds, T. (2017). What can I do with this?” Using existing language documentation for teaching and learning. 5th International Conference on Language Documentation and Conservation (ICLDC). Honolulu: University of Hawaiʻi at Mānoa.

SD-LLOD-23. (n.d.). 5th Summer Datathon on Linguistic Linked Open Data. https://datathon2023.jezik.hr/ 

Stigler, G. (1961). The Economics of Information. Journal of Political Economy, 69(3), 213-225.

Tauginienė et al. (2020). Citizen science in the social sciences and humanities: the power of interdisciplinarity. Palgrave Commun 6, 89. https://doi.org/10.1057/s41599-020-0471-y 

Trudgill, P. (1974). Linguistic Change and Diffusion: Description and Explanation in Sociolinguistic Dialect Geography. Language in Society, 3(2), 215-246.

United Nations. (n.d.). Ingigenous languages. The United Nations Permanent Forum on Indigenous Issues. Inited Nations.

UNESCO. (2003). Language vitality and endangerment. International Expert Meeting on the UNESCO Programme Safeguarding of Endangered Languages, Paris.

Universal Declaration of Linguistic Rights. (1998). Inresa: Institut d’Edicions de la Diputació de Barcelona.

Waterloo Institute for Social Innovation and Resilience. (n.d.). University of Waterloo. Ανάκτηση απόIngigenous Inovation: https://uwaterloo.ca/waterloo-institute-for-social-innovation-and-resilience/research/indigenous-innovation

Zhao, Z., & Zhang , Y. (2020). Impact of Smart City Planning and Construction on Economic and Social Benefits Based on Big Data Analysis. Cognitive Computing Solutions for Complexity Problems in Computational Social Systems.

Skip to content