Data science has been a renowned and much-discussed term, especially during recent years. Considering that the term is infamous, one would expect a clear definition of the term and the underlying concepts, howbeit the existing definitions differ, as they tend to focus on different aspects of the field, such as the interdisciplinary domains from which data science derives (Oracle, 2022), the techniques used (Stedman, 2021), the aim of the field (AUEB, 2022; IBM, 2022) or a combination of those mentioned above. Some argue that the absence of an unambiguous definition of the term is due to the unclear boundaries between different science domains and related concepts (Provost et al., 2013).
In an attempt to redefine the term by unifying the existing connotations, aiming to provide a definition that encompasses all the different aspects of the matter at hand, data science may be defined as a "hybrid field that combines elements of different domains, including but not limited to mathematics, statistics, and informatics, by using scientific methods, mainly computational, mathematical, and statistical, aiming to provide insights, answer questions, develop predictions, communicate results, and generally extract knowledge, based on collected data" (Oracle, 2022; Dhar, 2013; IBM, 2022; Stedman, 2021, AUEB, 2022).
Regardless of whether precisely defining the term of data science is essential, a clear definition further assists in understanding the roles and responsibilities of a data scientist. Similarly to different definitions of data science, the data scientist's work may entail different duties and responsibilities, with the differences being focused on the trade-off of the commitments and functions between data scientists, data analysts, data system developers, data engineers (Royal Society, 2019; Berkleley, 2021), and other related professions. Traditionally data scientists use data science, and as such, the definition provided for data science directly applies to such professionals. By further basing the roles and responsibilities of data scientists on the data science lifecycle (Berkeley, 2021), a data scientist is a professional who directly applies the concept of data science (as defined above) and performs the activities of capturing, maintaining, processing, analysing data and communicating the outcomes. Depending on the field the data scientist works on, the outcome may result in results-based decisions (data-driven decision making) or scientific discovery. Of course, apart from the purely technical aspects of the job, one should possess certain traits such as critical thinking/curiosity and communication/story-telling to undertake a fully-featured analysis and straightforwardly convey the messages and outcomes (Violino, 2018; Stedman, 2021).
Data science applications are numerous, and virtually all sectors, being business, scientific or public, may create value by using data scientists. This broad spectrum of applications, combined with the present data deluge, makes data scientists a highly sought-after profession. Research found that data scientists' demand rose by nearly 1300% from 2013 to 2018-2019 (Royal Society, 2019) with a projection of a further increase of nearly 30% till 2026 (Berkeley, 2021), and the demand for skills indissolubly linked to the data scientists' skill-set increased by 4% to 80% from 2013 to 2018-2019 (Royal Society, 2019). Similarly, the big-data era further increases the data-related expertise that different sectors need.
Even though the statistics on the increase in data science demand indicate an ever-expanding need for data scientists, this does not come without dissenting opinions. Due to the broad spectrum of roles and responsibilities of data scientists and the expanse of the domains that data science pertains to, one of the main points of criticism focuses on the significant number of hats that a professional should wear in order to achieve the set objectives successfully, as well as on the multilevel knowledge that one should acquire to fulfil the data scientist's role (Yildirim, 2020). Another point of criticism concerns the need for data scientists in the near future, having as the main argument the continuous automation of data science tasks, which may impede the growing need for data scientists (Saxeva, 2021).
In reality, both of the above arguments have a solid foundation, but this does not mean that there is no area for further elaboration, and, possibly, they neutralize each other. The main advantage of data science is based on the broad responsibilities and fields. While this can be an advantage and disadvantage, one should not perceive data science as "holier-than-thou", as many fields have a similarly broad spectrum of knowledge, such as doctors, IT technicians, biomedical engineers, and others. Additionally, the automation of data-specific tasks has limitations. Having in mind issues, like the non-existence of generic artificial intelligence, but only of task-specific artificial intelligence, which works well only on well-defined tasks, the diversity of data sources, the need for human intervention in data cleansing, the ethical issues in data science, which can be tackled only by human intervention, and the presence of data bias, full automation of data science is impossible to be achieved at this point in time. Big data, by being fast-changing in nature, further support this point.
To summarize, data science is a field that includes a variety of other domains and under which data scientists work by using different techniques to provide value to the stakeholders through data. The data science field is always flourishing but fast-pacing, considering the rate of technology adoption and adaptation of tasks (World Economic Forum, 2020). Professionals shall be dedicated and always on top of forthcoming changes, and the data suggest that higher demand is yet to come.
Having different roles and responsibilities within the greater data science field may result in conflicting priorities and duplication of effort within different team members, which can be described, in general, as "common internal difficulties" (Johns Hopkins University, 2022). The fact that in order to move to the next step of the Data data Analytics Lifecycle, the previous step does not need to be fully completed, but it suffices to have enough progress (EMC Education Services, 2015), adds to the complexity and the overlaps that may exist.
The human factor plays a crucial role in streamlining the processes and minimising conflicts. The team leaders of data science teams have the primary responsibility of ensuring a smooth implementation of the projects within teams. This can be achieved by creating internal documents which are clearly outlining roles and responsibilities, by creating Standard Operational Procedures for the Data Analytics Lifecycle and at the same time by being on top of every arising conflict and working towards de-escalation (Johns Hopkins University, 2022; Day R., 2021). At the same time, the underdevelopment ISO AWI TR 23347 with the title "Statistics — Big Data Analytics — Data Science Life Cycle" (International Organization for Standardization, 2022) may provide more clarity on the Data Science Life Cycle, posing as an essential instructional document that data science operations should use, but this is yet to come.
This is generally of interest, especially while trying to predict the future of data-science-related functions (after all, this is what data science is all about - predicting the future). As correctly mentioned, there are different roles and responsibilities, thus different "work titles" within the data science teams, with some being more impressionable in automation than others. For clarity, this projection will be based on Berkeley's data scientist roles: Data Scientist, Data Analyst, and Data Engineer (Berkeley, 2021).
The growth rate of the above-mentioned professions differs, with data scientist roles having a growth of nearly 1300% from 2013 to 2018, data engineering roles having a growth of a little bit over 450% for the same period and data analyst roles having a growth of nearly 45% (Royal Society, 2019). This indicates differences in the needs of different data-related professions and their related growth rates, which may imply the existence of underlying conditions, such as the shift of needs from one profession to another or the automation (through AI or other means) of specific roles. Even though the above are assumptions and do not respond directly to the question, they provide fundamental insight into the fluidity and change throughout the years.
Having limited predictive information about the future of automation, mainly due to the nature of technical progression (as an example, no one could predict how fast we would go from the first successful aeroplane flight to reaching the moon) which even though it follows patterns, is hard to be pinpointed to specific time-frames but is only outcome-based (Doyne Farmer & Lafond, 2016), one can investigate the changes that AI brought to data-specific professions so far.
As outlined in the initial post, data-related functions are not well-defined, and different tasks may be handed over to different data professionals. Moreover, the AI can automate specific tasks and not every task needed to complete a data science project. Questions such as "which is the business problem" or "which are the assumptions", along with issues such as ethical considerations while implementing data science projects, cannot be automated by machines yet (Adilin, 2021; Li, 2020). Moreover, the advancement of AI, similarly to every other technological advancement) may "destroy" some professions, but others will flourish, and new ones may be created (Adilin, 2021; Nunes, 2021).
In a nutshell, different professions will be affected differently (and disproportionately), depending on the type of AI advancement and the relation of the advancements with the different data-related professions. However, the factors are multidimensional and thus, predicting change is complex and precarious.
To summarize previous points, data science is an assemblage of tools, techniques, methods, and experience of different scientific fields, combined with domain knowledge, and aims to provide value to stakeholders through data (Oracle, 2022; Dhar, 2013; IBM, 2022; Stedman, 2021, AUEB, 2022). Thus, data scientists have many different roles, and their area of expertise can be classified as vertical or horizontal (Mkonto, 2022), with the additional classification of full-stack or specialized data professionals (Kyriacou, 2022; Foley, 2022).
In addition to the above, technological advancement, such as cloud computing, autoML, UI Path, and the expansion of databases (Mkonto, 2022; Kyriacou, 2022) change the nature of data-related tasks from "implementation" to "system handling (Kyriacou, 2022) which may present a future trend.
All these changes and the growth rate of data-scientist demand (Royal Society, 2019) indicate that data science is an ever-growing field. This does not come with opposing opinions presenting data science's "dark side" by indicating that data science should not be considered the "emperor's new clothes", mainly due to the broad area of expertise that professionals need, as well as due to the expansion of AI which may take over some of the tasks typically handled by data scientists (Yildirim, 2020; Saxeva, 2021). Even though automation poses a genuine risk, data science is heavily dependent on human interventions, including but not limited to legal and ethical considerations, such as GDPR compliance, or domain-specific problems, which can only be analyzed in an unstructured way, in a specific form that AI does not yet support (Koilakos, 2022).
Finally, the data are maintained in non-clear formats and are often duplicated and not well preserved. Methods such as Master Data Management (Profisee, 2022) can standardize the maintenance of such data, thus enhancing automation capacities, but are heavily dependent on the human factor.
All the above elements indicate that data science will flourish in the immediate future. Regardless of advancements in automation techniques there will always be an additional need for data professionals to act as interlinks between stakeholders and the used tools (Kyriacou, 2022).
Data protection is one of the most debated terms in recent years and is indissolubly linked
with data-related professions such as data scientists, data analysts and others. Some (Gutwirth, S. et al., 2009) even
considered it a utopic but fundamental right connected with the level of democracy long before data protection was
strictly incorporated into the European Union's directives as a unified document. Since then, data protection has
become more important for governments, with essential steps such as the General Data Protection Regulation (GDPR)
of the European Union, ISO 27001 for Information Security Management, and similar steps equivalent to the GDPR
being taken in the United States, with a recent example being the California Consumer Protection Act (Clark, B.).
The GDPR revolutionalised the data protection regulations in the EU, as it repealed the obsolete data
protection directive 95/46/EC (European Union, 1995) and unified the numerous laws, which, till the issuance
of GDPR, were country-dependant, without specific commonalities, and they lacked the essential element of cross-country
intra-union law-enforcement and judicial cooperation. After the issuance of GDPR, which by being a regulation
is considered an act legally binding for all state parties, with obligatory implementation through embedded
national laws recognising the regulation (European Union), specific provisions were put in place, such
as administrative fines and penalties for non GDPR compliance, joint operations of bodies of different
union members, specific corporate obligations and most importantly the rights of data subjects
(European Union, 2016).
The United Nations High Commissioner for Refugees (UNHCR) has developed and has been
implementing the "Policy on the Protection of Personal Data of Persons of Concern to UNHCR"
(onwards the "Policy") since 2015, as well as the "Data Transformation Strategy" since 2020.
Both documents provide insights into data protection issues, with the former being considered
the GDPR of UNHCR. Similarly to GDPR, UNHCR's Policy discusses areas such as the basic principles
(section 2 of the Policy and Chapter 3 of the GDPR), the rights of data subjects (section 3 of
the Policy and sections 2 to 4 of Chapter 3 of the GDPR), the prerequisites for safe processing and
transferring (sections 4 to 6 of the Policy and chapter 5 of the GDPR) as well as accountability
and supervision matters (section 7 of the Policy and chapter 6 of the GDPR) (European Union, 2016; UNHCR, 2015).
The two readings may be different in nature, as one is a legally binding union document, while the other is
a policy, though, as described above, they possess several similarities and have identical components.
The Policy does not specify Master Data per se. However, with the proper context knowledge and the provided
definition of "Personal Data", one can understand that Master Data are mainly personal data. Thus, processing
should be in line with the Policy. Moreover, provisions such as the data protection focal point, the data
controller (the most senior staff in each operation) and the data protection officer (equaling the data protection
officers under GDPR) significantly contribute to safeguarding the personal data of Persons of Concern.
Additionally, due to UNHCR's mandate, which is the protection of the world's most vulnerable, protecting
their personal data is of utmost importance, as data breaches may endanger their lives and well-being. In this
direction, UNHCR, in addition to the data protection officers, has legal professionals in each operation to
deal with data sharing and data protection issues and technical measures to limit the possibility and impact
of data breaches (UNHCR, 2016).
Finally, as an additional good practice, UNHCR has appointed personnel on data protection issues
in two of the most essential (legally related) units, the Division of Internal Protection and the Legal
Affairs Services. The former is inter alia responsible for protecting persons of concern, while
the latter handles all non-mandate-related legal issues. Doing this ensures that the organisation's
primary functions are data-protection proof with specialised personnel. On the technical
side, technical measures, which are strictly internal and for limited circulation, are in place,
along with data audits, ensuring that the processing is legitimate and fair.
The "Evaluation of UNHCR's data use and information management approaches" (UNHCR, 2019), which resulted
in the issuance of the Data Transformation Strategy (UNHCR, 2019), found that "There needs to be a far
deeper understanding of data protection and data access, especially personal data". To improve the data
protection areas, UNHCR has issued mandatory training, greatly enhanced the data protection capabilities,
and adhered to information and cybersecurity standards, including "privacy by design, by default". The level
of improvement is yet to be discovered through the upcoming evaluation report. The report found no incidents
related to data breaches in the organisation.
To conclude, due to UNHCR's mandate, data protection is one of the most critical components of the organisation's
day-to-day functions. The presence of policies, regulations and officers in charge of data protection issues
significantly improves the organisation's stance on such matters. Moreover, implementing need-to-know basis
access to personal data limits unintentional data breaches.
Abstract
Initially, the Welsh transportation system was created due to the need to move agricultural products and was significantly shaped by Wale's geological characteristics and Wale's industrial history, starting from the early 19th century, affected by the industrial revolution (Welsh Assembly Government, 2008). By cross-referencing the maps of population density, rail network and road network, one can quickly notice the relationship between the density and the existing infrastructure.
Abstract
The Wales transportation system has followed the classic definitions of transportation as a term throughout the years (Cooley, 1894; Hall, 1999; Koilakos, 2022). The transportation system of Wales was initially created due to the need to move agricultural products, and it was also shaped by the later industrial revolution (Welsh Assembly Government, 2008; Koilakos, 2022). As it is visible in the below maps, presenting Wales' population density in relation to its topology, population density in relation to the transportation system and topology in relation to the transportation system, it becomes apparent that the Welsh transportation system is also highly shaped by the current geomorphological and demographical characteristics.