Panagiotis Koilakos

Associate Operational Data Management Officer, UNHCR

MSc in Data Science - The Data Professional

In general

The future of the Data Scientist

Data science has been a renowned and much-discussed term, especially during recent years. Considering that the term is infamous, one would expect a clear definition of the term and the underlying concepts, howbeit the existing definitions differ, as they tend to focus on different aspects of the field, such as the interdisciplinary domains from which data science derives (Oracle, 2022), the techniques used (Stedman, 2021), the aim of the field (AUEB, 2022; IBM, 2022) or a combination of those mentioned above. Some argue that the absence of an unambiguous definition of the term is due to the unclear boundaries between different science domains and related concepts (Provost et al., 2013).

 In an attempt to redefine the term by unifying the existing connotations, aiming to provide a definition that encompasses all the different aspects of the matter at hand, data science may be defined as a "hybrid field that combines elements of different domains, including but not limited to mathematics, statistics, and informatics, by using scientific methods, mainly computational, mathematical, and statistical, aiming to provide insights, answer questions, develop predictions, communicate results, and generally extract knowledge, based on collected data" (Oracle, 2022; Dhar, 2013; IBM, 2022; Stedman, 2021, AUEB, 2022).

 Regardless of whether precisely defining the term of data science is essential, a clear definition further assists in understanding the roles and responsibilities of a data scientist. Similarly to different definitions of data science, the data scientist's work may entail different duties and responsibilities, with the differences being focused on the trade-off of the commitments and functions between data scientists, data analysts, data system developers, data engineers (Royal Society, 2019; Berkleley, 2021), and other related professions. Traditionally data scientists use data science, and as such, the definition provided for data science directly applies to such professionals. By further basing the roles and responsibilities of data scientists on the data science lifecycle (Berkeley, 2021), a data scientist is a professional who directly applies the concept of data science (as defined above) and performs the activities of capturing, maintaining, processing, analysing data and communicating the outcomes. Depending on the field the data scientist works on, the outcome may result in results-based decisions (data-driven decision making) or scientific discovery. Of course, apart from the purely technical aspects of the job, one should possess certain traits such as critical thinking/curiosity and communication/story-telling to undertake a fully-featured analysis and straightforwardly convey the messages and outcomes (Violino, 2018; Stedman, 2021).

 Data science applications are numerous, and virtually all sectors, being business, scientific or public, may create value by using data scientists. This broad spectrum of applications, combined with the present data deluge, makes data scientists a highly sought-after profession. Research found that data scientists' demand rose by nearly 1300% from 2013 to 2018-2019 (Royal Society, 2019) with a projection of a further increase of nearly 30% till 2026 (Berkeley, 2021), and the demand for skills indissolubly linked to the data scientists' skill-set increased by 4% to 80% from 2013 to 2018-2019 (Royal Society, 2019). Similarly, the big-data era further increases the data-related expertise that different sectors need.

 Even though the statistics on the increase in data science demand indicate an ever-expanding need for data scientists, this does not come without dissenting opinions. Due to the broad spectrum of roles and responsibilities of data scientists and the expanse of the domains that data science pertains to, one of the main points of criticism focuses on the significant number of hats that a professional should wear in order to achieve the set objectives successfully, as well as on the multilevel knowledge that one should acquire to fulfil the data scientist's role (Yildirim, 2020). Another point of criticism concerns the need for data scientists in the near future, having as the main argument the continuous automation of data science tasks, which may impede the growing need for data scientists (Saxeva, 2021).

 In reality, both of the above arguments have a solid foundation, but this does not mean that there is no area for further elaboration, and, possibly, they neutralize each other. The main advantage of data science is based on the broad responsibilities and fields. While this can be an advantage and disadvantage, one should not perceive data science as "holier-than-thou", as many fields have a similarly broad spectrum of knowledge, such as doctors, IT technicians, biomedical engineers, and others. Additionally, the automation of data-specific tasks has limitations. Having in mind issues, like the non-existence of generic artificial intelligence, but only of task-specific artificial intelligence, which works well only on well-defined tasks, the diversity of data sources, the need for human intervention in data cleansing, the ethical issues in data science, which can be tackled only by human intervention, and the presence of data bias, full automation of data science is impossible to be achieved at this point in time. Big data, by being fast-changing in nature, further support this point.

 To summarize, data science is a field that includes a variety of other domains and under which data scientists work by using different techniques to provide value to the stakeholders through data. The data science field is always flourishing but fast-pacing, considering the rate of technology adoption and adaptation of tasks (World Economic Forum, 2020). Professionals shall be dedicated and always on top of forthcoming changes, and the data suggest that higher demand is yet to come.

Back to contents

Oracle. (2022) What is data science?. Available from: https://www.oracle.com/data-science/what-is-data-science/ [Accessed 16 March 2022].

Stedman, C. (2021) What is data science? The ultimate guide. Available from: https://www.techtarget.com/searchenterpriseai/definition/data-science [Accessed 17 March 2022].

AUEB. (2022) About Data Science. Available from: https://datascience.aueb.gr/page.php?id=109 [Accessed 17 March 2022]

IBM. (2022) Data Science. Available from: https://www.ibm.com/cloud/learn/data-science-introduction [Accessed 17 March 2022]

Provost, F., Fawcett, T. (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1): 51-59. DOI: 10.1089/big.2013.1508

Dhar V. (2013) Data Science and Prediction. Communications of the ACM 56(12): 64-73. DOI:10.1145/2500499

Royal Society. (2019) Dynamics of data science skills: How can all sectors benefit from data science talent?. Available from: https://royalsociety.org/-/media/policy/projects/dynamics-of-data-science/dynamics-of-data-science-skills-report.pdf?la=en-GB&hash=212DAE7D599B0A48687B372C90DC3FEA [Accessed 10 March 2022]

Berkeley. (2021) What is Data Science?. Available from: https://ischoolonline.berkeley.edu/data-science/what-is-data-science/ [Accessed 18 March 2022]

Violino, B. (March 27, 2018) Essential skills and traits of elite data scientists. CIO. Available from: https://www.cio.com/article/228620/the-essential-skills-and-traits-of-an-expert-data-scientist.html [Accessed 18 March 2022]

Yildirim, S. (2020) The Dark Side of the Sexiest Job of the 21st Century. Available from: https://towardsdatascience.com/the-dark-side-of-the-sexiest-job-of-the-21st-century-fd9c46bf4cae [Accessed 11 March 2022]

Saxena, P. (2021) There Will Be a Shortage of Data Science Jobs in the Next 5 Years?. Available from: https://towardsdatascience.com/there-will-be-a-shortage-of-data-science-jobs-in-the-next-5-years-9f783737ed23 [Accessed 11 March 2022]

World Economic Forum. (2020) The Future of Jobs Report. Available from: https://www3.weforum.org/docs/WEF_Future_of_Jobs_2020.pdf [Accessed 11 March 2022]

Data Scientist Team Conflicts

 Having different roles and responsibilities within the greater data science field may result in conflicting priorities and duplication of effort within different team members, which can be described, in general, as "common internal difficulties" (Johns Hopkins University, 2022). The fact that in order to move to the next step of the Data data Analytics Lifecycle, the previous step does not need to be fully completed, but it suffices to have enough progress (EMC Education Services, 2015), adds to the complexity and the overlaps that may exist.

 The human factor plays a crucial role in streamlining the processes and minimising conflicts. The team leaders of data science teams have the primary responsibility of ensuring a smooth implementation of the projects within teams. This can be achieved by creating internal documents which are clearly outlining roles and responsibilities, by creating Standard Operational Procedures for the Data Analytics Lifecycle and at the same time by being on top of every arising conflict and working towards de-escalation (Johns Hopkins University, 2022; Day R., 2021). At the same time, the underdevelopment ISO AWI TR 23347 with the title "Statistics — Big Data Analytics — Data Science Life Cycle" (International Organization for Standardization, 2022) may provide more clarity on the Data Science Life Cycle, posing as an essential instructional document that data science operations should use, but this is yet to come.

Back to contents

Johns Hopkins University. (2022) Building a Data Science Team. Available from: https://www.coursera.org/lecture/build-data-science-team/common-internal-difficulties-lwSOV [Accessed 26 March 2022]

EMC Education Services. (2015) Data Science and Big Data Analytics: Discovering, Analysing, Visualising and Presenting Data. [n.k.]. [s.I]: Wiley Professional Development (P&T). Available via the Vitalsource Bookshelf. [Accessed 27 March 2022]

Day R. (2021) 4 Things I Didn't Know About Being a Team Lead in Data Science. Available from: https://towardsdatascience.com/4-things-i-didnt-know-about-being-a-team-lead-in-data-science-1f96293cb8aa [Accessed 27 March 2022]

International Organization for Standardization. (2022) ISO/AWI TR 23347. Available from: https://www.iso.org/standard/75289.html [Accessed 27 March 2022]


How will automation & AI affect data-related professionss

 This is generally of interest, especially while trying to predict the future of data-science-related functions (after all, this is what data science is all about - predicting the future). As correctly mentioned, there are different roles and responsibilities, thus different "work titles" within the data science teams, with some being more impressionable in automation than others. For clarity, this projection will be based on Berkeley's data scientist roles: Data Scientist, Data Analyst, and Data Engineer (Berkeley, 2021).

 The growth rate of the above-mentioned professions differs, with data scientist roles having a growth of nearly 1300% from 2013 to 2018, data engineering roles having a growth of a little bit over 450% for the same period and data analyst roles having a growth of nearly 45% (Royal Society, 2019). This indicates differences in the needs of different data-related professions and their related growth rates, which may imply the existence of underlying conditions, such as the shift of needs from one profession to another or the automation (through AI or other means) of specific roles. Even though the above are assumptions and do not respond directly to the question, they provide fundamental insight into the fluidity and change throughout the years.

 Having limited predictive information about the future of automation, mainly due to the nature of technical progression (as an example, no one could predict how fast we would go from the first successful aeroplane flight to reaching the moon) which even though it follows patterns, is hard to be pinpointed to specific time-frames but is only outcome-based (Doyne Farmer & Lafond, 2016), one can investigate the changes that AI brought to data-specific professions so far.

 As outlined in the initial post, data-related functions are not well-defined, and different tasks may be handed over to different data professionals. Moreover, the AI can automate specific tasks and not every task needed to complete a data science project. Questions such as "which is the business problem" or "which are the assumptions", along with issues such as ethical considerations while implementing data science projects, cannot be automated by machines yet (Adilin, 2021; Li, 2020). Moreover, the advancement of AI, similarly to every other technological advancement) may "destroy" some professions, but others will flourish, and new ones may be created (Adilin, 2021; Nunes, 2021).

 In a nutshell, different professions will be affected differently (and disproportionately), depending on the type of AI advancement and the relation of the advancements with the different data-related professions. However, the factors are multidimensional and thus, predicting change is complex and precarious.

Back to contents

Berkeley. (2021) What is Data Science?. Available from: https://ischoolonline.berkeley.edu/data-science/what-is-data-science/ [Accessed 27 March 2022]

Royal Society. (2019) Dynamics of data science skills: How can all sectors benefit from data science talent?. Available from: https://royalsociety.org/-/media/policy/projects/dynamics-of-data-science/dynamics-of-data-science-skills-report.pdf?la=en-GB&hash=212DAE7D599B0A48687B372C90DC3FEA [Accessed 27 March 2022]

Doyne Farmer, J., Lafond, F. (2016) How predictable is technological progress?. Research Policy 45(3): 647-665. DOI: https://doi.org/10.1016/j.respol.2015.11.001

Adilin, B. (2021) The Dystopia is Here, AI is Taking over Data Science Jobs in 2021. Available from: https://www.analyticsinsight.net/the-dystopia-is-here-ai-is-taking-over-data-science-jobs-in-2021/ [Accessed 27 March 2022]

Li, M. (2021) Will automation eliminate data science positions?. Available from: https://techcrunch.com/2020/08/27/will-automation-eliminate-data-science-positions/ [Accessed 27 March 2022]

Nunes, A. (2021) Automation Doesn’t Just Create or Destroy Jobs — It Transforms Them. Available from: https://hbr.org/2021/11/automation-doesnt-just-create-or-destroy-jobs-it-transforms-them [Accessed 27 March 2022]


Summary

 To summarize previous points, data science is an assemblage of tools, techniques, methods, and experience of different scientific fields, combined with domain knowledge, and aims to provide value to stakeholders through data (Oracle, 2022; Dhar, 2013; IBM, 2022; Stedman, 2021, AUEB, 2022). Thus, data scientists have many different roles, and their area of expertise can be classified as vertical or horizontal (Mkonto, 2022), with the additional classification of full-stack or specialized data professionals (Kyriacou, 2022; Foley, 2022).

 In addition to the above, technological advancement, such as cloud computing, autoML, UI Path, and the expansion of databases (Mkonto, 2022; Kyriacou, 2022) change the nature of data-related tasks from "implementation" to "system handling (Kyriacou, 2022) which may present a future trend.

 All these changes and the growth rate of data-scientist demand (Royal Society, 2019) indicate that data science is an ever-growing field. This does not come with opposing opinions presenting data science's "dark side" by indicating that data science should not be considered the "emperor's new clothes", mainly due to the broad area of expertise that professionals need, as well as due to the expansion of AI which may take over some of the tasks typically handled by data scientists (Yildirim, 2020; Saxeva, 2021). Even though automation poses a genuine risk, data science is heavily dependent on human interventions, including but not limited to legal and ethical considerations, such as GDPR compliance, or domain-specific problems, which can only be analyzed in an unstructured way, in a specific form that AI does not yet support (Koilakos, 2022).

 Finally, the data are maintained in non-clear formats and are often duplicated and not well preserved. Methods such as Master Data Management (Profisee, 2022) can standardize the maintenance of such data, thus enhancing automation capacities, but are heavily dependent on the human factor.

 All the above elements indicate that data science will flourish in the immediate future. Regardless of advancements in automation techniques there will always be an additional need for data professionals to act as interlinks between stakeholders and the used tools (Kyriacou, 2022).

Back to contents

Oracle. (2022) What is data science?. Available from: https://www.oracle.com/data-science/what-is-data-science/ [Accessed 16 March 2022].

Stedman, C. (2021) What is data science? The ultimate guide. Available from: https://www.techtarget.com/searchenterpriseai/definition/data-science [Accessed 17 March 2022].

AUEB. (2022) About Data Science. Available from: https://datascience.aueb.gr/page.php?id=109 [Accessed 17 March 2022]

IBM. (2022) Data Science. Available from: https://www.ibm.com/cloud/learn/data-science-introduction [Accessed 17 March 2022]

Dhar V. (2013) Data Science and Prediction. Communications of the ACM 56(12): 64-73. DOI:10.1145/2500499

Mkonto, T. (2022) Initial Post. Available from: https://www.my-course.co.uk/mod/hsuforum/discuss.php?d=302440 [Accessed 27 March 2022]

Kyriacou, C. (2022) Initial Post. Available from: https://www.my-course.co.uk/mod/hsuforum/discuss.php?d=301153 [Accessed 26 March 2022]

Foley, E. (2022) Initial Post. Available from: https://www.my-course.co.uk/mod/hsuforum/discuss.php?d=300412 [Accessed 26 March 2022]

Royal Society. (2019) Dynamics of data science skills: How can all sectors benefit from data science talent?. Available from: https://royalsociety.org/-/media/policy/projects/dynamics-of-data-science/dynamics-of-data-science-skills-report.pdf?la=en-GB&hash=212DAE7D599B0A48687B372C90DC3FEA [Accessed 27 March 2022]

Yildirim, S. (2020) The Dark Side of the Sexiest Job of the 21st Century. Available from: https://towardsdatascience.com/the-dark-side-of-the-sexiest-job-of-the-21st-century-fd9c46bf4cae [Accessed 11 March 2022]

Saxena, P. (2021) There Will Be a Shortage of Data Science Jobs in the Next 5 Years?. Available from: https://towardsdatascience.com/there-will-be-a-shortage-of-data-science-jobs-in-the-next-5-years-9f783737ed23 [Accessed 11 March 2022]

Koilakos, P. (2022) Initial Post. Available from: https://www.my-course.co.uk/mod/hsuforum/discuss.php?d=301038 [Accessed 28 March 2022]

Profisee. (2022) Master Data Management - What, why, how & who. Available from: https://profisee.com/master-data-management-what-why-how-who/ [Accessed 27 March 2022]


IT Code of Conduct

 Data protection is one of the most debated terms in recent years and is indissolubly linked with data-related professions such as data scientists, data analysts and others. Some (Gutwirth, S. et al., 2009) even considered it a utopic but fundamental right connected with the level of democracy long before data protection was strictly incorporated into the European Union's directives as a unified document. Since then, data protection has become more important for governments, with essential steps such as the General Data Protection Regulation (GDPR) of the European Union, ISO 27001 for Information Security Management, and similar steps equivalent to the GDPR being taken in the United States, with a recent example being the California Consumer Protection Act (Clark, B.).

 The GDPR revolutionalised the data protection regulations in the EU, as it repealed the obsolete data protection directive 95/46/EC (European Union, 1995) and unified the numerous laws, which, till the issuance of GDPR, were country-dependant, without specific commonalities, and they lacked the essential element of cross-country intra-union law-enforcement and judicial cooperation. After the issuance of GDPR, which by being a regulation is considered an act legally binding for all state parties, with obligatory implementation through embedded national laws recognising the regulation (European Union), specific provisions were put in place, such as administrative fines and penalties for non GDPR compliance, joint operations of bodies of different union members, specific corporate obligations and most importantly the rights of data subjects (European Union, 2016).

 The United Nations High Commissioner for Refugees (UNHCR) has developed and has been implementing the "Policy on the Protection of Personal Data of Persons of Concern to UNHCR" (onwards the "Policy") since 2015, as well as the "Data Transformation Strategy" since 2020. Both documents provide insights into data protection issues, with the former being considered the GDPR of UNHCR. Similarly to GDPR, UNHCR's Policy discusses areas such as the basic principles (section 2 of the Policy and Chapter 3 of the GDPR), the rights of data subjects (section 3 of the Policy and sections 2 to 4 of Chapter 3 of the GDPR), the prerequisites for safe processing and transferring (sections 4 to 6 of the Policy and chapter 5 of the GDPR) as well as accountability and supervision matters (section 7 of the Policy and chapter 6 of the GDPR) (European Union, 2016; UNHCR, 2015). The two readings may be different in nature, as one is a legally binding union document, while the other is a policy, though, as described above, they possess several similarities and have identical components.

 The Policy does not specify Master Data per se. However, with the proper context knowledge and the provided definition of "Personal Data", one can understand that Master Data are mainly personal data. Thus, processing should be in line with the Policy. Moreover, provisions such as the data protection focal point, the data controller (the most senior staff in each operation) and the data protection officer (equaling the data protection officers under GDPR) significantly contribute to safeguarding the personal data of Persons of Concern.

 Additionally, due to UNHCR's mandate, which is the protection of the world's most vulnerable, protecting their personal data is of utmost importance, as data breaches may endanger their lives and well-being. In this direction, UNHCR, in addition to the data protection officers, has legal professionals in each operation to deal with data sharing and data protection issues and technical measures to limit the possibility and impact of data breaches (UNHCR, 2016).

 Finally, as an additional good practice, UNHCR has appointed personnel on data protection issues in two of the most essential (legally related) units, the Division of Internal Protection and the Legal Affairs Services. The former is inter alia responsible for protecting persons of concern, while the latter handles all non-mandate-related legal issues. Doing this ensures that the organisation's primary functions are data-protection proof with specialised personnel. On the technical side, technical measures, which are strictly internal and for limited circulation, are in place, along with data audits, ensuring that the processing is legitimate and fair.

 The "Evaluation of UNHCR's data use and information management approaches" (UNHCR, 2019), which resulted in the issuance of the Data Transformation Strategy (UNHCR, 2019), found that "There needs to be a far deeper understanding of data protection and data access, especially personal data". To improve the data protection areas, UNHCR has issued mandatory training, greatly enhanced the data protection capabilities, and adhered to information and cybersecurity standards, including "privacy by design, by default". The level of improvement is yet to be discovered through the upcoming evaluation report. The report found no incidents related to data breaches in the organisation.

 To conclude, due to UNHCR's mandate, data protection is one of the most critical components of the organisation's day-to-day functions. The presence of policies, regulations and officers in charge of data protection issues significantly improves the organisation's stance on such matters. Moreover, implementing need-to-know basis access to personal data limits unintentional data breaches.

Back to contents

Gutwirth, S. et al. (2009) Reinvent Data Protection?. 1st ed. s.I.: Springer

Clark, B. (March 18, 2021) GDPR in the USA? New State Legislation Is Making This Closer to Reality. National Law Review. Available from: https://www.natlawreview.com/article/gdpr-usa-new-state-legislation-making-closer-to-reality [Accessed 27 April 2022]

Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data (1995) Directive no. 95/46/EC. n.k. n.k.:31-50. Available from: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:31995L0046 [Accessed 1 May 2022]

European Union. (2022) Types of EU law. Available from: https://ec.europa.eu/info/law/law-making-process/types-eu-law_en [Access 1 May 2022]

REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL (2016) Regulation no. 2016/679. n.k. 2: 2-78. Available from: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:02016R0679-20160504&from=EN [Accessed 3 April 2022]

Policy on the Protection of Personal Data of Persons of Concern to UNHCR (2015) n.k. n.k. 1: 1-48. Available from: https://data2.unhcr.org/en/documents/download/44570 [Accessed 10 April 2022]

UNHCR (2019) Evaluation of UNHCR's data use and information management approaches. Available from: https://www.unhcr.org/5dd4f7d24.pdf [Access 1 May 2022]

UNHCR (2019) DATA TRANSFORMATION STRATEGY 2020-2025 Supporting protection and solutions. Available from: https://www.unhcr.org/5dc2e4734.pdf [Accessed 30 April 2022]

ICRC (2022) Sophisticated cyber-attack targets Red Cross Red Crescent data on 500,000 people. Available from: https://www.icrc.org/en/document/sophisticated-cyber-attack-targets-red-cross-red-crescent-data-500000-people [Accessed 1 May 2022]


Data Analytics Report

Abstract

 Initially, the Welsh transportation system was created due to the need to move agricultural products and was significantly shaped by Wale's geological characteristics and Wale's industrial history, starting from the early 19th century, affected by the industrial revolution (Welsh Assembly Government, 2008). By cross-referencing the maps of population density, rail network and road network, one can quickly notice the relationship between the density and the existing infrastructure.

View embedded View in new page Download and View PDF file
Back to contents

Data Analytics Implementation

Abstract

 The Wales transportation system has followed the classic definitions of transportation as a term throughout the years (Cooley, 1894; Hall, 1999; Koilakos, 2022). The transportation system of Wales was initially created due to the need to move agricultural products, and it was also shaped by the later industrial revolution (Welsh Assembly Government, 2008; Koilakos, 2022). As it is visible in the below maps, presenting Wales' population density in relation to its topology, population density in relation to the transportation system and topology in relation to the transportation system, it becomes apparent that the Welsh transportation system is also highly shaped by the current geomorphological and demographical characteristics.

View embedded View in new page Download and View PDF file Go to Github repository
Back to contents

Feautured Resources