Post-Technical Data Practitioner

by Davide Posillipo

The extinction hypothesis

In May 2024, nearly two years ago, I was invited by the University of Milan to share my perspective on the world of Data Science and Data Engineering and its future evolution with graduate students. The exchange was aimed at students on the verge of entering the workforce and about to become, like myself, Data Practitioners (I use the term Data Practitioner as a synonym for Data Worker, encompassing all professionals employed as Data Analysts, Data Engineers, Data Scientists, ML Engineers, etc.). During that session, which I later repeated a year later with a similar audience, I introduced the concept of post-technical Data Science: the idea that in a world where GenAI tools enable software generation, a data practitioner's focus can no longer be on the technical aspects—increasingly becoming a commodity—but must shift to something else. Otherwise, the field itself risks "extinction," struggling to find a meaningful place in companies and society.

During that session, I proposed that Data Practitioners should see themselves as active participants in society, with a fundamental and unique role. Specifically, their background and familiarity with "data as an object" could position them not merely as low-level executors of increasingly automatable routines, but as responsible stewards of information management and sentinels against misinformation and data-related pitfalls. They could also be the ones who manage and influence the entire process of data creation, retrieval, transformation, and use (in short, the entire data journey), actively engaging with other parts of society and organizations.

I closed my session with a somewhat generic appeal to multidisciplinarity, understood as key to realizing this vision and the path to building these data practitioners of the future.

The extinction hypothesis is, of course, just a hypothesis and could be disproven by facts. Specifically, by GenAI that's insufficiently reliable in generating the code data practitioners need, or by the necessity of manual low-level data manipulation due to poor data quality and hyper-specific business logic and rules.

Is data broken for too much technique?

While I'm fairly convinced that GenAI today already enables automation of much of the code needed for standard data practitioner activities—or at least dramatically speeds up production—I believe the second point is harder to disprove. I haven't conducted a quantitative survey, but the vox populi emerging from countless interactions with data practitioners of all types is that data quality is low, that you need to "hammer" the data left and right to make the numbers add up and keep business stakeholders happy, and that while it's nice to sophisticate your approach and make everything automated and production-ready, it's worthless if the numbers don't square up.
Does this mean we're safe? Is the "extinction" hypothesis therefore disproven, because not even GenAI could handle the databases, data lakes, and tables of this world?

It depends on what we mean by "safe." We'll probably still need this manual work for a long time—work that makes data "speak" and bridges the gap between the IT layer and the knowledge needs of those querying the data. But I believe we miss an enormous opportunity if this type of activity is conceived and executed, as it is now, primarily as a "technical" activity. This space between software architecture and the questions of those looking to data for guidance is the space that the post-technical Data Practitioner can and must occupy.

And not just to "save the species." My second hypothesis is that the current state of the information assets of companies, organizations, and society is fragmented, low-quality, and unreliable because it's managed technically, not due to lack of technique. A very strong current trend is proposing the equation data=code, which obviously works perfectly well at a certain level of abstraction and representation (data is obviously also a digital artifact produced by IT routines). My conviction is that treating data as software is limited. Necessary, but not sufficient.

Over these nearly two years, I've spent considerable time reflecting on these two hypotheses, discussing them many times with colleagues and professionals from other disciplines. What are the activities, competencies, and mindset that should characterize a post-technical Data Practitioner?

The PTDP: a new role in four pillars

Today I feel I can articulate a perspective that's beginning to take shape, and which I'm grounding in reality through Komebi Studio, that I co-founded.
The post-technical Data Practitioner (PTDP) embraces the most recent frameworks dedicated to new ways of thinking about data and AI, such as Sabina Leonelli's Environmental Intelligence and Fabio Ferrari's EthnoAI, and builds their work around four pillars, shown in the image above as four green circles:
Each of these pillars represents a field of action where the PTDP can contribute by engaging with other professionals, always mediating between technological and engineering possibilities and business needs within the broader context in which the business operates.

Holistic Data Engineering is the effort to think about, define, and structure data without falling into the extremes of "data as software" or, on the opposite side, "datafication as absolute evil." It's the idea that recording digital traces of our common living can make sense and be useful, but that this utility can only be defined through dialogue among the parties involved (those who create the data, those who collect it, those who use it) and a real, "embodied and situated" understanding of the context.

Machine-Human Learning means bringing the human being back into the machine learning workflow. Not as an afterthought or "ethics" checklist, but as an integral part of building AI models (this pillar deliberately aligns closely with the concept of Environmental AI and the EthnoAI framework I referenced earlier), blending each technical decision with ideas and considerations coming from the understanding of broader context and wider implications. And not just because it's right, but because it works better.

Knowledge Analysis and Magnification covers the entire set of activities and design phases where data—whether constructed, collected, or synthesized through a model—is actually put to use. Here the PTDP no longer limits themselves to interpreting data merely as data, but manages to broaden their view and consider the dynamics underlying data generation, the people and entities involved, without fearing to actively and consciously influence critical decisions.

Impactful Changes Design is the set of activities that can elevate the PTDP to a new level of relevance within organizations and society. No longer just a "guardian" and administrator of data and knowledge, they interpret themselves as front-line actors in the dialectic of democracy and business. They do this through an approach and actions strongly tied to design thinking and systems design, as well as through active participation in collective life.For each pillar, I've identified four fundamental and new activities—not already part of a Data Practitioner's standard workflow—that enable the PTDP to make a difference in this particular middle ground where they operate. These activities are shown in the image as yellow rectangles and will be described in detail in articles I'll publish subsequently.

The way forward

These pillars, even from this initial summary description, convey the idea of a profile very different from the typical one we encounter today in the data industry. Is it possible to cover so many different aspects while remaining credible and sufficiently deep in each? In other words, is the PTDP something achievable or just an unattainable chimera? This is my third and final hypothesis: I believe that by carefully blending different disciplines and experiences, we can create a profile capable of engaging with the four pillars described. In the image, I've included, as a blue flow, twelve fundamental disciplines that can help a Data Practitioner make the transition to PTDP: the order follows a logical and conceptual descend from the foundational layer provided by Data Engineering to the broader and more dynamic aspects related to creating and managing impact.
In the coming months, I intend to advance these reflections by:
Komebi Studio Srl
Corso Luigi Manusardi 3,
20136 Milan (MI)
Italy
P.IVA IT14519690961
Cap. Soc. 20.000,00 EURO I.V.
komebistudio@pec.it