cv | Maria Levchenko

Basics

Name	Maria Levchenko
Label	NLP Engineer / Python Developer / AI Enthusiast
Email	marylevchenko@gmail.com
Phone	+39 (351) 8120615
Url	https://mary-lev.github.io
Summary	NLP Researcher with PhD in Russian Language and Literature and Python development experience, uniquely blending humanities background with technological skills. Specializing in digital humanities, multilingual digital editions, and computational literary studies.

Work

2020.12 - 2024.01
Python Developer

Welltory Inc.

Welltory is a science-based app that helps you measure your stress & energy levels and improve productivity and health.
- Contributed to the development of a microservices platform (Django, PostgreSQL) for the wellness app with over 7 million users
- Enhanced the message generation pipeline to improve performance and speed (Celery, Redis, RabbitMQ)
- Implemented key performance metrics to facilitate consistent monitoring of system performance (Kibana/ElasticSearch, Grafana, Sentry, Kubernetes)
- Created an AI-based tool to assist the QA team in accelerating the development of test cases (OpenAI API).
2019.12 - 2020.12
Data Engineer

EPAM Systems

EPAM Systems, Inc. is a global provider of software engineering and IT consulting services headquartered in Newtown, Pennsylvania, United States. The company has software development centers and branch offices in North America, Europe, Asia and Australia.
- Developed ML model pipelines to recognise tabular data from medical articles and reports (Flask, paddleOCR)
- Implemented ML model evaluation metrics for order volume prediction
- Python, Flask, ML pipelines, NLP algorithms, OCR, AWS
2014.07 - 2019.12
Independent Researcher

Concordances 2.0
- Python
- web2py
- MySQL
- NLP algorithms
- word2vec
- sklearn

Education

2022.09 - 2024.12

Bologna, Italy
MA

University of Bologna, Bologna, Italy

Digital Humanities and Digital Knowledge
- Natural Language Processing
- Machine Learning
- Open Science
- Information Modeling
- Knowledge Management
- Network Analysis
2000.09 - 2001.12

SPb, Russia
PhD

Herzen State Pedagogical University of Russia

Russian Language and Literature

Certificates

	DevOps on AWS
	AWS / Coursera	2023-07-30

	React Basics
	Meta / Coursera	2023-03-12

	IELTS
	IELTS Official	2022-05-20

	Django
	Stepic.org	2020-09-15

	R Programming
	Coursera	2014-11-04

	The Data Scientist’s Toolbox
	Coursera	2014-08-07

Publications

2025

Computational Analysis of Literary Communities: Event-Based Social Network Study of St. Petersburg 1999-2019

Journal of Computational Literary Studies

Conference paper accepted for publication
2025

Evaluating Named Entity Recognition Models for Russian Cultural News Texts: From BERT to LLM

arXiv preprint

Preprint available on arXiv
2025

TEI Encoding as a Unified Structure for Multilingual Digital Editions: The LeggoManzoni Case Study

AIUCD-2025, Verona, Italy

Co-authored with Beatrice Nava and Ersilia Russo
2025

AI-Supported Scaffolded Learning for Teaching Python in Digital Humanities Education

ADHO Digital Humanities Conference 2025

Accepted for presentation at ADHO DH 2025
2024

Как живут Digital Humanities в Китае: рассказ очевидца

SysBlok.ru

Firsthand account of Digital Humanities development in China
2024

Как можно улучшить ответы языковых моделей? Гайд по промтам

SysBlok.ru

Guide on prompt engineering techniques for improving language model responses
2024

Commentare – Leggo Manzoni. Quaranta commenti alla Quarantana

Manzoni e Leopardi in digitale. Idee e proposte per la scuola, Clueb, Bologna

Co-authored with Giulia Menna and Beatrice Nava, pages 73-86
2024

Mapping Literary Space: A Social Network from the Timeline of Cultural Events

DH2024. Book of Abstracts, Zenodo

Pages 423-424, George Mason University, Washington, DC, USA
2024

Automatic Translation Alignment Pipeline for Multilingual Digital Editions of Literary Works

Proceedings of the Computational Humanities Research Conference 2024

Pages 1086-1104, Aarhus University, Denmark
2021

Издательские стратегии в поле современной русской поэзии / Publishing Strategies in the Field of Contemporary Russian Poetry

Артикуляция / Artikuljacija, Volume 14

Analysis of contemporary Russian poetry publishing strategies and market dynamics

Skills

	Python
	Django / Django Rest Framework
	SQL / PostgreSQL
	Microservices / REST API

	JavaScript
	React / Redux
	Node.js / Express.js
	HTML / CSS / SASS
	Git / GitHub / GitLab

	Machine Learning / Artificial Intelligence
	Natural Language Processing
	Named Entity Recognition / Classification / Clustering
	Data Analysis / Data Engineering
	OpenAI / GPT-4 / LLAMA

	CI/CD
	Git / GitLab CI
	Docker / Kubernetes

Languages

	Russian
	Native speaker

	English
	Fluent

	Italian
	Intermediate

Projects

2024.01 - 2024.12
BertAlign API: Multilingual Sentence Alignment Service

FastAPI-based web service for multilingual sentence alignment using LaBSE embeddings, developed as part of DiScEPT (Digital Scholarly Editions Platform and aligned Translations). Supports 25 languages, TEI XML documents, and flexible alignment patterns for digital scholarly editions.
- Developed production-ready API service deployed on Google Cloud Run
- Implemented semantic alignment using LaBSE embeddings for 25 languages
- Created specialized TEI XML processing endpoints for digital humanities workflows
- FastAPI, sentence-transformers, Docker, Google Cloud Run, TEI XML
2024.01 - 2024.12
ALTO-to-TEI Converter: Document Processing Toolkit

Python toolkit for converting eScriptorium ALTO XML files to TEI format with Segmonto ontology support. Features page-level and book-level conversion modes, YAML-driven configuration, and advanced cross-page paragraph merging for historical document processing.
- Implemented Segmonto ontology compliance for document structure classification
- Developed YAML-driven configuration system for flexible document processing
- Created cross-page paragraph merging with hyphenation handling
- Built both page-level and book-level conversion modes for different use cases
- Python, XML/TEI, YAML, eScriptorium, Segmonto ontology
2023.01 - 2025.07
Computational Analysis of Literary Communities

Event-based social network analysis of Saint Petersburg's literary landscape from 1999-2019 using SPbLitGuide newsletter data. Processes 15,012 cultural events involving 11,777 participants across 862 venues to map literary community formation.
- Processed 20 years of cultural event data using Python data science stack
- Applied network analysis algorithms to identify 49 distinct literary communities
- Developed data processing pipeline with entity recognition and geospatial mapping
- Created interactive visualizations and statistical analysis of literary networks
- Python, NetworkX, pandas, NLP, geospatial analysis, data visualization
2023.05 - 2024.12
LeggoManzoni: Quaranta commenti alla Quarantana

Interactive web platform for teaching Italian literature, providing 40 critical comments on Alessandro Manzoni's 'The Betrothed'. Developed at University of Bologna with XML/TEI processing and multilingual digital edition capabilities.
- Developed interactive web platform for Italian literature education
- Implemented XML/TEI processing for text-commentary alignment
- Created segment-level text correlation algorithms
- Built responsive interface for educational use in secondary schools
- Node.js, XML/TEI, JavaScript, digital editions, educational technology
2020.01 - 2025.06
Zemelah.online: Soviet Jewish Egodocuments Archive

Digital archive with computational text analysis of Soviet-era Jewish egodocuments. Features topic modeling with LDA, GPT-powered chatbot interface, and AI-assisted indexing for historical document discovery and analysis.
- Implemented topic modeling analysis using Latent Dirichlet Allocation (LDA)
- Developed GPT-powered chatbot for thematic document queries
- Created AI-assisted markup editor with pre-annotation using Claude
- Applied sentiment analysis and word embeddings for semantic pattern mapping
- Python, LDA, OpenAI GPT, topic modeling, historical text analysis

Basics

Work

Welltory Inc.

Welltory is a science-based app that helps you measure your stress & energy levels and improve productivity and health.

EPAM Systems

EPAM Systems, Inc. is a global provider of software engineering and IT consulting services headquartered in Newtown, Pennsylvania, United States. The company has software development centers and branch offices in North America, Europe, Asia and Australia.

Concordances 2.0

Education

University of Bologna, Bologna, Italy

Digital Humanities and Digital Knowledge

Herzen State Pedagogical University of Russia

Russian Language and Literature

Certificates

Publications

Journal of Computational Literary Studies

Conference paper accepted for publication

arXiv preprint

Preprint available on arXiv

AIUCD-2025, Verona, Italy

Co-authored with Beatrice Nava and Ersilia Russo

ADHO Digital Humanities Conference 2025

Accepted for presentation at ADHO DH 2025

SysBlok.ru

Firsthand account of Digital Humanities development in China

SysBlok.ru

Guide on prompt engineering techniques for improving language model responses

Manzoni e Leopardi in digitale. Idee e proposte per la scuola, Clueb, Bologna

Co-authored with Giulia Menna and Beatrice Nava, pages 73-86

DH2024. Book of Abstracts, Zenodo

Pages 423-424, George Mason University, Washington, DC, USA

Proceedings of the Computational Humanities Research Conference 2024

Pages 1086-1104, Aarhus University, Denmark

Артикуляция / Artikuljacija, Volume 14

Analysis of contemporary Russian poetry publishing strategies and market dynamics

Skills

Languages

Projects

FastAPI-based web service for multilingual sentence alignment using LaBSE embeddings, developed as part of DiScEPT (Digital Scholarly Editions Platform and aligned Translations). Supports 25 languages, TEI XML documents, and flexible alignment patterns for digital scholarly editions.

Python toolkit for converting eScriptorium ALTO XML files to TEI format with Segmonto ontology support. Features page-level and book-level conversion modes, YAML-driven configuration, and advanced cross-page paragraph merging for historical document processing.

Event-based social network analysis of Saint Petersburg's literary landscape from 1999-2019 using SPbLitGuide newsletter data. Processes 15,012 cultural events involving 11,777 participants across 862 venues to map literary community formation.

Interactive web platform for teaching Italian literature, providing 40 critical comments on Alessandro Manzoni's 'The Betrothed'. Developed at University of Bologna with XML/TEI processing and multilingual digital edition capabilities.

Digital archive with computational text analysis of Soviet-era Jewish egodocuments. Features topic modeling with LDA, GPT-powered chatbot interface, and AI-assisted indexing for historical document discovery and analysis.