cv
Basics
Name | Maria Levchenko |
Label | NLP Engineer / Python Developer / AI Enthusiast |
marylevchenko@gmail.com | |
Phone | +39 (351) 8120615 |
Url | https://mary-lev.github.io |
Summary | NLP Researcher with PhD in Russian Language and Literature and Python development experience, uniquely blending humanities background with technological skills. Specializing in digital humanities, multilingual digital editions, and computational literary studies. |
Work
- 2020.12 - 2024.01
Python Developer
Welltory Inc.
Welltory is a science-based app that helps you measure your stress & energy levels and improve productivity and health.
- Contributed to the development of a microservices platform (Django, PostgreSQL) for the wellness app with over 7 million users
- Enhanced the message generation pipeline to improve performance and speed (Celery, Redis, RabbitMQ)
- Implemented key performance metrics to facilitate consistent monitoring of system performance (Kibana/ElasticSearch, Grafana, Sentry, Kubernetes)
- Created an AI-based tool to assist the QA team in accelerating the development of test cases (OpenAI API).
- 2019.12 - 2020.12
Data Engineer
EPAM Systems
EPAM Systems, Inc. is a global provider of software engineering and IT consulting services headquartered in Newtown, Pennsylvania, United States. The company has software development centers and branch offices in North America, Europe, Asia and Australia.
- Developed ML model pipelines to recognise tabular data from medical articles and reports (Flask, paddleOCR)
- Implemented ML model evaluation metrics for order volume prediction
- Python, Flask, ML pipelines, NLP algorithms, OCR, AWS
- 2014.07 - 2019.12
Education
-
2022.09 - 2024.12 Bologna, Italy
MA
University of Bologna, Bologna, Italy
Digital Humanities and Digital Knowledge
- Natural Language Processing
- Machine Learning
- Open Science
- Information Modeling
- Knowledge Management
- Network Analysis
-
2000.09 - 2001.12 SPb, Russia
Certificates
DevOps on AWS | ||
AWS / Coursera | 2023-07-30 |
React Basics | ||
Meta / Coursera | 2023-03-12 |
IELTS | ||
IELTS Official | 2022-05-20 |
Django | ||
Stepic.org | 2020-09-15 |
R Programming | ||
Coursera | 2014-11-04 |
The Data Scientist’s Toolbox | ||
Coursera | 2014-08-07 |
Publications
-
2025 Computational Analysis of Literary Communities: Event-Based Social Network Study of St. Petersburg 1999-2019
Journal of Computational Literary Studies
Conference paper accepted for publication
-
2025 Evaluating Named Entity Recognition Models for Russian Cultural News Texts: From BERT to LLM
arXiv preprint
Preprint available on arXiv
-
2025 TEI Encoding as a Unified Structure for Multilingual Digital Editions: The LeggoManzoni Case Study
AIUCD-2025, Verona, Italy
Co-authored with Beatrice Nava and Ersilia Russo
-
2025 AI-Supported Scaffolded Learning for Teaching Python in Digital Humanities Education
ADHO Digital Humanities Conference 2025
Accepted for presentation at ADHO DH 2025
-
2024 Как живут Digital Humanities в Китае: рассказ очевидца
SysBlok.ru
Firsthand account of Digital Humanities development in China
-
2024 Как можно улучшить ответы языковых моделей? Гайд по промтам
SysBlok.ru
Guide on prompt engineering techniques for improving language model responses
-
2024 Commentare – Leggo Manzoni. Quaranta commenti alla Quarantana
Manzoni e Leopardi in digitale. Idee e proposte per la scuola, Clueb, Bologna
Co-authored with Giulia Menna and Beatrice Nava, pages 73-86
-
2024 Mapping Literary Space: A Social Network from the Timeline of Cultural Events
DH2024. Book of Abstracts, Zenodo
Pages 423-424, George Mason University, Washington, DC, USA
-
2024 Automatic Translation Alignment Pipeline for Multilingual Digital Editions of Literary Works
Proceedings of the Computational Humanities Research Conference 2024
Pages 1086-1104, Aarhus University, Denmark
-
2021 Издательские стратегии в поле современной русской поэзии / Publishing Strategies in the Field of Contemporary Russian Poetry
Артикуляция / Artikuljacija, Volume 14
Analysis of contemporary Russian poetry publishing strategies and market dynamics
Skills
Python | |
Django / Django Rest Framework | |
SQL / PostgreSQL | |
Microservices / REST API |
JavaScript | |
React / Redux | |
Node.js / Express.js | |
HTML / CSS / SASS | |
Git / GitHub / GitLab |
Machine Learning / Artificial Intelligence | |
Natural Language Processing | |
Named Entity Recognition / Classification / Clustering | |
Data Analysis / Data Engineering | |
OpenAI / GPT-4 / LLAMA |
CI/CD | |
Git / GitLab CI | |
Docker / Kubernetes |
Languages
Russian | |
Native speaker |
English | |
Fluent |
Italian | |
Intermediate |
Projects
- 2024.01 - 2024.12
BertAlign API: Multilingual Sentence Alignment Service
FastAPI-based web service for multilingual sentence alignment using LaBSE embeddings, developed as part of DiScEPT (Digital Scholarly Editions Platform and aligned Translations). Supports 25 languages, TEI XML documents, and flexible alignment patterns for digital scholarly editions.
- Developed production-ready API service deployed on Google Cloud Run
- Implemented semantic alignment using LaBSE embeddings for 25 languages
- Created specialized TEI XML processing endpoints for digital humanities workflows
- FastAPI, sentence-transformers, Docker, Google Cloud Run, TEI XML
- 2024.01 - 2024.12
ALTO-to-TEI Converter: Document Processing Toolkit
Python toolkit for converting eScriptorium ALTO XML files to TEI format with Segmonto ontology support. Features page-level and book-level conversion modes, YAML-driven configuration, and advanced cross-page paragraph merging for historical document processing.
- Implemented Segmonto ontology compliance for document structure classification
- Developed YAML-driven configuration system for flexible document processing
- Created cross-page paragraph merging with hyphenation handling
- Built both page-level and book-level conversion modes for different use cases
- Python, XML/TEI, YAML, eScriptorium, Segmonto ontology
- 2023.01 - 2025.07
Computational Analysis of Literary Communities
Event-based social network analysis of Saint Petersburg's literary landscape from 1999-2019 using SPbLitGuide newsletter data. Processes 15,012 cultural events involving 11,777 participants across 862 venues to map literary community formation.
- Processed 20 years of cultural event data using Python data science stack
- Applied network analysis algorithms to identify 49 distinct literary communities
- Developed data processing pipeline with entity recognition and geospatial mapping
- Created interactive visualizations and statistical analysis of literary networks
- Python, NetworkX, pandas, NLP, geospatial analysis, data visualization
- 2023.05 - 2024.12
LeggoManzoni: Quaranta commenti alla Quarantana
Interactive web platform for teaching Italian literature, providing 40 critical comments on Alessandro Manzoni's 'The Betrothed'. Developed at University of Bologna with XML/TEI processing and multilingual digital edition capabilities.
- Developed interactive web platform for Italian literature education
- Implemented XML/TEI processing for text-commentary alignment
- Created segment-level text correlation algorithms
- Built responsive interface for educational use in secondary schools
- Node.js, XML/TEI, JavaScript, digital editions, educational technology
- 2020.01 - 2025.06
Zemelah.online: Soviet Jewish Egodocuments Archive
Digital archive with computational text analysis of Soviet-era Jewish egodocuments. Features topic modeling with LDA, GPT-powered chatbot interface, and AI-assisted indexing for historical document discovery and analysis.
- Implemented topic modeling analysis using Latent Dirichlet Allocation (LDA)
- Developed GPT-powered chatbot for thematic document queries
- Created AI-assisted markup editor with pre-annotation using Claude
- Applied sentiment analysis and word embeddings for semantic pattern mapping
- Python, LDA, OpenAI GPT, topic modeling, historical text analysis