cv

Basics

Name Maria Levchenko
Label NLP Engineer / Python Developer / AI Enthusiast
Email marylevchenko@gmail.com
Phone +39 (351) 8120615
Url https://mary-lev.github.io
Summary NLP Researcher with PhD in Russian Language and Literature and Python development experience, uniquely blending humanities background with technological skills. Specializing in digital humanities, multilingual digital editions, and computational literary studies.

Work

  • 2020.12 - 2024.01
    Python Developer
    Welltory Inc.
    Welltory is a science-based app that helps you measure your stress & energy levels and improve productivity and health.
    • Contributed to the development of a microservices platform (Django, PostgreSQL) for the wellness app with over 7 million users
    • Enhanced the message generation pipeline to improve performance and speed (Celery, Redis, RabbitMQ)
    • Implemented key performance metrics to facilitate consistent monitoring of system performance (Kibana/ElasticSearch, Grafana, Sentry, Kubernetes)
    • Created an AI-based tool to assist the QA team in accelerating the development of test cases (OpenAI API).
  • 2019.12 - 2020.12
    Data Engineer
    EPAM Systems
    EPAM Systems, Inc. is a global provider of software engineering and IT consulting services headquartered in Newtown, Pennsylvania, United States. The company has software development centers and branch offices in North America, Europe, Asia and Australia.
    • Developed ML model pipelines to recognise tabular data from medical articles and reports (Flask, paddleOCR)
    • Implemented ML model evaluation metrics for order volume prediction
    • Python, Flask, ML pipelines, NLP algorithms, OCR, AWS
  • 2014.07 - 2019.12
    Independent Researcher
    Concordances 2.0
    • Python
    • web2py
    • MySQL
    • NLP algorithms
    • word2vec
    • sklearn

Education

  • 2022.09 - 2024.12

    Bologna, Italy

    MA
    University of Bologna, Bologna, Italy
    Digital Humanities and Digital Knowledge
    • Natural Language Processing
    • Machine Learning
    • Open Science
    • Information Modeling
    • Knowledge Management
    • Network Analysis
  • 2000.09 - 2001.12

    SPb, Russia

    PhD
    Herzen State Pedagogical University of Russia
    Russian Language and Literature

Certificates

DevOps on AWS
AWS / Coursera 2023-07-30
React Basics
Meta / Coursera 2023-03-12
IELTS
IELTS Official 2022-05-20
Django
Stepic.org 2020-09-15
R Programming
Coursera 2014-11-04
The Data Scientist’s Toolbox
Coursera 2014-08-07

Publications

Skills

Python
Django / Django Rest Framework
SQL / PostgreSQL
Microservices / REST API
JavaScript
React / Redux
Node.js / Express.js
HTML / CSS / SASS
Git / GitHub / GitLab
Machine Learning / Artificial Intelligence
Natural Language Processing
Named Entity Recognition / Classification / Clustering
Data Analysis / Data Engineering
OpenAI / GPT-4 / LLAMA
CI/CD
Git / GitLab CI
Docker / Kubernetes

Languages

Russian
Native speaker
English
Fluent
Italian
Intermediate

Projects

  • 2024.01 - 2024.12
    BertAlign API: Multilingual Sentence Alignment Service
    FastAPI-based web service for multilingual sentence alignment using LaBSE embeddings, developed as part of DiScEPT (Digital Scholarly Editions Platform and aligned Translations). Supports 25 languages, TEI XML documents, and flexible alignment patterns for digital scholarly editions.
    • Developed production-ready API service deployed on Google Cloud Run
    • Implemented semantic alignment using LaBSE embeddings for 25 languages
    • Created specialized TEI XML processing endpoints for digital humanities workflows
    • FastAPI, sentence-transformers, Docker, Google Cloud Run, TEI XML
  • 2024.01 - 2024.12
    ALTO-to-TEI Converter: Document Processing Toolkit
    Python toolkit for converting eScriptorium ALTO XML files to TEI format with Segmonto ontology support. Features page-level and book-level conversion modes, YAML-driven configuration, and advanced cross-page paragraph merging for historical document processing.
    • Implemented Segmonto ontology compliance for document structure classification
    • Developed YAML-driven configuration system for flexible document processing
    • Created cross-page paragraph merging with hyphenation handling
    • Built both page-level and book-level conversion modes for different use cases
    • Python, XML/TEI, YAML, eScriptorium, Segmonto ontology
  • 2023.01 - 2025.07
    Computational Analysis of Literary Communities
    Event-based social network analysis of Saint Petersburg's literary landscape from 1999-2019 using SPbLitGuide newsletter data. Processes 15,012 cultural events involving 11,777 participants across 862 venues to map literary community formation.
    • Processed 20 years of cultural event data using Python data science stack
    • Applied network analysis algorithms to identify 49 distinct literary communities
    • Developed data processing pipeline with entity recognition and geospatial mapping
    • Created interactive visualizations and statistical analysis of literary networks
    • Python, NetworkX, pandas, NLP, geospatial analysis, data visualization
  • 2023.05 - 2024.12
    LeggoManzoni: Quaranta commenti alla Quarantana
    Interactive web platform for teaching Italian literature, providing 40 critical comments on Alessandro Manzoni's 'The Betrothed'. Developed at University of Bologna with XML/TEI processing and multilingual digital edition capabilities.
    • Developed interactive web platform for Italian literature education
    • Implemented XML/TEI processing for text-commentary alignment
    • Created segment-level text correlation algorithms
    • Built responsive interface for educational use in secondary schools
    • Node.js, XML/TEI, JavaScript, digital editions, educational technology
  • 2020.01 - 2025.06
    Zemelah.online: Soviet Jewish Egodocuments Archive
    Digital archive with computational text analysis of Soviet-era Jewish egodocuments. Features topic modeling with LDA, GPT-powered chatbot interface, and AI-assisted indexing for historical document discovery and analysis.
    • Implemented topic modeling analysis using Latent Dirichlet Allocation (LDA)
    • Developed GPT-powered chatbot for thematic document queries
    • Created AI-assisted markup editor with pre-annotation using Claude
    • Applied sentiment analysis and word embeddings for semantic pattern mapping
    • Python, LDA, OpenAI GPT, topic modeling, historical text analysis