Short bio

I’m originally from a beautiful Eastern European country called Moldova, where I graduated with a software engineering license.

Orheiul Vechi
My favorite place from Moldova. Its name is ‘Orheiul Vechi’, in English ‘Old Orhei’. A natural and historical complex. It is located around 50km from the capital of Moldova (Chisinau). I strongly recommend you visit this place if you are planning to visit Moldova.

In the end, I gained experience developing desktop and web applications. I was very passionate, but I decided to move to France. I earned my master’s degree in Computer Science, specializing in Data Mining and Exploration EID2, at the University of Paris 13. In addition, I obtained a Ph.D. in Statistical Learning at the University of Toulon. I had different opportunities to gain experience in academic and company environments. Today, I am working on the 11 projects proposed by the Open Classroom course for Data Scientist: Artificial Intelligence Engineer to improve my skills.

My professional interests are in AI: “machine learning,” “deep learning,” and their applications to text, video, and audio.

My hobbies are sports (running đŸƒïž, foot, chess), travel🧳, and movies/series.

On weekends I am the happiest man spending time with my family - my wife and our boys đŸ‘šâ€đŸ‘©â€đŸ‘Šâ€đŸ‘Š.

You can get my short CV in pdf format, or you can consult it in more detail in digital form on this page.

I am looking forward 👀 to getting in touch with you and working on some fascinating projects.

Primary Skills as a Data Scientist

  • Process structured or unstructured data
  • Control data quality, detect patterns, outliers
  • Propose and apply statistical models (regressions, etc.) or data science (machine learning, etc.) to meet business challenges
  • Collaborate with technical and business teams to define needs and explain results
  • Present the results (reports, presentations, etc.)
  • Implement and develop algorithms and tools in Python or R, relying on associated statistical and data science libraries
  • Develop and industrialize data science algorithms (machine learning, NLP, etc.)

Technical skills

  • Programming languages. As a software engineer, I have studied many programming languages. These are the 3 main languages used in my work as a Data Scientist:
    • Python (is the primary programming language I use all day)
    • Matlab (used during my Ph.D. and Master 2)
    • R (contributed to the development of an open-source R package that is available on GitHub link )
    • SQL

Soft Skills

  • Result Oriented: Every week, I archive my goals. I can use different tools like MindNote, TodoList, and Notion to organize my tasks and work. Through each experience, I achieved my goals on time, and I never turned back.

  • Time management: I like to manage my time and use it as optimally as possible. A google calendar is a great tool to organize and manage your time.

  • Motivation to learn: I consider myself very motivated to learn and acquire new skills each day. This skill helped me a lot in passing through my Ph.D. and then working on different machine-learning projects as a PostDoc. I stay up to date with the state of the art in machine learning technology.

  • Analytical mind: I developed this skill as a Ph.D. in Computer Science during all of my previous experiences.

  • Conversation: I like interacting with the team and the clients we work with.

  • Empathy: I like to listen, understand and share the feelings of others.

Currently occupation

image-left

Currently a student at OpenClassroom as an Engineer in Artificial Intelligence. I’m working on 11 projects which carry Data Analysis and Exploration, Data pre-processing, Supervised and Non-supervised modeling, Deep Learning, NLP, Cloud Computing, Visualization, and Model Deployment. Here, the full description of this online learning path is available.

Project 1: Language Translator

Project 8: Semantic segmentation for autonomy future cars

Project 10: ChatBot for booking flights

Experiences

AI Engineer at Centre LĂ©on Berard & INSA, Lyon, France

image-left September 2020 – February 2022

Description: Automatic segmentation of 3D lung nodules through deep learning. Lung cancer is one of the leading causes of cancer death worldwide. Characterization of lung tumors should be done by machine learning using radiomics techniques. Therefore we need to automatically segment lung nodules in CT images. To segment lung tumors, we use deep learning which gives reliable results in terms of quality, robustness, and computation time.

Project

AI Engineer at University of Caen, Caen France

image-left September 2018 – February 2020

Description: The task was to participate in the creation of a data science platform dedicated to the unsupervised classification of high-dimensional data. The first step concerns prototyping developed algorithms by the members of the project AStERiCs, and their integration into this platform on various real applications. These are unsupervised classification algorithms based on latent variable models. The second step is to integrate the algorithms is developed during the project. A part of the work is carried out in collaboration with the LMNO lab on distributed regularized mixture models with environmental applications / genomic sequences. The main missions are in unsupervised learning models, Prototype unsupervised learning algorithms, High performance distributed cloud computing, and Web integration and interfacing with the platform.

Engineer R&D at Orange Labs, Lanion, France

image-left March 2017 – February 2018

Description: Co-clustering is a data mining technique that aims at identifying the underlying structure between the rows and the columns of a data matrix in the form of homogeneous blocks. It finds many real-world applications, however many current co-clustering algorithms are not suited to large-size data sets. One of the successfully used approaches to co-cluster big-sized data reaching millions of instances and tens of thousands of values per dimension is the MODL co-clustering method that optimizes a criterion based on a regularized likelihood. However, difficulties are encountered with up to billions of values per variable. This post-doc focuses on developing a co-clustering algorithm, given that the MODL criterion allows us to efficiently deal with very large data sets that do not fit in memory. My work was to co-cluster large-scale data sets in a reasonable time by using less RAM memory than existing co-clustering techniques. Real-world data sets with variable pairs of values: Texts-Words, Source-Target web sites, and User-Films.

Github

PostDoc researcher at Toulon University, Toulon France

image-left December 2015 – February 2017

Description: I worked on the two applications that are Humpback whale song structuration and Birds classification. For the humpback structuration was to segment the signal to give different hypothesis of these song. Bayesian non-parametric learning approaches are used to automatically decompose the whale signal and produce song units that are considered as kind of whale alphabet. For the second application the task was to classify the bird songs in an automatic way to identify bird species in some recording. Also in a context when several species are present in a recording, the goal is to find the foreground specie (the one that appears closes to the microphone), this induced to a single label classification problem.

Ph.D researcher at Toulon University, Toulon France

image-left September 2011 – October 2015

Description: This thesis focuses on statistical learning and multi-dimensional data analysis. It particularly focuses on unsupervised learning of generative models for model-based clustering. We study the Gaussians mixture models, in the context of maximum likelihood estimation via the EM algorithm, as well as in the Bayesian estimation context by maximum a posteriori via Markov Chain Monte Carlo (MCMC) sampling techniques. We mainly consider the parsimonious mixture models which are based on a spectral decomposition of the covariance matrix and provide a flexible framework particularly for the analysis of high-dimensional data. Then, we investigate non-parametric Bayesian mixtures which are based on general flexible processes such as the Dirichlet process and the Chinese Restaurant Process. This non-parametric model formulation is relevant for both learning the model, as well for dealing with the issue of model selection. We propose new Bayesian non-parametric parsimonious mixtures and derive a MCMC sampling technique where the mixture model and the number of mixture components are simultaneously learned from the data. The selection of the model structure is performed by using Bayes Factors. These models, by their non-parametric and sparse formulation, are useful for the analysis of large data sets when the number of classes is undetermined and increases with the data, and when the dimension is high. The models are validated on simulated data and standard real data sets. Then, they are applied to a real difficult problem of automatic structuring of complex bioacoustic data issued from whale song signals. Finally, we open Markovian perspectives via hierarchical Dirichlet processes hidden Markov models.

Education

image-center

Ph.D. in Computer Science

image-left 2012 - 2015 Bac+8
Bayesian non-parametric parsimonious mixtures for model-based clustering
University of Toulon France

You can download the pdf here

Master 2 in Data Exploration and decision-making

image-left 2012 Bac+5
University Paris 13

Most important Skills:

  • capable of understanding a complex computer problem
  • general computer science learning for research and professional approaches
  • gain in-depth knowledge of artificial learning (Machine Learning), data science (Data Science), and decision-making computing (Business Intelligence).
  • Data exploration and exploitation tools
  • Ability to conduct fundamental or applied research work on a well-targeted problem in the field of data science and artificial intelligence.

    Here is the link to the full path.

License Engineer IT

image-left 2010 Bac+4
Technical University of Moldova

Bac in Engineering

image-left 2006 Bac
Technical University of Moldova

Publications

Publications International Conferences

1) Marius Bartcus, Faicel Chamroukhi, and Hervé Glotin. Hierarchical Dirichlet Process Hid- den Markov Model for Unsupervised Bioacoustic Analysis. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, July 2015

2) Marius Bartcus and Faicel Chamroukhi. Hierarchical Dirichlet Process Hidden Markov Model for unsupervised learning from bioacoustic data. In Proceedings of the International Conference on Machine Learning (ICML) workshop on unsupervised learning from big bioacoustic data (uLearnBio), Beijing, China, June 2014

3) Marius Bartcus, Faicel Chamroukhi, Joseph Razik, and HervĂ© Glotin. Unsupervised whale song decomposition with Bayesian non-parametric Gaussian mixture. In Proceedings of the Neural Information Processing Systems (NIPS), workshop on Neural Information Processing Scaled for Bioacoustics : NIPS4B, pages 205–211, Nevada, USA, December 2013

4) Faicel Chamroukhi, Marius Bartcus, and Hervé Glotin. Bayesian Non-Parametric Parsimonious Clustering. In Proceedings of 22nd European Symposium on Artifcial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgium, April 2014

5) Faicel Chamroukhi, Marius Bartcus, and Hervé Glotin. Bayesian Non-Parametric Parsimonious Gaussian Mixture for Clustering. In Proceedings of 22nd International Conference on Pattern Recognition (ICPR), Stockholm, Sweden, August 2014

Francophone conferences

1) Marius Bartcus, Marc Boullé, and Fabrice Clérot. A two level co-clustering algorithm for very large data sets. In 18Úme Journées Francophones Extraction et Gestion des Connaissances, EGC, 2018

2) Marius Bartcus, Faicel Chamroukhi, and HervĂ© Glotin. Clustering BayĂ©sien Parcimonieux Non-ParamĂ©trique. In Proceedings of 14Ăšmes JournĂ©es Francophones Extraction et Gestion des Connaissances (EGC), Atelier CluCo : Clustering et Co-clustering, pages 3–13, Rennes, France, Janvier 2014. The presentation is available here

International Jurnals (In preparation)

1) Vincent Roger, Marius Bartcus, Faicel Chamroukhi, Hervé Glotin. Unsupervised Bioacoustic Segmen- tation by Hierarchical Dirichlet Process Hidden Markov Model. Multimedia Tools and Applications for Environmental & Biodiversity Informatics, 2018. hal-01879385

2) Faicel Chamroukhi, Marius Bartcus, and Hervé Glotin. Dirichlet Process Parsimonious Mixture for clustering, January 2015. Preprint, 35 pages, available online arXiv :501.03347

3) Marius Bartcus, Vincent Roger, Faicel Chamroukhi, and Hervé Glotin. Unsupervised learning of acoustic sequences in non-human animals with HDP-HMM applied to whales and birds songs vocalization, 2015. To be submitted to PLOS BIOLOGY journal

Invited and contributed seminars

1) Bartcus Marius. Dirichlet Process Parsimonious Mixture. Seminaire IFSTTAR - COSYS - GRETTIA Equipe Diagnostic et Maintenance, 2015

2) Bartcus Marius. Clustering Bayésien Parcimonieux Non-Paramétrique. Journées des docto- rants, Seminaire LSIS Laboratory, 2014

Thesis

1) Marius Bartcus. Bayesian non-parametric parsimonious mixtures for model-based clustering. PhD thesis, University of Toulon, 28 octobre 2015

2) Bartcus Marius. Nonnegative Matrix Factorization for Unsupervised Learning. Master’s thesis, LIPN UMR CNRS 7030, UniversitĂ© Paris 13, Villetaneuse, France, September 2012

Languages

image-left French
English
Lithuanian
Romanian
Russe