**Research and Commercial Research projects of Mikhail Langovoy**

I am doing research on methodology and business applications of statistics, machine learning and applied mathematics, with the main focus being on developing methods and algorithms for extracting the value from massive complicated datasets encountered in practice nowadays. This includes analysis of structured data (such as networks), of dynamic data (such as time series and stochastic processes), as well as methods and algorithms for real-time data analysis.

**Statistics of networks and structures, statistical and probabilistic image analysis, spatial statistics.**

I developed a novel unsupervised nonparametric method for detection of signals and anomalies and reconstruction of images and clusters in the presence of random noise of possibly extreme characteristics. We are able to detect and estimate not only regular signals in the images or networks, but also weak objects of unknown shape, as well as fine structures such as curves (even those that are not visible by the human eye), or small clusters in networks.

The new algorithms have linear and sub-linear computational complexity and exponential accuracy and are therefore appropriate for real-time systems. Our analysis is mathematically rigorous. Each of the algorithms has a built-in data-driven stopping rule, so there is no need in human assistance to stop the algorithm at an appropriate step.

Application fields include nanotechnology, image analysis, robot vision, mathematical biology and network analysis. Important contributions to this development have been made by Professors Michael Habeck, Olaf Wittich, Laurie Davies and Bernhard Schölkopf.

**Statistics, machine learning and optimization for giant networks, statistics of mobile and wireless networks and of the Internet**

A wide variety of applications in machine learning, data mining and related areas involve large-scale graphs. A useful step in analyzing such graphs is to obtain certain summary statistics about the graph, as these statistics provide insight into the structure of a graph. Estimating them helps predict properties of the entire graph without having to actually look at the whole graph. This is generally very practical, and can be the only option in those cases where the whole network is not observable in principle. This is the case for Internet, to name just one example.

Motivated by this, me and Suvrit Sra proposed a novel approach to studying statistical properties of structured subgraphs (of a given graph) and, in particular, to estimate the expected objective function value of a combinatorial optimization problem over these subgraphs.

We have shown that, even for regular graphs, very surprising phenomena occur when a property of the whole network is studied via considering random subgraphs of this network. In particular, statistical estimators exhibit nontrivial behavior, and their consistency depends on the number of unexpected conditions.

These estimators can be consistent even when they are based on a “small” graph; this provides theoretical grounds for replacing processing of giant networks by processing suitably constructed, smaller networks.

I applied these results to analysis of statistical and dynamical properties of a structured, dynamic, spatially distributed mobile phone network of Orange in Ivory Coast, and to analysis and modeling of both Smart Grids and transportation networks.

**Active learning of manifolds and geometric statistics for computer vision, design of experiments, customer perception prediction**

Since 2013, I am leading statistics and machine learning research, as well as data analysis activities, within the international Industrial Joint Research Program xDReflect of European Association of National Metrology Institutes.

**Machine learning theory and support vector machines**

In my development of new methodologies for machine learning theory and support vector machines (SVMs), I combine techniques from both machine learning and mathematical statistics.

For research in this area, I was awarded the Max Planck Society Grant “Statistical learning theory for autonomous systems”. The grant has started in October 2011, with me as a principal investigator, and lasted until my departure for a permanent position at Bell Labs.

Support Vector Machines (SVMs) are one of the most popular and successful classes of learning algorithms used in Artificial Intelligence (AI) systems. SVMs are often used by AI systems to perform automated classification of objects. Typically, it is assumed that the number of possible classes is fixed in advance. However, modern autonomous and intelligent systems have to perform in an uncertain environment where the types of objects and necessary number of classes are unknown.

I designed SVMs of a new generation, called ISVMs, which are adapting to these uncertainties, and proved that the ISVMs solve the classification problem with an unknown number of classes. In a joint effort with Bernhard Schölkopf, universal consistency of ISVMs was proved.

Another important line in my research is to develop methodology for unsupervised learning by combining recent advances in both learning theory and nonparametric statistics. This approach has already lead to new results in unsupervised learning for image and network analysis.

**Time series analysis and prediction, Smart Grid and energy research**

**Statistical inverse problems, inference for stochastic processes, nonparametric testing and machine learning**

**Stochastic analysis**

**Research Experience (keywords)**

– Statistics of networks and of the Internet

– Big Data analytics and scalable algorithms

– Machine learning theory, unsupervised learning

– Support vector machines

– Mobile and wireless networks analysis

– Statistical modeling and traffic analysis

– Structured, dynamic, spatially distributed data

– Computational advertising

– Games research

– Spatial statistics and statistical image analysis

– Time series analysis for Smart Grid

– Econometrics and financial statistics

– Randomized algorithms

– Statistical inverse problems and their applications

– Statistics for stochastic processes, nonparametric statistics

– Stochastic analysis and quantitative finance

– Heavy-tailed distributions, strong dependencies and extremes

– Nonstandard statistics: singular or discrete models, unusual limit distributions

– Statistical quality control and sampling strategies

– Geometric statistics and learning on manifolds for computer vision

– Statistical design of experiments, active learning, online learning

– Customer perception analysis and prediction, A/B testing