Machine Learning & Data Science

Sakura provides a complete Managed Service backed by our Data expertise.

We define, design, and deliver data science microservices, machine learning products, and systems. Our Data Science team provides research and implementation of different machine learning algorithms and techniques for tasks such as item classification, product recognition and matching, attribute extraction. They also can conduct design and code reviews and assist you to productionize machine learning solutions. Selecting the right platform to suit your use case is difficult, and we are here to help cut through the marketing spin and identify the solutions that will work for you, our advisory team are here when you need the right implementation with the right skills to ensure success. The Sakura Data Science and Data Engineering team have proven commercial experience in bringing machine learning products and methods into production.

Statistical Methods

Statistical methods the Sakura team work with include some of the more common methods often used across projects as well as some of the less common methods that are typically implemented by highly skilled Data Scientists. Here are some of the current methods we work with:

  • Generalized linear models, which form the basis of most supervised machine learning methods including logistic regression.
  • Time series methods including ARIMA, SSA, and other machine learning based approaches.
  • Structural equation modeling that allows you to model and test mediated pathways.
  • Factor analysis including exploratory and confirmatory for survey design and validation.
  • Power analysis and trial design particularly simulation based trial design.
  • Nonparametric testing, ie deriving tests from scratch, particularly through simulations.
  • K-means clustering.
  • Bayesian methods covering Naïve Bayes, Bayesian model averaging, and Bayesian adaptive trials.
  • Penalized regression models (eg LASSO, LARS) and adding penalties to models in general (SVM, XGBoost) which are often used for datasets in which predictors outnumber observations which includes genomics and social science research.
  • Spline based models such as multivariate adaptive regression splines (MARS), for flexible modeling of processes.
  • Markov chains and stochastic processes which serves as an alternative approach to time series modeling and forecast modeling.
  • Missing data imputation schemes and their assumptions including missForest and MICE.
  • Survival analysis for modeling churn and attrition processes.
  • Statistical inference and group testing used for A/B testing and more complicated marketing campaigns.

Machine Learning

Machine learning extends many of these statistical methods, including k-means clustering and generalized linear modeling. Some of the methods we use include:

  • Regression and classification trees supporting early extension of generalized linear models with high accuracy, good interpretability, and low computational expense.
  • Dimensionality reduction covering PCA and manifold learning approaches like MDS and tSNE.
  • Classic feed forward neural networks.
  • Bagging ensembles that form the basis of algorithms like random forest and KNN regression ensembles.
  • Boosting ensembles that form the basis of gradient boosting and XGBoost algorithms.
  • Optimization algorithms for parameter tuning or design projects including genetic algorithms, evolutionary algorithms, simulated annealing, particle swarm optimization.
  • Topological data analysis tools, that are well-suited for unsupervised learning on small sample sizes such as persistent homology, Morse-Smale clustering, Mapper.
  • Deep learning architectures and deep architectures in general.
  • KNN approaches for local modeling, eg regression and classification.
  • Gradient-based optimization methods.
  • Network metrics and algorithms for centrality measures, betweenness, diversity, entropy, Laplacians, epidemic spread and spectral clustering.
  • Convolution and pooling layers in deep architectures which are useful in computer vision and image classification models. We also support third party tooling and APIs for these outcomes.
  • Hierarchical clustering which is related to both k-means clustering and topological data analysis tools.
  • Bayesian networks (pathway mining).
  • Complexity and dynamic systems for differential equations.

We also implement original algorithms for natural language processing (NLP).

AI Development Expertise

Leverage our artificial intelligence development skills to build adaptive, intelligent products and tools. We can assist you across the entire product lifecycle from initial concept, through to architecture, development, delivery, and maintenance. We build tools and run analysis using statistical programming languages such as R, Python, and frameworks such as TensorFlow, MLlib, and scikit-learn.

Intelligent Learning Algorithms

We can detect trends and identify anomalies in your analytics and operations using intelligent machine learning algorithms. Applications of our skills in this area can be applied to Cybersecurity, banking, insurance, manufacturing, and more.

Custom Optimization Algorithms

Our data team develops custom optimisation algorithms to maximize your efficiency and profitability. From basic dynamic optimisations, through to complex implementations we can have immediate impact on your business.

Predictive Models & Classifiers

We build predictive models that classify and quantify new events, content, images, video and customers based on historical data. We can also implement bespoke customer segmentation models to personalize your marketing and products. We assist you to identify challenges and opportunities before they occur.

Big Data Analytics & Tools

Data Warehouses & Databases

Machine Learning

Data Engineering


Data Lineage

Robust and reliable data for better decision making

Find out more about our Data Practice or to book a consultation.