The Five Pillars of Fluid ML

A few months ago, I was talking with the CTO of a major bank about machine learning. At one point he shook his head ruefully and said, “Dinesh, it only took me 3 weeks to develop a model. It’s been 11 months, and we still haven’t deployed it.”

This is just one example of the hazards you meet when machine learning encounters the real world. One thing is becoming clear: Machine learning data and models aren’t static. They never will be.

We need to embrace the fact that machine learning will only work over the long term if it’s fluid. In this case, being fluid means building your machine learning system on five important pillars as shown in figure #1:


Figure #1: Five pillars of “Fluid ML”

1. Managed.

For machine learning to do real and lasting work for an organization, you need thoughtful, durable, transparent infrastructure. That starts with identifying the data pipelines and correcting any issues around poor or missing data that can hamstring the accuracy of the models. It also means integrated governance and version control for models. Be sure that the version of each model – and there may be thousands of models being used concurrently— clearly indicates its inputs; regulators will want to know.

2. Resilient.

Being fluid means accepting from the outset that your models will fall out of synch. That “drift” can happen quickly or slowly depending on what’s changing in the real world. You need a way to do the data science equivalent of regression testing — and you need to do that testing frequently without burning up your time.

That means configuring a system that lets you set accuracy thresholds and automatic alerts to let you know your models need attention. Will you need to retrain the model on old data, acquire new data, or re-engineer your features from scratch? The answer depends on the data and the model, but the first step is knowing there’s a problem.

WATCH: I introduce the concept of Fluid ML in my keynote at O’Reilly Strata Data Conference in February.

3. Performant.

Most machine learning is computationally intense — both during training and particularly when models have been deployed. Most enterprises need models to be able to score transactions in milliseconds not minutes – to identify and prevent fraud or leverage a fleeting opportunity. You need excellent performance in both realms. Ideally, you can train models on GPUs and then deploy them on high-performance CPUs and enough memory to do real-time scoring.

And of course you want everything to run fast and error-free regardless of where you deploy: on-prem, cloud, or multi-cloud. Here, Fluid ML equals flexibility for the run time environment, without compromise.

4. Measurable.

These days, organizations across sectors are budgeting generously for machine learning projects, but those budgets will dry up if data science teams can’t deliver concrete results. You need to be able to quantify and visualize changes over time: improvements in data access and data volume, improvements in model accuracy, and ultimately improvements to the bottom line.

Begin with the end in mind. Think not only about what you need to measure now, but also about what you’ll want to measure in the future as your data science work matures. Is the system fluid enough to track those long term goals?

5. Continuous.

I started by pointing out that machine learning data and models aren’t static and never will be. The fifth and final pillar of Fluid ML is about continuous learning as the world changes. Ensure that your system lets you use tools like Jupyter and Zeppelin notebooks that can plug into processes for scheduling evaluations and retrain models.

At the same time, expect your own learning to grow and evolve as you absorb the advantages and limitations of various algorithms, languages, data sets, and tools. Fluid machine learning requires not only continuous improvement from the data and the system, but also continuous improvement from you and your teams.


The first three pillars are about “always-on” and the second two are about continuous learning. Wherever you are in your data science journey, the pillars of Fluid ML can bring focus to each moment and clarity for the future. It’s a bright future, and thinking carefully about machine learning can get us there. Try it today at


Dinesh Nirmal,

VP IBM Analytics Development

Follow me on twitter @DineshNirmalIBM


Breaking New Ground : A Unified Data Solution With Machine Learning, Speed and Ease Of Use

Imagine being able to arrive at your destination as much as 200 times quicker or being able to complete your most important tasks as much as 200 times faster than normal. That would be pretty impressive. What if you could get answers to your analytics queries that many times faster and run your machine learning algorithms with maximum efficiencies on your data by simply plugging in a pre-configured and pre-optimized system to your infrastructure?  That’s what the IBM Integrated Analytics Systems (IIAS) is designed to do.

As part of an organization’s “ground to cloud” hybrid data warehouse strategy, IIAS is a machine learning enabled cloud-ready unified data solution (in the past, this was called a “data warehouse appliance”) that can accelerate your analytics queries up to 210[1] times faster. From a machine learning perspective IIAS is pre-loaded with Apache™ Spark and IBM Data Science Experience (DSX) enabling organizations to use the system as an integral part of their data science collaborations.

Converging analytics and ML technologies

IIAS represents a convergence of Db2 Warehouse and PureData Systems for Analytics that enables organizations to write analytics queries and machine learning algorithms and run them anywhere across their hybrid infrastructure.  It can handle mixed workloads from structured to unstructured data, offering integration with Hadoop, high speed query routing and bulk data movement and real time data ingest.

Architected for Performance

Built on the latest IBM Power 8 technology, IIAS leverages 4X threads per core, 4X memory bandwidth and 6X more cache at lower latency compared to select x86 architectures, which helps optimize an organization’s analytics – as shown in figure #1. The hardware based all Flash storage translates to potential faster insights than disk storage with high reliability and operational efficiencies.   It is designed for massive parallel performance leveraging in-memory BLU columnar processing with dynamic movement of data from storage. It skips unnecessary data processing of irrelevant data and patented compression techniques help preserve order so data can be processed without decompressing it. Another aspect of performance is Spark embedded into the core engine therefore being co-located on the same data node which removes unnecessary network and hardware latencies.

 IIAS Picture1

Figure #1: Optimized Hardware for Big Data and Analytics

Design Simplicity

IIAS is designed around simplification and ease of use. For data experts that don’t want to be database experts IIAS helps provide fast time to value with an easy to deploy, easy to operate “Load and Go” architecture.  As a preconfigured system (what we’ve often called an appliance) can help lower the total cost of ownership with built-in tools for data migration and data movement. Using a common analytics engine enables organizations to write their analytics queries once and run them across multiple environments with IBM Fluid Query providing data virtualization through federated queries. I cover this in more detail in the “A Hybrid approach to the cloud and your data” section below 

With no configuration, no storage administration, no physical data model needed – nor indexing or tuning necessary, business intelligence developers & DBAs can achieve fast delivery times. IIAS is also data model agnostic and is able to handle structured and unstructured data and workloads.  It also comes with a self-service management dashboard.

Business Analysts can run ad hoc queries without the need to tune or create indexes and can run complex queries against large datasets and load & query data simultaneously.

Machine Learning built-in.

IIAS offers organizations the opportunity to embrace a machine learning ecosystem by simply plugging a preconfigured ready-to-go system into a client’s existing infrastructure. It’s all an organization needs for a truly cognitive experience which includes fast data ingest, data mining, prediction, transformations, statistics, spatial, data preparation for predictive and prescriptive in-place analytics.

Preconfigured with IBM’s award winning IBM Data Science Experience (DSX) data scientists, engineers, business analysts and cognitive app developers can build, train and deploy models through the sophisticated but easy to use interface allowing them to collaborate on cognitive applications across multiple platforms. DSX Local instances from an expanded IIAS can be joined to create a larger DSX Local cluster to support additional users. For those who prefer Notebooks IIAS offers built-in Jupyter Notebooks (Zeppelin coming soon) for visualizing and coding data science tasks using Python, R and Scala. RStudio is also built-in and Spark embedded (see figure # 2) on the system allowing parallelization and acceleration of tasks leveraging sparklyr and dplyr libraries.

IIAS Picture2

Figure #2: The power of embedded Spark 

Users can now create and deploy models through programmatic as well as visual builder interfaces – (simple 3 – 4 steps from ingesting data, cleaning data, training, deploying and scoring a model).

A hybrid approach to the cloud and your data

When it comes to your data, a one-size-fits-all approach rarely works. The IIAS is built on the Common SQL Engine, a set of shared components and capabilities across the IBM hybrid data management offering family that helps deliver seamless interoperability across your infrastructure.

For example, a data warehouse that your team has been using might need to be moved to the cloud to meet seasonal capacity demands. Migrating this workload to IBM Db2 Warehouse on Cloud can be done seamlessly with tools like IBM Bluemix® Lift. The Common SQL Engine helps ensure no application rewrites are required on your part.

Essentially, the Common SQL Engine provides a view of your data, regardless of where it physically sits or whether it is unstructured or semi-structured data. The system’s built-in data virtualization service in the Common SQL Engine helps unify data access across the logical data warehouse allowing an organization to federate across Db2, Hadoop and even third-party data sources.

Integrated and Open

IIAS provides integration with tools for model building and scoring including IBM SPSS, SAS, Open Source R, Fuzzy Logix. For BI and visualization there is integration with IBM Cognos, Tableau, Microstrategy, Business Objects, SAS, Microsoft Excel, SPSS, Kognito and Qlikview. And for those looking to build their own custom analytics solutions IIAS integrates with Open Source R, Java, C, C++, Python and LUA enabling organizations to use the skills sets they already have. Integration with IBM Infosphere Governance Catalog also helps users with self-service data discovery.

The Secret Sauce – the sum of the parts.

IBM Integrated Analytics Systems (IIAS) is the only unified data solution currently in the market equipped with all the combined set of capabilities discussed above.  And the key differentiator in my view of the IIAS is the convergence of multiple analytics technologies on to a single platform that together create a hybrid data warehouse capable of massive parallelism, scalability, query acceleration, embedded machine learning engine and built-in cognitive tools. Integration with open source technologies as well as IBM and third-party analytics and BI technologies all based on a common analytics engine offering simplicity with load and go features make it a very open platform.  Add to this the simplicity and performance characteristics mentioned earlier and it’s easy to see how the IIAS can help organizations more efficiently and effectively tackle their most challenging analytics and cognitive workloads like never before. In summary (see figure #3 below), the IBM Integrated Analytics Systems is designed to  help organizations do data science faster.
IIAS blog Picture3

Figure #3 : IIAS – Do data science faster

For more information read the announcement letter or listing the solution page.


Dinesh Nirmal,

VP Analytics Development

Follow me on twitter @DineshNirmalIBM


  1. Based on IBM internal tests of 97 analytics queries run an a full rack of IBM N3001-010 (Mak)) and a full rack of IBM Integrated Analytics Systems (IIAS), the average speed was 5 times faster, the median was 2 times queries and the maximum was 210 times faster. More than 80% of queries ran faster. Performance is based on measurements  using an internal IBM benchmark in a controlled environment.  This benchmark is a variant of an industry standard decision support workload. It is configured to use a 30TB scale factor and a single user issuing queries, and contains a mix of queries that are compute-bound or I/O-bound in the test environment. Note: Actual throughput or performance  will vary depending upon many factors, including considerations such as the workload characteristics, application logic and concurrency.

Piotr Gnysinski: QA Wizard, Former Farmhand, and Family Man

I originally started the “You in the Private Cloud” series as a way to introduce our talented team to each other across our many geographies. I knew it was important for us to know each other as more than email addresses or voices during meetings.

But I didn’t realize at the time that it would become one of the favorite parts of my job. I truly love settling in for great conversations with the terrific people working on IBM Analytics offerings across the globe.

This time was no different. Many of you know that we have a vibrant presence in Krakow, Poland. And while there recently I got the chance to visit with Piotr Gnysinski who works as Test Lead on the Information Governance Catalog, a key part of our InfoSphere Information Server offering.


Piotr with Dinesh

Dinesh: I know you worked for a while for Comarch whose founder is Janusz Filipiak -—a famous, larger-than-life tech founder. What was it like working there?

Piotr: When I joined, Comarch was already a big company. It was my first job in IT and the first time I experienced emotions from customers coming our way: real people on the receiving end of my work — sometimes with real joyful reactions, sometimes with irritation as a result of bugs that made it through to the field.

I had to switch to real proactive thinking. I would say this attitude —this deep and strong engagement for customer advocacy and not just technical skills — is the most important single characteristic that can help someone do well in our business, or any business for that matter.

Dinesh: You’ve got a reputation for designing robust testing frameworks that cover a lot of ground. I think testing can seem like a mystery to many of us. Give me a sense of how you approach things.

Piotr: It depends on what you’re testing, but a big tool for us across the board is the idea of pair-wise testing. We know from studies that most defects can be discovered in tests of the interactions between the values of two variables (65-97%)[1]. A factor could be the browser vendor, the underlying operating system, and so on.

So, when you have an almost infinite number of tests you could run and very limited time, you first think of all those possible factors and figure out their possible values, then you classify these into groups called “equivalence classes”. You know that testing a single value from a class will probably give the same result as testing any other value in the group, so now you use algorithms that make sure each pair of classes is covered at least once — and you make sure to mix up which specific values are getting tested in the different pairs. That gives you good coverage.

I’ll send you a link to some information about Combinatorial Test Design if anybody wants to read up some more.


Piotr with wife Justyna, daughter Julia, and son Szymon

Dinesh: What do you do on weekends for fun?

Piotr: Almost every weekend, my wife Justyna and I take our son and daughter on some adventure: water park, bike riding, or visiting the playground. But my favorite is to bring them to visit Henrykow, which is a small village with about 30 people. My aunt and uncle have a farm and I used to go there every summer when I was a kid. I collected so many fantastic memories from there.

So now, whenever I have a chance, I pack up the family and two hours later we are in ‘Neverland’. They still keep livestock and they still work the land, so my kids get to see and do all that as well. For instance, not so long ago, they witnessed a calf being born, they very often get to ‘drive’ — being on my lap — a tractor, play in the hay for hours, or we go through the woods or the swamps, which always ends up with at least two of us all wet and muddy.


At the beach with friends and family 

Dinesh: It looks like you also make it to the gym once in a while. Am I crazy?

Piotr: Ha! Yes, I do weights mostly. There is something very satisfying in pushing yourself over imagined limits and doing completely exhausting training sessions, after which you can barely move. Yeah, gym is fun!

I’ll also get ideas for work at the gym, usually related to current work stuff: how are we going to approach creating our environment matrix for an upcoming release or how can we improve a process that was raised during a Lessons Learned session. Nothing revolutionary that would change the IT world, but very down-to-earth solutions that help us get better and better at what we do.

Dinesh Nirmal

Vice President, Analytics Development

Follow me on twitter @DineshNirmalIBM



Piotr’s hometown is  Bedzin, Poland, most famous for its castle.



Piotr: “A nearby roundabout, which was designed back when we had Communism here aiming to be perfect non-collision intersection for cars and trams. What we are left with, is this ’roundabout’ that is called ‘a kidney’ and where cars cross paths with trams three times before they leave it 🙂 It makes just about as much sense as Communism itself.”

Favorite programming language: JavaTM

Top 5 authors:

  1. Terry Pratchett
  2. Andrzej Sapkowski
  3. James Whitaker
  4. J.K. Rowling
  5. Wiktor Suworow

  1. IBM Haifa Research Laboratory Combinatorial Test Design (CTD)

Mihai Nicolae: Code Craftsman, Aspiring Chef and World Traveler

As much as I love meeting long-time IBMers and hearing their perspective on our evolution over the years, it’s a special pleasure to visit with our newer team members and to hear their visions for IBM’s future. You’ll remember my conversations with Martyna Kuhlmann, Ketki Purandare, and Phu Truong.

This time, I’m talking with Mihai Nicolae, a developer working out of our Markham office near Toronto. In just two years with IBM, Mihai has already been transformational on flagship products — Db2 , Watson Data Platform, and Data Science Experience. He’s currently trading time between DSX Local, IBM Data Platform, and the new Machine Learning Hub in Toronto.


Dinesh and Mihai

I hope you’ll take as much inspiration from our conversation as I did.

Dinesh: Where are you from originally?

Mihai: Romania. I’m very grateful — and always will be — for my parents having the courage to emigrate to Canada in their forties for me to have the opportunity to attend university here.

Dinesh: I bet they’re proud of you.

Mihai: Oh absolutely, I can’t ever have a doubt about that based on how much they talk about it.

Dinesh: If my son’s first job out of college was at IBM, I’d be proud, too. Tell me about your experience so far.

Mihai: I’ve been at IBM for two years full-time. Currently, I’m working on DSX Local and IBM Data Platform, which just started in January, after my time on the Db2 team. It’s been an amazing journey, especially GA-ing the product in only 4 months.

Dinesh: First of all, thanks and kudos to you and the team for delivering DSX in such a short amount of time. You’re now diving into machine learning. Did you take ML classes at university?

Mihai: I took one Intro-to-AI class, but frankly I feared the stats component of the ML course — and that 40% of my performance would depend on a 2-3 hour, stats-intensive exam.  At this point, I know that no hard thing is insurmountable if you put in the work.


Mihai at Big Sur.

Dinesh: Where do you see machine learning or data science going from here?

Mihai: I think it’ll be a vital component of every business. AI is the once-in-a-lifetime technology destined to advance humanity at an unprecedented scale. I think the secrets to defeating cancer, reversing climate change, and managing the global economy lie within the growing body of digital data.

But reaching that potential has to happen with the trust of end-users, trust in security and lack of bias. That’s why I think IBM will be a leader in those efforts: because IBMers really do value trust — I see it in the way we interact with each other day to day, as much as I see it in our interactions with clients. Trustworthiness is not something that can be compartmentalized.

Dinesh: Well said. I know you also work on encryption. Where does that fit in?

Mihai: When data is the core of everything, encryption is critical — encryption plus everything to do with security, including authentication and authorization. They’re all essential for earning and keeping user trust.

Dinesh: I love your passion for your work. Do you ever leave the office? What are your hobbies?

Mihai: Ha! I go to the gym, and I recently subscribed to one of those recipe services that delivers ingredients in pre-determined amounts. But traveling is really my fixation: California, Miami, Rhode Island and Massachusetts last year. And this year, I’ve been to the Dominican Republic, and then I head to Nova Scotia this summer.


…and at the Grand Canyon.

Dinesh: Nice. Do you have a particular dream destination?

Mihai: Thailand has a moon festival in April, where you get to have a water fight for three days. It’s the Thai new year. That might be my next big pick.

Dinesh: I travel a lot and I think there can be something really creative about travel, especially with the types of trips you’re talking about. I like asking developers whether they think of themselves as creative people. What’s your thought?

Mihai: Travel is definitely creative, but you’re making me think of the recipe service. I think of cooking from a card like learning programming from sample code: You get the immediate wow factor from building and running the working product but you don’t necessarily understand how and why the pieces fit so well together, or even what the pieces are. But over time, and with experience, you get understanding and appreciation. I think that’s when innovation and creativity can flourish.

Dinesh: Thanks, Mihai. Thanks for taking the time, thanks for the great work, and thanks for evolving IBM for our customers.

Dinesh Nirmal

Vice President Analytics Development

Follow me on twitter @DineshNirmalIBM


Home town: Constanta, Romania

Currently working on: DSX Local, Machine Learning Hub Toronto

Favorite programming language: Python

Top 5 future travel destinations:

  1. Thailand for Songkran
  2. Australia for scuba diving in Great Barrier Reef and surfing
  3. Brazil for Rio Carnaval
  4. Mexico for Mayan ruins and Diez y Seis
  5. Germany for Oktoberfest and driving on the Autobahn



Opening up the Knowledge Universe.

IBM Data Science Experience Comes to a Powerful, Open, Big Data Platform.

I have just finished presenting at the DataWorks Summit in San Jose. CA. where a partnership between IBM and HortonWorks was announced the aim of which is to help organizations further leverage their Hadoop infrastructures with advanced data science and machine learning capabilities. 

Some Background.

When Apache™ Hadoop® first hit the market there was huge interest in how the technology could be leveraged – from being able to perform complex analytics on huge data sets by using a cluster of thousands of cheap commodity servers and Map/Reduce  – to predictions that it would replace the enterprise data warehouse.  About three years ago Apache™ Spark™ gained a lot of interest unleashing a multi-purpose advanced analytics platform to the masses – a platform capable of performing streaming analytics, graph analytics, SQL and Machine Learning with a focus on efficiency, speed and simplicity.

I won’t go into details on the size of the Hadoop market, but many organizations invested heavily for numerous reasons including, but not limited to, it being seen as an inexpensive way to store massive amounts of data, the ability to perform advanced queries and analytics on large data sets with rapid results due to the Map / Reduce paradigm.  From one perspective, it was a data scientist’s dream to be able to reveal deeper insights and value from one’s data in ways not previously possible.

Spark represented a different but complementary opportunity allowing data scientists to apply cognitive techniques on data using machine learning – and other ways of querying data – in HDFS™ as well as data stored on native operating systems.

Many organizations including IBM made investments in Hadoop and Spark based offerings. Customers were enthused because these powerful analytics technologies were all based on open source representing freedom and low cost. Organizations including IBM participated in initiatives such as ODPi to help ensure interoperability and commonality between their offerings without introducing proprietary code.

Self-Service, Consumable, Cognitive tools.

Frustrated with IT departments not being able to respond fast enough to the needs of the business, departments sought a “platform” that would allow them to perform “self-service” analytics without having to be die-hard data scientists / engineers or developers.

The IBM Data Science Experience (DSX) emerged as a tool that could help abstract complexity, unify all aspects of data science disciplines regardless of technical ability to allow a single user or multiple personas to collaborate on data science initiatives on cloud, locally (on-prem) or while disconnected from the office (desktop).  Whether you prefer your favorite Jupyter notebook, R Studio, Python, Spark or a rich graphical UI that provides advanced users with all the tools they need – as well as cognitively guiding inexperienced users through a step by step process of building, training, testing, deploying a model – DSX helps unify many aspects into an end to end experience.

DSX Arch1
Figure #1 : Data Science Experience – Making data simple and accessible to all. 

Enterprise Ready.

A lot needs to happen for machine learning to be enterprise ready and robust enough to withstand business critical situations. Through DSX (see figure #1), advanced machine learning capabilities, statistical methods and advanced algorithms such as Brunel visualizations are available. Sophisticated capabilities such as automated data cleansing help ensure models are executing against trusted data. Deciding which parts of the data set are key to the predictive model (feature selection) can be a difficult task. Fortunately, this capability is automated as part of the machine learning process within DSX.  An issue that many data scientists face is the potential for predictive models to be impacted by rogue data or sudden changes in the market place.  IBM machine learning helps address this issue by keeping the model in its optimal state through a continuous feedback loop that can fine tune parameters of the model without having to take it off line.  This allows the model to sense and respond to each interaction (level of granularity defined by policy) without any human interaction.

A knowledge Universe – Unleashing Cognitive insights on Hadoop Data Lakes – with Power.

The potential of integrating the richness of DSX and the cognitive ML capabilities with all that data residing in HDFS (as well as many other data sources outside of Hadoop) is an exciting proposition for the data science community. It could help unlock deeper insights, increasing an organization’s knowledge about itself, the market, products, competitors, customers, sentiment at scale, at speeds approaching real time. One of the key features delivered as part of Hadoop 2.0 was YARN (yet another resource negotiator) that manages resources involved when queries are submitted to a Hadoop cluster, far more efficiently than in earlier versions of Hadoop – ideal for managing ever increasing cognitive workloads.

Simply put, I cannot think of a time where there has been a better opportunity for organizations to leverage their Hadoop investments until now.  The combination of Hadoop based technologies integrated with IBM ML and DSX unleashes cognitive insights to a very large Hadoop install base.

All very promising so far –but there is one more nugget to unleash that will help organizations with their cognitive workloads. IBM just announced HDF 3.0 for IBM Power Systems, bringing the built-for-big-data performance and efficiency of Power Systems with POWER8 to the edge of the data platform for streaming analytics applications.  This solution joins HDP for Power Systems, recently launched, which offers a 2.4X price-performance advantage [1] versus x86-based deployments.

I’m excited at the possibilities that lie ahead – how data scientists and machine learning experts might leverage and benefit from our offerings and the integration with Hadoop infrastructures – how they might take it to the next level in ways we’ve not yet imagined as we continue to enrich our offerings with more capabilities.

For more information on how to get started with Machine Learning click the link below :


Dinesh Nirmal – VP Analytics Development.  

Follow me on twitter @DineshNirmalIBM



IBM, the IBM logo,, IBM Elastic Storage Server, IBM Spectrum Scale, POWER8 and Power Systems are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their rst occurrence in this information with a trademark symbol (® or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at

Apache Spark, Apache Hadoop, HDFS, Spark, Apache, Hadoop and the Spark, Hadoop logos are trademarks of The Apache Software Foundation.

Other company, product or service names may be trademarks or service marks of others.

1 – Based on IBM internal testing of 10 queries (simple, medium, complex) with varying run times, running against a 10TB DB on 10 IBM Power Systems S822LC for Big Data servers (20 C/40 T), 256GB memory, HDP 2.5.3, compared to published Hortonworks results based on the same 10 queries running on 10 AWS d2.8xlarge EC2 nodes (Intel Xeon E5-2676 v3), HDP 2.5. Individual results may vary based on workload size and other conditions.  Data as of April 20, 2017; pricing is based on web prices for the Power Systems S822LC for Big Data ( and HP DL380 Intel Xeon HP DL380; 20 C/40 T, 2 X E5-2630 v4; 256 GB found at

Meet Sebastian – A developer with a recipe for success

What a pleasure it was to meet Sebastian! He was recommended to me as a technical whiz with Python™ skills par excellence, but he impressed me just as much with his infectious, happy energy, his thinking on the advancement of society and technology, and how he chooses to spend his time sharing his passion for electronics and software with children and adults at his local community center. Sebastian hails from a close-knit village in the Ruhr Valley — perhaps that’s where he learned how to be effortlessly generous. Like all of you, I am constantly learning — not just about business or the next turn of the blade in machine learning, but about life, empathy, and leadership. More and more this year I’ve noticed the difference positive leadership makes. Sebastian, though a very young man, had much to teach me on this score.

When you walked into this room, you brought with you a burst of energy. I felt more positive as soon as we started talking — and I am already a very positive person. How do you do that?


By believing in a cause. Positivity is what we all need in life, and in business. If you are stretching yourself you’ll inevitably encounter failure and distress, but you have to stay positive. If we are talking about a group of people having a positive attitude, it doesn’t matter where you come from, or how old you are, it only matters that you all believe in the same cause.

What is the cause you believe in?

At work, it’s the team. We are all working on IBM DB2 Analytics Accelerator for z/OS, and I’ve never experienced a team that is as close together as this one, even though we are working on so many different parts. That’s the great aspect. When it comes to designing a new feature, we have to congregate and think about lots of different use cases. It’s not a simple product. Although we consider ourselves as writing “glue code,” we have to take special care with every little aspect and think through the consequences of potential failure. If I make a mistake in programming or designing a feature, it has a heavy impact on customers, and I know intimately what that can feel like from when I was in a customer-facing situation.

“Seeing someone learn and advance, and become an expert themselves, it’s the best thing that you can see. It lays the groundwork for society to advance.”

You started your career not long ago in customer support and now you’re a developer on a critical analytics product for large enterprise. What was it like, for a social person like you, to make the leap from facing customers to facing an Integrates Development Environment (IDE)?


It was natural. I did it using communications, and deep technical knowledge. I studied computer science at university, as a lot of people who work at IBM do, but we specialized in intercultural and international communications. We learned to communicate with passion and dedication, and to have empathy for other people and their needs and demands. My job in support was to understand the customer’s vision, and to show them that we at IBM are great partners to them. I also have deep technical knowledge, so now, knowing the architecture and where to expand it, that’s just awesome. But the foundation is the clients. They put so much trust in us that we have to give back to them.

Are you just as intensely involved with life outside of work? 

I’m interested in hardware, not just software: I love to lay out printed circuit boards and teach children how to solder and how to programmatically control it. It is a great balance to the complex software of my work life. With hardware, you can achieve simple things, like making an LED blink, and it makes children crazy with excitement.

You volunteer with children?

Absolutely! And adults. It’s great to see people learn and to share your knowledge, because sharing is what advances all of us. It helps me to find ways to explain what I know in different words. And, seeing someone understand what you just said, seeing someone learn and advance, and become an expert themselves, it’s the best thing that you can see. It lays the groundwork for society to advance.


For such a young person, you speak profoundly, and you are involved with noble causes: sharing your time and knowledge to move society forward. It maps exactly to what you do at work: using empathy and knowledge to advance the product. What do you do for downtime? Or is it all uptime?

Oh no! I love to do things with my friends. I am a baking enthusiast, and I frequently come to work on a Monday with lots of cookies and a big cake to share. I can relax if I bake. I love going to movies with friends and playing board games — that’s a great thing — and walks in nature. Nature helps me find my inner point of …


That is saying a little too much I think, but some peace, and calm.

Dinesh Nirmal,

Vice President Analytics Development

Follow me on twitter @DineshNirmalIBM

Name: Sebastian Muszytowski

Hometown: Ruhr Valley
Currently working on: IBM DB2 Analytics Accelerator for z/OS
Favorite Programming Language: Python™
Top 5 movies to see with friends:

1) Hedwig and the Angry Inch
2) Scott Pilgrim vs. The World
3) Juno
4) Little Miss Sunshine
5) Deadpool  

Sebastian’s Favorite New York Style White Chocolate Cheesecake with Blueberries.

200 g whole wheat cookies or Amaretti biscuits
100 g butter
250 g white chocolate
100 g crème fraîche (or heavy whipping cream)
600 g cream cheese
1 tbsp vanilla flavored sugar (or vanilla extract)
100 g powered sugar
a hand full of washed blueberries
How To:
0) Preheat your oven to 180°C or 350°F
1) Crumble the cookies (either by hand or in a food processor)
2) Melt the butter (short 10 seconds bursts in the microwave are fine for melting. Give it a good stir after each 10 second burst. Be cautious since butter in the microwave can become a huge mess if you heat it too quickly.)
3) Put some non-stick backing paper into the backing tin or use some butter or non-stick baking spray to cover the area of the baking tin.
4) Combine your crumbled cookies and the melted butter and put it in the baking tin to form the bottom of your cheesecake.
5) Put it in the oven for about 10 minutes and let it completely cool. (Hint: you do not need your oven any longer – you can turn it off ;-))
For the yummy cheesecake filling:
1) Chop up the chocolate in small pieces and mix it with the créme fraîche (or heavy whipping cream)
2) Heat it and stir it until it combines (I recommend short microwave bursts or a double boiler to do so)
3) In another bowl mix the creme cheese, vanilla sugar (or extract) and powered sugar until it is well combined
4) Slowly add the chocolate-creme-fraiche mixture into the bowl while you constantly stir.
5) Once it is combined (do not over-stir!) put it on top of your cooled cheesecake bottom, flatten the top and let it sit in the freezer for a while.
Decoration time!
1) Put some of the washed blueberries on top of the cheesecake to make it look even better. Be assured that it tastes even more delicious with them!
2) For an even better effect you can grate some left over white chocolate (if there is any) to make the cake even more attractive.

“Python” is a registered trademark of the Python Software Foundation.

The Data Scientist Who “Listens to the Problem”

My most recent in-flight reading was Thank You for Being Late. In it, Thomas Friedman says risk of AI isn’t that it’s going to take over humanity, HAL-like, but that we as humans could become so entranced by technology that we’ll neglect to teach it human values. It’s not machines v. humans or technology v. creativity. The more technology develops, the greater the opportunity to add to it our kindness, our fairness, and our creativity.

Jorge Castañon, Data Scientist at the IBM Machine Learning Hub and this week’s “You in the Private Cloud,” interviewee, clearly agrees. He and I met this week to discuss math, art, and the future of data science.

What was your first job at IBM?

To understand what data science is. There were so many different definitions! I decided it’s the combination of three things: mathematics, computation, and creativity. You need the creativity to listen to the problem and come up with the math. You need the math because data science requires a very deep understanding of the math that lies behind it. Then you need to compute the solution.

What do you mean, “Listen to the problem?”

Math is like a foreign language that not everyone can speak. When I’m listening to a problem, I’m translating from English to math and then translating back to English to continue the conversation.


The mathematics is distinct from the computation?

Yes. You think of a method mathematically. You eventually need to implement it in the computer: that’s the algorithm part. But first, it’s you and a blank piece of paper, and your thoughts, and eventually a math solution. The first person who thought about linear regression or least squares, that person was mathematical. It was a bunch of data points in space, and then, “Let’s find a model that fits those points” — but first it was math.

IBM was named on Gartner’s Magic Quadrant for Data Science Platforms for 2017, because of DSX with machine learning, and also the work you and the team are doing. A lot of it is side-by-side with clients: what’s that like?

It’s super fun! Learning about new problems is the best part of data science. The minute you start a conversation with a domain expert, to see what are the important parts of the model, what you can use for your math solution: that’s the exciting part. Talking to customers is a way to find the most interesting problems to solve.


I would imagine coming out of Rice with a PhD in Computational Mathematics that you had a lot of career choices. Why did you choose tech and why IBM?

Rice University is in Houston, so there were opportunities in the financial world and the energy sector and a lot of money to be made. I went to a conference and met IBM recruiters and got good indication of the spectrum of expertise at IBM. I felt I would be able to go wherever I wanted to in terms of the research and technical challenges; I would not be limited to one narrow role.

What’s the one thing about work that you are most excited about?

Collaboration. As a computational mathematician, you know a lot about certain things. But to go and talk about energy efficiency, or credit unions, or TV marketing, that gives me new topics where I can apply math and make a difference: to health care for example, or by making a building more efficient.

You are working at the edge of technology that doesn’t quite exist yet.

Definitely. My first project was to identify what is data science: that was unstructured. Then, how to use data science in our products: unstructured. How to apply machine learning: unstructured. It’s very exciting, to find the structure of things that are amorphous or not yet reified. And that’s what mathematics is. It goes back to my whole path, to the creative problem-solving that drives me.


Where do you see data science going? Is it part of the machine learning path, or will it diverge?

It’s an open question as to whether data science is going to be automated and humans won’t be needed. I think they will be.  The creativity aspect of data science cannot be automatic.

What do you do for fun outside of work?

I love art, and traveling with my wife: she’s also an applied mathematician. We got to museums and I take photographs of art. I used to do life drawing, but after the PhD and work — you get busy! I love James Turrell in particular; his work is based on what he called “the geometry of light” and he studied math in college.

Customers tell me it’s not just our skills they appreciate, it’s the commitment we make to their success, and they see that from working directly with you and IBMers like you. Thank you.

You are welcome. It is a pleasure to work here. I have a lot of space to grow.

Name: Jorge Castañon

Years at IBM: 3

Home town:  Mexico City

Currently working on: IBM Machine Learning Hub

All-time top five artists:

  1. Francisco Toledo
  2. James Turrell
  3. Willem De Kooning
  4. Mark Rothko
  5. M.C. Escher


Dinesh Nirmal,

Vice President Analytics Development

Follow me on twitter @DineshNirmalIBM