Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it!
Dr. Jerry is joined today by Alonso Castañeda Andrade, who is the Managing Director of Data Engineering and Analytics at Agile Thought. Dr. Jerry and Alonso are talking today about the role that Data Engineering and Analytics play in AI.
- Why is it a challenge today to create quality data products?
- Technically a lot of tools are available today for databases. The cloud allows us to scale quickly and to be able to manage all the data, but most of the challenges come from the organizational aspect and processes, which involve the dynamic nature of the data
- You can do AI without having data. Where do we start to create good quality data?
- Data is available in a variety of forms and places
- Organizing data is a challenging job and tools are needed to assist the data engineer to perform his role, like having a good architecture platform for data and having a well-defined flow of information
- Once we have the organized data, analytics can be run on them
- What are customers looking for out of their dashboards? What are they really looking to get out of their analytic solutions?
- Data Engineering and Analytics are asked to work on the integration of systems
- Customers expect their business to gain more visibility
- Customers want to receive trusted data in a timely manner
- The analytics team, dashboard engineers, and data scientists need to work together for better outcomes
- The democratization of the data: How do we enable everyone in the company to have access to the data that they require, and do that by themselves without depending on others?
- Trends for 2022: The continuous migration to the cloud
- Clouds play a very important role in the data modernization of platforms since they allow businesses to deploy data products faster (up to 50% velocity increase)
- Services become really significant, especially the cognitive services and analytical databases
- Businesses require their data as soon as possible when something happens, for example, five minutes in the banking industry for fraud can cost millions of dollars
- What is going on today in the world of data apps?
- One of the challenges is to provide the data that the business requires in a timely manner, generally traditional analytics have been waterfall in nature, bringing all the data to create a massive data model; many fail in this process since it is time-consuming and expensive, and once they are ready, the data may be obsolete
- Data is an asset of an organization and being able to make that into a competitive advantage is key
Transcript [This transcript is auto-generated and may not be completely accurate in its depiction of the English language or rules of grammar.]
Intro: [00:04] You’re listening to AI Live and Unbiased, the podcast where we explore how to apply artificial intelligence and the impact it has on enterprises and society. Now, here’s your host, Dr. Jerry Smith.
Dr. Jerry Smith: [00:19] Welcome to AI Live and Unbiased. I am your host, Dr. Jerry, I’ll be traveling with you on this journey as we explore the depth and breadth of artificial intelligence. Please to have as a guest today, Alonso Castañeda Andrade. Alonzo is a Managing Director of Data Engineering and Analytics at AgileThought. he’s been with the company for a very long time. Alonzo, thanks for talking with us today about the role data engineering and analytics plays in AI. First of all, can you briefly introduce yourself to the group?
Alonso Castañeda Andrade: [00:53] Thank you. Very thank you. It’s a pleasure to be here. yes, my name is Alonzo I’m Computer Scientist, started as a Developer with Java and building enterprise applications. Then my life took me to the data world where I started first as BI Developer, then Data Engineer, then Data Architect. And from there I’ve been managing and leading projects for a lot of big enterprises for fortune 500 companies that have a lot of challenges with their data. And they want to create quality data products. And that’s a challenge.
Dr. Jerry Smith: [01:58] Let’s unpack that for a second. Why is it a challenge today to create data quality products? I mean, my goodness, you know, I’ve been around the block for a while, so have you, you know, it’s been 10, 20, 30 years, we’ve all been in this business. Why do you think it’s still so hard for companies to create quality data products?
Alonso Castañeda Andrade: [02:20] Yeah, well, I believe there’s a lot of aspects of this. First, technically we have a lot of tools now, at least we have databases that support frees of records very easily. We have the cloud that allows us to scale quickly and be able to manage all of the data. So I don’t think, technically it’s such a hard thing. The challenge that we mostly see is on their organizational side where business, the company is not a well established, mature on their data practices. And, they don’t have good data governance and, just still are struggling to define what they’re really looking for and standardization of the data concepts and the value it’s hard. So that’s one thing and the other is processes and change and the dynamic nature of the data.
Dr. Jerry Smith: [03:55] What do you mean by dynamic nature of the data?
Alonso Castañeda Andrade: [03:58] Yeah, data is frequently changing. The data sources get updated every second the data structure changes, applications evolve. So all of those change, we have to be agile to be able to incorporate those into data pipelines and be able to expose those to the business and the applications that consume them.
Dr. Jerry Smith: [04:36] Absolutely. Absolutely. And, I think with that as a basis, we can sort of get into the heart of today. Which is we’re here to talk about AI, but you can’t do artificial intelligence without having data. Right. I mean, that’s at the way, the current practices today in artificial intelligence, most of our data science machine learning and even AI models are predicated on having access to good data. Right. so what is that process like for us, right. To create good quality data, you know, where do we start, you know, from all the vast data sources, you know, how do we grab it all? Where do we bring it in? What is that process like? Cause a lot of people they just assume, Hey, you know, you’re a data scientist, you go do that work, but that’s not true. Right. I mean, there is a data engineering component to all this.
Alonso Castañeda Andrade: [05:31] Yes, yes. Data is available in a variety of forms, places, APIs, databases, files, on structured data, text, audio video, and in a lot of, formats and ways. A challenge a lot of times is getting that data accurately on a timely basis, that it’s useful to the business and to the process and to the machine learning and the AI algorithm.
Dr. Jerry Smith: [06:17] And by the way, that’s key. Right? And, and I think what I’ve seen on the other side of this curb, right? You being on the front curb and you being on the back curb is organizations go, Hey, Alonzo, bring your team in, help me organize this data, get into a database, make it operationally great. Right. So that we can and run our point of sale solution or we know bit more about, we have a great organization for our people. Right. We’ll put it in the data cloud. I think we’re going to talk about that a little bit. But they don’t realize though is for folks like me, that star schema, you know, that third normalized form data, that’s rough for me. Right. I need it. I need lots of row of observations and lots of columns of variables in order for me to fundamentally do my role. Do you see that as a challenge and what kind of tools do you use? Things like data bricks and stuff like that to make that organization?
Alonso Castañeda Andrade: [07:11] Yes. So, I mean, if you have a well architecture platform of data that makes things much more easier. So if you have a well-defined flow of your information and a centralized place where you can do all of these data manipulation, cleaning, and standardization that you need to be able to be consumed by machine learning and AI. That makes it much more easier. So I think the key is to have this well structured and well architecture platform that allows you to have this flexibility and generate the data in the format that each of the consumers needs, right? Like, machinery AI, they need it in a machine learning record type of form. But for BI tools the best format is a star schema that facilitates the data consumption, right. And, other applications might required in a slightly different way. So if you have it in a place that is flexible and you can, and first of all, the data is accessible for this process. I think that makes it much more easier.
Dr. Jerry Smith: [08:50] Well, that’s important. So, I mean if nothing else today if we got nothing else out of anything else we talk about today, as we get into some of the trends and stuff that you see, having people understand that there could be this branch in your data ops, right. A branch that says, okay, now that we know your data, all the sources you point out, there’s some enterprise data, there’s some it data, there’s some source data, social media and stuff like that. Now that we know your data, and we’ve brought it in now that we’ve got it organized, there may be a branch in there that says you’re going to run analytics on this. And by the way, I’d like you to think about what kind of analytics are important today. You have another group of people that have a different use of the data that may be organized differently, and data engineering can help with that, right. Start scheme of version for analytics, machine learning flat record for data sciences and machine learning areas in there. So that’s an important piece I think people can take away from today. Tell me a little bit about analytics, right? What does that mean today for customers that are working with you and what are they looking for out of their data dashboards? Right. They just can’t be, let me be blunt about this, they can’t just be looking for, oh, tell me how much data I have and how many records are in there. What are they really looking to get out of their analytical solutions today?
Alonso Castañeda Andrade: [10:09] So as I was mentioning the data is very dynamic and organizations are evolving every day and acquiring companies merging with other companies, operating or migrating to other systems. So all of those changes in the organization require a lot of integration across systems. So, a lot of the praise that we are seeing on analytics and data engineering is integration of systems, making data, talk together to APIs and to send standard data formats. And so that’s one thing and making it visible to everyone, the main point is to give visibility to the business users, to the decision makers, to allow them to get the data that they need in a timely manner that is trusted.
Dr. Jerry Smith: [11:32] That is, I think that’s a second takeaway, which is having access to those dashboards regardless of what the dashboards are, having some sort of understanding analytically, you can trust this data, right. Being able to see, you know, think you were mentioning a couple different kinds of dashboards. One is you have maybe even three types I heard in your voice. One is an IT dashboard that basically says, here’s all the data you have. The second dashboard set of dashboards. Here’s the quality of that data. And both of those seem to be very contextually independent, doesn’t matter whether you’re a healthcare or bank, you need both those. Then you have your business dashboard which is very contextually dependent, right? So for example, I know you’re starting to work with a home healthcare provider, one of the largest ones in the United States. And they’re looking at doing things like personal care services. This is how many baths you need to do and how many massages you need to have. And, cleaning up things on people’s bodies and stuff like that. That’s a very business oriented kind of section. Having all three of those kind of dashboards available to a decision maker seems to be an important capability. Do you see that as well?
Alonso Castañeda Andrade: [12:46] Yeah, exactly. And leveraging on analytics, leveraging all the advanced analytics functionality that we have nowadays and with ML and AI and embedded in the dashboard. These insights and prescriptions, and predictions of data and that’s another important.
Dr. Jerry Smith: [13:16] Well, so now you bring up a third piece, which is an awesome, I mean. So first we realize that not all data’s the same to everyone. The second is, is that as we look at analytics, it plays in a very important role to decision makers. The next piece is, is that, I think you’re saying that analytics is drifting into the world of data science and, and machine learning. And that is that what’s a dashboard, right? It’s a visual representation of some information. And, what kind of insights do executives want? Not only do they want to know how many people am I taking care of today, but how many people will I may take care of tomorrow or next week, or next month. So analytics plays a very, very important role in that, right.
Alonso Castañeda Andrade: [14:00] That’s right. That’s right.
Dr. Jerry Smith: [14:02] So having your analytics team, your dashboard engineers that are just brilliant at creating the visualizations, working with your data scientists to couple those two things, do you see that as an important emerging activity coming down the road?
Alonso Castañeda Andrade: [14:17] Yes, definitely. And, making all of these insights available to the end users. Making them consumable, which is one of the trends that we’re seeing which is democratization of the day. So how do we enable everyone in the company to have access to the data that they require? And be able to do that, a lot of times by them, by themselves, not depending on
Dr. Jerry Smith: [14:58] People like me
Dr. Jerry Smith: [15:01] You’re trying to put me out of business. That’s what you’re trying. I hear that in your voice.
Alonso Castañeda Andrade: [15:05] Yeah. So, that making it available and not just for and as you were saying for the different kind of users, right. We have like the normal user sometimes known as like citizens where these are the business users that are the data for their operations for decision making, and for business analysis and data analysis. And you have your data scientists that are working on maybe some exploratory analysis on experimentation or some, creating some data algorithm to find some insights. and you have also your applications, right. Which also require data be accessible.
Dr. Jerry Smith: [16:06] Oh, by the way, for those who don’t know today, we are live in the home of Alonzo. And, Alonzo is in Mexico City. I should have brought this up before, you’re actually live from Mexico city. Mexico is, just like all of us, it’s in the breadth and depth of the coronavirus. And, you know, it affects the children and the families down there just as it does up here. And today, Alonzo has as his administrative his younger son, how old is your boy?
Alonso Castañeda Andrade: [16:36] He’s another three year old. So
Dr. Jerry Smith: [16:40] We have two administrative aids here today working with us. So if you hear music or if you hear banging in the background, or if you hear some little human beings doing little human being things, I hope you could put up with it because, that’s just part of life today I think, right. I mean, we’re seeing this all the time Alonzo. So, let’s get into the trends then. I mean, trends for 2022, right. We’re here in the first couple weeks. One of the areas that you talked to me about before was this continued migration to the cloud. Can you talk to me a little bit about that?
Alonso Castañeda Andrade: [17:11] So cloud plays a very important role in the data modernization of the platform, because it allows businesses to deploy data products with much more speed. We’ve seen improvements of up to 50% faster.
Dr. Jerry Smith: [17:49] Wow. And that 50% velocity increase, what’s that due to is that because you don’t have to buy the equipment, you just like, you want a server boom, it’s up and running with all the stuff, or is there something else going on?
Alonso Castañeda Andrade: [17:59] Yeah. The elasticity of the cloud services be able to scale quickly and also use the resources in an effective way when you have a lot of things to process scale the system when you don’t need it anymore, you scale it down. And all of those, flexibility, and having access to also all of the services that the cloud provide the platforms that are already published and available to us, they make it much more, easier.
Dr. Jerry Smith: [18:48] And those services become really important. One of the services that we rely on heavily is those cognitive services, right. That, you know, having all that raw data sitting there is great. Right. I mean, we talked about that, but behavior is an important part, right? In this new world of AI, it’s all about changing behavior. You change the world on observant. We say that quite a bit. So being able to pull behaviors out through services that are offered by the vendors allows you keep that benefit without having to go through them yourself, right? what are there other services besides the cognitive stuff. Right. That’s the one that I’m most familiar with. Do you have any other services out there that are important?
Alonso Castañeda Andrade: [19:31] Yes. So one important is the analytical databases that we have available today in the cloud providers, Azure has very pro product CNAs allows great flexibility on exposing data consuming, transforming it. So it’s a great platform where you can do a lot of your data pipelines.
Dr. Jerry Smith: [20:08] And, is that a batch process? We do it once and over, or is that a continuous process that’s happening there?
Alonso Castañeda Andrade: [20:15] Uh, it’s a mixture. I mean, you have your batch processes and then you have your real time or your real time, which is another trend that we’re seeing. People want their data, or, the business requires their data as soon as possible, right. As soon as something happened, it’s no longer acceptable or useful to have data one day back for a lot of use cases. You require data that is up to.
Dr. Jerry Smith: [20:57] Right. I mean, just think about the banking industry.
Dr. Jerry Smith: [21:03] I mean, five minutes in the banking industry with fraud is millions of dollars in losses by individuals, right? So you need it real time in that area. So we have folks like Microsoft, AWS, Google, those sort of things, folks that are working that area. So we talked a couple trends. One is the continue movement to the cloud through those vendors. You also introduced us to the democratization of data, which I think is important in that area. You mentioned in the past things like data ops and stuff like that, you know, I’m familiar with ML ops, because we’re always trying to manage our machine learning operations. And even to some extent AI ops, which is applying AI to IT operations, what’s going on in the world of data ops, is that a big thing today? Is it not a big thing today?
Alonso Castañeda Andrade: [21:53] Yeah. And, and this comes back to my previous comments about the challenges on most of the organizations, which is to provide the data that the business requires and the operation requires as soon as possible, we do quality and it’s agile and evolving and providing value soon. Generally, traditional analytics or data projects have been waterfall in nature and also try to boil the ocean in the first project, try to bring all of your data creating a massive data model, where you define all your rules upfront, all your business, all your, transformations and data integrations. And that takes a lot of time. And once you finish these.
Dr. Jerry Smith: [23:11] He’s excited about data and analytics. I can just tell him the background. He’s like, Hey dad, you forgot to introduce the fact that you’re inventing analytical ops as well. I think that’s what he was saying.
Alonso Castañeda Andrade: [23:23] Yeah. So, a lot of times these projects fail because of that, they are time consuming, expensive and, once are there ready, they may be obsolete at some point.
Dr. Jerry Smith: [23:41] I think that’s a good place to stop for today. You know, we’ve sort of reached that 25 minute mark, we’ve talked a bit about your background, about the analytics world. We talked about a lot about data, right? the need for data, the branches of it, you led us through some understandings around the world of analytics, and also some of the trends in terms of continuing moving the cloud and democratization and, this sort of world of data ops and, your son introduced us to this unknown concept of analytical ops. If we end there today, is there any last thing that comes to mind that you would say, well, it’s somewhere down the road, we should probably talk about this too. What’s the last words you have?
Alonso Castañeda Andrade: [24:23] Yeah. I think the importance of the data in today’s business is every time more evident, data is an asset for the organizations. So be able to take advantage of that and make it a competitive advantage, and be able to outperform your competitors and perform better. It’s key. So, having these practices that well established and your processing, your data governance and your technology and your architecture are well-defined and well established, it’s very important. And, we can help organization on doing that here.
Dr. Jerry Smith: [25:31] Well, data is an asset. That is something that I think, hopefully people still recognize is important. I really appreciate you being here as a guest today on AI Live and Unbiased. And I look forward to, talking with you and working with you as a colleague in the months to come.
Alonso Castañeda Andrade: [25:48] Thank you very much, Jerry.
Dr. Jerry Smith: [25:49] Well, that’s it for the show today, how’d we do, please send me a note on what you found interesting and how it impacted you so that we can make our time together better. So with that, this is your host, Dr. Jerry Smith, your AI Uber driver, asking you to change the world, just not observe it. See you later.
Outro: [26:12] This has been the AI Live and Unbiased podcast brought to you by AgileThought. The views, opinions and information expressed in this podcast are solely those of the hosts and the guests, and do not necessarily represent those of AgileThought. Get the show notes and other help for this episode and other episodes at agilethought.com/podcast.