What is Robotic Process Automation ?

   Fri, 19-Apr-2019   Intellyk   Intellyk    No Comments


Robotic process automation (RPA) is currently receiving a lot of attention and rightly so. Organizations are driving considerable value by streamlining enterprise processes and reducing cost.

Forrester predicts that the RPA industry will be worth around US$2.9 billion by 2021. The research firm also estimates that by 2021, more than 4 million robots will come into existence and will perform administrative and office work as well as sales and related jobs.

Presently, the RPA market is dominated by North America and the Asia-Pacific region is adopting the technology rapidly in areas such as healthcare, BFSI, IT, retail, and telecommunications.

John Cryan, CEO of Deutsche Bank summed up the signal changes brought about by technologies like RPA with his statement, "In our bank we have people doing work like robots. Tomorrow we will have robots behaving like people. It doesn't matter if we as a bank will participate in these changes or not, it is going to happen."

What is robotic process automation (RPA)?

Robotic process automation (RPA) refers to the use of software driven by artificial intelligence (AI) and machine learning (ML) to automate high-volume, repetitive tasks of a business process.
Just like humans, RPA robots utilize the user interface to manipulate applications and capture data. They communicate with other systems and interpret messages and send responses to perform a wide variety of tasks.

However, RPA robots cost much less than an employee, never take sick days, never sleep, and make absolutely no errors. Thus, RPA offers a competitive edge to businesses that adopt it.

How is RPA different from traditional automation?

In principle, both RPA and traditional automation integrate software to automate business processes. However, traditional automation typically achieves this on the backend. Specifically, RPA and traditional automation differ in the following ways:

  • Low technical barrier: Traditional automation involves the use of APIs and thorough knowledge of the target system. RPA robots can work at the level of the graphical user interface (GUI) and application integration is not required.

  • Vendor/Software Limitations: In traditional automation, only certain systems can be integrated due to API limitations and applications cannot be customized by users because the source code is not available. Automation of legacy systems is a challenge because it requires through knowledge of software. RPA does not have these limitations as it works on the GUI.

  • Customization: Traditional automation does not lend itself to customization to the user's platforms in the manner in which RPA does.

  • Audience: Skilled software developers are required to deploy traditional automation applications, whereas RPA requires techno-functional SMEs to train robots.

Why should organizations consider RPA implementation?

RPA is the most cost-effective and efficient way to automate modern office tasks. Today, employees use a greater number of tools than they did in the past. It is simply not feasible any longer to automate all these tools and interactions using macros.

Implementation of RPA offers several benefits to organizations:

  • Highly scalable and flexible: RPA has the ability to perform a wide variety and volume of tasks across business units and geographies in parallel. Extra robots can be deployed to deal with fluctuations in volume at minimal cost.

  • Greater accuracy, improved compliance: RPA robots are trained to work according to the rules of a program. They work with greater precision and accuracy than humans. They never make mistakes and never get tired; thus, they are compliant and consistent.

  • Identify workflow inefficiencies: Business processes become streamlined after deployment of RPA. Businesses may also discover ways to reduce inefficiencies by simplifying existing workflows and eliminating needless processes.

  • Better security: RPA robots are immune to cyber attacks like social engineering and spear phishing, thus making the system safer than if humans were operating it.

  • Improved productivity: RPA takes over repetitive tasks from humans and allows them to focus on value-added tasks. This helps improve employee productivity significantly.

  • Cost savings: RPA has been found to reduce processing costs by up to 80%. Most enterprises receive a positive ROI in less than 12 months. According to McKinsey, RPA has the potential to offer a 30%-200% ROI in the first year of deployment.

  • Non-invasive: RPA robots do not require custom software or deep systems integration and can be easily trained. Thus, organizations using legacy infrastructure can also implement RPA. This also often makes RPA cheaper to implement than traditional automation since there is no need for infrastructure remodelling, offshore/onshore manual processing, or outsourcing.

How does RPA work?

While industrial robots transformed the factory floor, RPA robots are taking over the back office. RPA robots emulate human actions like opening files, copy-pasting fields, and inputting data in an automated manner. They interact with different systems through integration and screen scraping, allowing RPA tools to work like human employees.

An RPA bot is the fundamental unit of automation and it can be deployed from an employee's desktop or from the cloud. The two common types of bots available in the market are:

  • Programmable bots: Such bots are defined by set rules and parameters created by a programmer. However, it is a time-consuming task to train bots for complex tasks as it requires step-by-step mapping of the process.

  • Intelligent bots: AI-powered bots and self-learning bots can analyse historical data and current data to "learn" how an employee performs a process. Once the bot has analysed enough data, it can perform the process on its own. Such bots are well-suited for processes that have unstructured data or fluctuating parameters.

RPA has evolved from three key technologies:

  • Screen scraping: It involves processing the HTML of a webpage and converting it into a format which can be recognized by the RPA bot and helps it interact with the page.

  • Workflow automation: It involves software that eliminates manual data entry and increases the speed of order fulfilment, thus improving accuracy, efficiency, and customer satisfaction.

  • Artificial intelligence: It refers to the ability of computers to perform tasks that normally require human intelligence and intervention. With machine learning, computers can be trained to perform tasks independently using large quantities of historical and current data.

Challenges to RPA:

Although RPA is poised to become a major business need, there are a few questions that enterprises must consider before implementing the technology:

  • Is RPA expensive?

  • RPA may have initial installation costs, but it is still cheaper to deploy than traditional automation solutions because it works alongside the existing infrastructure and does not require restructuring of systems.

  • Does RPA pose a data security risk?

  • There is always a potential for misuse of sensitive data that RPA robots handle on a day-to-day basis. Malicious programming by software developers to introduce malware is a risk. Enterprises can combat such dangers by introducing additional security measures like role-based access to confidential data and data encryption.

  • Will RPA cause loss of jobs?

  • Certain jobs will be replaced by RPA bots, like those of data entry specialists. However, the technology will also create new jobs such as those of RPA developers and RPA engineers. RPA will also enhance existing jobs by providing tools to help humans focus on value-added tasks.

    Some Top RPA vendors

    Some of the leading vendors that offer RPA tools customized for enterprise-level companies and streamline the technology are:

    • Automation Anywhere Inc. offers an enterprise digital workforce platform that enables quote-to-cash, procure-to-pay, claims processing, HR, and other back-office processes.

    • Blue Prism offers organizations desktop-aligned robots that are defined and managed centrally.

    • UiPath provides an open platform to help organizations efficiently automate business processes. It is the most widely used RPA platform in the world currently.

    • Pega Systems offers end-to-end automation solution which combines business process management, artificial intelligence, and robotic desktop automation.

    • EdgeVerve Ltd., an Infosys company, helps organizations improve business processes, modernize customer service, and enhance operational productivity.

    • Workfusion automates enterprise business processes by bringing together workforce orchestration, robotics, and AI-powered cognitive automation.


    Although automation software is estimated to eliminate 140 million jobs by 2025, it is also set to create high-quality jobs for enterprises which maintain and improve RPA software. RPA technology will require CTOs or CIOs to take accountability for business outcomes and the risks of deploying RPA tools. Advances in RPA will continue to help businesses reduce their overhead costs and maximize their ROI, thus justifying its relevance and use.














    Using data science for customer retention

       Thu, 02-Feb-2017   Intellyk   Intellyk    2 Comments

    Customer retention is key priority for any business. Multiple factors drive customer churn and understanding of these factors can help proactive management of customer churn. Combination of data processing and statistics can help in understanding the possible reasons and identifying customers at risk.

    For this example we have used telecom data set from IBM community website ( through a complex decision making process before subscribing to any one of the numerous Telecom service.

    Goal of predictive model in this blog is to identify set of customers who have high probability of unsubscribing from the service. For this model, we are using personal details, demographic information, pricing, and plan information. We will also identify set of independent variables that are related to customer unsubscribing from service.


    • Dataset has 7043 rows with 21 features.
    • Independent variables considered for this exercise
    • Customer Demographics (Age, Gender, Marital Status, Location, etc,)
    • Billing Information (monthly and yearly payment)
    • Voice and data service (Phone service, multiple lines, Internet service, online security, device protection, Tech support)
    • Contract type
    • Bill payment mode
    • Response/Dependent variable considered for the model:
    • Value '1' indicates UNSUBSCRIBED customers
    • Value '0' indicates ACTIVE customers


    Source code for this exercise is available at


    Logistic Regression:

    For this exercise, we are using logistic regression algorithm. Logistic regression is useful in establishing a relationship between binary outcome and a group of continuous and/or categorical predictor variables. It also determines the percentage of variance in the dependent variable explained by independent variable.

    Training the data to the model:

    lr = LogisticRegression(),y_train)
    print("exaplined covarinace is {}".format(Explained_Cov))

    Roc: evaluating the trade-offs between true positive rate (sensitivity) and false positive rate (1- specificity). Higher the area under curve(AUC), better the prediction power of the model.

    mean_squared_error is 0.345635202271
    mean_absolute_error is 0.345635202271
    explained_variance_score is -0.728474919601
    r2 score is -0.791188969636
    jaccard similarity is 0.654364797729


    Training: 70% of the data is used for training a model

    Testing: 30% of data is used to test the model.


    It is common to use multiple models. We chose this model based on Prediction accuracy and impact of type1 error.

    Precision recall f1-score support
    0 0.84 0.90 0.87 1061
    1 0.61 0.47 0.53 348
    Avg/total 0.78 0.79 0.79 1409



    Why should I consider big data project?

       Wed, 08-Feb-2017   Intellyk   Intellyk    1 Comment

    In this blog post I will try to explain big data implementation projects and the why and how of big data, what drives big data project and most importantly, how business intelligence has changed over the years. One comment I hear often is that it is business intelligence which I have been doing and my data warehouse is handling huge amounts data and that today "big data is old wine in new bottle". There has always been a hype around big data and its importance. We need to understand why we need it, what value it provides and how we can adapt best to the changes.

    Having worked in the business intelligence space for a considerable number of years, we need to understand that there have been many changes along the way. To understand business intelligence and data mining projects better, I would like to divide it into 3 components: firstly, understanding data, secondly, the questions we need answered and thirdly how we use technology to answer these questions adequately and insightfully.

    Business intelligence, its purpose and structure per say has not changed, but the above mentioned components most definitely have. Here's how:


    Data has changed dramatically over the years. Let's take a look at the 4 V's of data: volume, variety, velocity, and veracity.

    Volume: Big data implies enormous volumes of data. Previously, data used to be created by employees. Today, data is generated by machines, networks and human interactions across mediums such social media, thereby making the volume of data to be analyzed massive. Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.

    Variety: Variety refers to the many sources and types of data both structured and unstructured. We used to store data from sources like spreadsheets and databases. Now data comes in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc.

    In 2015, over 1.4 billion smart phones were be shipped - all packed with sensors capable of collecting all kinds of data, not to mention the data the users create themselves.

    Velocity: Velocity refers to the speed at which data flows in from various sources such as business processes, machines, networks and human interaction with things like social media sites, mobile devices, etc. The flow of data is massive and continuous. This real-time data can help researchers and businesses make valuable decisions that provide strategic competitive advantages and ROI if we are able to handle the sheer velocity of data.

    Facebook users send on average 31.25 million messages and view 2.77 million videos every minute. There is massive growth in video and photo data, where every minute up to 300 hours of video are uploaded to YouTube alone.

    Veracity: Veracity refers to the biases, accuracy and abnormality in data. It is very important to know whether the data that is being stored, and mined meaningful to the problem being analyzed. In scoping out big data strategies, it is imperative to keep your data clean and processes to keep 'dirty data' from accumulating in your systems.


    Now that we have an abundance of changing data, we are able to give business decision makers more insightful answers. With the combination of structured and unstructured data, we are able to answer these questions far more meaningfully than we have in the past. Previously, we would spend a lot of time and money in finding these answers, today BI is far less time consuming and can be delivered at lower costs driving ROI up significantly. The combination of new data sources, data mining, predictive analytics and today's complex machine learning algorithms are more effective. Predictive business intelligence has become more mainstream.

    Machine learning brings value to all the data that enterprises have been saving for years, by churning high volumes of data and helping gain deeper insights and improve decision-making. The beauty of it all is that these algorithms keep getting better with time by itself.


    In early years' companies like Google architected solutions to use huge amounts of unstructured data to get actionable meaningful insights for users. Here's two research papers on how it was done: and .

    Inspired by this open source community, technologists have invented an ecosystem of Hadoop technologies. These technologies have reached a state of maturity that companies of all sizes can make use of to build their data/BI solutions. These technologies can handle the type of data that is available to organizations today and can scale economically and are well suited to building next generation data warehouse for predictive and prescriptive analytics. The scale of storage and processing for petabytes is economical so that companies of all size can create data solutions.

    Traditional Business Intelligence and data warehousing was not fully prepared and were not designed to handle or make use of type of data we have now. On the other hand, newer technologies are designed ground up to handle the new scenarios. Due to this, it is important to assess impact on data warehouses and ETL process infrastructure. It may require integration of Hadoop distribution and creating enterprise data lake/data hub for variety of reasons. There will be architecture change, infrastructure change etc.


    The value that organizations can get with the above changes in data and technology should drive new big data projects. There are multiple ways organizations can derive value from big data projects - for example, improving the cost of data warehouse, getting new insights from new type of data sources to improve operations, better customer targeting, customer experience and creating new lines of business using data, etc.

    However, it is advisable to create a vision for the big data project before starting and the vision should include the value big data project will drive for the organization.

    Many companies have invested in data mining but the focus was always on structured data. Now we can leverage the same on valuable unstructured data. More descriptive analytics and prescriptive analytics can be done. This is possible because of data as well as technology advancement in machine learning / modelling algorithms. This can be further integrated with likes of Agent based technology like Cortana, Siri, etc.

    Data Sources:

    Why Data Lake?

       Wed, 08-Feb-2017   Intellyk   Intellyk    No Comments

    Data Lake

    In the past years, the concept of data lake looked like much needed in the enterprises. But what does data lake mean? How does it benefit organizations? What are the advantages of Data Lake by incorporating into the architecture? Are organizations adapting data lake model?

    What is a Data Lake?

    First, let us define what data lake actually means.

    "A data lake is a storage repository that contains massive amount of raw data in native formats, to enable users to easily access to large volumes of structured (rows and columns), semi-structured (CSV, logs, XML, JSON), and unstructured (emails, documents, PDFs) data as needed."

    The main idea of Data Lake is to have a centralized store of all data in enterprises ranging from raw data (primary data) to transform the data when required into various tasks which include reporting, visualization, analytics and machine learning.

    Next-Generation Data - Modernizing Enterprise Hub

    Data Lake architecture has become a scalable data storage operation for many businesses and it plays a crucial to the future of business that can be flexible for all types of data, process, and query data in many ways.

    Understanding the Data Lake Architecture with Hadoop

    The data lake is a single repository capable of storing huge volumes of data in various formats. Big organizations like Facebook, Google, Yahoo and other web scale companies have gained numerous benefits and advanced to the next level by using data lakes.

    The successful Hadoop journey typically begins with new analytic applications, which lead to a Data Lake. The new applications which are created and derive value from the new formats of data from webserver logs, databases, social media, clickstreams, and other sources, the Data Lake is formed with Hadoop acting as a shared service to create a diverse set of value chain for efficient scale which new types of business values are emerged. In addition to this, curation takes through capturing, mixing, and exploring new types of data making it in available in the data catalog .

    The Best Describe Advantages of Data Lake

    A data lake is used to end the data silos to centralize the data and to gain flexible access to all diverse data sources within your business.

    • Low Cost and Extremely Scalable for Processing: The cost of data lake is low and it enhances scalability to extreme high volumes of data processing for high efficiency.

    • Compatibility with Multiple Platforms: The raw data that is stored in the data lake has capabilities where developers can work effectively in multiple programming languages like Java or Python and frameworks technologies such as Pig and Hive.

    • Data Accessibility: A data lake can contain any data such as structured, semi-structured and unstructured in a single centralized location. This not only enables users to allow but also grant immediate access to all data, since it is in one location.

    • Data do not have to be moved: Since all the data are stored in one central location, silos are no longer necessary and it gives easy accessibility. Additionally, it is not necessary to move data from one warehouse to another.

    • Insights of Data Lake: Organizations can store the data in raw format with Data Lake. This means that the information is stored in a secure location and does not fear the loss of data.

      The Data Lake can be effective data management solutions for advanced analytics solutions and user-interface. However, there are some security challenges that have come across when it comes to storing in one particular location. The objective of building a data lake is to derive value and if it is done correctly and stored in single repository, the data can be easily accessible and analyze the raw data with existing data analytics tools, which gives significant new insights.

      Data Lake and Enterprise Data Warehouse together provide synergy of capabilities to allow users to do more with data and actually drives incredible business results faster.

    © 2022 . All Rights Reserved