TheDevTeam

We are bespoke software developers with over 22 years experience of making things work, we have a wide range of Web Development tools that we can rapidly bring to bear on any problem and are confident of delivering on our promises In many cases we have worked with our clients for over 10 years covering East Sussex and surrounding areas
  • 1
    Web

    Web Development

    Web Development from The Dev Team The websites and web applications that we build are more akin to bespoke software applications than traditional websites. They incorporate complex business logic and processes from stock control and fulfilment to CRM functionality and data manipulation tools.

  • 2
    Software

    Software Development

    Software Development from The Dev Team Whether it's websites, web applications, PC based software, services or server based software we've got tonnes of development experience going back more than 20 years.

  • 3
    PM

    Project Management

    Project Management from The Dev Team Project Management is all about having good tools to collaborate and a team able to deliver

  • 4
    Mapping

    Dynamic Mapping

    Dynamic Mapping from The Dev Team It's quick, easy and inexpensive (if you know how). One of the biggest advances over the last few years has been the ability for anybody to gather longitude and latitude information and then plot it onto maps. If you're not already doing it then contact us to find out how you can.

  • 5
    SEO

    Search Engine Optimization

    Search Engine Optimization from The Dev Team For most websites, being accessible to Google is probably the most important requirement. The only way that this can be done is to make it carefully thought out and integrated part of your website. If anybody phones up offering to get you to the top of the search engine rankings for a few hundred pounds, don't believe them.

  • 6
    Data Visualisation

    Big Data & Data Visualisation

    Big Data & Data Visualisation from The Dev Team The amount of data that we, as humankind, are collecting is increasing exponentially. If you know what to do with the data, as we do, it can also be turned into a vital competitive advantage and business critical information that makes a real difference....

  • 7
    E-Commerce

    E-Commerce

    E-Commerce from The Dev Team Shopping carts, online shops and taking payments online is one of the more standard implementations in a website but the choice available to you is huge. We can help you implement your online shop whether it's a 3rd party plugin or built from scratch. We've done it all!

  • 8
    Collaboration

    Collaboration Tools

    Collaboration Tools from The Dev Team The ability to work on the same documents, build specifications together, build projects together, share ideas and collaborate are essential for modern business. Effective collaboration is at the core of every project that we work on and we use a variety to tools and skill-sets to facilitate this.

  • 1
    Construction

    Construction

    Construction from The Dev Team Denaploy has been deeply involved in may aspects of the construction industry since its inception and has worked with Main Contractors, Sub-contractors, Architects, Surveyors and CDMCs. We understand the pressures of the industry and its sometime arcane vocabulary.

  • 2
    Property

    Property

    Property from The Dev Team There are over 22 million homes in the UK and we move once every six years on average throughout our lifetime. That's a lot of data and we deal with a good proportion of it.

  • 3
    NFP

    Not for Profit & Charities

    Not for Profit & Charities from The Dev Team We do a number of different projects for Charities and Not for Profit organisations. Most of the problems needing to be solved are still the same and it suits our skillset to have a more hands-on approach to the project management.

  • 4
    Publishing

    Publishing

    Publishing from The Dev Team We've been working with publishing companies for over 20 years now, most notably for one of our major clients Newsstand.co.uk. If you're after a fully integrated stock control and automatic ordering system or something as simple as data feed integration then look no further.

  • 5
    Blue Chip

    Blue Chip

    Blue Chip from The Dev Team We've worked with many blue chip companies over the last 20 years and continue to receive work on a regular basis underlying the fact that we are competitively priced and possess a skills that are hard to come by even for companies with thousands of employees.

  • 6
    International

    International

    International from The Dev Team The World is getting smaller by the day and the ability to work at a desk thousands of miles away is something we've been doing for years. Why let the logistics of getting from A to B get in the way of getting the software solutions you need when everything from meetings to deployment can be done remotely?

  • 7
    Mininingware

    Eclipse Mininingware

    Eclipse Mininingware from The Dev Team Our Eclipse Miningware Solution is a cost effective solution for any startup mining company.

  • 8
    Small Business

    Small Businesses

    Small Businesses from The Dev Team Even if you've got a small business then you can afford a big IT department because you'll be hiring our skills and services for a fraction of the time and therefore cost compared to hiring permanent staff.

  • 9
    Joint Ventures

    Joint Ventures

    Joint Ventures from The Dev Team One of the downfalls of working in IT is that everybody and his dog has an idea of how they're going to make a fortune with the Internet. They usually have no money to invest but they do know somebody in the industry who might listen. Sometimes they do listen....

The continuing evolution of the internet creates many business opportunities needing specialist IT skills.

That's where we fit in. We are a highly skilled collective who are more than a match for most blue chip companies internal IT departments.

We're here to make IT and the Internet work for you.

Written on 10 December 2015

Tools to help clean big data

You'd be forgiven for thinking that in this day and age, extracting useful information from data provided by the supermarkets would be a simple task. The data provided by the "Big Four" supermarkets in the UK; Sainsburys, Asda, Tesco and Waitrose doesn't contain UPCs or EANs. There are countless spelling mistakes and variations on spellings even down to the brand level. Different character sets are used and supermarkets will even change the way they name products from week to week. To combat all of these things and more, we have developed a number of different tools to help make the process as easy as possible.

A Brief Overview of the Product Matching tool

This human element is most important in identifying duplicate products and classifying them as the same. With the huge amount of possible variations in spelling, only a human can come to the conclusion, with a good degree of certainty that

Lilt zero 6x330ml

is the same as

Lilt z p/apl & grpfrt 6 x 330 ml pack

The computer's job is to make the human's job as simple as possible. The image below shows the other possible duplicates of Lilt Zero (6X330 ML) in the bottom list. If they are the same product then the user clicks "combine" or "make parent" to identify a single value to report on.

using computers to help humans is the only way to make sense of this data

using computers to help humans is the only way to make sense of this data

 

 

When the user clicks on "Lilt Zero (6x330ml)", the application automatically searches the product table for possible matches and returns them based on the description. The results are weighted according to the position of the words and ordered accordingly. This single screen is aso where the user can specify parent/child products, view existing matches and rollback changes as well as change all metrics apart from the product description (because the same product description is likely to be sent from the supermarkets in the future). Keeping the layout simple and unclutered is important for the human user and belies the complexity of the underlying functionality.

One man's trash is another man's treasure

Most people out there groan when they have to think about making sense of the masses of data they collect, we revel in it. Maybe it's because we like to tut at other people's mistakes and roll our eyes at flawed processes that generated the "mess" in the first place. Maybe it's because we like to see a picture emerge from that seemingly impossible jigsaw. Whatever it is, we've had one of the most difficult puzzles to date trying to make sense of data from the "Big Four".

On of the major stumbling blocks we experienced is that the data supplied by the supermarkets doesn't include UPCs/EANs (tut-tut). That means we need to match by product descriptions which, as you can imagine, vary not only between the supermarkets but from week to week. There are also lots of variations between products in the same range from the same supermarket and plenty of spelling mistakes to boot (not to mention other mistakes in quantity, size, packaging, flavour....). Most people would give in here and quote the term GIGO (garbage in, garbage out) which was first referred to in 1963 and relates to the fact that computers, because they work by logical processes unquestioningly process unintended, even nonsensical, input data ("garbage in") and produce undesired, often nonsensical, output ("garbage out").

We're a little more in 'unpolished jewel' camp, believing that most data isn't complete garbage but human intervention is required to clean things up. You can't leave it entirely to the machines (even 50+ years after Charles Babbage's musings) but the machines can help make the process of cleaning it up a heck of a lot easier.

We've ended up developing tools to help make the process as simple as possible whilst providing the ability for users to undo their changes as well as track them, report on them and even create their own rules.

Fortnightly updates

Even when the products have been cleaned there is still the risk of importing more errant data when the new supermarket data comes through. Every two weeks the clients imports supermarket data and goes through this same process:

  • Import the supermarket data using the tool (above).
  • Rules automatically run to fix generic errors and non UTF-8 characters
  • Check for unmatched products. Unmatched products are ordered by sales value by default so that the user can ascertain how important it is that they are fixed.
  • Using dashboards created by Moor Consulting view possible exception data, combined data, changes and other areas that may requtitivation
  • And finally, the who purpose of this excercise is to use dashboards for analysing sales of products across the entire industry.

Data management

Of course, what computers do well is store the cleaned data and our tools help manage the data that has been imported. This includes rolling back / deleting unwanted data which may, for example, have been imported in error. Users need to be able to see the batches that have previously been imported and the software needs to minimise the risk of the same data being imported again.

Updates and changes

Using RESTful web services we send rules and initiate updates to the client application without getting the user involved with installing anything. Why get humans involved when you don't need to?



Related to Tools to help clean big data

Eclipse Mining software - Summary

A mine planning, tracking, QC and reporting package. Used at gold mines in Ghana, Cote d’Ivoire, Eritrea and soon many others.

AMO Scandanavian Online Ordering System

AMO (Advanced Medical Optics), the largest optics manufacturer in the World, asked us to create an online ordering system so that opticians in Scandanavia could order their products online.

AMO Online Sales Tool

Online sales tool for Advanced Medical Optics

AMO CRM

Customer Relationship Management tool for AMO (Advanced Medical Optics), the largest optics manufacturer in the World.

Bisha Mining

We further developed the Mining Planning Database for a small surface gold mine in Eritrea. Small in the mining World is a startup cost of around $300m and we like to think that we played a part in keeping those costs to a minimum.


Knowledge Base

EMail marketing done properly

We've worked hard over the last few months to create a tool which can be used for creating and managing your email marketing campaigns. This feature packed set of tools will help us help you make the most of your existing customers and maximise your revenue.

Advantages of Sub-contracting your Software development

Software development has become increasingly specialised, so only the largest companies can have all the skills necessary to be leading edge in all areas.

PowerBI digger production example using Eclipse

There are literally thousands of different types of reports you can extract from our mining production tracking system: Eclipse Miningware. This article is a brief demonstration of just one; a daily report for the diggers which worked in production "yesterday".

The Agile Manifesto

The Agile Manifesto is a group of software development methodologies which promotes adaptive planning, evolutionary development and delivery, a time-boxed iterative approach, and encourages rapid and flexible response to change. We are moving towards the Agile Software development process and ISO 12207.

Address:
TheDevTeam
8 The Lawn,
St Leonards-On-Sea,
East Sussex, TN38 0HH