The Fine Print: Big Data

Showing posts with label Big Data. Show all posts

Friday, 27 December 2019

UK Firms: Why Not Simply Process EEA Residents' Personal Data In the EEA?

It's time for UK businesses to get creative in dealing with Brexit and all its uncertainties. As I've explained here, the processing of personal data relating to EEA residents is a particular problem. The UK is 13th on the list of countries that will be waiting for the European Commission to declare the UK personal data regime to be 'adequate' to transfer that data as of right (as happens now).

So, rather than bring personal data into the UK from the EEA, you could - as many already have - simply incorporate an entity within the EEA to hold the data and determine the means and purposes of processing there. That EEA entity could do the processing itself within the EEA or outsource that to an EEA-based processor with the right experience and expertise. Ireland, for example, is the top AI hub in the EU and it can be a simple matter to transfer existing English law contracts to a new entity there, particularly as Irish law is so similar.

Only the aggregated results would need to come in to the UK.

Sunday, 16 June 2019

Of Caution And Realistic Expectations: AI, ANN, BDA, ML, DL, UBI, PAYD, PHYD, PAYL...

A recent report into the use of data and data analysis by the insurance industry provides some excellent insights into the pros and cons of using artificial intelligence (AI) and machine learning (ML) - or Big Data Analytics (BDA). The overall message is to proceed with caution and realistic expectations...

The report starts by contrasting in detail the old and new types of data being used by the motor and health segments in the European insurance industry:

Existing data sources include medical files, demographics, population data, information about the item/person insured ('exposure data') and loss data; behavioural data, frequency of hazards occuring and so on;

New data sources include data from vehicles and other machines or devices like phones, clothing and other 'wearables' (Internet of things); social media services; call centres; location co-ordinates; genetics; and payment data.

Then the report explains the analytical tools being used, since "AI" is a term used to refer to many things (including some not mentioned in the report, like automation, robotics and autonomous vehicles). Here, we're talking algorithms, ML, artificial neural networks (ANN) and deep learning networks (DLN) - the last two being the main focus of the report.

The difference between your garden variety ANN and DLN, is the number of "hidden" layers of processing that the inputs undergo before the results pop out the other end. In a traditional computing scenario you can more readily discover that the wrong result was caused by bad data ("shit in, shit out", as the saying goes) but this may be impracticable with a single hidden layer of computing in an ANN, let alone in a DLN with its multiple hidden layers and greater "challenges in terms of accuracy, transparency, explainability and auditability of the models... which are often correlational and not causative...".

~~Of course, this criticism could be levelled at the human decision-making process in any major financial institution, but let's not go there...~~

In addition, "fair use" of algorithms relies on data that has no inherent bias. Everyone knows the story about the Amazon recruitment tool that had to be shut down because they couldn't figure out how to kill its bias against women. The challenge (I'm told) is to reintroduce randomness to data sets. Also:

As data scientists find themselves working with larger and large data sets and working harder and harder to find results that are just slightly better than random, they will also have to spend significantly more time and effort in accurately determining what exactly constitutes true randomness in the first place.

Alarmingly, the insurers are mainly using BDA tools for pricing and underwriting, claims handling, sales and distribution - so you'd think it pretty important that their processes are accurate, transparent, explainable and auditable; and that they understand what results are merely correlated as opposed to causative...

There's also a desire to use data science throughout the insurance value chain, particularly on product development using much more granular data about each potential customer (see data sources above). The Holy Grail is usage-based insurance (UBI), which could soon represent about 10% of gross premiums:

pay-as-you-drive (PAYD): premium based on kms driven;

pay-how-you-drive (PHYD): premium based on driving behaviour; and

pay-as-you-live (PAYL): premium based on lifestyle, tracking.

This can enable "micro-segmentation" - many small risk pools with more accurate risk assessments and relevant 'rating factors' for each pool - so pricing is more risk-based with less cross-subsidy from consumers who are less likely to make claims. A majority of motor insurers think the number of risk pools will increase by up to 25%, while few health insurers see that happening.

Of course, micro-segmentation could also identify customers who insurers decide not to offer insurance (though many countries have rules requiring inclusion, or public schemes for motorists who can't otherwise get insurance, like Spain, Netherlands, Luxembourg, Belgium, Romania and Austria). Some insurers say it's just a matter of price - e.g. using telematics to allow young high risk drivers to literally 'drive down' their premiums by showing they are sensible behind the wheel.

Increases in the number of 'rating factors' is likely to be more prevalent in the motor insurance segment, where 80% (vs 67%) are said to have a direct causal link to premium (currently driver/vehicle details, or age in health insurance), rather than indirect (such as location or affluence).

Tailoring prices ('price optimisation') has also been banned or restricted on the basis that it can be unfair - indeed the FCA has explained the factors it considers when deciding whether not price discrimination in unfair.

Apparently 2% of firms apply BDA to the sales process, resulting in "robo-advice" (advice to customers with little or no human intervention). BDA is also used for "chatbots" that to help customers through initial inquiries; to forcecast volumes and design loyalty programmes to retain customers; prevent fraud; to assist with post-sales assistance and complaints handling; and even to try to "introduce some demand analytics models to predict consumer behaviour into the claims settlement offer."

Key issues include how to determine when a chatbot becomes a robo-adviser; and the fact that some data is normally distributed (data about human physiology) while other data is not (human behaviour).

All of which begs the question: how you govern the use of BDA?

Naturally, firms who responded to the report claim they have no data accuracy issues and have robust governance processes in place. They don't use discriminatory variables and outputs are unbiased. But some firms say third party data is less reliable and only use it for marketing, while others outsource BDA altogether. But none of this was verified for the report, let alone whether or not outputs of ANN or DLN were 'correct' or 'accurate'.

Some firms claim they 'smoothed' the output of ML with human intervention or caps to prevent unethical outcomes.

Others were concerned that it may not be possible to meet the privacy law (GDPR) requirements to explain the means of processing or the output where ANN or DLN is used.

All of the concerns lead some expert legal commentators to suggest that ANN and DLN are more likely to be used to automate decision-making where "the level of accuracy only needs to be "tolerable" for commercial parties [who are] interested only in the financial consequences... than for individuals concerned with issues touching on fundamental rights." And there remain vast challenges in how to resolve disputes arising from the use of BDA, whether in the courts or at the Financial Ombudsman.

None of this is to say, "Stop!" But it is important to proceed with caution and for its users to be realistic in their expectations of what BDA can achieve...

Wednesday, 15 February 2017

#PSD2: What Is An Account Information Service?

The Treasury is consulting on its proposed regulations to implement the new Payment Services Directive (PSD2) in the UK. The consultation ends on 16 March 2017 and the regulations must take effect on 13 January 2018. The FCA will consult on the guidance related to its supervisory role in Q2 2017. Time is tight and there are still plenty of unanswered questions, which I've been covering in a series of posts. In this one, I'm exploring the issues related to the new "account information service", which is being interpreted very broadly indeed by the FCA. Firms providing such services will need to register with the FCA, rather than become fully authorised (unless they provide other payment services); and they are spared from compliance with a number of provisions that apply to other types of payment service provider. But now is the time for assessing whether a service qualifies, and whether to restructure or become registered.

The Treasury has, naturally, copied the definition from the directive:

‘account information service’ means an online service to provide consolidated information on one or more payment accounts held by the payment service user with either another payment service provider or with more than one payment service provider (article 4(16)) - [my emphasis] - but has added:

"and includes such a service whether information is provided—

(a) in its original form or after processing;

(b) only to the payment service user or to the payment service user and to another person in accordance with the payment service user’s instructions" [which do not appear in PSD2]

This reflects the government's broad definition of the directive (para 6.27 of the consultation paper) - consistent with the UK needlessly creating a rod for its own back and particularly ironic in the light of Brexit. The account information service provider (AISP) should be granted access by the account service provider to the same data on the payment account as the user of that account (para 6.25). A firm will be considered an AISP even if it only "uses" some and not all of that account information to provide "an information service" (para 6.28).

Services that the government believes are AISs include (but are not limited to):

dashboard services that show aggregated information across a number of payment accounts;
price comparison and product identification services;
income and expenditure analysis, including affordability and credit rating or credit worthiness assessments; and
expenditure analysis that alerts users to consequences of particular actions, such as breaching their overdraft limit.

The services could be either standardised or bespoke, so might include accountancy or legal services, for example (para 6.30).

Some key points to consider:

does it matter to whom the account information service is provided? The additional wording seems to suggest that the 'payment service user' must be at least one recipient of the information, but does that mean the payment service user of the payment account or the person using the account information service? This would seem to cover every firm that prepares and files tax or VAT returns, for example, since these are usually provided to both the client and HMRC.

the service has to be "online", but what if some of it is not?

little seems to turn on the word "consolidated", since the Treasury says a firm only needs to use some of the information from the payment account to be offering an AIS, and it could be from only one payment account. For instance, what if a service provides a simple 'yes' or 'no' to a balance inquiry or request to say whether adequate funds are available in an account, and that 'information' or conclusion/knowledge is not drawn from the payment account itself, but merely based on comparing the balance with the amount in the customer's inquiry or proposed transaction?

the payment account that the information relates to must be 'held by the payment service user' with one or more PSPs, so presumably this would not include an online data account or electronic statement that shows the amount of funds held for and on behalf of a client in a trust account or other form of safeguarded or segregated account which is in the name of, say, a law firm or crowdfunding platform operator (albeit designated and acknowledged as holding 'client money' or 'customer funds');

it seems impossible for the relevant data to provided in its 'original form', since data has to be processed in some way to be 'provided' online, but this could cover providers of personal data stores or cloud services that simply hold a copy of your bank data for later access;

what is meant by 'after processing':

it may not be clear that a firm is providing information 'on a payment account', as opposed to the same information from another type of account;
does this mean each data processor in a series of processors is providing an AIS to its customer(s) - which brings us back to whether it matters who the customer is - or does interim processing 'break the chain' so that the next processor can say that the information was not 'on a payment account' but came from some other service provider's database (whether or not it was an AIS), such as a credit reference agency?
what about accounting/tax software providers providers who calculate your income and expenditure by reference to payment account information but may not necessarily display or 'provide' the underlying data - although presumably the figures for bank account interest income (if any) in a tax return might qualify?

Sorry, more questions than answers at this stage!

Update on 21 April 2017:

The FCA has indicated in Question 25A of its proposed draft changes to the Perimeter Guidance that:

"Account information service providers include businesses that provide users with an electronic “dashboard” where they can view information from various payment accounts in a single place, businesses that use account data to provide users with personalised comparison services, and businesses that, on a user’s instruction, provide information from the user’s various payment accounts to both the user and third party service providers such as financial advisors or credit reference agencies." [my emphasis added]

Thursday, 15 October 2015

Keeping Humans At The Heart Of Technology: Conference Wrap

This is a long overdue summary of my closing remarks at the SCL Technology Law Futures Conference on whether humans can survive the advent of super-intelligent machines. The podcasts for each session are available on the SCL site.

I am confident that we can keep humans at the heart of technology during the current era of artificial narrow intelligence. It seems we are a long way into the process of coping with computers being better than us at certain things in some contexts. The sense was that the dawn of artificial general intelligence, where computers can do anything a human can, is 20-40 years away. It's also possible, of course, that the machines may never completely exceed human capabilities - more a matter of faith, in any event, as it would only be us who judged that to be the case.

There are clear signs that humans are using computers to enhance the human experience, rather than replace it. E-commerce marketplaces for everything from secondhand goods, to lending and borrowing, to outsourcing household tasks and spare rooms show that humans are working together directly to remove intermediaries by relying on faciltators who add significant value to that human-to-human experience.

This underscores the fact that computers' lack of 'common sense' will severely limit their ability to replace us – not just rationally speaking but also in terms of a shared understanding of our own five senses, and how we co-operate and use that shared understanding with each other in subtle yet important and uniquely human ways, for example, simply to summon the smell of freshly cut grass.

Misuse of machines by humans - to constrain choice, for example - will also hold back development or lead humans to develop alternatives. We have worked around technology-based monopolies in various industries, such as music, but we also heard how the few major mobile 'app stores' are not only becoming the preferred distribution platforms for software, but also choke points to throttle competition. Such attempts at control will prove futile if those platforms do not give us what we want or are not aligned with how we behave or fail to reinforce the shared sense of community that is a feature of, say, peer-to-peer marketplaces and the new distributed ledgers.

The point was also made that we should recognise the value in our freedom to make mistakes or to simply forget or fail to do something – indeed the fact that someone else has forgotten or failed presents an opportunity for someone else. Perhaps this is the key driver of competition and innovation in the first place. [So, would machines evolve to be so efficient that change would no longer be necessary? Superintelligence could be a dull experience!]

Yet it is human fallibility, not that of machines, which is behind most online fraud. Turns out that it's simpler and cheaper to hack the human operating system with confidence tricks than it is to cut through the security systems themselves. Ironically, in this context, it seems there’s more a role for machines to help us avoid being fooled by other humans into giving out sensitive information, rather than to evolve ever more sophisticated encryption, for example.

A key issue is that the evolution of machine ability and interoperability is adding vast complexity to the rules and contracts that govern their use. Layers and layers of rules, terms and conditions must knit together to ensure effective governance of even the humble home entertainment network. Of course, the earlier the lawyers, legislators and regulators are involved in this, the easier it is for governance infrastructure to keep up. That point is often made by lawyers, but it was also very heartening to hear the direct invitation for more lawyers to be involved directly with engineers in the step-by-step development of driverless cars, so they are aligned with how we humans want them to work on our roads.

Yet the speed of technological development versus the speed at which the law moves make it unlikely that the law and rules alone will be effective in directly controlling the development of machines, whereas incentives such as commission, fees and fines will likely prove more useful in nudging behaviour in the right direction and keeping interests aligned. How the economic models evolve is therefore critical - and a good area for less direct legal control of machines, particularly through the apportionment of liability and theregulation of markets and competition.

Economically speaking, however, it was pointed out that we are prone to overstating the impact of technology has had in the past, and overestimating its effect in the future. In terms of GDP growth, for example, it turns out there was no industrial 'revolution' but merely a steady increase in output in parallel with various technological improvements. Tech booms and busts are also evidence of this.

We also tend to get hung up on globalisation and the need for harmonious rules across regions, yet much of the benefit of the internet, for example, has actually occurred at local level, and most of us use our phones and email to stay in touch with local people.

Against this background, the conference keynote speech provided an entertaining overview of artificial intelligence and the community behind it, finishing nicely with a list of the top priorities for urgent human attention. The 'Internet of things' - 50 billion connected devices by 2020 - clearly covers a vast area, so it's important to bring it down to specific scenarios, such as the home, the car, the streets and how sensors, software and machines in each context inter-operate. Other critical developments and scenarios deserving our attention are driverless cars; the use of drones in the context of both civil surveillance and warfare; and applications that control or monitor our health.

More on those fronts in due course, no doubt.

Thanks again to all the speakers for such a thought provoking series of presentations.

Tuesday, 19 May 2015

Of #Smart Contracts, Blockchains And Other Distributed Ledgers

Seems I caught Smart Contract Fever at last week's meeting of the Bitcoin & Blockchain Leadership Forum. So rather than continuing to fire random emails at colleagues, I've tried to calm myself down with a post on the topic.

For context it's important to understand that 'smart contracts' rely on the use of a cryptographic technology or protocol which generates a 'ledger' that is accessible to any computer using the same protocol. One type of 'distributed ledger' is known as a 'blockchain', since every transaction which is accepted is then 'hashed' (shortened into a string of letters and numbers) and included with other transactions into a single 'block', which is itself hashed and added to a series or chain of such blocks. The leading distributed ledger is 'Bitcoin', the blockchain-based virtual currency. But virtual currencies (commodities?) are just one use-case for a distributed ledger - indeed the Bitcoin blockchain is being used for all sorts of non-currency applications, as explained in the very informative book, Cryptocurrency: How Bitcoin and Digital Money are Challenging the Global Economic Order. As Jay Cassano also explains, another example is Ripple, which is designed to be interoperable with other ledgers to support the wider payments ecosystem; while Ethereum is even more broadly ambitious in its attempt to use smart contracts as the basis for all kinds of ledger-based applications.

Generally speaking, the process of forming a 'smart contract' would be started by each party publishing a coded bid/offer or offer/acceptance to the same ledger or 'blockchain', using the same cryptographic protocol. These would be like two (or more) mini-apps specifying the terms on which the parties were seeking to agree. When matched, these apps would form a single application encoding the terms of the concluded contract, and this would also be recorded in the distributed ledger accessible to all computers running the same protocol. Further records could be 'published' in the ledger each time a party performed or failed to perform a contractual obligation. So the ledger would act as its own trust mechanism to verify the existence and performance of the contract. Various applications running off the ledger would be interacting with the contract and related performance data, including payment applications, authentication processes and messaging clients of the various people and machines involved as 'customers' or 'suppliers' in the related business processes. In the event of a dispute, a pre-agreed dispute resolution process could be triggered, including enforcement action via a third party's systems that could rely on the performance data posted to the ledger as 'evidence' on which to initiate a specific remedy.

Some commentators have suggested this will kill-off various types of intermediaries, lawyers and courts etc. But I think the better view is that existing roles and processes in the affected contractual scenarios will adapt to the new contractual methodology. Some roles might be replaced by the ledger itself, or become fully automated, but it's likely that the people or entities occupying today's roles would be somehow part of that evolution (if they aren't too sleepy). The need for a lot of human-readable messages would also disappear, signalling the demise of applications like email, SMS and maybe even the humble Internet browser. Most data could flow among machines, and they could alert humans in ways that don't involve buttons and keyboards.

So what are the benefits?

Well, it might take significant investment to set up such a process, but it should produce great savings in time, cost, record-keeping and so on throughout the lifetime of a contract. And, hey, no more price comparison sites or banner ads! Crypto-tech distributed ledgers would enable you to access and use a 'semantic web' of linked-data, open data, midata, wearables, smart meters, robots, drones and driverless cars - the Internet of Things - to control your day-to-day existence.

The downside?

This also might also play into the hands of the Big Data crowd (if they find a way to snoop on your encrypted contracts), or even the machines themselves. So it's critical that we figure out the right control mechanisms to 'keep humans at the heart of technology - the topic of the SCL's Tech Law Futures Conference in June, for example.

Meanwhile, I'm reviewing my first smart contract, which is proving rather like being involved in the negotiation of a software development agreement - which it is, of course. I'll post on that in due course, confidentiality permitting...

Saturday, 7 March 2015

Artificial Intelligence, Computer Misuse and Human Welfare

The big question of 2015 is how humans can reap the benefit of artificial intelligence without being wiped out. Believers in 'The Singularity' reckon machines will develop their own superintelligence and eventually out-compete humans to the point of extinction. Needless to say, we humans aren't taking this lying down, and the Society for Computers and Law is doing its bit by hosting a conference in June on the challenges and opportunities that artificial intelligence presents. However, it's also timely that the Serious Crime Act 2015 has just introduced an offence under the UK's Computer Misuse Act for unauthorised acts causing or creating the risk of serious damage to "human welfare", not to mention the environment and the economy. Specifically, section 3ZA now provides that:

(1) A person is guilty of an offence if—
(a) the person does any unauthorised act in relation to a computer;
(b) at the time of doing the act the person knows that it is unauthorised;
(c) the act causes, or creates a sign ificant risk of, serious damage of a material kind; and
(d) the person intends by doing the act to cause serious damage of a material kind or is reckless as to whether such damage is caused.

(2) Damage is of a “material kind” for th e purposes of this section if it is—
(a) damage to human welfare in any country;
(b) damage to the environment in any country;
(c) damage to the economy of any country; or
(d) damage to the national security of any country.

(3) For the purposes of subsection (2)(a) an act causes damage to human welfare only if it causes—
(a) loss to human life;
(b) human illness or injury;
(c) disruption of a supply of money, food, water, energy or fuel;
(d) disruption of a system of communication;
(e) disruption of facilities for transport; or
(f) disruption of services relating to health.

I wonder how this has gone down in Silicon Valley...

Tuesday, 28 January 2014

A Google Tax Is Not The Answer

The task of enabling people to control the use of their data by Big Data platforms acquired a new urgency this month.

The issue was perhaps highlighted most by Google CEO Eric Schmidt's recent assertions that "a race between computers and people" obliges humans to avoid jobs that machines can do. That's a somewhat disingenuous recommendation, because pushing people into a narrower and narrower range of 'creative' jobs enables the Big Data platforms like Google to continue their reliance on creative output (not to mention the data generated by user participation) to attract the vast advertising revenues needed to build ever smarter machines. Indeed, Jaron Lanier has warned that many among the Silicon Valley elite believe this process will end in the The Singularity and human extinction. They even have a university dedicated to achieving it, although in fairness I see that one of its directors posted this yesterday in favour of "social causes" and "giving back to the world" in answer to recent criticisms of Silicon Valley's elitist attitude.

Meanwhile, publishers have plumbed new depths in their own quest to regain those same advertising revenues which Google et al. effectively stole from under their noses. Their latest effort involves persuading Israeli legislators to attempt to pass a "Google ~~Tax~~ Law" that would require search engines to pay royalties to the state based on the number of times that eligible content is 'clicked' on. A committee would then divide the spoils amongst eligible 'content creators'. But this does not include users whose participatioin online is critical to the ecosystem. To be eligible to share in the tax revenues, content creators must be at least one-year old (no, we aren't talking about toddlers), update their sites at least weekly and produce at least 30 percent of their own content, excluding user-generated posts. In denouncing yet another assault by the publishing industry a Google spokesperson is quoted as saying:

“Innovation and commercial cooperation is a better way forward than new legislation in order to ensure that the content industry thrives online. Google works closely with publishers to develop new technology to increase their audiences, revenue, and engagement on their sites, and we will continue doing so.”

From this it appears that Google is happy enough to help Big Media find a way to survive at the expense of 'the audience' that is currently sharing its data for free...

The proposed Israeli Google Tax is a bad way to redistribute the excessive value extracted by the Big Data platforms from the use of everyone else's data. Not only does it rewards publishers over the users whose participation is taken for granted, but governments are also grossly wasteful and inefficient financial conduits. Nobody would really be any better off if Big Data platforms were forced to part with a fair slice of their advertising revenues, only for that money to be soaked up by a few big publishers and state bureaucracy.

Fortunately, an alternative ecosystem is steadily developing that should pave the way to fairly rewarding the use of everyone's data, but government's need to be patient and spare us the red tape in the meantime.

Tuesday, 10 September 2013

Regulating Convergence

This week I get the chance to chat about my three of my favourite topics from a legal standpoint: payments, peer-to-peer finance and data.

All three are in a state of regulatory flux (which is also making for some late nights). But that tells you a lot about where commerce, and society itself are headed. The much vaunted 'convergence' of Web 1.0 has definitely arrived.

As ever, the challenge for independent regulation of these areas is to approach electronic commerce in a holistic way that promotes competition and innovation, rather than in a blinkered fashion that results that strangles innovative services at birth...

It should be a lively week.

More in a wrap-up post at the end.

Tuesday, 12 March 2013

DNA In A Cup, Rats And Primordial Soup

My article on recent developments in the data world is published at SCL - the IT Law Community.

Search This Blog