AI-yakagadzirwa Synthetic Data, nyore uye nekukurumidza kuwana kune yemhando yepamusoro data?

AI yakagadzira synthetic data mukuita

Syntho, nyanzvi muAI-yakagadzirwa synthetic data, ine chinangwa chekushandura privacy by design mune yemakwikwi mukana neAI-yakagadzirwa synthetic data. Ivo vanobatsira masangano kuvaka hwaro hwakasimba hwedatha ine nyore uye nekukurumidza kuwana kune yemhando yepamusoro data uye nguva pfupi yadarika yakahwina iyo Philips Innovation Award.

Nekudaro, kugadzirwa kwedata rekugadzira neAI mhinduro nyowani iyo inowanzo suma mibvunzo inowanzo bvunzwa. Kuti upindure izvi, Syntho akatanga nyaya-chidzidzo pamwe chete neSAS, mutungamiri wemusika muAdvanced Analytics uye AI software.

Mukubatana neDutch AI Coalition (NL AIC), vakaongorora kukosha kwedata rekugadzira nekuenzanisa AI-yakagadzirwa synthetic data yakagadzirwa neSyntho Engine nedata rekutanga kuburikidza nekuongororwa kwakasiyana kwemhando yedata, kushanda kwemutemo uye kushandiswa.

Ko data anonymization haisi mhinduro here?

Classic anonymization matekiniki akafanana kuti anonyengedza data rekutanga kuitira kutadzisa kuronda vanhu kumashure. Mienzaniso ndeye generalization, kudzvanyirira, kupukuta, pseudonymization, data masking, uye kukwenya kwemitsara & makoramu. Unogona kuwana mienzaniso mutafura iri pazasi.

data anonymization

Iwo matekiniki anounza matatu akakosha matambudziko:

  1. Ivo vanoshanda zvakasiyana pamhando yedata uye nedataset, zvichiita kuti vaome kuyera. Uyezve, sezvo vachishanda zvakasiyana, pachagara paine gakava pamusoro pekuti ndedzipi nzira dzekushandisa uye kuti ndeapi masanganiswa ehunyanzvi anodiwa.
  2. Iko kunogara kune hukama hweumwe-kune-imwe neiyo yekutanga data. Izvi zvinoreva kuti pachagara paine njodzi yekuvanzika, kunyanya nekuda kwemaseti ese akavhurika uye hunyanzvi huripo hwekubatanidza iwo dataset.
  3. Ivo vanobata data uye nekudaro vanoparadza data mukuita. Izvi zvinonyanya kushungurudza mabasa eAI uko "simba rekufanotaura" rakakosha, nekuti yakashata yemhando data inozoguma neiyo yakaipa njere kubva kuAI modhi (marara-mukati anozokonzera marara-kunze).

Aya mapoinzi anoongororwa zvakare kuburikidza nechidzidzo ichi.

Nhanganyaya yechidzidzo chenyaya

Yechidzidzo chenyaya, dhatabheti yaitarisirwa yaive telecom dataset yakapihwa neSAS ine data revatengi vanosvika mazana mashanu nemazana matanhatu. Iyo dataset ine 56.600 columns, kusanganisira imwe column inoratidza kana mutengi asiya kambani (kureva 'churned') kana kwete. Chinangwa chechidzidzo chenyaya chaive chekushandisa data rekugadzira kudzidzisa mamwe mamodheru kufanotaura churn yevatengi uye kuongorora mashandiro emhando dzakadzidziswa. Sezvo kufanotaura kwechurn ibasa rekuisa, SAS yakasarudza mamodheru mana anozivikanwa ekuita fungidziro, kusanganisira:

  1. Sango risina kujairika
  2. Gradient inowedzera
  3. Kugadziriswa kwemagetsi
  4. Neural network

Isati yagadzira data rekugadzira, SAS yakapatsanura dhatabheti renhare kuita seti yechitima (yekudzidzisa mamodheru) uye seti yekubata (yekugova mamodheru). Kuve neyakagadzika yekubatirira yakaseti yezvibodzwa inobvumira ongororo isina kurerekera yekuti maitiro emhando yemhando angaite sei kana ikashandiswa kune data nyowani.

Ichishandisa chitima seti yekuisa, Syntho yakashandisa Syntho Engine yayo kugadzira dataset yekugadzira. Zvekumisikidza, SAS yakagadzirawo shanduro yakashandiswa yechitima seti mushure mekushandisa nzira dzakasiyana dzekusazivikanwa kuti usvike pane chimwe chikumbaridzo (chek-anonimity). Matanho ekutanga akaguma ave mana dataset:

  1. Dataset yechitima (kureva iyo yekutanga dataset kubvisa dataset yekubata)
  2. A holdout dataset (kureva chidimbu che dataset yekutanga)
  3. Dataset isingazivikanwe (zvichibva pane dataset yechitima)
  4. A synthetic dataset (zvichibva padhata rechitima)

Datasets 1, 3 uye 4 yakashandiswa kudzidzisa imwe neimwe yemhando modhi, zvichikonzera gumi nemashanu (12 x 3) akadzidziswa mamodheru. SAS yakazoshandisa iyo holdout dataset kuyera kurongeka uko modhi yega yega inofanotaura churn yevatengi. Mhedzisiro inoratidzwa pazasi, kutanga nehumwe hwaro hwehuwandu.

Muchina Kudzidza pombi inogadzirwa muSAS

Mufananidzo: Muchina Kudzidza pombi inogadzirwa muSAS Visual Data Mining uye Kudzidza Muchina

Nhamba dzekutanga kana uchienzanisa data isingazivikanwe kune yekutanga data

Maitiro ekusazivikanwa anoparadza kunyangwe mapatani ekutanga, pfungwa dzebhizinesi, hukama uye nhamba (semumuenzaniso uri pazasi). Kushandisa anonymized data kune basic analytics saka kunoburitsa kusavimbika mhedzisiro. Muchokwadi, hurombo hweiyo data isingazivikanwe yakaita kuti zvisaite kuishandisa kune epamberi analytics mabasa (eg AI/ML modelling uye dashboarding).

kuenzanisa data isingazivikanwe kune yekutanga data

Basic statistics kana uchienzanisa synthetic data nepakutanga data

Synthetic dhata yekugadzira neAI inochengetedza ekutanga mapatani, bhizinesi pfungwa, hukama uye manhamba (semumuenzaniso pazasi). Kushandisa synthetic data kune basic analytics saka kunoburitsa mhedzisiro yakavimbika. Mubvunzo wakakosha, data rekugadzira rinobata mabasa epamusoro ekuongorora (semuenzaniso AI/ML modelling uye dashboarding)?

kuenzanisa data yekugadzira kune yekutanga data

AI-yakagadzirwa synthetic data uye advanced analytics

Synthetic data haingobatanidzi yemapatani ekutanga (sekuratidzwa mune zvakamboitika), zvakare inobata zvakadzama 'zvakavanzika' zvimiro zvenhamba inodiwa kune epamberi analytics mabasa. Iyo yekupedzisira inoratidzwa mubharichati iri pasi apa, zvichiratidza kuti kururamisa kwemhando dzakadzidziswa pane zvinyorwa zvekugadzira maringe nemhando dzakadzidziswa pane data rekutanga dzakafanana. Uyezve, iine nzvimbo iri pasi pe curve (AUC*) padyo ne 0.5, mamodheru akadzidziswa pane anonymized data anoita zvakanyanya zvakanyanya. Chirevo chizere chine ese epamberi analytics ongororo pane synthetic data mukuenzanisa neyekutanga data inowanikwa pakukumbira.

* AUC: nzvimbo iri pasi pe curve chiyero chekururama kwemhando dzekuongorora dzepamusoro, tichifunga nezvechokwadi chakanaka, chenhema, nhema dzenhema uye nhema dzechokwadi. 0,5 inoreva kuti modhi inofanotaura zvisina tsarukano uye haina simba rekufanotaura uye 1 inoreva kuti modhi yacho inogara yakarurama uye ine simba rakazara rekufungidzira.

Pamusoro pezvo, iyi data yekugadzira inogona kushandiswa kunzwisisa hunhu hwe data uye misiyano mikuru inodiwa pakudzidziswa chaiko kwemamodhi. Izvo zvakasarudzwa nealgorithms pane data yekugadzira zvichienzaniswa neyekutanga data yaive yakafanana. Nekudaro, maitiro ekuenzanisira anogona kuitwa pane iyi synthetic vhezheni, iyo inoderedza njodzi yekutyorwa kwedata. Nekudaro, kana inferencing marekodhi ega (eg. telco mutengi) kudzidzirazve pane yepakutanga data inokurudzirwa kutsanangura, kuwedzera kugamuchirwa kana nekuda kwemutemo.                              

AUC nealgorithm yakaunganidzwa neMethod

AUC

Mhedziso:

  • Mienzaniso yakadzidziswa padhata rekugadzira zvichienzaniswa nemhando dzakadzidziswa padhata rekutanga dzinoratidza kuita kwakafanana
  • Mamodheru akadzidziswa padhata risingazivikanwe ane 'classic anonymization techniques' anoratidza kuita kwakaderera zvichienzaniswa nemamodheru akadzidziswa padhata rekutanga kana data rekugadzira.
  • Synthetic data kugadzirwa iri nyore uye nekukurumidza nekuti tekiniki inoshanda zvakangofanana padhata uye nemhando yedata.

Value-adding synthetic data use kesi

Shandisa nyaya 1: Synthetic data yekuvandudza modhi uye analytics yepamusoro

Kuve nehwaro hwakasimba hwedata hune nyore uye nekukasira kuwana kushandisika, yemhando yepamusoro data yakakosha kugadzira modhi (semuenzaniso madhibhodhi [BI] uye analytics yepamusoro [AI & ML]). Nekudaro, masangano mazhinji anotambura kubva kune suboptimal data hwaro zvichikonzera matatu akakosha matambudziko:

  • Kuwana mukana we data kunotora mazera nekuda kwe (zvakavanzika) mirau, maitiro emukati kana data silos
  • Classic anonymization matekiniki anoparadza data, zvichiita kuti data risakodzere kuongororwa uye analytics yepamusoro (marara mu = marara kunze)
  • Mhinduro dziripo hadzisi scalable nekuti dzinoshanda zvakasiyana padhata uye nerudzi rwe data uye haigone kubata hombe-matafura dhatabhesi.

Synthetic data nzira: gadzira modhi ine se-yakanaka-se-chaiyo synthetic data ku:

  • Deredza kushandiswa kwepakutanga data, pasina kutadzisa vagadziri vako
  • Vhura yako yega data uye uwane kune imwe data iyo yaimborambidzwa (semuenzaniso nekuda kwekuvanzika)
  • Nyore uye nekukurumidza data kuwana kune yakakodzera data
  • Scalable solution inoshanda zvakafanana kune yega yega dhata, dhatabhesi uye yemahombe dhatabhesi

Izvi zvinobvumira sangano kuvaka hwaro hwakasimba hwedatha ine nyore uye nekukurumidza kuwana kune inogona kushandiswa, yemhando yepamusoro data kuvhura data uye kuwedzera mikana yedata.

 

Shandisa nyaya yechipiri: smart synthetic test data yekuyedza software, kusimudzira uye kuburitsa

Kuedza uye kusimudzira nemhando yepamusoro yedata data kwakakosha kuendesa mamiriro-e-iyo-software mhinduro. Kushandisa data rekutanga rekugadzira zvinoita sezviri pachena, asi hazvibvumidzwe nekuda kwe (kuvanzika) mitemo. Alternative Test Data Management (TDM) zvishandiso zvinounza "legacy-by-design” mukuwana data rebvunzo zvakanaka:

  • Usaratidze data rekugadzira uye bhizinesi logic uye referensi kutendeseka hakuna kuchengetedzwa
  • Basa rinononoka uye rinotora nguva
  • Basa remaoko rinodiwa

Synthetic data approach: Edzai uye gadzira neAI-yakagadzirwa synthetic test data kuendesa mamiriro-e-the-art software mhinduro dzakangwara ne:

  • Kugadzira-senge data ine yakachengetedzwa bhizinesi logic uye referensi kutendeseka
  • Yakareruka uye inokurumidza dhata yekugadzira neese-ye-iyo AI AI
  • Kuvanzika-ne-kugadzira
  • Nyore, nekukurumidza uye agile

Izvi zvinobvumira sangano kuti riedze uye rikurudzire neinotevera-nhanho bvunzo data kuendesa mamiriro-e-iyo-software mhinduro!

More mashoko

Kufarira? Kuti uwane rumwe ruzivo nezve data rekugadzira, shanyira iyo Syntho webhusaiti kana kubata Wim Kees Janssen. Kuti uwane rumwe ruzivo nezve SAS, shanya www.sas.com kana kubata kees@syntho.ai.

Muchiitiko ichi chekushandisa, Syntho, SAS neNL AIC vanoshanda pamwechete kuti vawane mhedzisiro inodiwa. Syntho inyanzvi muAI-yakagadzirwa synthetic data uye SAS mutungamiriri wemusika mune analytics uye inopa software yekuongorora, kuongorora uye kuona data.

* Inofanotaura 2021 - Dhata uye Analytics Strategies Kutonga, Kuyera uye Shandura Digital Bhizinesi, Gartner, 2020.

syntho guide cover

Sevha yako synthetic data gwara izvozvi!