Idatha ye-Synthetic eyenziwe yi-AI, ukufikelela lula kunye nokukhawuleza kwidatha ephezulu?

I-AI ivelise idatha yokwenziwa ngokusebenza

USyntho, ingcali kwi-AI-generated data synthetic data, ijolise ekujikeni privacy by design kwinzuzo yokukhuphisana ngedatha yokwenziwa eyenziwe yi-AI. Banceda imibutho ukuba yakhe isiseko sedatha esomeleleyo kunye nokufikelela ngokulula nangokukhawulezayo kwiinkcukacha ezikumgangatho ophezulu kwaye kutshanje iphumelele i-Philips Innovation Award.

Nangona kunjalo, ukuveliswa kwedatha eyenziwe nge-AI sisisombululo esitsha esihlala sazisa imibuzo ebuzwa rhoqo. Ukuphendula ezi zinto, u-Syntho waqala isifundo-sifundo kunye ne-SAS, inkokeli yemarike kwi-Advanced Analytics kunye nesoftware ye-AI.

Ngokubambisana ne-Dutch AI Coalition (NL AIC), baye baphanda ixabiso ledatha yokwenziwa ngokuthelekisa i-AI-generated data synthetic data eyenziwa yi-Syntho Engine kunye nedatha ye-original ngokusebenzisa iimvavanyo ezahlukeneyo kumgangatho wedatha, ukuqinisekiswa kwezomthetho kunye nokusetyenziswa.

Ngaba ukungaziwa kwedatha ayisosisombululo?

Iindlela zakudala zokufihla amagama ziyafana ukuba zikhohlisa idatha yoqobo ukuze zithintele ukulandelela umva abantu. Imizekelo kukwenziwa ngokubanzi, ukucinezelwa, ukosula, ukwenza amagama angengowakho, ukufihla iinkcukacha, kunye nokushishina kwimiqolo kunye nemiqolo. Ungayifumana imizekelo kwitheyibhile engezantsi.

ukungaziwa kwedatha

Obo buchule bazisa imingeni emi-3 engundoqo:

  1. Basebenza ngokwahlukileyo ngohlobo lwedatha kunye neseti yedatha nganye, kubenza kube nzima ukukala. Ngaphaya koko, ekubeni besebenza ngokwahlukileyo, kuya kuhlala kukho ingxoxo malunga nokuba zeziphi iindlela zokusetyenziswa kunye nokuba yeyiphi indibaniselwano yobuchule efunekayo.
  2. Kuhlala kukho ubudlelwane obunye kunye nedatha yoqobo. Oku kuthetha ukuba kuya kuhlala kukho umngcipheko wabucala, ngakumbi ngenxa yazo zonke iiseti zedatha ezivulekileyo kunye neendlela ezikhoyo zokudibanisa ezo datha.
  3. Balawula idatha kwaye ngaloo ndlela batshabalalise idatha kwinkqubo. Oku kutshabalalisa ngokukodwa kwimisebenzi ye-AI apho "amandla okuxela kwangaphambili" ayimfuneko, kuba idatha esemgangathweni engalunganga iya kubangela ingqiqo embi kwimodeli ye-AI (Inkunkuma-in iya kubangela udoti-out).

Ezi ngongoma zikwavavanywa ngokusetyenziswa kolu phando.

Intshayelelo kwisifundo somzekelo

Kuphononongo lwemeko, idataset ekujoliswe kuyo yayiyi-telecom dataset enikwe yi-SAS equlethe idatha yabathengi abangama-56.600. I-dataset iqulethe iikholamu ze-128, kubandakanywa ikholamu enye ebonisa ukuba umthengi uyishiyile inkampani (oko kukuthi 'i-churned') okanye hayi. Injongo yesifundo sophando yayikukusebenzisa idatha yokwenziwa ukuqeqesha ezinye iimodeli ukuqikelela i-churn yabathengi kunye nokuvavanya ukusebenza kwaloo mifuziselo eqeqeshiweyo. Njengoko uqikelelo lwe-churn ingumsebenzi wohlelo, i-SAS ikhethe iimodeli ezine ezidumileyo zokuhlela ukwenza uqikelelo, kubandakanya:

  1. Ihlathi elingaqhelekanga
  2. Ukunyusa umgangatho
  3. Ukuhlengahlengiswa kwezinto
  4. Uthungelwano lweeNeural

Phambi kokwenza idatha yokwenziwa, i-SAS yahlula ngokungakhethiyo i-dataset ye-telecom ibe yiseti kaloliwe (ukuqeqesha iimodeli) kunye neseti yokubamba (ukwenzela amanqaku eemodeli). Ukuba neseti eyahlukileyo yokufumana amanqaku kuvumela uhlolo olungakhethi cala lokuba imodeli yohlelo inokusebenza kakuhle kangakanani xa isetyenziswa kwidatha entsha.

Isebenzisa iseti kaloliwe njengegalelo, i-Syntho yasebenzisa i-Syntho Engine yayo ukuvelisa i-dataset yokwenziwa. Ukwenza ibhenchmarking, i-SAS iphinde yenze inguqulelo egqwethiweyo yeseti kaloliwe emva kokusebenzisa iindlela ezahlukeneyo zokungaziwasi ukufikelela kumda othile (we-k-anonimity). Amanyathelo angaphambili abe nesiphumo kwiiseti zedatha ezine:

  1. Iseti yedatha kaloliwe (oko kukuthi iseti yedatha yoqobo thabatha idata yokubamba)
  2. Iseti yedatha yokubamba (oko kukuthi iseti esezantsi yeseti yedatha)
  3. Iseti yedatha engachazwanga (ngokusekwe kwidatha kaloliwe)
  4. Iseti yedatha eyenziweyo (esekelwe kwidatha kaloliwe)

Iisethi zedatha 1, 3 kunye ne-4 zisetyenziselwe ukuqeqesha imodeli yohlelo ngalunye, okubangele i-12 (3 x 4) imodeli eqeqeshiwe. I-SAS emva koko yasebenzisa i-dataset ye-holdout ukulinganisa ukuchaneka imodeli nganye eqikelela ngayo ukuqhambuka komthengi. Iziphumo zithiwe thaca apha ngezantsi, kuqalwa ngeenkcukacha-manani ezisisiseko.

Umbhobho wokuFunda ngoomatshini owenziwe kwi-SAS

Umzobo: Umbhobho wokuFunda ngoomatshini owenziwe kwi-SAS yeVisual Data Mining kunye nokuFunda koomatshini

Amanani asisiseko xa kuthelekiswa idatha engaziwa kwidatha yoqobo

Ubuchule bokungaziwa butshabalalisa iipateni ezisisiseko, ingqiqo yeshishini, ubudlelwane kunye namanani (njengoko kumzekelo ongezantsi). Ukusebenzisa idatha engaziwa kuhlalutyo olusisiseko kuvelisa iziphumo ezingathembekanga. Enyanisweni, umgangatho ombi wedatha engachazwanga yenze ukuba kube lula ukuyisebenzisela imisebenzi yohlalutyo oluphambili (umzekelo, i-AI/ML imodeli kunye nedeshibhodi).

ukuthelekisa idatha engaziwa kwidatha yoqobo

Izibalo ezisisiseko xa kuthelekiswa idatha yokwenziwa kunye nedatha yoqobo

Ukuveliswa kwedatha ye-synthetic nge-AI kugcina iipatheni ezisisiseko, ingqiqo yezoshishino, ubudlelwane kunye namanani (njengakumzekelo ongezantsi). Ukusebenzisa idatha yokwenziwa kuhlalutyo olusisiseko kuvelisa iziphumo ezithembekileyo. Umbuzo ongundoqo, ngaba idatha yokwenziwa igcine imisebenzi yohlalutyo oluphambili (umzekelo, i-AI/ML imodeli kunye nedeshibhodi)?

Ukulinganisa idatha yokwenziwa kwidatha yoqobo

Idatha eyenziwe yi-AI eyenziwe kunye nohlalutyo oluphambili

Idatha ye-Synthetic ayibambeli kuphela iipateni ezisisiseko (njengoko kubonisiwe kwiploti yangaphambili), ikwabamba ubunzulu 'efihliweyo' iipateni zamanani ezifunekayo kwimisebenzi yohlalutyo oluphambili. Oku kokugqibela kuboniswa kwitshathi yebha engezantsi, ebonisa ukuba ukuchaneka kweemodeli eziqeqeshwe kwiidatha zokwenziwa ngokuchasene neemodeli eziqeqeshwe kwiidatha zangaphambili ziyafana. Ngaphaya koko, ngendawo ephantsi kwegophe (AUC*) kufutshane ne-0.5, iimodeli eziqeqeshwe kwidatha engaziwa zisebenza kakhulu kakhulu. Ingxelo epheleleyo kunye nazo zonke iimvavanyo zohlalutyo oluphambili kwidatha yokwenziwa xa kuthelekiswa nedatha yokuqala ifumaneka ngesicelo.

* I-AUC: indawo ephantsi kwegophe ngumlinganiselo wokuchaneka kweemodeli zohlalutyo oluphambili, kuthathelwa ingqalelo iinyani eziyinyaniso, ezingezizo eziyinyaniso, ezingezizo ezingezizo, kunye nezinto ezingalunganga. I-0,5 ithetha ukuba imodeli iqikelela ngokungaqhelekanga kwaye ingenawo amandla okuxela kwangaphambili kwaye i-1 ithetha ukuba imodeli isoloko ichanekile kwaye inamandla apheleleyo okuxela kwangaphambili.

Ukongezelela, le datha yokwenziwa ingasetyenziselwa ukuqonda iimpawu zedatha kunye nezinto eziphambili ezifunekayo kuqeqesho lwangempela lweemodeli. Amagalelo akhethwe yi-algorithms kwidatha yokwenziwa xa kuthelekiswa nedatha yoqobo yayifana kakhulu. Ngenxa yoko, inkqubo yomzekelo inokwenziwa kule nguqulo yokwenziwa, enciphisa umngcipheko wokuphulwa kwedatha. Nangona kunjalo, xa uphonononga iirekhodi zomntu ngamnye (umzekelo. Umthengi we-telco) ukuqeqeshwa kwakhona kwidatha yangaphambili kunconywa ukuchazwa, ukwamkelwa okwandisiweyo okanye nje ngenxa yommiselo.                              

I-AUC nge-Algorithm ehlelwe yiNdlela

AUC

Izigqibo:

  • Iimodeli eziqeqeshwe kwidatha eyenziweyo xa kuthelekiswa neemodeli eziqeqeshwe kwidatha yokuqala zibonisa ukusebenza okufanayo kakhulu
  • Iimodeli eziqeqeshwe kwidatha engaziwa mntu 'ngeendlela zakudala zokungaziwa' zibonisa ukusebenza okuphantsi xa kuthelekiswa neemodeli eziqeqeshwe kwidatha yoqobo okanye idatha eyenziweyo.
  • Ukuveliswa kwedatha yokwenziwa kulula kwaye kuyakhawuleza kuba ubuchule busebenza ngokufanayo ngokweseti yedatha kunye nohlobo lwedatha.

Ukongezwa kwexabiso iimeko zokusetyenziswa kwedatha yokwenziwa

Sebenzisa imeko ye-1: Idatha ye-Synthetic yophuhliso lwemodeli kunye nohlalutyo oluphambili

Ukuba nesiseko sedatha eyomeleleyo enofikelelo olulula nolukhawulezayo olusebenzisekayo, idatha ekumgangatho ophezulu ibalulekile ukuphuhlisa imifuziselo (umz. iideshibhodi [BI] kunye nohlalutyo oluphambili [AI & ML]). Nangona kunjalo, imibutho emininzi inengxaki yesiseko sedatha esisezantsi esikhokelela kwimingeni emi-3 engundoqo:

  • Ukufumana ukufikelela kwidatha kuthatha iminyaka ngenxa (yemfihlo) imigaqo, iinkqubo zangaphakathi okanye iisilos zedatha
  • Iindlela zakudala zokungaziwa zitshabalalisa idatha, okwenza ukuba idatha ingasafaneleki ukuhlalutya kunye nohlalutyo oluphambili (inkunkuma = inkunkuma ngaphandle)
  • Izisombululo ezikhoyo azinakunyuka ngenxa yokuba zisebenza ngokwahlukileyo kwidathasethi kunye nohlobo lwedatha nganye kwaye azikwazi ukuphatha idatabase enkulu yeetafile ezininzi.

Indlela yokwenziwa kwedatha: phuhlisa imifuziselo enedatha yokwenziwa elungile-njengeyokwenene ukuze:

  • Nciphisa ukusetyenziswa kwedatha yoqobo, ngaphandle kokuthintela ababhekisi phambili bakho
  • Ukuvula idatha yobuqu kunye nokufikelela kwidatha ethe yathintelwa ngaphambili (umzekelo, ngenxa yemfihlo)
  • Ukufikelela ngokulula kunye nokukhawuleza kwedatha kwidatha efanelekileyo
  • Isisombululo esinokulinganiswa esisebenza ngokufanayo kwiseti nganye, idathabheyisi nakwindawo yolwazi enkulu

Oku kuvumela umbutho ukuba wakhe isiseko sedatha esomeleleyo kunye nokufikelela ngokulula nangokukhawulezayo kwidatha esebenzisekayo, ekumgangatho ophezulu wokuvula idatha kunye nokunyusa amathuba edatha.

 

Sebenzisa imeko yesi-2: idatha yovavanyo lwe-smart synthetic yovavanyo lwesoftware, uphuhliso kunye nokuhanjiswa

Uvavanyo kunye nophuhliso olunedatha yovavanyo oluphezulu luyimfuneko ukuhambisa izisombululo zesoftware yanamhlanje. Ukusebenzisa idatha yemveliso yokuqala kubonakala kucacile, kodwa akuvumelekanga ngenxa yemigaqo (yabucala). Enye indlela Test Data Management (TDM) izixhobo zazisa "legacy-by-design” ekufumaneni idatha yovavanyo echanekileyo:

  • Ungabonisi idatha yemveliso kunye nengqiqo yoshishino kunye nengqibelelo yokubhekisela ayigcinwanga
  • Umsebenzi ucotha kwaye utya ixesha
  • Umsebenzi wezandla uyafuneka

Indlela yokwenziwa kwedatha: Vavanya kwaye uphuhlise ngedatha yovavanyo eyenziwe yi-AI ukuhambisa izisombululo zesoftware enobuchule nge:

  • Idatha efana nemveliso enengqiqo yoshishino egciniweyo kunye nokuthembeka okubhekiselele
  • Ukuveliswa kwedatha okulula kunye nokukhawulezayo ngobugcisa be-AI
  • Ubumfihlo-ngoyilo
  • Kulula, ngokukhawuleza kwaye agile

Oku kuvumela umbutho ukuba uvavanye kwaye uphuhlise ngedatha yovavanyo lwenqanaba elilandelayo ukuhambisa izisombululo zesoftware ye-software!

Iinkcukacha ezithe xaxe

Unomdla? Ngolwazi oluthe vetshe malunga nedatha eyenziweyo, ndwendwela iwebhusayithi yeSyntho okanye uqhagamshelane noWim Kees Janssen. Ngolwazi oluthe vetshe nge-SAS, ndwendwela www.sas.com okanye uqhagamshelane kees@syntho.ai.

Kule meko yokusetyenziswa, i-Syntho, i-SAS kunye ne-NL AIC basebenzisana ukuze bafezekise iziphumo ezijoliswe kuyo. I-Syntho yingcali kwi-AI-generated data synthetic data kunye ne-SAS yinkokeli yemarike kuhlalutyo kwaye inikezela ngesofthiwe yokuhlola, ukuhlalutya kunye nokubonwa kwedatha.

* Iqikelela i-2021-Iidatha kunye ne-Analytics Strategies yokulawula, ukulinganisa kunye nokuguqula iShishini leDijithali, iGartner, i-2020.

isigqubuthelo sesikhokelo se-syntho

Gcina isikhokelo sakho sedatha yokwenziwa ngoku!