Anonymized data vs Synthetic data

Kana iwe ukasazivikanwa data rako usati waita data bvunzo ye data analytics, pane akati wandei zvinhu pakutamba:

  1. Munenge mune zvese zviitiko, data risingazivikanwe rinogona kudzoserwa kumashure kune vanhu nekuda kwemitsara chaiyo uye yakasarudzika (semuenzaniso marekodhi ekurapa)
  2. Paunonyanya kusazivikanwa kana kuita zvakawandisa, ndipo paunoparadza data rakawanda. Izvi zvinodzikisa kunaka kwedata rako uye nekudaro maonero ako
  3. Anonymization inoshanda zvakasiyana kune akasiyana mafomati data. Izvi zvinoreva kuti haisi scalable uye inogona kunyanya kutora nguva

Synthetic data inogadzirisa zvese izvi zvikanganiso uye nezvimwe. Tarisa vhidhiyo iri pazasi kuti uone nyanzvi yekuongorora kubva kuSAS (mutungamiri wemusika wepasi rose mune analytics) anotsanangura nezve ongororo yake pamusiyano wemhando pakati pe data rekutanga, data risingazivikanwe uye neSyntho yakagadzira data rekugadzira.

Vhidhiyo iyi yakatorwa kubva kuSyntho x SAS D[N]A Café nezveAI Yakagadzirwa Synthetic Data. Wana vhidhiyo yakazara pano.

Edwin van Unen akatumira dataset yepakutanga kuSyntho uye isu takagadzira dataset. Asi mubvunzo waive zvakare: "Chii chichaitika kana tikafananidza data rekugadzira nedata risingazivikanwe?" Nekuti iwe unorasikirwa neruzivo rwakawanda mune isingazivikanwe data, izvi zvichaitikawo kana uchigadzira dataset? Takatanga nedataset kubva kuindasitiri yekufonera ine mitsara makumi mashanu nematanhatu uye 56.000 makoramu ekambani churn-ruzivo. Iri dhata rakagadzirwa zvese uye risingazivikanwe kuti Edwin akwanise kuenzanisa synthetization nekusazivikanwa. Ipapo, Edwin akatanga kumodhera achishandisa SAS Viya. Akavaka akati wandei echurn modhi padhatabheti rekutanga, achishandisa classical regression matekiniki uye miti yesarudzo, asiwo hunyanzvi hwakanyanya senge neural network, gradient boosting, sango risingaverengeki - aya marudzi ehunyanzvi. Uchishandisa yakajairwa SAS Viya sarudzo pakuvaka iwo modhi.

Ipapo, yakanga yava nguva yekutarisa mibairo. Mhedzisiro yacho yanga ichivimbisa zvakanyanya kune data rekugadzira uye kwete rekusazivikanwa. Kune vasina-muchina-yekudzidza nyanzvi muvateereri, tinotarisa nzvimbo iri pasi peROC-curve iyo inotaurira chimwe chinhu pamusoro pekururama kwemuenzaniso. Kuenzanisa data yepakutanga kune anonymized data, tinoona kuti yepakutanga data muenzaniso ine nzvimbo iri pasi ROC-curve ye .8, iyo yakanaka chaizvo, Zvisinei, anonymized data ine nzvimbo pasi ROC-curve ye .6. Izvi zvinoreva kuti tinorasikirwa neruzivo rwakawanda neiyo inonymized modhi saka iwe unorasikirwa neakawanda ekufanotaura simba.

Asi zvino, mubvunzo ndewekuti chii nezve synthetics data? Pano, isu takaita zvakafanana asi pachinzvimbo chekusazivisa iyo data, Syntho akagadzira iyo data. Iye zvino, tinoona zvose zvinyorwa zvepakutanga uye data yekugadzira ine nzvimbo iri pasi peROC-curve ye .8, iyo yakafanana zvikuru. Hazvina kunyanya kufanana nekuda kwekusiyana, asi zvakanyanya kufanana. Izvi zvinoreva kuti, kugona kwedata rekugadzira kuri kuvimbisa - Edwin anofara zvikuru pamusoro peizvi.

boka revanhu vachinyemwerera

Dhata ndeyekugadzira, asi timu yedu ndeyechokwadi!

Bata Syntho uye imwe yenyanzvi dzedu ichasangana newe nekumhanya kwechiedza kuti uongorore kukosha kweiyo synthetic data!