Kana iwe ukasazivikanwa data rako usati waita data bvunzo ye data analytics, pane akati wandei zvinhu pakutamba:
Synthetic data inogadzirisa zvese izvi zvikanganiso uye nezvimwe. Tarisa vhidhiyo iri pazasi kuti uone nyanzvi yekuongorora kubva kuSAS (mutungamiri wemusika wepasi rose mune analytics) anotsanangura nezve ongororo yake pamusiyano wemhando pakati pe data rekutanga, data risingazivikanwe uye neSyntho yakagadzira data rekugadzira.
Vhidhiyo iyi yakatorwa kubva kuSyntho x SAS D[N]A Café nezveAI Yakagadzirwa Synthetic Data. Wana vhidhiyo yakazara pano.
Edwin van Unen akatumira dataset yepakutanga kuSyntho uye isu takagadzira dataset. Asi mubvunzo waive zvakare: "Chii chichaitika kana tikafananidza data rekugadzira nedata risingazivikanwe?" Nekuti iwe unorasikirwa neruzivo rwakawanda mune isingazivikanwe data, izvi zvichaitikawo kana uchigadzira dataset? Takatanga nedataset kubva kuindasitiri yekufonera ine mitsara makumi mashanu nematanhatu uye 56.000 makoramu ekambani churn-ruzivo. Iri dhata rakagadzirwa zvese uye risingazivikanwe kuti Edwin akwanise kuenzanisa synthetization nekusazivikanwa. Ipapo, Edwin akatanga kumodhera achishandisa SAS Viya. Akavaka akati wandei echurn modhi padhatabheti rekutanga, achishandisa classical regression matekiniki uye miti yesarudzo, asiwo hunyanzvi hwakanyanya senge neural network, gradient boosting, sango risingaverengeki - aya marudzi ehunyanzvi. Uchishandisa yakajairwa SAS Viya sarudzo pakuvaka iwo modhi.
Ipapo, yakanga yava nguva yekutarisa mibairo. Mhedzisiro yacho yanga ichivimbisa zvakanyanya kune data rekugadzira uye kwete rekusazivikanwa. Kune vasina-muchina-yekudzidza nyanzvi muvateereri, tinotarisa nzvimbo iri pasi peROC-curve iyo inotaurira chimwe chinhu pamusoro pekururama kwemuenzaniso. Kuenzanisa data yepakutanga kune anonymized data, tinoona kuti yepakutanga data muenzaniso ine nzvimbo iri pasi ROC-curve ye .8, iyo yakanaka chaizvo, Zvisinei, anonymized data ine nzvimbo pasi ROC-curve ye .6. Izvi zvinoreva kuti tinorasikirwa neruzivo rwakawanda neiyo inonymized modhi saka iwe unorasikirwa neakawanda ekufanotaura simba.
Asi zvino, mubvunzo ndewekuti chii nezve synthetics data? Pano, isu takaita zvakafanana asi pachinzvimbo chekusazivisa iyo data, Syntho akagadzira iyo data. Iye zvino, tinoona zvose zvinyorwa zvepakutanga uye data yekugadzira ine nzvimbo iri pasi peROC-curve ye .8, iyo yakafanana zvikuru. Hazvina kunyanya kufanana nekuda kwekusiyana, asi zvakanyanya kufanana. Izvi zvinoreva kuti, kugona kwedata rekugadzira kuri kuvimbisa - Edwin anofara zvikuru pamusoro peizvi.
Bata Syntho uye imwe yenyanzvi dzedu ichasangana newe nekumhanya kwechiedza kuti uongorore kukosha kweiyo synthetic data!