Idatha engaziwa iqhathaniswa nedatha yokwenziwa

Uma ufihla idatha yakho ngaphambi kokwenza ukuhlolwa kwedatha kokuhlaziya idatha, kunezici ezimbalwa ezidlalwayo:

  1. Cishe kuzo zonke izimo, idatha engaziwa isengalandelelwa emuva kubantu ngabanye ngenxa yemigqa ethile neyingqayizivele (isb. amarekhodi ezokwelapha)
  2. Uma wenza ngokungaziwa noma wenza okuvamile, kulapho ucekela phansi idatha eyengeziwe. Lokhu kwehlisa ikhwalithi yedatha yakho kanjalo nemininingwane yakho
  3. Ukungaziwa kusebenza ngendlela ehlukile kumafomethi edatha ahlukene. Lokhu kusho ukuthi ayinakulinganiswa futhi ingadla isikhathi esiningi

Idatha yokwenziwa ixazulula zonke lezi ziphutha nokunye okwengeziwe. Buka ividiyo engezansi ukuze ubone uchwepheshe wezibalo wakwa-SAS (umholi wemakethe yomhlaba wonke kwezokuhlaziya) echaza mayelana nokuhlola kwakhe umehluko wekhwalithi phakathi kwedatha yoqobo, idatha engaziwa kanye nedatha yokwenziwa ye-Syntho.

Le vidiyo ithathwe ku-Syntho x SAS D[N]A Café mayelana ne-AI Generated Synthetic Data. Thola ividiyo ephelele lapha.

U-Edwin van Unen uthumele isethi yedatha yoqobo ku-Syntho futhi sahlanganisa idathasethi. Kodwa umbuzo wawubuye uthi: "Kuzokwenzekani uma siqhathanisa idatha yokwenziwa nedatha engaziwa?" Ngoba ulahlekelwa ulwazi oluningi kudatha engaziwa, ingabe lokhu kuzophinde kwenzeke uma uhlanganisa idathasethi? Siqale ngedathasethi evela embonini yezokuxhumana enemigqa engu-56.000 namakholomu angu-128 olwazi lwe-churn-information. Le dathasethi yenziwe yahlanganiswa futhi yenziwa yangaziwa ukuze u-Edwin akwazi ukuqhathanisa ukuhlanganiswa nokungaziwa. Ngemuva kwalokho, u-Edwin waqala ukumodela esebenzisa i-SAS Viya. Wakha amamodeli ambalwa we-churn kudathasethi yasekuqaleni, esebenzisa amasu okuhlehla yakudala nezihlahla zesinqumo, kodwa futhi namasu athuthuke kakhulu njengamanethiwekhi emizwa, ukukhulisa i-gradient, ihlathi elingahleliwe - lezi zinhlobo zamasu. Ukusebenzisa izinketho ezijwayelekile ze-SAS Viya lapho wakha amamodeli.

Khona-ke, kwase kuyisikhathi sokubheka imiphumela. Imiphumela ibithembisa kakhulu kudatha yokwenziwa hhayi ukungaziwa. Kochwepheshe abangafundi ngomshini ezilalelini, sibheka indawo engaphansi kwe-ROC-curve etshela okuthile mayelana nokunemba kwemodeli. Uma siqhathanisa idatha yoqobo nedatha engaziwa, siyabona ukuthi imodeli yedatha yasekuqaleni inendawo engaphansi kwejika le-ROC lika-.8, elihle kakhulu, Nokho, idatha engaziwa inendawo engaphansi kwejika le-ROC lika-.6. Lokhu kusho ukuthi silahlekelwa ulwazi oluningi ngemodeli engaziwa ukuze ulahlekelwe amandla amaningi okuqagela.

Kodwa-ke, umbuzo uthi kuthiwani ngedatha ye-synthetics? Lapha, senze okufanayo ncamashi kodwa esikhundleni sokufihla idatha, u-Syntho wahlanganisa idatha. Manje, sibona kokubili idatha yoqobo kanye nedatha yokwenziwa inendawo engaphansi kwejika le-ROC lika-.8, elifana kakhulu. Ayifani ncamashi ngenxa yokuhlukahluka, kodwa ifana kakhulu. Lokhu kusho ukuthi, amandla edatha yokwenziwa ayathembisa kakhulu - u-Edwin ujabule kakhulu ngalokhu.

iqembu labantu elimamathekayo

Idatha iyenziwe, kodwa ithimba lethu lingokoqobo!

Xhumana noSyntho futhi omunye wochwepheshe bethu uzoxhumana nawe ngesivinini sokukhanya ukuze ahlole inani ledatha yokwenziwa!