![]() Pydev_imports.execfile(filename, global_vars, local_vars) # execute the scriptįile "C:\Users\User\AppData\Local\JetBrains\Toolbox\apps\P圜harm-P\ch-0\191.7141.48\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfileĮxec(compile(contents+"\n", file, 'exec'), glob, loc)įile "C:/Users/User/PycharmProjects/ml/ml.py", line 148, in įile "C:/Users/User/PycharmProjects/ml/ml.py", line 124, in train_classifierįile "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\multiclass.py", line 215, in fitįile "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\externals\joblib\parallel.py", line 917, in _call_įile "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\externals\joblib\parallel.py", line 759, in dispatch_one_batchįile "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\externals\joblib\parallel.py", line 716, in _dispatch In the line classifier = clf.fit(list(X), y), I get the following error: Traceback (most recent call last):įile "C:\Users\User\AppData\Local\JetBrains\Toolbox\apps\P圜harm-P\ch-0\191.7141.48\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile Finally, I list-enise everything in the train_classifier function, which supposedly should help. I make sure to apply transformations to the test data, too, so there can be no inconsistency there. Finally, I process the text using the spaCy settings found in this tutorial. My code firstly splits up the categorical values (which are comma-delimited), before running them through MultiLabelBinarizer(). Train, test = represent(df, test_data,, , )Īs you can see, there are text values, number values and categorical values. Test_data = pd.DataFrame(pd.read_csv("test.csv", header=0)) X = ]) for i in range(1, len(train_docs))]ĭf = pd.DataFrame(pd.read_csv("testdata.csv", header=0)) ![]() Print("preprocessing completed successfully")ĭef train_classifier(train_docs, classAxis):Ĭlf = OneVsRestClassifier(LogisticRegression(solver='saga')) Vec = TfidfVectorizer(tokenizer=tokenizeText, ngram_range=(1, 1))ĭoc_train = vec.transform(doc_train).todense()ĭoc_test = vec.transform(doc_test).todense() Print("numbers scaled using StandardScaler()")ĭoc_train = ansform(doc_train)ĭoc_test = ansform(doc_test) Print("categorical columns encoded using MultiLabelBinarizer()") Self.encoder = MultiLabelBinarizer(*args, **kwargs)ĭef represent(rd, ed, number, category, text):ĭoc_train = ]ĭoc_test = ]įor row in range(len(doc_train)):ĭoc_train = transformed_rĭoc_test = transformed_e lemma_ for tok in tokens if tok not in SYMBOLS] Lemmas.append(tok.lemma_.lower().strip() if tok.lemma_ != "-PRON-" else tok.lower_) I have the following code: nlp = spacy.load('en_core_web_sm')Ĭlass CleanTextTransformer(TransformerMixin):ĭef transform(self, X, **transform_params): ![]() In : output.I have already seen this, this and this question, but none of the suggestions seemed to fix my problem (so I have reverted them). Output.loc = ĭemo - In : output = pd.DataFrame(data = ], columns=, index=,dtype=object) You can also specify the dtype while creating the DataFrame, Example - output = pd.DataFrame(data = ], columns=, index=,dtype=object) Output.loc = #Your listĭemo - In : output = pd.DataFrame(data = ], columns=, index=) A way to fix this would be to use a non-numeric dtype (like object) or so. Then when you try to set a list as the value, it errors out, due to the dtype. If you really want to set a list as the value for the element, the issue is with the dtype of the column, when you create the DataFrame, the dtype gets inferred as float64, since it only contains numeric values.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |